# Formats and Encodings

`qip` intentionally chooses old, boring, open formats:

- Simpler parsing and fewer edge cases.
- Broad ecosystem of existing tooling.
- Easier for coding agents to generate correct implementations.
- Likely to still work in 10+ years.
- Reduced lock-in to proprietary systems.

For qip's one-input/one-output module model, stable interchange formats make modules more reusable.

## Preferred formats

Current formats directly supported by a qip command or supported by this repo’s modules in `modules/`:

- `application/warc`: website snapshots
- `application/x-tar`: : directory archive as one input/output blob
- `image/bmp`: simple uncompressed raster interchange
- `image/svg+xml`: vector graphics that work great with LLMs
- `image/x-icon`
- `image/gif`
- `text/markdown`
- `text/html`
- `text/javascript`
- `text/x-c`
- `application/vnd.sqlite3`
- `application/xml`

Examples:

- `qip route warc ...` emits `application/warc`
- `modules/image/svg+xml/svg-rasterize.wasm` maps `image/svg+xml -> image/bmp`
- `modules/application/warc/warc-to-static-tar-no-trailing-slash.wasm` maps `application/warc -> application/x-tar`

Tradeoffs:

- BMP is larger than PNG/JPEG on disk, but excellent as an internal interchange format because it is straightforward to parse and transform.
- Tar is uncompressed; in the future we might pair it with compression.

## Encodings

Formats and encodings are at different layers:

- Formats are container/file semantics (`image/svg+xml`, `application/warc`, `image/bmp`).
- Encodings are byte/value representations used within processing stages.

`qip` currently supports these encodings:

- `UTF-8` for text pipelines (`input_utf8_cap` / `output_utf8_cap`)
- `RGBA f32` for image filter tiles in `qip image` (`tile_rgba_f32_64x64`)

Why these two:

- UTF-8 is the default, broadly interoperable text encoding. It is much easier to process than alternatives like UTF-16.
- RGBA float32 preserves precision during chained image transforms and maps cleanly to GPU/shader-style workflows.

## Quick Decision Guide

If you need:

- One file that represents many files: use `application/x-tar`
- A snapshot of routed web output: use `application/warc`
- Vector graphics interchange: use `image/svg+xml`
- Simple raster interchange between modules: use `image/bmp`
- General text transforms: use `UTF-8` modules in `modules/utf8/`
- Image filter pipelines: use `RGBA f32` via `qip image`

## When not to use these defaults

- Use richer app-specific formats only when their extra semantics are required.
- Keep module boundaries on simple formats, then adapt at ingress/egress.
- Use PNG/JPEG/WebP at system edges where compression is the priority.
