Mar: The Modern Archive Format

Core data infrastructure for data of all sizes.

Mar is EarthFrame's archive format and data interaction toolkit. Just as data is the new oil, you can think of mar as the new tar. Mar improves on traditional archive formats like TAR and ZIP by adding efficient random access, modern compression algorithms, built-in checksums, good defaults with user customization, and more.



Archiving Your Data

Mar, much like TAR and ZIP, is an archive format and command-line tool for interacting with that format. Mar uses a well-defined binary format that includes a header and one or more data blocks. Mar is often faster than TAR, and Mar archives are often the same size or smaller than tarballs compressed with GZIP or LZ4.

Create archives of your files with a simple command:

Create
mar create <archive name> <one or more files or directories>

Compression and Space Savings

Mar supports modern compression algorithms including ZSTD and the highly-optimized libdeflate library for GZIP compression. Mar archives are usually the same size or smaller but much faster to create and compress.

Mar archives enable listing files without unpacking any data blocks, meaning you do not need to decompress your files to disk to see what's inside them.

List
mar list <archive>

Sharing Your Data

Mar makes it easy to share large collections of files as a single archive. Recipients can extract all files, get individual files by their path, or even pipe data directly from the archive into other processes.

Create
mar create my_archive.mar big_data/ little_data/

Extract everything:

Extract All
mar extract my_archive.mar

Or extract specific files:

Extract Specific Files
mar extract my_archive.mar big_data/big_data_1.pq big_data/big_data_104.pq

Data Redaction

Mar supports data redaction within the archive, enabling data removal without having to fully extract and re-archive. Perfect for compliance and privacy workflows.

Redact Data
mar redact -o redacted_archive.mar my_archive.mar big_data/big_data_4.pq

Universal Accessibility

Mar archives store decompression information right in the archive itself, so recipients don't have to guess or trust your file extensions. By creating a strict specification, Mar makes it easy to reliably share your data.

  • Cross-platform – Supports Linux and Mac OS X
  • Multi-architecture – Tested on x86 and ARM systems
  • Easy installation – Coming to package managers

Reducing Cloud Storage Costs

Mar compresses your data and makes random access efficient. Storing compressed archives is more efficient than storing uncompressed data, often by a factor of 2-4x.

With the mar-s3 package (currently in private beta), you can selectively list files in your remote archive and download just the ones you need, significantly reducing egress costs. mar-s3 even includes a caching layer for efficient repeated downloads.


Indexing and Searching

Mar's indexing system includes support for sidecar files that associate new indices with the filename index in the header. This enables implementing new index types without needing to update the core specification.

Today, Mar supports lexicographic similarity search using the MinHash sidecar index, enabling retrieval of similar texts from an archive based on a query. Full semantic search using a Mar vector sidecar index and the mar-embed package is expected in the April release.


Built for Agents

Mar's self-describing CLI makes it easy for agents and LLMs to figure out how to use it without needing a specialized MCP server or fine-tuning. Mar was designed from day one to be agent-driven and works great with modern tool-calling LLMs right out of the box—and it's only getting better.


The Future of Data Sovereignty

At EarthFrame, Mar already powers our internal data sovereignty toolkit, stores versioned releases of our documentation, and allows us to easily share files and data with each other. We're building an ecosystem of tools around Mar to make storing, archiving, and sharing your data easier, faster, and cheaper.

We are just getting started on Mar's development with an expected 1.0 release in late 2026.


Get Started with Mar

Mar is available as open-source on GitHub under an Apache-2.0 License. Explore the full source code, documentation, and contribute to the project:

View Mar on GitHub →

Installation and detailed usage examples are available in the Mar GitHub repository.