What Does WARC Stand For?
WARC stands for Web ARChive. It is a file format specifically designed to store captured web content in a way that preserves every technical detail of how that content was delivered over the internet. If you have ever wondered how organisations preserve websites for legal, compliance, or historical purposes, the answer almost always involves WARC files.
The WARC format was developed by the International Internet Preservation Consortium (IIPC), a group of national libraries, archives, and technology organisations dedicated to preserving the web. After years of collaborative development, the format was formalised as an international standard: ISO 28500, first published in 2009 and updated in 2017 as ISO 28500:2017.
Today, WARC is the universally recognised standard for website archiving. It is used by the Internet Archive, national libraries around the world, government agencies, and enterprise archiving providers including Aleph Archives.
How a WARC File Works
To understand a WARC file, it helps to understand what happens when you visit a website. When your browser loads a page, it does not receive a single file. It sends an HTTP request to the web server and receives an HTTP response. That response includes headers (technical metadata such as the content type, server information, caching instructions, and timestamps) and the body (the actual content – HTML, CSS, JavaScript, images, or other resources). A single web page may trigger dozens or even hundreds of these request-response exchanges as the browser fetches the HTML document, stylesheets, scripts, images, fonts, and third-party resources.
A WARC file captures all of this. Every HTTP request-response pair is stored as a discrete record within the WARC file, complete with:
- The HTTP request – exactly what was sent to the server, including headers
- The HTTP response – exactly what the server returned, including all headers and the full response body
- Timestamps – the precise date and time of the capture, recorded to the second
- Metadata – information about the archiving process itself, such as which software performed the capture and what configuration was used
This means a WARC file does not just preserve what a website looked like. It preserves how the website was delivered – the complete technical transaction between client and server.
The ISO 28500:2017 Standard
ISO 28500:2017 defines the WARC file format in precise technical detail. The standard specifies the structure of WARC records, the required and optional fields, the encoding rules, and the mechanisms for linking related records together.
Why does having an ISO standard matter? For several important reasons:
Interoperability. A WARC file created by one tool can be read by any other tool that supports the standard. Your archives are not locked into a proprietary format that depends on a single vendor’s software.
Longevity. ISO standards are designed to endure. The WARC format will remain readable for decades because its specification is publicly documented and maintained by an international standards body. This is critical for archives that must be preserved for ten, twenty, or fifty years.
Legal acceptance. Courts and regulatory bodies recognise ISO standards. An archive stored in an ISO-compliant format carries more weight than one stored in a proprietary or ad hoc format, because the standard provides a verifiable framework for how the data was structured and preserved.
Completeness. The standard requires that archives include the full HTTP transaction, not just the visible content. This level of detail is essential for proving exactly what was published on a website at a specific point in time.
WARC vs. Other Formats
Organisations sometimes attempt to archive websites using simpler methods. Each of these approaches has significant limitations compared to WARC-based archiving.
WARC vs. Screenshots (PNG/JPEG)
A screenshot captures a static image of what appears on screen at a single moment. It preserves the visual appearance of a page but nothing else. There are no links, no underlying HTML, no metadata about how the content was delivered, and no way to verify that the screenshot has not been altered. Screenshots cannot capture content below the fold, behind interactive elements, or within embedded media players. For legal purposes, a screenshot is a photograph of evidence rather than the evidence itself.
WARC vs. PDF
A PDF export of a web page preserves text and layout in a portable format, but it strips away interactivity, dynamic content, embedded videos, and the technical context of how the page was delivered. PDFs do not capture HTTP headers, server responses, or the dozens of auxiliary resources that contribute to a page’s appearance. A PDF of a web page is a flattened representation, not a faithful reproduction.
WARC vs. HTML Download (Save As)
Using a browser’s “Save As” function downloads the HTML file and some associated resources to your local machine. This approach frequently breaks: images fail to load, CSS stylesheets are missing, JavaScript does not execute, and the page looks nothing like the original. There are no timestamps, no HTTP headers, and no chain of custody. It is the digital equivalent of tearing pages from a book and hoping you captured the whole story.
WARC vs. MHTML
MHTML (MIME HTML) bundles a web page and its resources into a single file. It is better than a simple HTML download, but it still lacks HTTP headers, does not handle JavaScript-rendered content, and has inconsistent support across browsers. It was never designed for archival purposes and does not meet the requirements of any major compliance framework.
Why WARC Is the Gold Standard
For any organisation that needs to prove what its website displayed at a specific point in time – whether for regulatory compliance, intellectual property protection, litigation support, or corporate governance – WARC is the only format that provides legally defensible evidence.
A WARC-based archive can demonstrate:
- Exactly what content was published on every page of the website
- Exactly when it was captured, with precise timestamps
- Exactly how it was delivered, including all HTTP headers and server responses
- The complete technical context, including all resources that contributed to the page’s appearance and functionality
This level of completeness is what distinguishes a proper website archive from a collection of screenshots or PDF exports. In legal and regulatory contexts, the difference is consequential. A screenshot can be challenged as incomplete, manipulated, or taken out of context. A WARC-based archive, with its complete HTTP transaction records and metadata, is far more difficult to dispute.
How Aleph Archives Uses WARC
At Aleph Archives, every website archive we produce is stored in fully ISO 28500-compliant WARC files. We adopted the WARC standard when we founded the company in 2010, and it has been the foundation of everything we build.
But we go beyond the standard. Every WARC file we produce is secured with dual cryptographic signatures using SHA-512 and RIPEMD-160 hashing algorithms. These cryptographic hashes create a unique digital fingerprint for every archived resource. If even a single byte of the archived content were to be modified after capture, the hash would no longer match, immediately revealing the tampering.
Our archives are stored on WORM (Write Once Read Many) storage, which physically prevents modification after the initial write. Combined with cryptographic verification, this creates a tamper-evident chain of custody from the moment of capture through long-term storage.
The result is a website archive that is not merely a copy of what was online. It is a cryptographically verified, independently auditable, legally defensible record of exactly what was published on the web, preserved in the international standard format designed for precisely this purpose.
Getting Started
Whether you need to archive your own corporate website for compliance, preserve a competitor’s published claims for intellectual property protection, or maintain a complete record of your digital presence for governance purposes, the WARC format and ISO 28500 standard provide the foundation for reliable, long-term website preservation.
If you have questions about how WARC-based archiving can serve your organisation’s needs, contact Aleph Archives. We have been building on this standard since 2010, and we are always happy to explain how it works.


