The Most Dangerous Assumption in Digital Compliance
There is a belief that persists across corporate IT departments, legal teams, and compliance offices worldwide: “We back up our CMS every night, so our website is archived.” This assumption is understandable. It is also wrong. And for organisations operating in regulated industries, it can be genuinely dangerous.
A content management system backup and a website archive are fundamentally different things. They serve different purposes, capture different data, and provide different levels of legal protection. Confusing the two can leave an organisation exposed precisely when it believes it is covered.
What a CMS Backup Actually Captures
When you back up a WordPress, Drupal, Sitecore, or Adobe Experience Manager installation, you are typically preserving two things:
- The database – which contains your content in raw form: text fields, metadata, configuration settings, user accounts, and the relationships between content objects
- The file system – which contains uploaded media (images, PDFs, videos), theme files, plugin code, and configuration files
This is the raw material from which your website is assembled. It is not the website itself.
Think of it this way: a CMS backup is like preserving all the ingredients and the recipe for a meal. It is not the same as preserving the meal as it was served to your guests. The finished dish – the website as it appears to visitors – requires a kitchen (the web server), a chef (the CMS rendering engine), and all the side dishes brought by other guests (third-party content, CDN-hosted resources, external scripts).
What a CMS Backup Misses
The gap between what a CMS backup contains and what a visitor actually sees on your website is significant. Here is what falls through the cracks:
Rendered Output
Your CMS takes raw content from the database and transforms it through templates, themes, and rendering logic into the HTML that visitors see. A CMS backup preserves the inputs to this process, not the output. If your theme changes, your plugins update, or your rendering logic evolves, restoring an old backup may produce a page that looks nothing like what was actually published at the time.
Third-Party Content
Modern websites pull content from dozens of external sources: embedded YouTube videos, social media feeds, Google Maps widgets, live chat tools, review aggregators, stock tickers, and advertising networks. None of this content lives in your CMS database. When you restore a backup, all of this third-party content is gone – or worse, it loads the current version rather than the version that was displayed at the time you need to document.
CDN-Hosted Assets
Many enterprise websites serve images, scripts, and stylesheets through content delivery networks. These assets may not be stored in your CMS file system at all. A CMS backup will not include them. If the CDN purges old content or the asset URLs change, those resources are lost.
Dynamic and Personalised Content
If your website displays different content based on user location, device type, time of day, A/B testing configurations, or personalisation rules, a CMS backup cannot capture these variations. The database stores the rules, not the results. The actual experience that a specific visitor had at a specific moment is not preserved.
JavaScript-Rendered Content
Websites built with modern JavaScript frameworks like React, Angular, or Vue generate their visible content in the browser. The CMS may serve only a minimal HTML shell and a bundle of JavaScript. A CMS backup preserves this shell and code, but not the fully rendered page that visitors actually see.
CMS Backup vs. WARC-Based Website Archive
| Aspect | CMS Backup | WARC-Based Archive |
|---|---|---|
| What is preserved | Database + file system | Complete HTTP request-response pairs for every resource |
| Rendered output | Not captured | Fully rendered pages as visitors saw them |
| Third-party content | Not included | Captured as part of the page |
| Timestamps | Backup creation time only | Per-resource capture timestamps |
| HTTP headers | Not included | Full request and response headers preserved |
| Cryptographic verification | Not standard | Hash-based integrity verification available |
| Legal defensibility | Weak – no proof of what was displayed | Strong – complete record of published content |
| Chain of custody | Not established | Documented from capture to storage |
| Format standard | Vendor-specific | ISO 28500 international standard |
| Replay capability | Requires full CMS stack to render | Can be replayed independently in a browser |
The Legal and Compliance Gap
For organisations in regulated industries – financial services, pharmaceuticals, healthcare, telecommunications, government – the distinction between a CMS backup and a website archive is not academic. It has direct legal consequences.
No Proof of Publication
A CMS backup can demonstrate what content existed in your database. It cannot prove what that content looked like when it was published on your website. In a regulatory investigation or litigation, the question is not “What was in your database?” but “What did visitors to your website actually see?”
No Timestamps on Content
A CMS backup has a timestamp for when the backup was created. It does not have timestamps for when specific pages were rendered and displayed to visitors. If a regulator asks when a specific disclosure was visible on your website, a CMS backup cannot answer that question.
No Chain of Custody
Legal defensibility requires an unbroken chain of custody: a documented record of how evidence was captured, stored, and protected from the moment of creation. CMS backups are typically created by automated scripts, stored on shared servers, and managed by IT teams with broad access. There is no cryptographic proof that the backup has not been modified since creation.
No Independence from the Platform
To demonstrate what a CMS backup contained, you must restore it – which requires the same CMS version, the same plugins, the same theme, and the same server configuration. If any of these components have changed (as they inevitably do over time), the restored backup may not accurately represent what was published. A WARC-based archive, by contrast, is self-contained and can be replayed independently.
When CMS Backups Fail
Consider these scenarios where a CMS backup proves inadequate:
Regulatory audit. A financial regulator asks your firm to demonstrate exactly what risk disclosures were visible on your website on a specific date six months ago. Your CMS backup contains the disclosure text, but the page template has been updated three times since then. Restoring the backup produces a page that looks different from what was actually published. You cannot prove compliance.
Intellectual property dispute. A competitor claims you copied their product descriptions. You need to prove that your website displayed your original content before theirs was published. Your CMS backup shows when content was added to the database, but not when or how it appeared on the live website. The timestamps do not prove publication.
Brand integrity incident. Your website was defaced or displayed incorrect information due to a compromised plugin. You need to document exactly what visitors saw during the incident. Your CMS backup from that period shows the correct database content – the defacement happened at the rendering layer, which the backup did not capture.
Compliance deadline. A new regulation requires you to have displayed specific terms and conditions on your website by a certain date. Your CMS backup shows the content was in the database, but you cannot prove it was actually rendered and visible to visitors on the required date.
What You Need Instead
A proper website archive captures your website as it actually appeared to visitors – the fully rendered pages, with all resources, in the ISO 28500 WARC format. It provides per-page timestamps, complete HTTP transaction records, and cryptographic verification that the archive has not been modified since capture.
At Aleph Archives, we capture websites using browser-based archiving technology that renders every page exactly as a visitor would see it, including JavaScript-generated content, third-party resources, and dynamic elements. Every capture is stored in ISO 28500-compliant WARC files with SHA-512 and RIPEMD-160 cryptographic signatures, on WORM storage that physically prevents modification.
This is not a replacement for your CMS backup. You should absolutely continue backing up your CMS for disaster recovery purposes. But a CMS backup serves a different purpose: it helps you restore your website after a technical failure. A website archive serves a fundamentally different purpose: it proves what your website displayed to the world, when it displayed it, and provides legally defensible evidence of that fact.
The Bottom Line
Back up your CMS for disaster recovery. Archive your website for compliance, legal protection, and institutional memory. They are not the same thing, and one cannot substitute for the other.
If your organisation currently relies on CMS backups as its website archiving strategy, contact us to discuss how proper WARC-based archiving can close the gap.


