April 22, 2019

Why CMS Backups Are Not Website Archives

blog image

The Most Dangerous Assumption in Digital Compliance

There is a belief that persists across corporate IT departments, legal teams, and compliance offices worldwide: “We back up our CMS every night, so our website is archived.” This assumption is understandable. It is also wrong. And for organisations operating in regulated industries, it can be genuinely dangerous.

A content management system backup and a website archive are fundamentally different things. They serve different purposes, capture different data, and provide different levels of legal protection. Confusing the two can leave an organisation exposed precisely when it believes it is covered.

What a CMS Backup Actually Captures

When you back up a WordPress, Drupal, Sitecore, or Adobe Experience Manager installation, you are typically preserving two things:

  1. The database – which contains your content in raw form: text fields, metadata, configuration settings, user accounts, and the relationships between content objects
  2. The file system – which contains uploaded media (images, PDFs, videos), theme files, plugin code, and configuration files

This is the raw material from which your website is assembled. It is not the website itself.

Think of it this way: a CMS backup is like preserving all the ingredients and the recipe for a meal. It is not the same as preserving the meal as it was served to your guests. The finished dish – the website as it appears to visitors – requires a kitchen (the web server), a chef (the CMS rendering engine), and all the side dishes brought by other guests (third-party content, CDN-hosted resources, external scripts).

What a CMS Backup Misses

The gap between what a CMS backup contains and what a visitor actually sees on your website is significant. Here is what falls through the cracks:

Rendered Output

Your CMS takes raw content from the database and transforms it through templates, themes, and rendering logic into the HTML that visitors see. A CMS backup preserves the inputs to this process, not the output. If your theme changes, your plugins update, or your rendering logic evolves, restoring an old backup may produce a page that looks nothing like what was actually published at the time.

Third-Party Content

Modern websites pull content from dozens of external sources: embedded YouTube videos, social media feeds, Google Maps widgets, live chat tools, review aggregators, stock tickers, and advertising networks. None of this content lives in your CMS database. When you restore a backup, all of this third-party content is gone – or worse, it loads the current version rather than the version that was displayed at the time you need to document.

CDN-Hosted Assets

Many enterprise websites serve images, scripts, and stylesheets through content delivery networks. These assets may not be stored in your CMS file system at all. A CMS backup will not include them. If the CDN purges old content or the asset URLs change, those resources are lost.

Dynamic and Personalised Content

If your website displays different content based on user location, device type, time of day, A/B testing configurations, or personalisation rules, a CMS backup cannot capture these variations. The database stores the rules, not the results. The actual experience that a specific visitor had at a specific moment is not preserved.

JavaScript-Rendered Content

Websites built with modern JavaScript frameworks like React, Angular, or Vue generate their visible content in the browser. The CMS may serve only a minimal HTML shell and a bundle of JavaScript. A CMS backup preserves this shell and code, but not the fully rendered page that visitors actually see.

CMS Backup vs. WARC-Based Website Archive

AspectCMS BackupWARC-Based Archive
What is preservedDatabase + file systemComplete HTTP request-response pairs for every resource
Rendered outputNot capturedFully rendered pages as visitors saw them
Third-party contentNot includedCaptured as part of the page
TimestampsBackup creation time onlyPer-resource capture timestamps
HTTP headersNot includedFull request and response headers preserved
Cryptographic verificationNot standardHash-based integrity verification available
Legal defensibilityWeak – no proof of what was displayedStrong – complete record of published content
Chain of custodyNot establishedDocumented from capture to storage
Format standardVendor-specificISO 28500 international standard
Replay capabilityRequires full CMS stack to renderCan be replayed independently in a browser

For organisations in regulated industries – financial services, pharmaceuticals, healthcare, telecommunications, government – the distinction between a CMS backup and a website archive is not academic. It has direct legal consequences.

No Proof of Publication

A CMS backup can demonstrate what content existed in your database. It cannot prove what that content looked like when it was published on your website. In a regulatory investigation or litigation, the question is not “What was in your database?” but “What did visitors to your website actually see?”

No Timestamps on Content

A CMS backup has a timestamp for when the backup was created. It does not have timestamps for when specific pages were rendered and displayed to visitors. If a regulator asks when a specific disclosure was visible on your website, a CMS backup cannot answer that question.

No Chain of Custody

Legal defensibility requires an unbroken chain of custody: a documented record of how evidence was captured, stored, and protected from the moment of creation. CMS backups are typically created by automated scripts, stored on shared servers, and managed by IT teams with broad access. There is no cryptographic proof that the backup has not been modified since creation.

No Independence from the Platform

To demonstrate what a CMS backup contained, you must restore it – which requires the same CMS version, the same plugins, the same theme, and the same server configuration. If any of these components have changed (as they inevitably do over time), the restored backup may not accurately represent what was published. A WARC-based archive, by contrast, is self-contained and can be replayed independently.

When CMS Backups Fail

Consider these scenarios where a CMS backup proves inadequate:

Regulatory audit. A financial regulator asks your firm to demonstrate exactly what risk disclosures were visible on your website on a specific date six months ago. Your CMS backup contains the disclosure text, but the page template has been updated three times since then. Restoring the backup produces a page that looks different from what was actually published. You cannot prove compliance.

Intellectual property dispute. A competitor claims you copied their product descriptions. You need to prove that your website displayed your original content before theirs was published. Your CMS backup shows when content was added to the database, but not when or how it appeared on the live website. The timestamps do not prove publication.

Brand integrity incident. Your website was defaced or displayed incorrect information due to a compromised plugin. You need to document exactly what visitors saw during the incident. Your CMS backup from that period shows the correct database content – the defacement happened at the rendering layer, which the backup did not capture.

Compliance deadline. A new regulation requires you to have displayed specific terms and conditions on your website by a certain date. Your CMS backup shows the content was in the database, but you cannot prove it was actually rendered and visible to visitors on the required date.

What You Need Instead

A proper website archive captures your website as it actually appeared to visitors – the fully rendered pages, with all resources, in the ISO 28500 WARC format. It provides per-page timestamps, complete HTTP transaction records, and cryptographic verification that the archive has not been modified since capture.

At Aleph Archives, we capture websites using browser-based archiving technology that renders every page exactly as a visitor would see it, including JavaScript-generated content, third-party resources, and dynamic elements. Every capture is stored in ISO 28500-compliant WARC files with SHA-512 and RIPEMD-160 cryptographic signatures, on WORM storage that physically prevents modification.

This is not a replacement for your CMS backup. You should absolutely continue backing up your CMS for disaster recovery purposes. But a CMS backup serves a different purpose: it helps you restore your website after a technical failure. A website archive serves a fundamentally different purpose: it proves what your website displayed to the world, when it displayed it, and provides legally defensible evidence of that fact.

The Bottom Line

Back up your CMS for disaster recovery. Archive your website for compliance, legal protection, and institutional memory. They are not the same thing, and one cannot substitute for the other.

If your organisation currently relies on CMS backups as its website archiving strategy, contact us to discuss how proper WARC-based archiving can close the gap.

See the Most Complete Web Archives in Action

Schedule a 15-minute demo to discover how Aleph Archives automates regulatory web archiving for your organisation.

See the Most Complete Web Archives in Action