The GDPR and Digital Preservation: A Misunderstood Relationship
When the General Data Protection Regulation came into effect in May 2018, it fundamentally changed how European organisations think about data. Every department – from marketing to legal to IT – was forced to re-examine how personal data is collected, processed, and stored. Website archiving was no exception.
Yet seven years later, many organisations still misunderstand how the GDPR applies to their website archiving practices. Some have stopped archiving their websites entirely, believing that preserving web content conflicts with data protection principles. Others continue archiving without any consideration of GDPR requirements, exposing themselves to regulatory risk. Both approaches are wrong.
The reality is more nuanced and, ultimately, more encouraging: proper website archiving, when implemented correctly, actually supports GDPR compliance rather than undermining it. Understanding why requires a careful examination of the regulation’s core principles and how they intersect with the legitimate need to preserve digital records.
Why Organisations Must Archive Their Websites
Before examining the GDPR’s specific requirements, it is worth establishing why website archiving matters in the first place. A modern corporate website is not merely a marketing brochure. It is a legal document. It contains terms and conditions, privacy policies, cookie disclosures, product claims, pricing information, regulatory statements, and promotional content – all of which create legal obligations and liabilities.
Regulatory bodies across Europe require organisations to maintain records of their public-facing digital content. Financial services firms must preserve website content under MiFID II and national financial conduct regulations. Pharmaceutical companies must document their promotional websites to satisfy European Medicines Agency requirements. Public sector organisations face transparency obligations that extend to their digital presence.
When a dispute arises – whether it concerns misleading advertising, contractual terms, or regulatory compliance – the organisation that can produce a verified, timestamped record of exactly what its website displayed on a specific date holds a decisive advantage. The organisation that cannot produce such records faces an uncomfortable burden of proof.
How the GDPR Applies to Website Archives
The GDPR governs the processing of personal data. A website archive will inevitably contain personal data: names in blog posts, photographs of employees, contact details on team pages, testimonials from customers, and personal information submitted through forms. This means website archiving constitutes data processing under the GDPR, and organisations must comply with the regulation’s core principles.
Lawful Basis for Processing
Every processing activity requires a lawful basis under Article 6 of the GDPR. For website archiving, two bases are most commonly applicable.
Legitimate interest (Article 6(1)(f)) is the most robust basis for most organisations. The legitimate interests in website archiving are substantial: regulatory compliance, legal protection, dispute resolution, and institutional record-keeping. These interests must be balanced against the data subjects’ rights, but in most cases the balance favours archiving, particularly when the archived content was already published on a public-facing website.
Legal obligation (Article 6(1)(c)) applies when sector-specific regulations mandate the preservation of website content. Financial services firms required to archive their websites under national financial conduct rules, for example, have a clear legal obligation that provides an independent basis for processing.
The Right to Erasure and Its Limits
Article 17 of the GDPR – the “right to be forgotten” – is the provision that causes the most confusion in the context of website archiving. Data subjects have the right to request the erasure of their personal data, and organisations must comply unless an exception applies.
The critical point is that the GDPR explicitly provides exceptions to the right to erasure. Article 17(3) permits continued processing where it is necessary for compliance with a legal obligation, for the establishment, exercise, or defence of legal claims, or for archiving purposes in the public interest. These exceptions are directly relevant to website archiving.
An organisation that archives its website for regulatory compliance purposes is not obligated to delete personal data from those archives simply because a data subject requests it, provided the retention is genuinely necessary for the stated purpose. The key word is “necessary.” Organisations must be able to demonstrate that retaining the data serves a specific, documented purpose and that the retention period is proportionate to that purpose.
This does not mean the right to erasure can be ignored. It means organisations need a clear, documented policy that explains why archived website content is retained, for how long, and under what circumstances data may or may not be removed from archives. The policy must be defensible, proportionate, and transparent.
Data Minimisation and Purpose Limitation
The GDPR requires that personal data be adequate, relevant, and limited to what is necessary (data minimisation) and processed only for specified, explicit, and legitimate purposes (purpose limitation). These principles do not prohibit website archiving, but they require organisations to be intentional about what they archive and why.
A website archive captures the website as it was published – content that the organisation itself chose to make public. Archiving that published content for compliance, legal protection, or institutional record-keeping is consistent with both principles, provided the organisation has documented its purposes and established appropriate retention periods.
Data Sovereignty: Why Hosting Location Matters
The GDPR places strict requirements on where personal data is stored and processed. Data transfers outside the European Economic Area are subject to additional safeguards, and the regulatory landscape for international transfers has become increasingly complex following the Schrems II judgment.
For European organisations, hosting website archives within the EEA – or in a country with an adequate level of data protection – eliminates transfer-related compliance risks entirely. Switzerland, where Aleph Archives is headquartered, has been recognised by the European Commission as providing an adequate level of data protection since the GDPR came into effect. This means that website archives stored in Switzerland benefit from the same level of regulatory certainty as archives stored within the EEA, without the complexities of Standard Contractual Clauses, Binding Corporate Rules, or other transfer mechanisms.
This is not a trivial consideration. Organisations that store website archives with providers based in jurisdictions without an adequacy decision – or that route data through servers in such jurisdictions – must implement and maintain additional legal safeguards. These safeguards are administratively burdensome, subject to ongoing legal uncertainty, and potentially vulnerable to challenge by data protection authorities.
Cookie Consent Pages and Their Archiving Implications
Modern European websites are required to obtain informed consent before setting non-essential cookies. This requirement, stemming from the ePrivacy Directive and enforced through national implementations, has produced a ubiquitous feature of the European web: the cookie consent banner.
Cookie consent pages present a specific challenge for website archiving. The content displayed to a visitor depends on their consent choices. A visitor who accepts all cookies may see personalised content, targeted advertising, and embedded third-party elements. A visitor who rejects non-essential cookies may see a materially different version of the same page.
For compliance purposes, both versions may need to be archived. The default view – what a first-time visitor sees before making a consent choice – represents the initial presentation of the website and should always be captured. The fully consented view may also need to be archived, particularly for organisations in regulated industries where the complete user experience is subject to regulatory scrutiny.
A robust website archiving system must be capable of handling consent dialogs intelligently: capturing the initial state, interacting with consent mechanisms, and preserving the resulting page states. This is a technical challenge that distinguishes professional web archiving solutions from simple screenshot tools or basic crawlers that cannot interact with dynamic page elements.
How Proper Website Archiving Supports GDPR Compliance
Rather than conflicting with the GDPR, a well-implemented website archiving programme actually strengthens an organisation’s compliance posture in several important ways.
Demonstrating Transparency
The GDPR requires organisations to be transparent about how they process personal data. Privacy notices, cookie policies, and data processing disclosures must be accurate, up-to-date, and accessible. A website archive provides an irrefutable record of exactly what privacy information was published on the organisation’s website at any given time. When a data protection authority asks what privacy notice was in effect on a particular date, an archived record provides a definitive answer.
Supporting Data Subject Rights
When data subjects exercise their rights – whether requesting access to their data, objecting to processing, or seeking erasure – organisations must respond within strict timeframes. Website archives help organisations verify what data was published, when it was published, and in what context, enabling faster and more accurate responses to data subject requests.
Proving Compliance During Investigations
Data protection authorities across Europe have become increasingly active in investigating and enforcing the GDPR. When an investigation is opened, the organisation must demonstrate that it was compliant at the relevant time – not merely that it is compliant today. Website archives provide contemporaneous evidence of compliance that cannot be fabricated after the fact.
Preserving Contractual Records
Website terms and conditions, acceptable use policies, and other contractual documents published on a website form binding agreements with users. Archiving these documents in a tamper-evident format ensures that the organisation can prove exactly what terms were in effect at any given time, which is essential for resolving contractual disputes.
Building a GDPR-Compliant Website Archiving Programme
Organisations that wish to archive their websites in compliance with the GDPR should address several key elements.
Document the lawful basis. Identify and record the specific lawful basis for archiving, whether legitimate interest, legal obligation, or another basis. Conduct and document a legitimate interest assessment if relying on Article 6(1)(f).
Establish retention periods. Define how long website archives will be retained, based on the purposes for which they are maintained. Regulatory retention requirements, limitation periods for legal claims, and institutional needs should all inform the retention schedule.
Implement access controls. Restrict access to website archives to authorised personnel with a documented need. Maintain audit logs of who accesses the archives and when.
Choose a hosting jurisdiction carefully. Store archives in the EEA or in a jurisdiction with an adequacy decision, such as Switzerland, to avoid the complexities of international data transfers.
Use tamper-evident storage. Archives stored in ISO 28500 WARC format with cryptographic verification – such as the dual SHA-512 and RIPEMD-160 signatures used by Aleph Archives – provide the evidentiary integrity that both data protection authorities and courts expect from digital records.
Update the privacy notice. Ensure the organisation’s privacy notice discloses the archiving activity, including the purposes, lawful basis, and retention periods.
Conclusion
The GDPR does not prohibit website archiving. It requires organisations to archive responsibly, transparently, and with proper legal justification. For organisations that take compliance seriously, a well-designed website archiving programme is not a liability – it is an asset that demonstrates good governance, supports regulatory compliance, and provides the evidentiary foundation for defending the organisation’s interests.
European organisations that neglect website archiving out of misplaced GDPR concerns leave themselves exposed to far greater risks: the inability to prove what their website displayed, the absence of contemporaneous compliance records, and the lack of defensible evidence when disputes arise.
The organisations that get this right are the ones that treat website archiving not as a conflict with data protection, but as an integral part of it.


