Why Every Organisation Needs a Website Archiving Policy
Most organisations have policies governing the retention of financial records, contracts, and correspondence. Many have policies for preserving physical documents and maintaining backups of critical IT systems. Far fewer have a formal policy for archiving their websites – despite the fact that a website is often the most public, most scrutinised, and most legally consequential document an organisation publishes.
This gap is both common and dangerous. A corporate website contains terms and conditions that bind customers, privacy policies that create regulatory obligations, product claims that expose the company to liability, pricing information that affects contractual relationships, and regulatory disclosures that must be maintained for compliance. When any of this content is changed, removed, or disputed, the organisation that lacks a verifiable record of what was published and when is at a significant disadvantage.
A website record retention policy transforms website archiving from an ad hoc activity – if it happens at all – into a structured, repeatable programme with clear responsibilities, defined standards, and documented procedures. It ensures that the organisation’s digital presence is preserved with the same rigour applied to its other critical business records.
This guide provides a framework for building such a policy, drawing on the requirements of multiple regulatory regimes and the practical experience of preserving websites for Fortune 500 companies and regulated institutions across six industries.
Key Elements of a Website Retention Policy
A comprehensive website record retention policy should address six core elements: scope, capture frequency, retention periods, storage requirements, access controls, and roles and responsibilities.
1. Scope
The scope section defines which web properties are covered by the policy. This is the foundation of the entire programme, and it must be comprehensive. Many organisations focus exclusively on their primary corporate website while overlooking web properties that are equally – or more – exposed to regulatory and legal risk.
A thorough scope definition should include:
Primary corporate website. The main website, including all subdomains, landing pages, and microsites hosted under the organisation’s primary domain.
Product and service websites. Dedicated websites for specific products, services, or brands, which may be hosted on separate domains.
Investor relations and regulatory pages. Pages containing financial disclosures, SEC filings, annual reports, and other regulated content.
Careers and recruitment pages. Job postings and employment-related content, which may be subject to equal opportunity and labour regulations.
Blog and editorial content. Corporate blogs, thought leadership content, and press releases, which constitute public statements by the organisation.
E-commerce pages. Product listings, pricing information, terms of sale, and return policies, all of which create contractual obligations.
Customer-facing portals. Help centres, FAQ sections, knowledge bases, and support pages that may contain representations about product capabilities or service commitments.
Campaign and promotional pages. Temporary landing pages, promotional microsites, and event-specific web content that may contain time-limited offers or claims.
The scope should be reviewed and updated at least annually, and whenever the organisation launches new web properties or retires existing ones.
2. Capture Frequency
The capture frequency defines how often each web property is archived. The appropriate frequency depends on how frequently the website changes and the level of regulatory scrutiny it receives.
Daily capture is appropriate for websites that are updated frequently, such as news-oriented sites, e-commerce platforms with dynamic pricing, or financial services websites with regularly updated market commentary. Daily capture ensures that changes are documented within a 24-hour window.
Weekly capture is suitable for most corporate websites that are updated on a regular but not daily basis. Weekly capture provides a reasonable balance between comprehensive documentation and resource efficiency.
Monthly capture is the minimum frequency for websites that change infrequently, such as static informational sites or archived microsites that are no longer actively maintained. Monthly capture ensures that even low-activity sites are periodically documented.
Event-driven capture should supplement scheduled captures whenever substantive content changes are made. Policy changes, product launches, pricing updates, regulatory disclosure modifications, and terms of service revisions should all trigger immediate archiving, regardless of the regular schedule.
The policy should specify the default capture frequency and identify any web properties that require a different schedule based on their content, regulatory exposure, or update frequency.
3. Retention Periods
The retention period defines how long website archives are preserved before they may be disposed of. Retention periods should be based on a combination of legal requirements, regulatory obligations, and business needs.
Regulatory minimums. Industry-specific regulations often prescribe minimum retention periods. SEC Rule 17a-4 requires broker-dealers to preserve business communications for at least three years. FINRA Rule 4511 requires similar retention. The FDA expects pharmaceutical companies to maintain promotional records for at least two years after the last date of use. European regulations under MiFID II require investment firms to retain records for at least five years.
Statute of limitations. The applicable statute of limitations for potential legal claims should inform retention periods. If the limitation period for a contract dispute is six years, website archives containing contractual terms should be retained for at least that period. If the limitation period for a product liability claim is longer, archives of product-related web content should be retained accordingly.
Litigation hold obligations. When litigation is reasonably anticipated, the organisation has a duty to preserve all potentially relevant evidence, including website archives. The retention policy should include provisions for implementing litigation holds that override normal retention schedules.
Business needs. Beyond legal and regulatory requirements, organisations may have legitimate business reasons for retaining website archives, such as brand history documentation, competitive intelligence, or institutional memory.
The policy should specify default retention periods by content category and include a process for extending retention when litigation holds, regulatory investigations, or other circumstances require it.
4. Storage Requirements
The storage requirements section defines the technical standards for how website archives are stored. These requirements directly affect the legal defensibility and regulatory acceptability of the archives.
File format. Website archives should be stored in ISO 28500 WARC (Web ARChive) format, the internationally recognised standard for web archiving. WARC files preserve the complete HTTP transaction for every resource, including request and response headers, content bodies, and metadata. This level of completeness is essential for regulatory compliance and legal defensibility.
Cryptographic integrity. Every archive should include cryptographic hash signatures computed at the time of capture. Dual-algorithm signing – such as the SHA-512 and RIPEMD-160 approach used by Aleph Archives – provides tamper-evident verification that the archive has not been modified since creation.
WORM storage. Archives should be stored on Write Once, Read Many (WORM) storage media that prevents alteration or deletion of records during the retention period. WORM storage satisfies the most stringent regulatory requirements, including SEC Rule 17a-4, and provides the strongest available assurance of archive integrity.
Geographic location. The policy should specify where archives are stored, particularly for organisations subject to data sovereignty requirements. European organisations subject to the GDPR should ensure archives are stored within the EEA or in a jurisdiction with an adequate level of data protection, such as Switzerland.
Redundancy. Archives should be stored with appropriate redundancy to protect against data loss. The policy should specify the minimum number of copies and the geographic separation between storage locations.
5. Access Controls
Access controls define who may view, search, and retrieve website archives, and under what circumstances. These controls are important for both security and regulatory compliance.
Role-based access. Define specific roles with appropriate access levels. Legal teams may need full access for litigation support. Compliance officers may need access for regulatory audit preparation. Marketing teams may need limited access for brand history research. IT administrators may need technical access for system maintenance.
Authentication. Require strong authentication for archive access, including multi-factor authentication for systems containing sensitive content.
Audit logging. Maintain detailed logs of all access to the archive, including who accessed it, when, what they viewed or retrieved, and for what stated purpose. These logs support chain of custody documentation and demonstrate responsible data stewardship.
Export controls. Define procedures for exporting archived content, particularly when producing records for regulatory examinations, legal proceedings, or other external purposes. Exports should be documented, authorised, and tracked.
6. Roles and Responsibilities
The policy should clearly assign responsibility for each aspect of the website archiving programme.
Policy owner. A senior executive – typically the General Counsel, Chief Compliance Officer, or Chief Information Officer – should own the policy and be responsible for its maintenance and enforcement.
Archiving operations. The team or provider responsible for executing the archiving programme should be identified, along with their specific responsibilities for capture scheduling, quality assurance, and storage management.
Legal and compliance oversight. The legal and compliance functions should be responsible for defining retention periods, implementing litigation holds, and responding to regulatory inquiries that involve website archives.
Periodic review. The policy should specify a review cycle – at least annually – during which the scope, capture frequency, retention periods, and other elements are evaluated and updated as needed.
Industry-Specific Requirements
While the framework above applies broadly, specific industries have additional requirements that should be reflected in the retention policy.
Financial Services
Financial firms subject to SEC, FINRA, FCA, or equivalent regulations face the most prescriptive requirements. Website content constitutes regulated communications that must be preserved in non-rewritable, non-erasable format. Retention periods of three to six years are typical, with some records requiring preservation for the life of the enterprise. The policy should specifically address the preservation of investment performance data, risk disclosures, and fee information published on the firm’s website.
Pharmaceutical and Healthcare
Pharmaceutical companies must preserve promotional website content in accordance with FDA expectations. The policy should address the archiving of branded product sites, unbranded disease awareness sites, clinical trial information, and patient support portals. Integration with the company’s medical-legal-regulatory review process is essential, and archives should be triggered whenever approved content changes are published.
Government and Public Sector
Government agencies face transparency requirements, including freedom of information laws and open records statutes, that create specific obligations for preserving website content. Retention periods are often prescribed by statute and may extend to permanent retention for certain categories of records. The policy should address compliance with applicable records management statutes and the agency’s records schedule.
Insurance
Insurance companies publish policy terms, coverage descriptions, and claims procedures on their websites. These representations create contractual obligations and regulatory exposure. The retention policy should ensure that all customer-facing website content is archived with sufficient frequency to document changes, and retained for periods consistent with the applicable statute of limitations for insurance disputes.
Energy and Utilities
Energy companies and utilities publish environmental disclosures, safety information, rate schedules, and regulatory filings on their websites. These publications are subject to oversight by environmental agencies, public utility commissions, and other regulators. The retention policy should address the preservation of regulated disclosures and ensure compliance with sector-specific recordkeeping requirements.
Automated Scheduling vs. Manual Capture
A website retention policy must address how captures are executed. The two approaches – automated scheduling and manual capture – have fundamentally different reliability profiles.
Automated Scheduling
Automated scheduling is the foundation of a reliable archiving programme. An automated system captures the defined web properties at the specified intervals without human intervention. It does not forget, it does not skip a scheduled capture because of workload pressures, and it does not introduce inconsistencies in capture methodology.
Automated scheduling should be configured to cover the full scope of web properties at the frequencies specified in the policy. The system should monitor capture success and alert designated personnel when a capture fails or produces incomplete results.
Manual Capture
Manual capture – where a person initiates an archiving action – has a role as a supplement to automated scheduling, not as a replacement. Manual capture is appropriate for event-driven archiving: capturing a website immediately before or after a major content change, preserving a competitor’s website in response to a specific legal concern, or capturing a web page that is expected to be removed.
The policy should be clear that manual capture supplements automated scheduling. Programmes that rely exclusively on manual capture inevitably develop gaps – pages that were not captured, periods that were not covered, and documentation of inconsistent quality.
A Framework for Your Policy
The following framework provides a starting point for organisations developing their first website retention policy or formalising an existing programme.
Section 1: Purpose and Scope. State the purpose of the policy, identify the web properties covered, and define the organisational units responsible for compliance.
Section 2: Capture Standards. Specify the file format (ISO 28500 WARC), cryptographic integrity requirements, and capture completeness standards. Define what constitutes a complete capture, including all page resources, interactive elements, and linked content.
Section 3: Capture Schedule. Define the default capture frequency and identify web properties that require a different schedule. Specify the circumstances that trigger event-driven captures.
Section 4: Retention Schedule. Define retention periods by content category, referencing applicable regulatory requirements. Include provisions for litigation holds and regulatory preservation orders.
Section 5: Storage and Security. Specify storage format, WORM compliance, geographic location, redundancy requirements, and encryption standards.
Section 6: Access and Retrieval. Define role-based access controls, authentication requirements, audit logging, and export procedures.
Section 7: Roles and Responsibilities. Assign ownership, operational responsibility, and oversight functions. Define escalation procedures for policy exceptions.
Section 8: Review and Maintenance. Establish the annual review cycle and define the process for updating the policy in response to regulatory changes, organisational changes, or lessons learned.
Conclusion
A website record retention policy is not a bureaucratic exercise. It is the document that transforms website archiving from an afterthought into a governed, defensible programme. It ensures that the organisation’s most public document – its website – is preserved with the same discipline applied to its financial records, its contracts, and its regulatory filings.
The organisations that build this discipline now will find themselves prepared when a regulator asks what their website displayed three years ago, when a litigant challenges the terms that were published on a specific date, or when an internal investigation requires a verified record of the organisation’s digital history.
The cost of building a website retention policy is measured in hours of planning and drafting. The cost of not having one is measured in regulatory fines, adverse legal judgments, and the irretrievable loss of institutional memory. The choice is straightforward.


