Why Our Competitors Left Web Archiving (And Why We Never Will)

The Industry We Helped Build Looks Very Different Today

When Aleph Archives was founded in 2010, the web archiving market was small and specialized. A handful of companies shared a common mission: capture the live web and preserve it for the future. We were all solving the same hard problem, competing on who could do it best.

Fifteen years later, the landscape has changed dramatically. The companies that once competed directly with us in web archiving have, one by one, expanded into adjacent markets. Their product pages now feature capabilities that have little to do with preserving websites. Their engineering teams are split across multiple product lines. Their marketing speaks a broader language.

This is not a criticism. It is a factual observation about how the market has evolved, and it has important implications for any organization that depends on the quality and reliability of its web archives.

The Great Diversification

A close look at our competitors’ own websites tells a clear story of strategic diversification away from web archiving as a core focus.

MirrorWeb: From Web Archiving to “Unified Platform”

MirrorWeb began as a web archiving company. Today, their website tells a different story. Their navigation menu is organized under a “Unified Platform” banner that lists four co-equal product categories: Website Archiving, Email & Communication Archiving, Social Media Archiving, and Mobile Archiving. Web archiving is one line item among four.

Their Instant Messaging Archiving page advertises capture capabilities for Microsoft Teams, Bloomberg, Slack, Google Chat, and Symphony. Their Mobile Archiving page promotes the capture of SMS, WhatsApp, Zoom, LinkedIn, X, TikTok, and more. Their compliance pages describe “MirrorWeb Insight,” a product that “captures messages across Slack, Teams, mobile SMS, WhatsApp, Gmail, and Microsoft 365 – all threaded and displayed with full context.”

Their blog content reflects the shift as well. Posts on “Communications Compliance: A Complete Guide to Digital Communications Capture” discuss how “Teams hop from Outlook to Slack to WhatsApp in seconds” and explain how MirrorWeb “connects directly to each platform’s native API or journal stream: Outlook, Gmail, Teams, Slack, WhatsApp, Zoom, LinkedIn, SMS, Bloomberg IB and dozens more.”

Web archiving remains on their menu, but it now shares attention and engineering resources with an expansive communications archiving portfolio.

Hanzo: From Web Archiving to Legal Data Management

Hanzo’s transformation is perhaps the most striking. Visit hanzo.co today and you will find a company that describes itself as offering “Legal Data Management Software.” Their tagline reads: “Hanzo empowers you to streamline legal holds, investigations, and litigation with a powerful, AI-driven product suite – built for cloud-based collaboration data.”

Their product suite now consists of three distinct solutions: Chronicle (automated web archiving), Illuminate (collaboration data analysis and eDiscovery), and Spotlight AI (AI-powered relevancy assessment for eDiscovery). Their use cases page lists Legal Hold and Preservation, Internal Investigations, eDiscovery for complex modern sources, Early Case Assessment with AI Relevancy, and Compliance Review for DSAR/GDPR.

Illuminate, their eDiscovery product, is described as “a pioneering dynamic data management and eDiscovery solution” that allows users to “create and manage in place preservations, collect, investigate, cull, and export dynamic collaboration data.” Spotlight AI “accelerates data discovery and investigations by automatically identifying and guiding users to relevant content hidden within volumes of complex data.” Hanzo even holds a patent for Spotlight AI, emphasizing their “GenAI Leadership in Collaboration Data.”

Chronicle – the web archiving product – still exists. But it is one solution inside a platform whose identity and investment have clearly shifted toward AI-powered legal data management.

Pagefreezer: From Web Archiving to Investigations and Compliance

Pagefreezer’s product menu now lists five distinct archiving products: Website Archiving, Social Media Archiving, Microsoft Teams Archiving, Workplace from Meta Archiving, and Text Message Archiving. Beyond archiving, they offer Website & Social Media Evidence Collection and On-Demand Collection Services.

Their content library reveals an even wider strategic expansion. Pagefreezer now publishes extensive OSINT (Open Source Intelligence) investigation guides covering X/Twitter, Instagram, Facebook, Reddit, Discord, LinkedIn, WhatsApp, and even Bluesky. They publish “The Ultimate Social Media Investigations Guide” and guides on social media fraud investigations and insurance fraud.

Their “Complete Slack Field Guide for Legal & Compliance Teams” explains how Slack content should be archived to meet enterprise compliance needs, positioning the company firmly in the enterprise collaboration compliance space.

Website archiving remains a product, but the company’s energy and content strategy have clearly pivoted toward social media investigations, OSINT, and multi-channel compliance.

Why Companies Diversify Away from Web Archiving

The pattern is consistent across all three competitors, and the reasons are not difficult to understand. Web archiving is, by a significant margin, the most technically demanding form of digital archiving.

Consider what API-based archiving looks like. To archive Slack messages, Microsoft Teams conversations, or social media posts, you connect to a well-documented API. The platform returns structured data in predictable formats. The data model is stable. New messages arrive in an orderly stream. The engineering challenge is real, but it is fundamentally a solved problem: ingest structured data, store it securely, make it searchable.

Now consider what web archiving requires. Modern websites are built with ever-changing JavaScript frameworks. Single-page applications dynamically render content that exists nowhere in the initial HTML. Authentication walls, CAPTCHAs, and anti-bot measures actively resist automated capture. Cookie consent dialogs, infinite scrolling, lazy-loaded images, embedded iframes, complex CSS animations, and personalized content all conspire to make faithful web archiving extraordinarily difficult.

Every year, the web gets harder to archive. New frontend frameworks emerge. Websites adopt increasingly sophisticated client-side rendering. Content delivery becomes more dynamic and more personalized. The engineering challenge does not plateau – it compounds.

The economics follow naturally. API-based archiving offers higher margins, lower engineering costs, and a more predictable development roadmap. You can build a messaging archiving connector once and maintain it with a small team. Web archiving requires continuous, intensive R&D just to keep pace with the evolving web. Diversification into API-based archiving is, from a pure business perspective, entirely rational.

What Gets Lost When Focus Shifts

Diversification makes business sense. But it comes at a cost that is not always visible to customers.

When web archiving becomes one product among many, it inevitably receives a smaller share of engineering investment. A company running four or five product lines must allocate developers, QA engineers, product managers, and infrastructure budgets across all of them. The newest, fastest-growing products tend to receive the most attention and resources. The legacy product – often web archiving, the original business – is expected to sustain itself.

The hardest problems in web archiving – achieving high-fidelity capture of JavaScript-rendered content, handling single-page applications, adapting to new anti-bot technologies, preserving interactive elements – require sustained, focused R&D. These are not problems you solve once. They require dedicated teams who think about nothing else, who track every major browser update, every new frontend framework, and every shift in how the web is built.

When a company’s leadership is focused on launching AI-powered eDiscovery tools, building messaging connectors for Bloomberg and WhatsApp, or creating OSINT investigation platforms, the web archiving product risks becoming a maintenance-mode offering. It still works. It still captures pages. But the gap between what it captures and what a modern website actually looks like to a user may quietly widen.

For customers whose primary need is web archiving, this distinction matters. There is a meaningful difference between a company that does web archiving among other things and a company that does nothing but web archiving.

Our Choice: Stay Focused

Aleph Archives was founded in 2010 with one purpose: to archive the web with the highest possible fidelity. Fifteen years later, that purpose has not changed.

We have faced the same market pressures our competitors faced. The same investors and advisors suggested we expand into social media archiving, messaging compliance, or eDiscovery platforms. The same economic logic applied to us: API-based archiving is easier to build, easier to sell, and easier to scale.

We chose a different path. We chose to invest every engineering hour, every research initiative, and every ounce of institutional expertise into making web archiving better. Not broader. Better.

Our archives are stored in ISO 28500-compliant WARC files – the international standard for web archiving, developed by the International Internet Preservation Consortium. Every archived page is cryptographically verified, providing tamper-evident proof of exactly what appeared on the web and when. We capture the live web as it actually appears to users – not flat screenshots, not incomplete HTML snapshots, not simplified representations that strip away the dynamic elements that define the modern web experience.

When a new JavaScript framework gains adoption, our engineering team studies it and adapts our capture technology. When browsers change their rendering engines, we respond. When websites deploy new anti-bot measures, we find ways to archive content without interfering with normal operations. This is what focus makes possible: the ability to respond to the hardest problems with the full weight of your organization’s capability.

A Question of Priorities

This is not a criticism of our competitors. MirrorWeb, Hanzo, and Pagefreezer are well-run companies that made strategic decisions to grow their businesses. Diversification is a valid and often necessary business strategy. Their broader platforms serve real market needs, and their customers benefit from having multiple capabilities under one roof.

But organizations that depend on web archiving for regulatory compliance, legal protection, or institutional memory should ask a straightforward question: is the company preserving my web presence focused on solving the hardest problems in web archiving, or is web archiving one of many products competing for attention?

At Aleph Archives, web archiving is not our legacy product. It is not the original business that funds newer, more exciting ventures. It is our only product. It receives one hundred percent of our engineering investment, one hundred percent of our research budget, and one hundred percent of our institutional focus.

We are here for the next fifteen years, solving the same hard problem we have been solving since 2010 – because the web keeps changing, and someone needs to be fully dedicated to preserving it.