Case Study 1
One of the most influent investment and wealth management group has chosen CAMA® web archiving platform. The group is known for offering an individualized and high-quality service for its clients. The websites contain tremendous amount of data and dynamic contents (including
links, rich media, video, and Flash…)
The group’s needs were mainly:
- Preserving their web presence to comply with their digital data retention/management policy and certain regulations that they were under
- Quick Access to all their archived data in their original form
All the crawls were automated, archives were replayed seconds after the crawls were finished and with the on-demand web archives exports feature (PDF/PNG) customer had the freedom to download any desired archived page instantly.
Online marketing/communications can present a challenge for securities traders, investment advisors, banks, and others in the financial services industry. The benefits of advancing technologies must be weighed against the risks associated with non-compliance in the area of books and records retention.
Failure to meet the demands of industry standards can result in hefty fines and bad publicity. Multiple sets of guidelines for the financial industry (issued by SEC, FDIC, FSA, SOX, FINRA, and others) demand the preservation of business records (both paper AND electronic) in such a way that the data can be reproduced in a timely and complete manner to a regulator. These requirements are now being extended to include newer tools such as social media platforms, and FINRA has advised that no compliance grace period will be in effect for these new technologies. It’s critical that firms implement a robust records retention policy for their websites and social media pages. Should your corporate web presence be investigated or questioned, a perfect representation of your company’s online activity is a necessity — and that’s exactly what CAMA® provides. Website archiving is vital to fulfilling many key FINRA and SEC regulations. Start complying today.
The CAMA® Web archiving platform offers open (WARC -ISO 28500:20092),and innovative (scheduled crawls, export Web archives as PDF/PNG, antiviral check, CAMA® Appliance, realtime results deduplication, multilingual search and translation).
Case Study 2
A government financial agency under regulation needed a fully automated web archiving In-House solution in order to collect and archive its precious data.
The agency’s needs were mainly:
- Ensure that the data was perfectly archived and stored on-site
- Monitor the whole process
- Cope with the growing amount of digital data and access it instantly
Aleph Archives provides a digital timestamp and signature for each archived page, ensuring data integrity and authenticity.
Virtually all government agencies have regulatory obligations to preserve electronic content. Because your agency’s online content is increasing both in complexity and volume, and because governments are held accountable for the information they publish on the web.
The 2006 changes to the Federal Rules of Civil Procedure indicate that all organizations (including governments) must be able to find, capture, and produce electronically stored information that might be relevant to a judicial or regulatory request. This can’t be done with server backups, CMS revision control, or other outdated methods. You need a solution that can provide indisputable proof of your online records integrity and authenticity (as required by the Federal Rules of Evidence). For example, 2010 saw the Executive Office of the President (EOP) issue a solicitation to: Provide the necessary services to capture, store, extract to approved formats, and transfer content published by EOP on publicly-accessible web sites, along with information posted by non-EOP persons on publicly-accessible web sites where the EOP offices under PRA maintains a presence, throughout the term of the contract.
Other requirements come from:
- Presidential Records Act (PRA)
- National Archives and Records Administration (NARA)
- E-GOV - electronic records management initiatives
- Guidance on Managing Records in Web 2.0/Social Media Platforms, October 20th, 2010
- Library of Congress
- Federal Rules of Civil Procedure (FRCP)
- Department of Commerce
- Department of Energy
- Department of Justice
- Environmental Protection Agency
- Office of Management & Budget
- Securities and Exchange Commission Rules (SEC)
- Library & Archives Canada
Case Study 3
A major hardware manufacturer was looking for a business intelligence solution capable of gathering data and accessing it at any time. This client is not under any regulation; however he needed to automate efficiently the process of gathering important data relative to the market’s trends and the competition.
- Collecting a large amount of valuable data and accessing it immediately.
- Archive all the pertinent data.
- The need to provide accurate data to all the departments.
At Aleph Archives we do not consider “compliance and litigation support” as the main need that leads companies to acquire our solutions. There are different technologies and solutions built around that purpose only. CAMA® is a solution that goes beyond the “compliance” need to provide automated powerful business intelligence and monitoring tools.
CAMA® is part of the four main decisional steps in BI:
- Strengthening: Once centralized, data must be analyzed and distributed inside the data warehouse. This pre-processing makes it easier for CAMA® tools to access data, since data warehouses are automated.
- Processing: From a request based upon dedicated search forms, the analysis tool collates related data to find relevant information
- Reporting: This step is about broadcasting and presenting information with added value so that they appear as readable as possible to the decision-maker
- Data extraction (Web Scrapping): To obtain significant results, one must gather Web-based data wherever they remain. When connected to Web data sources, CAMA® gathers relevant data and centralizes it in its distributed data warehouse
Social Media Preservation
Law Journal survey on tools and best practices on Social Media Data Preservation.
Going beyond web archiving technology
At Aleph Archives, we believe in two missions.
First of all, we work hard to build the most advanced solutions for our customers and partners. Then, we want our technology to help organizations and libraries preserving the data.
read more When we started building CAMA® we were certain to accomplish our main goal which was "innovate with a powerful solution", since then we kept improving it beyond the basic compliance and data preservation need.
We are currently offering CAMA® as a "multipurpose" web archiving solution; we couldn't even name all the new features and improvements. In 2011, we decided to provide our customers and partners with a new solution offering better performances at a competitive cost.
We put enough work and trust in our technology, to completely install it in our customer's premises. This version of the solution is fully automated and requires a very limited human intervention.
With this advanced technology, we are assured to provide an excellent service to our customers. At Aleph Archives, we are essentially launching partnerships with many non-profit organizations involved in data preservation and web archiving. We have already started working on more than six projects in order to help curators and archivists in America, Europe, Asia and Africa.
We would like to thank Internet Archive for inspiring us, and for sharing the same passion and devotion for the web archiving.
A Strategic Decision
Choosing a web archiving solution is a strategic decision for the company. Most companies consider web archiving for compliance purpose only.
These are the reasons why a web archiving solution can be a strategic decision for the company:
- Compliance and litigation support
- Gathering and processing data, using business intelligence tools
- Keeping the data helps departments analyse communication, marketing, pricing, development strategies and find new alternatives
- Managing efficiently costs (online marketing and communication campaigns etc.)
Web archiving helps you understand where you lost money, and how you can be more competitive learn more about the competition, on consistent basis (published data on the internet).
Such as Customer Relationship Management solutions, web archiving helps you collect, organize, process, monitor, and retrieve data; these actions lead to the profit generation.
Web archiving buyer's guide
Web archiving is fairly new compared to other data retention practices that have been existing for a long period of time.
That makes it even harder for a consumer to gather the necessary data and make a smart purchasing decision.
This guide contains basic information about web archiving, lists some of the most popular providers on the market and sets up some criteria and choices.
Web archiving has made significant progress during the last five to seven years. It now offers a choice of approach to both policy and supporting technology. These choices should be considered carefully prior to make a purchase.
Client-side archiving uses an archival crawler, derived from search engine crawler technologies, with significant enhancements to ensure that complex and hard-to-reach content can be found and captured, as well as stored without change. Starting from seed pages or entry points, these tools automatically capture pages and parse them to extract all links. The process repeats and continues as long as newly discovered pages remain within the scope defined for the crawl. The captured web content and embedded files are stored unchanged — original and authentic copies, an exact equivalent of what the generic user would have received in their browser at the time — and preserved in a flat, standards-based and self- contained file format that can be confidently considered as future proof.
This is especially important within a legal context. To be effective this method requires a crawler with excellent link extraction and path-finding algorithms that can work in a wide range of circumstances and site/page designs. In addition to client-side archiving, there are two alternative methods to capture web content. Both methods need to be operated from the server-side; require prior authorisation to services; and need access to both front-end and back-end servers.
The first of these alternative methods, called transaction archiving, consists of the systematic capture and archiving of all browser/server exchanges (request/response pairs), resulting from the interaction of users with sites, regardless of their content type and how they are produced. Transaction archiving enables tracking and recording of every actual instantiation of content in an authentic flat HTML form, easy to maintain and preserve over time.
Moreover, it can be used to archive hidden web content, provided this content is requested, i.e. read, by the websites’ users during the capture time.
However, transaction archiving generates unnecessary duplicates of frequently- visited pages and raises serious privacy concerns as the method implicitly relies on usage tracking.
The second, and more obvious, alternative to client side archiving is « server- side archiving ». This consists of directly copying files in the document folders to back-up servers. Although it might appear to be the simplest approach, it is in fact seriously flawed, from both the preservation and archive access points of view.
To make certain that any web content archived using this method can be properly restored, server-side archiving requires that all original CMSs, databases and other software are archived alongside the content or are actively maintained in an operational state; or that the content is migrated to newer CMSs, databases, etc. In any case, these activities will be required for the whole period of archive retention. Interestingly, IT backups essentially rely on this method in almost all cases, systematically failing to meet long-term preservation and access capabilities that are essential for legal and compliance requirements.
However, for some types of hidden-web content, this method can prove to be useful, mainly in situations where it is required to archive parts of websites that a client-side crawler cannot reach.
Although most, if not all of the solutions available in the market today use a client- side archiving approach we can split these in two categories.
First category we’re going to call it website copiers due to certain similarities with HTTrack, this technology consists mainly in taking snapshots of websites and archiving them.
- low cost solution
- small disk usage
- no dynamic content played
- does not replay the archives
- low level of depth
- Examples of companies on the market: Iterasi, Next point these solutions are suitable for litigation support and compliance (not dynamic media)
It uses web bots (i.e crawlers) that capture all web pages (including social media). The web pages are stored exactly as they are captured (including links, rich media, video, and Flash)).
- a technology that capture multiple web formats in dynamic websites
- high-quality archive accessibility and rendering
- fulltext search for large web archive collections
- deduplicated full-text search results in real-time
- daily archiving capabilities
- support of WARC ISO file format
- In-House solutions
- These solutions costs more than the ones from the first category
- consume ressources (disk space, cpu etc.)
Some companies claim to have an In-house solution but most of them only store the data In-House. Examples of companies offering this solution: Aleph Archives, Hanzo Archives, Pagefreezer.
Most of the solutions mentioned above are cloud based; the pricing differs from a company to another since there is only a few competitors offering complete solutions.
Usually and logically solutions that take only snapshots are more affordable due to a less complicated technology and a small disk space usage. Solutions offering a full capture of the websites in depth are more expensive, and usually charge per URL, and base their price on the archiving frequency, the scope (list of URLs), and the operation fees (maintenance, data security, retention etc.).
Some companies base their prices accordingly to the data storage.
In-House solutions: there is a very few companies to provide a fully automated In-House solution. The In-House solution’s price is hardly determined, and it’s usually more expensive than the cloud based however it can be considered as a one-time fee (the customer purchase the licence) plus maintenance and support if the customer chooses a support plan.
Recommendations prior to buy a web archiving solution
specify your needs: if you are under a regulation, you can be compliant using any of the solutions mentioned above, however in making your decision you will need to consider the fact that a web archiving solution can go beyond “the compliance and litigation support need”, such as providing relevant data to departments, and preserve your corporate heritage. Numerous corporations and enterprises which are not under any regulation choose to acquire a web archiving solution for business intelligence, and social media monitoring content in order to enhance their customer service and avoid false disclaimers.
acquiring a web archiving solution, can be an investment or an annual expense. The In-House solution can be a long term investment, and allows you to have more freedom, security, and no latency. Some companies which offer technologically competitive solutions recommend the In-House deployment.
Ask about the archiving process and judge if it is suitable for your company and compare the capabilities of all solutions.