In collaboration with the US National Archives and Records Administration (NARA), the Library of Congress and others, the PDF Association will participate in an Andrew W. Mellon Foundation project to identify the essential characteristics and optimal functional requirements of email messages and necessary related information in a PDF technology-based archive.
Running over six months, the project's objective is to publish a technical white paper defining how email messages and their identified essential characteristics and functionality should be converted into PDF containers that can be considered - in the context of captured information - provably authentic and complete email records.
Project deliverables include a published report and appendices that define the significant characteristics of email required to meet the needs of the email archiving community. In addition, the report will lay out use cases for email-to-PDF software, while providing recommendations that vendors can use to build such archiving capabilities into email clients or third-party tools.
From 2016-2018 the Andrew W. Mellon Foundation and the Digital Preservation Coalition supported the Task Force on Technical Approaches for Email Archives which released a report of their findings: The Future of Email Archives (PDF). Published by the Council on Library and Information Resources (CLIR) in August 2018, this report provides a detailed analysis of the technical challenges to preserving email and constructed a working agenda for the community to improve and refine the technical framework for email archiving, including developing interoperable toolkits to fill in the missing gaps.
One of the gaps identified and a goal of many of the other email archiving projects is to identify a format or formats that are appropriate for use in storing email for long-term preservation. One strong contender identified in the Email Task Force report is PDF, and specifically, the PDF/A subset.
The traditional “print and file” approach to email archiving is cumbersome. As practical experience and digital preservation have advanced, the traditional method is now recognized as destructive, resulting in a loss of contextual information such as metadata, changes to the look and feel of email and the associated user experience, and dissociates messages from their attachments.
As validated by the U.S. Government's Managing Government Records Directive (PDF), which required that by 2016 federal agencies maintain all email records in an accessible electronic format, the records management and archival communities have long recognized that “print and file” is no longer an acceptable records management approach for email.
As part of its support for this directive, and in furtherance of its own mission to bolster the continued integrity of electronic records throughout the federal government, the National Archives and Records Administration (NARA) issues formal guidance to federal agencies defining the file formats they may use when transferring electronic records for permanent retention. While this guidance is directed at federal agencies, in practice it is widely adopted by the archival and records management communities at large.
Maintaining email for archival purposes using often unfamiliar and/or proprietary formats is a daunting task, so many organizations choose the safe and predictable PDF format, in no small part due to its similarity to the familiar "print and file" approach. With the sunset of the print and file method, saving email messages out of their native applications as PDF files provides one potential preservation pathway where records remain in a familiar format that can be managed in an electronic record-keeping system. Additionally, some legacy email systems and systems supporting email encryption ONLY provide support for exporting messages as PDF files. For these reasons, as well as other enabling features of the format, "saving as PDF" remains a compelling option for many organizations and archives.
Since PDF is designed to replicate the look of paper documents, some of the same problems encountered when converting email messages to paper are evident in conversions to PDF. This grant seeks to begin addressing those problems.
Ironically, it is the flexibility of PDF that makes preserving email messages challenging. In the absence of an industry-supported profile of PDF for the purpose of archiving email there are simply too many ways to store and associate the various components of email in PDF format documents. Moreover, the lack of such a profile inhibits development of end user applications for interacting with such archived email.
Current tools vary widely in how they handle the archivally-significant properties of email; none address them in a manner that is considered fully “archival”, according to Principal Investigator Chris Prom. Email messages converted to PDF files include the following known issues:
In spite of these issues, inconsistencies, and complexities, PDF is a very viable option for email archiving:
PDF could be a powerful solution for archiving email, but the necessary profile of PDF to meet archival requirements will require significant work. This Andrew W. Mellon Foundation-funded project assembles a team to draft and publish a report detailing specific properties of email key to archival in the PDF context, along with use cases for converting archival emails to PDF and recommendations to the vendor community regarding development of the profile during a planned secondary phase of the project.
The project is led by its Principal Investigator (PI), Christopher Prom, Professor and Dean for Digital Strategies at the University of Illinois at Urbana-Champaign. The team consists of a wide variety of government, academic and industry experts, including members and representatives of the PDF Association.
The project's intent is to provide building blocks for interoperable technical solutions for email archiving. By defining the archival needs in a practical way, systems builders can use them as functional requirements which will create consistent email packages across multiple vendors and platforms.
The primary outcome from the current project will be a technical white paper identifying the significant properties of email and making them actionable in a transformation to PDF. A secondary outcome is direct engagement between the email and PDF communities. By establishing working groups and formal connections between PDF vendors and members of the email community in the academic and government sphere, the goal is to service real needs for advanced email archiving with simplified pathways to help users more easily find their way to best practice in records-keeping.
The work done during this project will aid in a planned follow up project in which both PDF and email industry experts will develop:
Members of the PDF Association and others interested in learning more about the organization's involvement with this project should contact Executive Director Duff Johnson.
Duff serves the PDF industry as ISO Project co-Leader and US TAG chair for both ISO 32000 (the PDF specification) and ISO 14289 (PDF/UA). As Executive Director of the PDF Association, Duff coordinates several working groups, speaks at a wide variety of industry events and promotes the advancement and adoption of PDF technology worldwide. An independent consultant, Duff Johnson is …