Recap of international web archiving community meeting

June 12, 2014
logo of the International Internet Preservation Consortium

Web archivists Ahmed AlSum and Nicholas Taylor and LOCKSS Chief Scientist David Rosenthal recently attended the International Internet Preservation Consortium (IIPC) General Assembly, an annual meeting of national libraries, research universities, non-profits, and service providers engaged in web archiving. This was the first General Assembly we all attended since Stanford University Libraries (SUL) joined the IIPC, though we had all previously attended meetings under the auspices of other organizations.

Niels Brügger's closing remarks best captured the emergent theme of the meeting: how can we best serve researchers, broadly construed? The word clouds on the fourth and fifth slides of his presentation (PPT) helped to visualize how the focus of the international web archiving community has shifted over the past decade.

In keeping with the emphasis on understanding how web archives are being used, the open day (PDF) consisted of presentations by researchers working with historical web content. Some examples included an initiative to create distributed web science research centers (PPT), the user demographics of shuddering consumer web services (PDF), the proferring of web archive datasets on cloud infrastructure (PPT), and an architecture for archiving of cited web addresses in scholarly publications deposited into a repository.

The presentations and discussions from the member-only days (PDF) have not been systematically gathered, but some are available. There were discussions about collaborative or, at least, mutually-informed collection development; models of close collaboration between researchers and web archiving organizations; exchanging of best practices for full-text indexing; and updates on the OpenWayback collaborative development effort.

The last day-and-a-half were open workshops (PDF) on topics including crawl engineering, the web archiving tool landscape, the role and responsibilities of curators, and novel crawler architectures for capturing dynamic content or facilitating creation of precise corpora through interactive archiving. I co-organized the Curator Tools Fair (PDF) with Abbie Grotke and presented on strategic web archive collection development.

SUL will be assuming an increasing role in the IIPC in the coming year. I have stepped up as co-lead of the Access Working Group along with Daniel Gomes; we will continue to contribute to a technical proposal for profiling of web archives to enable scalable Memento aggregation, and we are exploring co-hosting the next General Assembly in the San Francisco Bay Area in collaboration with California Digital Library and Internet Archive.