You might already be familiar with blog archiving and collections of blogs that web archiving projects are putting together. There’s the collection of blogs from the Wellcome Library, for example, or from the British Library, or the Library of Congress BLawg Archive. The common and familiar scenario is that an organisation runs a web crawler such as HTTrack or Heritrix to capture copies of content – in this case, blogs – and provides subsequent access to the web site (blog) as an integral whole. This is perfectly acceptable if the requirement is that the site is presented as an integral whole. ArchivePress, on the other hand, is based upon the premise that organisations may not have this requirement and can have different reasons for wishing to capture copies of blog content, and different intentions for managing and using the content once they have it. For instance:

Scenario 1: A university institution has given its academics free reign to blog on whatever software platforms they choose. It later realises that this output is of academic and record-keeping value but that it has no record it and that any attempt to force staff to switch to an internally hosted service would probably be badly received. ArchivePress is installed and configured to collect copies of all blog posts from a list of pre-selected blogs; the contents are consolidated into a single database which can then be re-presented to the user via a single interface branded with the University logo.

Scenario 2: A local archiving institution wishes to collect blog content around a particular local theme (for example, about a local author or event) and aggregate the contents into a single and easily searchable resource for access via its website. It does not have the technical resources to manage a harvest-based web archiving approach, nor the finances to invest in a commercial web archiving service. ArchivePress can be implemented and managed with very little technical knowledge and enables the institution to collect copies of posts and comments from a pre-compiled list of blogs, store them in a easy-to-manage database, and re-present the contents as a single resource for users.

For these organisations, it is the raw and aggregated content that is of primary target, and not the complete website that is used to host each blog. Their core requirement is the consolidation of content from different origins into a single resource for re-use and re-purposing. ArchivePress enables them to focus on this content, along with the necessary metadata to identify each resource, and can meet their needs better than a ‘traditional’ harvesting approach.

We appreciate that this is but the first step in preservation: the simple act of collecting the content into a database does not mean it has been preserved! But, we believe that the tools we will use and the infrastructure we will provide will be conducive to preservation. Just how we’ll do that will be covered in more detail in a subsequent post.

We also recognise that there are all sorts of legal issues that would need to be addressed before an institution should implement ArchivePress. Whilst we won’t be providing legal advice pe se, we will be exploring these during the course of the project. First however, we will be looking into the subject of user requirements in more detail. This is vital to ensure that we provide the functionality required by institutional and organisational users. More posts about this subject will appear as our work progresses.

4 Responses to “ArchivePress: a different tool for a different purpose”

  1. [...] where ArchivePress could be used – not just by archival institutions (as per scenario 2 in my post below) but also by local groups who want to develop an archival collection along a particular theme and [...]

  2. [...] where ArchivePress could be used – not just by archival institutions (as per scenario 2 in my post below) but also by local groups who want to develop an archival collection along a particular theme and [...]

  3. very nice blog, keep it up, will share this to others.

  4. A thoughtful insight and ideas I will use on my blog. You’ve obviously spent a lot of time on this. Congratulations!So I sincerely say you produce some exceptional points and I will publish a variety of thoughts to add in briefly.

Leave a Reply