The Daily Dose Presents: Digital Curation at the British Library

DailyDose2Welcome back to the Daily Dose—today with a bit of Rogue Scholar thrown in.

In the past few weeks, we have been featuring various museum and library collections, and IMG_7423today I have invited Dr. James Baker to the forum. He is a new digital curator for the British Library—and it you’ve never been there, here is a snap of me doing the happy dance out front. I do love it there.

Thank you, James, for joining and participating in the ongoing conversation among librarians, curators and an interested public excited about outreach! I hope to feature more from the British Library in the future–such a wonderful place!

About James Baker

Much as I’m usually uneasy about people defining themselves by their profession, I’m comfortable calling myself by my trade: I am a historian. All the same, I admit that History took some time to really ‘grab’ me.

Having left the University of Southampton in 2005 with a BA, an MA, an enthusiasm for maps and a sense that eighteenth century London was deliciously vile (at least for those who didn’t have to live there…), I went into the world of insurance in search of a career. But the eighteenth century lingered at the back of my mind – and in particular eighteenth century graphic satire – so much so that it brought me back to higher education in 2007 as a doctoral student at the University of Kent. My thesis – Isaac Cruikshank and the notion of British liberty, 1783-1811 – was completed in 2010. It explored the satirical prints (often known as ‘caricatures’) designed by Isaac Cruikshank, their significance in the late-Georgian marketplace and their emphasis on social as opposed to high political drama. These social satires included many on medical men, which provided useful evidence for demonstrating how print audiences were obsessed with deploying domestic examples of unacceptable behaviour (and hence not just the behaviour of foreigners) to define their conception of British liberty.

Since completing my thesis, I’ve pursued an alt-ac career, combining lecturing at the University of Kent with work on Kent’s institutional research repository, cataloguing archives for the Rochester Bridge Trust, and project managing City and Region. I also retained my research interests, drifting further towards digital humanities and the history of technology along the way. In 2012 I was awarded a postdoctoral fellowship by the Paul Mellon Centre for Studies in British Art, the purpose of which was to free up some time to convert my thesis into a book, and in March 2013 I was appointed Digital Curator at the British Library. I hope to have the book completed by the end of the year.

About the collections

As a relative newbie at the British Library, I feel far from qualified to discuss the wealth and richness of the digital collections we hold and which extend from medieval manuscripts to digitised newspapers, early spoken word recordings, and – since April 2013 – an archive of the whole UK web domain. Further, a digital curator doesn’t so much curate the digital collections at the British Library, rather our work (there are five of us in total) involves opening up those collections, either by advising on infrastructure for access, working on collaborative research projects or offering training in digital scholarship methods.

Nonetheless, a flavour of what we hold is captured by the tagline for British Library Labs project: ‘Every book tells a story, but what can 68,000 books tell you?’. The short answer is of course quite a lot, but who has the capacity to remember – let alone read, process and cross-analyse – large corpora of books? And so researchers turn to digitised books and digital methods to read the books for us: a type of collection level analysis often referred to as ‘distance reading’ (Moretti, 2005). But distance reading can only work if the stories locked within thousands or millions of books are discoverable by a machine and its operator: the researcher.

In March we set about improving the usability of a digital asset well overdue researcher attention: the 68,000 (or so) books that make up the 19th Century Printed Book Dataset, a collection digitised during the now defunct Microsoft Books Project. The work involved creating a series of Python scripts which reorganise, refine and index this content so researchers can more easily set their corpus analysis tools to work on the data therein. These scripts are currently crunching away on the data, but soon we will have around 30TB of out of copyright data ready for exploitation by any interested researcher: most of them first editions, many published in London, but with plenty for non-British historians to get their teeth into (samples here, list of books here). In the short term this archive will be held locally (on my desk in fact!), but in the future we hope to open it up for interrogation online.

The 19th Century Printed Book Dataset is an example of the sort of digital collection the British Library is best known for: digitised copies of old stuff. But we’re also beginning to capture the aspects of our heritage that never made it out of the digital form. From 6 April, the British Library, the National Library of Scotland, the National Library of Wales, Bodleian Libraries, Cambridge University Library and Trinity College Dublin, gained powers to archive the entire UK web. This is a huge undertaking, automatically harvesting what has until now been a more modest and hand-curated exercise run by the UK Web Archive. The potential value of a web archive to the scholarly community is obvious, and I’m looking forward to being here to support the ideas they come up with. But of course, historians tend to play public documents off against private documents in their work, and this should be no different for born digital archives. As a consequence the British Library is also seeking to explore ways of capturing, archiving and reusing personal digital archives. Jeremy John, Curator for e-Manuscripts, heads up our eMSS Lab where the software, emails and documents held on personal computers are treated the same as any fragile manuscript, as all other historical stuff. The archives of the poet Wendy Cope highlight the challenge: to archive emails and other digital correspondence as we do for handwritten and typed letters, journals, and memos. Again this is a huge undertaking, but a necessary one without which we would be at serious risk of losing vast swathes of our national heritage. If you are interested in knowing more about Jeremy’s work (I certainly can’t do it justice), I’d heartily recommend taking a look at a talk on Personal Digital Archiving which he gave in 2011.

These examples are only a fraction of the digital collections the British Library holds. These digital collections are constantly expanding and our means of providing access to them constantly improving, so for the latest developments stay tuned to the British Library Digital Scholarship blog!

2 Replies to “The Daily Dose Presents: Digital Curation at the British Library”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s