Ed Summers, whose digital life is at http://inkdroid.org/, and who is a software developer on the Library of Congress’s Chronicling America project, noticed my “Chronicling America” blog entry of the other day, where I talked about predictive URLs I found on the site.
He pointed out that Chonicling America has a published API (application programming interface) that explains how one can access the content of Chronicling America. The API is at: http://chroniclingamerica.loc.gov/about/api/
The API facilitates the following functions:
- Search — with results returned in HTML, JSON, or Atom — allowing for simple human reading of a web page, web page manipulation of the returned data arriving in JavaScript Object Notation, or as an Atom feed, that can be read in a feed reader, such as Google Reader or Bloglines.
- Link — to “titles, issues, editions, and pages” using “LCCNs, dates, issue numbers, edition numbers, and page sequence numbers.” Using some of the examples on the site, you can quickly predict and test potential URLs, then use and share them. You can also generate URLs out of a database, once you understand the rules.
- Linked Data — using published, standard ontologies, you can use the Chronicling America database to get at related content on the “semantic web”, where that content is similarly tagged. Using RDF/OWL (Resource Description Framework / Web Ontology Language) technologies, this content can be delivered to users in new and creative ways.
- Aggregations — Chronicling America has assembled collections of related items (such as JPEG 2000, PDF, and OCR text of the same newspaper page) using a technology called OAI/ORE (Open Archives Initiative, Object Reuse and Exchange).
I am amazed by the scope of this project, as well as how openly the content is being made available. Here’s a brief snippet from their API page about the scope of Chronicling America:
There are more than a million digitized newspaper pages in Chronicling America. These pages span several decades and many U.S. states and territories. New batches of data come in from partner institutions throughout the year and are added to the site regularly.
The openness of the content, which such a rich, published API, means that this content is ripe for re-purposing, and the site itself can teach you how to get to its own content. Just as I noticed the predictive URLs, the folks at Chronicling America write:
Details about these interfaces are below. In case you want to dive right in, though, we use HTML link conventions to advertise the availability of these views. If you are a software developer or researcher or anyone else who might be interested in programmatic access to the data in Chronicling America, we encourage you to look around the site, “view source” often, and follow where the different links take you to get started.
I intend to do just that. What an exciting and powerful resource.