Chronicling America API

Ed Sum­mers, whose dig­i­tal life is at http://inkdroid.org/, and who is a soft­ware devel­op­er on the Library of Con­gress’s Chron­i­cling Amer­i­ca project, noticed my “Chron­i­cling Amer­i­ca” blog entry of the oth­er day, where I talked about pre­dic­tive URLs I found on the site.

He point­ed out that Chon­i­cling Amer­i­ca has a pub­lished API (appli­ca­tion pro­gram­ming inter­face) that explains how one can access the con­tent of Chron­i­cling Amer­i­ca. The API is at: http://chroniclingamerica.loc.gov/about/api/

The API facil­i­tates the fol­low­ing func­tions:

  • Search —  with results returned in HTML, JSON, or Atom — allow­ing for sim­ple human read­ing of a web page, web page manip­u­la­tion of the returned data arriv­ing in JavaScript Object Nota­tion, or as an Atom feed, that can be read in a feed read­er, such as Google Read­er or Blog­lines.
  • Link —  to “titles, issues, edi­tions, and pages” using “LCC­Ns, dates, issue num­bers, edi­tion num­bers, and page sequence num­bers.” Using some of the exam­ples on the site, you can quick­ly pre­dict and test poten­tial URLs, then use and share them. You can also gen­er­ate URLs out of a data­base, once you under­stand the rules.
  • Linked Data — using pub­lished, stan­dard ontolo­gies, you can use the Chron­i­cling Amer­i­ca data­base to get at relat­ed con­tent on the “seman­tic web”, where that con­tent is sim­i­lar­ly tagged. Using RDF/OWL (Resource Descrip­tion Frame­work / Web Ontol­ogy Lan­guage) tech­nolo­gies, this con­tent can be deliv­ered to users in new and cre­ative ways.
  • Aggre­ga­tions — Chron­i­cling Amer­i­ca has assem­bled col­lec­tions of relat­ed items (such as JPEG 2000, PDF, and OCR text of the same news­pa­per page) using a tech­nol­o­gy called OAI/ORE (Open Archives Ini­tia­tive, Object Reuse and Exchange).

I am amazed by the scope of this project, as well as how open­ly the con­tent is being made avail­able. Here’s a brief snip­pet from their API page about the scope of Chron­i­cling Amer­i­ca:

There are more than a mil­lion dig­i­tized news­pa­per pages in Chron­i­cling Amer­i­ca. These pages span sev­er­al decades and many U.S. states and ter­ri­to­ries. New batch­es of data come in from part­ner insti­tu­tions through­out the year and are added to the site reg­u­lar­ly.

The open­ness of the con­tent, which such a rich, pub­lished API, means that this con­tent is ripe for re-pur­pos­ing, and the site itself can teach you how to get to its own con­tent. Just as I noticed the pre­dic­tive URLs, the folks at Chron­i­cling Amer­i­ca write:

Details about these inter­faces are below. In case you want to dive right in, though, we use HTML link con­ven­tions to adver­tise the avail­abil­i­ty of these views. If you are a soft­ware devel­op­er or researcher or any­one else who might be inter­est­ed in pro­gram­mat­ic access to the data in Chron­i­cling Amer­i­ca, we encour­age you to look around the site, “view source” often, and fol­low where the dif­fer­ent links take you to get start­ed.

I intend to do just that. What an excit­ing and pow­er­ful resource.

Comments are closed.