Using the Wayback Machine for Genealogy

The Wayback Machine, a project of The Internet Archive, (current version: http://web.archive.org/; new beta version at http://waybackmachine.org/) is an attempt to archive the complete content of the Internet. Brewster Kahle, the co-founder of the Internet Archive spoke about the project at the Saturday keynote address at RootsTech 2011.

The key purpose of the Internet Archive is to make the Internet available for future historians and other researchers, in order that they might know what we were saying and doing in this often ephemeral environment called the Internet.

But it can also help us in the here and now. If you ever encounter a publicly available site that has disappeared, you may find it elsewhere on Google, but, failing that, you may find it in the Internet Archive.

For example, on an old Rootsweb page that I am in the process of migrating to this site, I have a link that is no longer working. (As the lingo goes, I have “link rot”.)

I try to link to:

http://www.geocities.com/Heartland/Hollow/1936/index.html

When I try to navigate to this site, I get a message saying:

“Sorry, the GeoCities website you were trying to visit is no longer available.
GeoCities has closed, but there’s a lot more to explore on Yahoo!”

This does not offer much solace. However, when I go to the Wayback Machine and enter the URL I was searching for, I receive the following link:

http://web.archive.org/web/*/http://www.geocities.com/Heartland/Hollow/1936/index.html

Alternately, if I go to the beta version of the new Wayback Machine and enter this search I get to:

http://waybackmachine.org/*/http://www.geocities.com/Heartland/Hollow/1936/index.html

This page shows me the various snapshots the Internet Archive got around to making of this page. When I click on the most recent, I see that it has a link to a new location:

http://freepages.genealogy.rootsweb.ancestry.com/~pre1800vias/

I can also look at other snapshots to see what the site looked like at that time.

The Internet Archive cannot instantaneously capture the whole Internet, but every couple of months, it traverses most of the public web, captures what has changed, and moves on. You should not rely on it, either as a web user, or as a webmaster, however it can prove very handy at times. Try it the next time you run across a link that you are sure used to work, but no longer does.