During the past 2 weeks, the first beta release offering up a new version of the Wayback Machine, provided by the Internet Archive, has gone live on the web. It offers web researchers several new features (with more to come) and a clean, no-nonsense user interface. Wayback is an essential internet research tool and should be one of the first resources discussed when teaching basic web-based research skills. It offers more than 150 billion archived web pages with some material dating back to 1996.
Like any research tool it has limitations; nothing’s perfect. For example, it’s far from a complete record of every change made on every web page; there is a delay of at least 6 months before what has been archived becomes accessible, and it’s not keyword searchable. Nevertheless, Wayback is the easiest, quickest, and in some cases only chance a researcher has in trying to find material that is no longer available on the web.
The first thing you’ll notice is that Wayback now has its own URL. You can access the beta at http://www.waybackmachine.org.
When you arrive at the site you’ll notice that except for a bit of text below the search box it’s a basic search box and two buttons. That’s it. This is in stark contrast to a massive amount of text you can see surrounds the interface (what’s now being referred to as the “classic interface”) at http://www.archive.org or http://web.archive.org. The two buttons are labeled “Latest” and “Show All.”
The “latest” button will take you directly to the most recent version of the page in the Wayback database, skipping the page where links to the page/URL from various dates are found. More on that in a moment.
Next up is the “Show All” button. When you click this button you’ll be moved to another new feature. Instead of heading to a page that simply lists the dates when archived versions of the page/URL are available, you’re taken to a page that not only looks great but also provides users with a large amount of useful data in a very small space.
Let’s take a look at this page from top to bottom using the White House homepage as the example (http://waybackmachine.org/*/http://www.whitehouse.gov).
First, you’re provided with several “fast facts” about what Wayback has crawled and archived for the page you’re reviewing. You’ll learn how many times the URL has been crawled since the Wayback Machine began in 1996 and the first time the URL was crawled. By the way, to use Wayback terminology, each crawl creates a “snapshot” of the page.
Next you’ll see the sparkline toolbar. Sparklines (the dark vertical lines) allow you to quickly visualize, based on the length of a bar the amount of times a specific page was crawled during a specific year. For example, note the differences between the sparklines for 1998, 2001, and 2009.
You can also click on a specific year (let’s use 2001) and be taken to a calendar display that provides greater detail and allows you to quickly see the dates when a specific page was crawled during a specific year.
When we looked at the sparkline display you might have noticed that the Whitehouse.gov page was crawled more frequently in the last quarter of 2001. The calendar display makes this even clearer. After 9/11 you can see the crawl intensify.
Each small circle on the calendar shows that the page was crawled at one time on that date. The larger circles illustrate that the URL was crawled multiple times on that day. If you move your cursor over the large circle placed over Sept. 15, 2001 you’ll see that the page was crawled three times that day. We know this since three specific times (also a new feature) are provided.
Click any/all of these links to see what the page looked like at that time. It’s important to remember that the crawl of a page and the resulting snapshot does not necessarily mean the page was updated when it’s compared to the last time it was crawled.
Let’s head back to the toolbar and take at look at the arrows found to the right of the sparklines. They allow you to navigate forward or backward in time, one snapshot at a time. The date located on top of the black background is the date the snapshot was taken.
Also, if you move your cursor over the sparkline, you can see the dates change to those where other snapshots are available. This is a very useful way to quickly navigate and access the same page on different days.
This first release of the Wayback Machine beta already provides a 100% improved experience with major improvements both in navigation and in the amount of data available for a specific page. Based on recent feedback, the developers are already working on a few improvements, such as more detail in the timeline graphs, to indicate where changes occurred, and if a page ends up being redirected.
Kudos to the Wayback Machine Team! We’re looking forward to seeing what’s coming next.
It’s good to see and read that the web and the long term preservation of that material are becoming frequently written about and discussed topics by a growing number of info pros. This is a very important topic not only in terms of the record we will leave future generations but also simply trying to track down some born-digital material no longer available that was placed on the web only 5 years ago. At the same time, personal digital archiving is also getting more and more attention.
One solution that some organizations are using is the Archive-It service also from the Internet Archive. This fee-based service allows users to specifically determine which URLs they want crawled and the frequency they are to be recrawled. In the past several years, the recrawl rate has taken on extra importance because of the constant creation and updating of social media.
From the search/research perspective, Archive-It provides access to nearly 1,300 collections (as selected by the client) of archived web pages. They can vary from a collection of multiple social media sites from NASA to subject based collections from the University of Toronto.
Also, unlike the Wayback Machine, the pages that are available via Archive-It are keyword searchable. If you’re not familiar with Archive-It this service can be very valuable to organizations and researchers.