Searching the Vanishing Web

Tutorial Directory
cartoon: rabit ears disappear in a top hat, a comment on the vanishing Web... going, going, gone.

Gone, but not forgotten!

Have ever returned to a favorite site, only to discover the dreaded 404 page not found error message? Have you ever clicked through a Google result to find the page missing or altered since it was last indexed?

The Web is changing all of the time; pages are added or removed everyday. Top Websites are regularly overhauled. Some sites move to new addresses or are taken down when their Webmasters find other interests. Millions of pages of information vanish from the Web every day.

This lost content can be thought of as the Vanishing Web. But is a page really gone once it disappears? Is it possible to find something no longer on the net? Web pages that vanish are gone but not quite forgotten. There are steps you can take to retrieve lost pages.

We'll look at three strategies that might save your day by pulling the missing rabbit from the proverbial hat:

Google's Cache feature
Researching the Internet Archives
Creating a personal archive for offline browsing.

Using Google's Cache

The Google Cache is a powerful feature offered by the popular search engine. Typically when you click on one of Google's recommended links you go to the actual Website. If you find the page is missing or has changed and no longer contains the information you seek, you can drop back on Google's ‘Cached' feature. You can use this feature by clicking the dropdown arrow to the right of the green URL link to the site. Looked for ‘Cached' in the dropdown. Clicking ‘Cached' will take you to Google's index copy of the page (rather than to the actual Website). The cached page will appear with your keywords highlighted, making it easier for you to skim to the pertinent information. Additionally, on the first line of the page, you'll see the date Google retrieved the page! CachedView.com also is able to retrieve pages from Google's cache.

Example:

google screen shot with the word cached showing

The Google cache: Operator

Google also provides a specific operator that will reveal the current version of the Webpage in the Google index. To use this operator type it into the Google search bar. The syntax is: cache:www.domainname.com There is no space between the operator and the Webpage: cache:URL. The feature won't work if you insert a space.

Example:

screen shot of google with cache: operator highlighted

If this cached information is crucial, consider making an archive copy of the page. After all, next time Google updates the index, it's likely that the Cache copy will be removed or replaced.

Researching the Internet Archives

When a print publication is issued an ISBN number in the United States, an archive copy of the publication is sent to the Library of Congress. Similar procedures for creating a permanent record of print publications are in place in other countries. This isn't the case for Web publications. Indeed, some fear we've stumbled into an Internet Dark Ages, with our digital cultural history vanishing as we speak. However the need to archive Internet information is being recognized around the world. Increasingly national libraries are making copies of culturally significant Web pages. Of particular note is Egypt's multilingual archive effort The Library of Alexandria. http://www.bibalex.org/Website .

The Internet Archive

In the United States a partnership between Alexa and the Internet Archive is amassing a huge collection that documents the World Wide Web back to 1996. The Internet Archive claimed to have 30 billion pages in the vault in 2003. In 2016, that number has grown to 273 billion. The Internet Archive includes Web pages, moving images, texts, and audio files. These archived pages can be accessed via The Wayback Machine: http://www.archive.org

screen shot of wayback machine search box, part of the Intenet archive

Using the standard search box, you cannot search the Wayback Machine by concept, keyword, or popularity ranking. You'll need the domain name (URL) of the Website you are seeking. However, it is possible to search by title, creator, collection, media type and more using the site's robust Advanced Search tools.

Creating a Personal Archive of a Website

You can create a copy of your favorite Web materials on your hard drive by manipulating your browser's ‘save as ' feature. This allows you to keep an ‘offline' copy of the Web pages you really value. Creating a personal archive will also allow you to use the information without going online. This is a particularly useful technique if you use a portable computer, since you can save now, and read later. This method is often called ‘offline browsing'. This is also a good option if you have narrow bandwidth or limited access to the Internet. Of course offline copies won't offer database interactivity, or remain current unless you update them regularly. Consider uploading your archived copies to a cloud service such as Google Drive or Dropbox (etc.) which makes your personal archive accessible only to you (or with those whom you want to share it) from any computer.

Example: Firefox

  • To create an ‘offline' copy using Firefox, right click the Web page you want to save and from the menu, choose Save Page As
  • On the Save As panel, enter a file name (unless you want to keep the one suggested) and select Save As format (Web page, text file, etc.)
  • *It should be noted that this procedure saves just the page you have opened in your browser, and not the other linked pages on the site

Commercial archiving packages

There are also third party software products that can help create your own Internet archive of Web pages and social media (twitter, Facebook and much more). Expect to pay a fee for these services. These products are powerful and provide more automation than the simple functions of the most popular browsers. A few examples include PageFreezer, Blue Squirrel and Spidersoft.

Armed with the knowledge to use Google's Cache, research the Internet Archives, and make personal backups of treasured Web pages, you are better equipped to deal with the Vanishing Web. Luckily the resources and materials available on the Internet continue to grow; still it is useful to be able to snatch back an important page from the verge of extinction!

What about Copyright?

An archival copy of a Website that is strictly for personal reference does not violate copyright for fair use requirements. However, Copyright protections apply. It is best to have permission of the Website author before creating an archive.

Authored by Dennis O'Connor 2003 | updated 2016