What Is The Invisible Web?

What Is The Invisible Web? How Can You Search It? Why Would You Want To?

There are parts of the Internet that search engines simply can't see. The search engine's robotic "crawlers" either miss or are locked out of these areas on the Internet. Behind the barriers lie treasure troves of quality information. Collectively this information is called the Invisible Web. There are many categories of invisible information missed by the popular search engines. The most common are:

webpages that have been intentionally skipped by search engine crawlers
webpages that are dynamically assembled from online database content
password protected webpages
non-HTML resources like image, audio, animation, and PDF files

Intentionally skipped: Webpages that have been missed or skipped by the crawlers are sometimes called the nearly visible or opaque web. These pages could be indexed but are skipped intentionally to save the search company time and money. Because crawling and indexing is expensive, search engines limit the number of pages they copy from each site. This can leave hundreds of pages out of the search index, but still available to the site's users. (For more on this topic see the IMSA Module: Opaque Web.)

Dynamic Material created on demand: Online databases create HTML pages to match your criteria. These pages are dynamically assembled when you query the database. Search engines do not index the contents of online databases, and they cannot index dynamically assembled pages that don't exist until the user creates them. Searching for a book on Amazon.com is one example of using the Invisible Web. Amazon's database will assemble a unique page to match your requests.

Password protected information: Many sites have password-protected webpages. Search engine spiders can reach the front door, but can't crawl in. On the other side of the barrier is quality information developed and categorized by professionals. Before you search these pages you must first establish an account. Some sites are free, others charge a fee. Regardless the materials beyond the password barrier can't be reached by search engines and remain invisible until you establish an account , obtain the key, and login to the website.

Non-HTML formats: Search engines were originally designed to comb through HTML text pages and create an index of keywords. Non-HTML file formats that didn't contain much text were routinely skipped. These file formats include image, audio, animation, and PDF files. Recently some of the commercial search engines have added image and PDF files to their indexes. Additionally specialized search engines are available to help you find these and other types of files. (For more on this topic see the IMSA Module: Formats.)

Vast Resources await: The Invisible Web is a vast resource estimated to be from 2 to 500 times the size of the easily accessible information on the public web. The materials found on the Invisible Web are often more focused, current, and professionally relevant than what you can find on the public web using search engines. Knowing how to use Invisible Web resources will make you a more efficient and powerful researcher. (For more on this topic see the IMSA Module: How Many Pages Are There On The WWW?)

Cartoon Image of a computer reading from paper text. Listen

Note: Multimedia Materials open in a new window. Download and install the latest Macromedia Flash Player to use the video or audio materials.

Authored by Dennis O'Connor 2003-2005