Searching the Invisible (Deep) Web

Tutorial Directory
man nearly invisible walking in fog.

There are parts of the Internet that search engines simply can't see. This page you are reading is one of them. A search engine's search bot doesn't have a username and password to access this subscription only page. Other times, the bot just doesn't go deep enough into a site to collect all the information saved there. Behind such barriers lie treasure troves of quality information. Collectively this information is called the Invisible or Deep Web and vastly outnumbers information that is visible to search engines. Think of the tip of an iceberg. You are missing a lot if you only rely on search bots.

Types of information not indexed by search engines include the following.

Intentionally skipped content

Webpages that are missed or skipped by search bots are sometimes called the nearly visible or opaque Web. These pages could be indexed but are skipped intentionally to save the search company time and money. Because crawling and indexing is expensive, search engines limit the number of pages they copy from each site. This can leave hundreds of pages out of the search index, but still available to the site's users. (For more on this topic see the IMSA Module: Opaque Web).

Dynamic Material created on demand

Online databases create HTML pages to match your criteria. These pages are dynamically assembled when you query the database, as opposed to static pages that don't change unless someone edits them. Search engines do not index the contents of online databases, and they cannot index dynamically assembled pages that don't exist until the user creates them. Searching for a book on Amazon.com is one example of using the Invisible Web. Amazon's database will assemble a unique page to match your requests. You can usually tell if the page is dynamically created by looking at the URL. Below is another typical example. Note it is missing any kind of file extension like .html or .pdf, etc.

https://www.wunderground.com/?cm_ven=PS_GGL_Weather_9302015_1&par=MK_GGL&gclid=CJDnnduGtdACFYM2gQodtaoIrA

Password protected information

Many sites have password-protected Webpages. Search bots can reach the front door, but can't crawl in. On the other side of the barrier is quality information developed and categorized by professionals. Before you search these pages you must first establish an account. Some sites are free, others charge a fee. Regardless the materials beyond the password barrier can't be reached by search engines and remain invisible until you establish an account, obtain the key, and log in to the Website.

How to access the Deep Web

Most online research databases require a subscription. Fortunately, libraries pay for these subscriptions and if you are a student at an institution with such a library, you can use their access for free. Ask a librarian if you need help. In the case of other subscription sites and databases, you may need to buy a membership or sign up for a limited time free trial. LinkedIn is one example of a database that provides a 30-day trial, which enables you to search its member database which no other search bots can access.

On free access sites that aren't completely indexed by bots, your best bet is to browse until you find what you think may be hiding deep down. Browsing is simply clicking on links that may take you into un-indexed portions of a site. This type of searching is entirely speculative and requires patience, careful attention to keywords and luck.

On dynamically-assembled sites, use the search engine provided to look up relevant keywords. Search engines vary in how they process queries, so if it doesn't respond the same way as Google, don't be surprised. Keep your queries simple.

iceberg side view

Vast Resources await

The Deep Web is a vast resource estimated to be from 20 to 500 times the size of the information accessible on the surface of the public Web. The materials found on the Invisible Web are often more focused, current, and professionally relevant than what you can find on the public Web using search engines. Knowing how to use Invisible Web resources will make you a more efficient and powerful researcher.

Authored by Dennis O'Connor 2003-2005 | updated 2016