Searching the Opaque or Nearly Invisible Web

Tutorial Directory
cartoon of man staring at vague symbols, indicating the nearly invisible Web.

What is the nearly invisible or opaque Web?

The public Web is open and freely available to search engines. The invisible Web also part of the Internet, but inaccessible to the robotic Web-crawling technology search engines use to automatically build and update their indexes. (For more on this topic, see the Micro Module: What Is the Invisible Web?) In this module we will consider information that bridges the public and invisible Web, the 'nearly-visible' or 'opaque' Web.

The opaque or nearly visible Web is information on a public Website that has not been indexed by the search bots sent out by the search engine. The information is 'indexible' but it hasn't yet been indexed. Bots may just miss the page because they have limited the number of pages they index from each site. Also Webmasters can exclude pages by using special HTML codes.

Think of the nearly visible or opaque Web as Web pages that are just one click beyond the reach of a search engine. The Website itself has been visited and some of its pages are copied into the search engine Index. However, due to storage limitations, not all pages on a site are visited by every search engine.

Why would this happen? Crawling the Web is not cheap because storage can be expensive. For this reason search engines impose limits on the number of pages they record at any given site. With a limited 'depth of crawl' the robotic spider might copy 150 to 300 pages from a site, and leave 700 pages out of the index for that site. This un-indexed information is said to be part of the nearly invisible or opaque Web. The information is out there, but you'll have to find your way to it indirectly by browsing, following links on the Website. You can click to the Web pages once you are on the site, but won't see the pages showing up on a search engine hit list.

How does search engine 'depth of crawl' create the opaque Web?

Some engines impose limits on the number of pages they record at any given site. With a limited 'depth of crawl' the search bot might copy part of a site, while leaving other pages out of the index for that site. If a Website has a thousand pages, but only 100 are crawled and indexed, the depth of crawl has created a good deal of 'opaque Web' content.

Are there ways to make a Webpage intentionally 'opaque'?

A Webmaster may choose to 'hide' a page from search engine crawlers by using special html code that instructs bots to skip pages or sub-directories of information. This code is placed in a file called robots.txt. Additionally the HTML NOINDEX meta tag can be added to a page, which will then be automatically skipped by a search engine crawler. The HTML NOFOLLOW meta tag allows a page to be indexed, but blocks the bot from following links on that page.

Are the same pages nearly invisible or opaque to all search engines?

Each search engine has its own unique index. What is opaque to one search engine might be indexed and highly visible to another search engine. This is another good reason always to use different search engines when researching information. Also, how long a page will remain hidden is hard to determine. Search engines are constantly updating and revising their index systems. What's hidden today may be visible tomorrow.

How to search the opaque Web

Knowing that important information may be hidden behind the next click on a Web page could entice you to look deeply into the sites you visit. If you find a good Website, spend time exploring it at depth. If the Website has a sitemap, use it to dig into the information. Who knows, you may unearth an opaque gem of information that will shine when held up to the lens of your research! (For more on these topics see What is a Sitemap?)

Authored by Dennis O'Connor 2003 | updated 2016