Investigative Searching

Investigative Searching

If finding the right content information for a project or paper wasn't hard enough, the search usually doesn't end there. Having to track down the citation information for an online source is a necessary requirement of ethical use. Think of it as investigative or secondary searching.

The difference between searching for content and searching for citation is that the latter often involves a different set of techniques. Instead of relying on a search engine for efficiency, investigative searching often resorts to old-fashioned, careful scanning, judicious browsing, sifting through html code and being willing to try out a few specialized search engines.

"What Information am I looking for?"

Using the Digital Information Fluency Model to describe investigative searching, the first three questions result in different answers. In response to "What Information am I looking for?" the answers become very specific:

Author's name(s)
Publisher's name
Date created
Document title
Title of the website
URL

Prior to this point, let's assume that credible information on the topic was retrieved; now even more specific information is needed. Some of it should have been uncovered while verifying the credibility of the information, such as the author's name or the date of publication. If not, now it HAS to be located if it is going to be cited.

Techniques that are used most often in tracking down specific citation information include:

Scanning
Find on this page
Truncate the URL
View Page info
View Page source
Search the (surface) web
Search the Deep Web

"Where am I going to look for the information?"

Normally, start where you are and expand your search outward as necessary. Many times the information needed is right in front of you but not where you expect it to be. An author's name may be embedded in an article rather than at the beginning or end. The date may be posted in a sidebar instead of the bottom of the page. The publisher's name may be part of a banner rather than part of the copyright.

If careful scanning or the find command reveals the information simply is not part of the current page, then the next place to look is in the page code (described below). If that turns up nothing, then it's time to start browsing nearby, which usually means clicking a link on the page for relevant information (about us, contact us, references, etc.) or truncating the URL back to a logical break, for example a person's or an organization's name.

If looking nearby fails to produce results, then try going to the root site of the URL to see if there's a way to search the site (a search engine or subject directory) or using an external search engine such as Google or a Deep Web resource such as Whois.net.

"How am I going to get there?"

Let's say that a good scan of the page turns up nothing. In the event that a critical word or term was missed, this is an opportunity to use the FIND command. Launch the FIND command by pressing Ctrl + F. Into the search box, enter a term that may be associated with the information you are seeking. This is experimental searching, since you don't know exactly what you are looking for. But there are a few key terms or symbols that might be placed close to what you want to find. A copyright symbol, ©, or the word copyright is usually placed before a date and a copyright owner. The copyright may also provide a valuable clue about the publisher. You will have to read carefully what the FIND command retrieves--it may find information about a different article completely. Other terms that may be helpful are 'contact,' '20' (e.g., if you want to find a recent year on a page: 20xx) and 'last updated.' The FIND command can also be used to find links to 'home,' 'about us' and 'search,' which may come in handy if the search has to be expanded.

View Page. Remaining on the original page, right click and select View Page source (or Ctrl+U, or from the Menu: View>Page Source). Information in the meta-tags may be helpful, especially the title of the page. Use the FIND command to search for <title> to ascertain the exact document title if you can't find one on the page. The title of the web site can be found the same way, going to the home page in the URL (truncating) and searching those meta-tags for <title>.

Page Info. In certain browsers (e.g., Firefox), right-clicking on a page will allow you to view Page Info, which contains information about the last date the page was modified. If nothing else indicates a date, this may be the best option. Be careful if the date listed is the same day as today--some pages automatically update to the current time without the information having been modified. A related search technique that uses JAVA code can be found below:

Last Modified Date Search Wizard

This will return a last modified date. However, be wary if the date returned is the server time (today's date and minute).

Browsing Links. To move beyond the page being cited, look for links such as: Home, About Us, Contact Us, and Site Map. Many times these lead to information about the author or the publisher.

Truncating the URL. By removing sections of the URL, starting at the end, there's a chance that information about the author, publisher and/or date may be uncovered as you move closer to the root of the site. You may need to use the same techniques you used on the original page to find information on any of these new pages.

Searching a different database. Moving away from the site to conduct a Deep Web search will result in problems unless you have something specific for which to search, such as an author's name, the document title, the website title or the URL. A commercial search engine (e.g., Google, Bing, Yahoo!) may be worth searching for an author's name or URL. Even better may be the author's name AND the document title. This could lead to elusive publication date information. Searching for the URL could lead to pages that link to your source page that may include a valuable description of the contents, including an author's name.

Another type of Deep Web search to conduct is to go to a database that specializes in certain types of information. Whois.net retrieves information about the owner of a website when the url is entered in its search engine. Whois.net does not index all webpages, but it is worth a try if publisher information is critical. Another database to try is archive.org, a massive collection of 'retired' web pages. Enter the URL of the page you want to cite and see if archive.org's "Wayback Machine" can retrieve an older copy of it. You may be able to learn more about the page, its author, date, etc. by going back in time.

"What if I still can't find it?"

All the techniques in the world won't retrieve information that's not there. If you believe you conducted a reasonable search and came up short--and still want to cite the material--your best option is to cite it with the information missing. In this case, the conventions of style do allow for citations in which there is an anonymous author: leave it blank and start the citation with the document title (MLA, Chicago). A date may have to be limited to just the year. For such exceptions, refer to a style manual or your school's policy on citations. Keep in mind that Document and Website titles can always be found in the meta-tags, and the publisher is either the site host or the owner of a personal web page.

Have your students check with you about including incomplete citings in their work. Policies vary from school to school, so it is up to the standards set by your school or, lacking that, your best judgment.