Winter 2019

Feature: Database Mining

where to search

Database mining is a competency* that starts with this question: Where do you search?

Students already know how to answer the question. So do you. The second question in the search process (#1 is “for what am I searching?”) is “where do I search?” The vast majority of the western world turns to Google. It’s America’s go-to search engine. And it’s good at what it does.

But it doesn’t do everything.

Imagine for a moment that ALL the available information ever created was available on Google. It’s an incomprehensible number of documents, files and artifacts of all types. In May 2108, Forbes reported this:

“The amount of data we produce every day is truly mind-boggling. There are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating with the growth of the Internet of Things (IoT). Over the last two years alone 90 percent of the data in the world was generated.” Source: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#7f179d6660ba

Those bytes translate into many more bits of information than any human can digest or comprehend. Collectively, we create more than any of us can read or use.

The term info-whelm has been around since at least 2012. It describes the dilemma of having too much information. Not all topics have the same depth of coverage, but the problem isn’t having enough information, it’s having too much.

For this reason, it makes a lot of sense to use Google. It serves up the most relevant information at the top of the list, right?


It also serves up ads and matches to whatever query it received. While Garbage In Garbage Out (GIGO) is still an issue, the fundamental problem is that Google doesn’t have access to all the information out there. It has a lot, to be sure, but only a fraction of what is being produced. If it wasn’t selective in what information it indexes, Google’s server farms would buckle under the weight of information production.

Looking at it another way, Google may catalog too much information, or at least too much information you don't need--what you are looking for is a miniscule fraction of everything in there. Google is less selective that other databases (yes, it is a database) so its topical range is immense.

Generally speaking, I usually find what I’m looking for on Google. I’m fine until I need something specific for research, fresh (not yet stale) or want to know that the information I am retrieving has been vetted by an authoritative source. Google doesn’t have that mastered yet; they serve up the closest “smart” match to that which I think I’m searching. The rest is up to me.

So while Google is a good place to start searching, there are other databases to mine with which students and teachers should be familiar.

Start with Google

To get a general sense of information available, start with a Google (or Yahoo, Bing, etc.) search. Note the keywords used in the abstracts (results or snippets). Many times these contain helpful keywords that can be used to home in on information.

Specialized Databases

There’s practically a database on any topic. The advantage in looking to one of these, is that this is the information in which they specialize. They may vet the content they index, although don’t always assume this to be true. Whether you are looking for Broadway Shows, Rollercoaster Parks or Bison in North America (three of our search challenges), there is likely a specialized repository of that information. Each one of the databases mentioned has its own search engine. And Google doesn’t index everything in those databases.

  • Use Google to search for: __________ database. (fill in the blank with the subject area). Then go to the most comprehensive-looking result and search there. Thank Google for being helpful.
  • Reference databases include the specialized databases to which libraries subscribe. Don't forget, if you're at the library, that you can always ask a librarian for help. Here’s a typical university list: https://libguides.umflint.edu/reference/databases
  • Wikipedia. While many may question this choice, a wiki is an example of a group-curated database. The expertise of the contributors can't be verfied due to anonymity, and accuracy is always suspect, but the references at the end can be a source of helpful information. Don't overlook them as bibliographic suggestions.
  • Library of Congress. This collection is vetted and includes lots of different media besides books: audio recording, films, videos, legislation, manuscripts, maps, music scores, newspapers, periodicals, personal narratives, photos, drawings, software, 3D objects and more. Search here.

In compiling a robust list of references, students should always be guided towards databases like these for targeted, vetted information. While it may be easier to compile a list yourself, when you do that, you've hidden an important step in the search process from them.

*Database mining has similarities and differences to what is being described here as database mining: Instead of using software to discover patterns in data in a database, this search competency is a manual process that depends on determining the relevance of data returned by a query.

