You are here



Harry E. Pence
SUNY Oneonta
Oneonta, NY
(Available on the WWW at

Note: This article was scanned using OCR from the Spring 1997 CCCE Newsletter. Please contact us if you identify any OCR errors.
It is becoming increasingly common for students to use the World Wide Web as a source of information, even in science courses. This creates several challenges for science faculty members. In addition to teaching students how to effectively use information to create papers and talks, it has become much more important to help students learn how to evaluate data. The WWW is a confusing combination of truth and falsehood, and it can require a discerning eye to distinguish between the two.
In addition, faculty are finding that it is difficult to direct students to the best search engine. Since the search engine is the main tool for finding information on the web, search engine selection can be a key decision in student research. The most widely advertized engines are often relatively poor for scientific searches. Indeed, it sometimes seems that there is an inverse relationship between the amount of advertizing and the effectiveness of the engine.
As discussed in an earlier article ( search engines, there are at least three important criteria that should be used to evaluate search engines, comprehensiveness, currency, and efficiency. Comprehensiveness is a measure of what fraction of the total web sites the search engine actually reviews. Currency measures how often the search engine revisits sites to determine whether or not there have been any changes. Efficiency is determined by whether the most useful sites are not just included but listed early in the search results. Comparing engines based on these criteria is problematic, because the characteristics of engines seem to be constantly changing.
During the past few months, several of the main search engines have been competing to attract more traffic to their sites by making claims about the effectiveness of their product. The mode of competition has varied from increasing the size of the engine index (a generally beneficial effort) to publicizing rationalizations of why an engine that has poorer metrics is still preferable (including blatant deception). The purpose of this article is to suggest where an individual can go to obtain non-biased and up-to- date information about search engines.
A major source of data about the accessibility of science information on the Web is provided by Steve Lawrence and C. Lee Giles from the NEG Research Institute in Princeton, NJ. Their three studies of web search engines are especially valuable because they are mainly concerned with scientific searches. All are available on the web, including the most recent ( Since this is apparently an ongoing study, it would be a good idea to check back from time to time to learn if more recent results have become available.
Danny Sullivan's Search Engine Watch ( seems to be the most extensive source of general information about engines. This site includes a current listing of index sizes (http://www .search Page down to see that even the largest index is less than half of the total web sites. Sullivan's site also links to many reviews ( The only problem with this site is that there is so much information that it is easy to find that you have just clicked away from something that you can no longer find. Either drop bread crumbs as you surf or else use the excellent site-specific search engine that is provided. Sullivan also offers a subscription service that promises even more information, but the part of the site that is open to the general public is an excellent starting point for ariyone who is trying to keep up with recent search engine developments. These sections only scratch the surface of what Sullivan offers. Be sure to look at the informative tutorial on how engines work (, and Search Tips ({Jndex.html), which, as the name suggests, explains how the major engines work
and how to maximize the possibility that a given engine will give the best possible results. The tutorials are aimed at all levels of experience, ranging from novice (, through advanced tips ( and boolean algebra (
The third major resource on index size and number of dead links is Search Engine Showdown (, a site maintained by Greg Notess. He factors the dead links data into the index size charts and gets results that look somewhat different than the numbers that are often cited by the search engines ( ). Another feature of Greg's site is Search Engine Inconsistencies. ( These pages list search engine problems, both temporary and long-term, for four of the most popular engines, AltaVista, Google, HotBot, and Northern Light.
After the big three, there are a number of sites that may not be as generally useful, but do have some interesting features. Search 10 ( has a listing of more engines than you probably dreamed existed, complete with a rating number (the 10) and some very frank criticisms.
The portal concept is still a hot topic on the net, and if you wish to compare the features of the various portals (which often appear to be search engines to those of us who think the net is primarily an information source instead of a place to catch suckers) there is a site devoted to these comparisons, called traffick ( Even for those who don't care about portals, the set of articles under the general title, "Andrew's Metaguide" offer good analysis of topics related to searching. uses live guides, who write brief reviews, with appropriate links, about topics of general interest. The special section dedicated to web searching (!websearch) is written by Chris Sherman, who has gathered an impressive variety of useful articles and links.
It seems inevitable that the WWW will continue to evolve rapidly and in unpredictable directions. Search techniques for printed materials may remain relatively constant for years and even decades, but the internet world can change within months or even days. The only way to remain current in the new information environment is to use the environment itself to keep track of new developments. It is hoped that these net references will make this job a little easier.


03/01/00 to 03/04/00