Contact Us




Back to sem guide

The Major Search Engines

A lot of the search engines widely used today started as university research projects, including Google, Inktomi, and WebCrawler. The topic of research was quick and effective data retrieval from large databases. The database in these cases is the web and all the linked pages that comprise it.

The search engine that gets by far the most users is Google. When majority of other search engines were switching to directory mode, Google continued to improve its natural search results. Other search engines, such as Netscape and AOL, started using Google to generate their search results. Yahoo has also been using Google for its main search results section until February of 2004 when it dropped Google and started using its own search technology.

Currently the other major players among natural search engines are Yahoo, AltaVista, Teoma, AskJeeves, HotBot, and AllTheWeb. Major directory search engines include Yahoo and OpenDirectory. The bigger players in the pay-per-click and pay-for-inclusion search industry are Overture, About.com, Inktomi, and LookSmart.

Below we have included a brief description of some of the major search engines, their search algorithms, and result formats. At the bottom of this page you can find a link to an expanded description of major search engines and their functions.

Google: Google's spider, googlebot, crawls the web using page links. Google stores crawled pages in a database, and returns results based not only on text matching, but also on a proprietary PageRank algorithm that determines the "importance" of a web page. For a more detailed explanation of how PageRank algorithm works, read our article "How is Google PageRank determined"?

Yahoo: Yahoo has recently acquired Inktomi (now a pay-for-inclusion service) and Overture (a pay-per-click service), and has switched from using Google's search function to its own natural search technology in mid-February of 2004. Yahoo's spider, YahooSlurp, crawls pages using HREF links. Preliminary analysis of Yahoo results indicates that top results are similar to Google's with an increasing difference lower on the results pages.

AltaVista: The first full text search service on the internet, AltaVista has gone down in popularity, but still remains one of the major players. When you conduct a search with AltaVista, this search engine offers suggestions on how to refine your search by listing options on a side panel. To generate this list of suggestions the search engine looks at the most common terms in the pages that best match the user’s query.

AskJeeves: This search engine was the first attempt to answer natural language questions. The philosophy of AskJeeves has always been “to humanize the online search experience.” So, if the user searches “value of golden ratio,” the result is “The golden mean is 1.61803398875” followed by a list of web page results that, in theory, contain best answers to the query. AskJeeves also offers a related search terms section. Unlike AltaVista, AskJeeves generates these suggestions based on the popularity and frequency of real user search queries.

Teoma: Acquired by AskJeeves, Teoma boasts a proprietary "Subject-specific Popularity" algorithm that determines the importance of the page. Teoma attempts to organize websites into communities by subject, and determine the importance of every site in its respective community.

Inktomi: Acquired by Yahoo, Inktomi is a natural search engine that also offers a pay-for-inclusion service. This program guarantees that your pages are included in the search results, and that the Inktomi spider frequently visits your site. However, this search engine does not market itself to human users. Instead, it offers its search service to other search engines.

If you would like to learn more, here is an excellent page on major search engines in more detail from SearchEngineWatch.com written by a recognized search engine expert Danny Sullivan.

Related documents:

Other resources:

Web Robots FAQ
An in-depth explanation of search engine agent functions. Includes the database of known and registered search engine spiders. Although it is becoming somewhat outdated as the number of search user agents is constantly increasing, it is still informative.
http://www.robotstxt.org/wc/robots.html