Followers

Serch For Your Favorites

Yahoo | Google | MSN | YouTube | Accoona | ASK | AlltheWeb | AltaVista | AllCrawl | Dewa | FindWhat | Search | YOUSEARCH < Dmoz | Lycos | IXquick | GO | HogSearch | HotBot | 7SEarch | AOL |

Saturday 2 February 2008

-History of search engines

Where would we be without 'em?Our experience of the Internet is often facilitated through the use of search engines and search directories. Before they were invented, people’s Net experiences were confined to plowing through sites they already knew of in the hopes of finding a useful link, or finding what they wanted through word of mouth.
As author Paul Gilster puts it in Digital Literacy "How could the world beat a path to your door when the path was uncharted, uncatalogued, and could be discovered only serendipitously?". This may have been adequate in the early days of the Internet, but as the Net continued to grow exponentially, it became necessary to develop a means of locating desired content.
At first search services were quite rudimentary, but in the course of a few years they have grown quite sophisticated.
Not to mention popular. Search services are now among the most frequented sites on the Web with millions of hits every day.
Even though there is a difference between search engines and search directories (although less so every day), I will adopt the common usage and call all of them search engines.
Archie and Veronica The history of search engines seems to be the story of university student projects evolving into commercial enterprises and revolutionizing the field as they went.
Certainly, that is the story of Archie, one of the first attempts at organizing information on the Net. Created in 1990 by Alan Emtage, a McGill University student, Archie archived what at the time was the most popular repository of Internet files, Anonymous FTP sites.
Archie is short for "Archives" but the programmer had to conform to UNIX standards of short names.
What Archie did for FTP sites Veronica did for Gopherspace. Veronica was created in 1993 at the University of Nevada. Jughead was a similar Gopherspace index.
RobotsArchie and Veronica were for the most part indexed manually. The first real search engine in the sense of a completely automated indexing system is MIT student Matthew Gray’s World Wide Web Wanderer.
The Wanderer robot was intended to track the growth of the Web counting only web servers initially. Soon after its launch it captured URLs as well. This list formed the first database of websites, called Wandex.
Robots at this time were quite controversial. For one, they occupied a lot of network bandwidth and they would index sites so rapidly it was not uncommon for the robots to crash servers.
In the Glossary for Information Retrieval Scott Weiss describes a robot as:
[a] program that scans the web looking for URLs. It is started at a particular web page, and then accesses all the links from it. In this manner, it traverses the graph formed by the WWW. It can record information about those servers for the creation of an index or search facility.
Most search engines are created using robots. The problem with them is, if not written properly, they can make a large number of hits on a server in a short space of time, causing the system’s performance to decay.
The First Web Directory
In response to the problems with automated indexing of the Web, Martjin Koster in Oct. 1993 created Aliweb, which stands for Archie Like Indexing of the Web. This was the first attempt to create a directory for just the Web.
Instead of a robot, webmasters submit a file with their URL and their own description of it. This allowed for a more accurate, detailed listing.
Unfortunately, the application file was difficult to fill out so many websites were never listed with Aliweb,
Spiders
By December 1993, three more robots, now known as spiders, were on the scene: JumpStation, World Wide Web Worm (developed by Oliver McBryan in 1994, bought out by Goto.com in 1998) and the Repository-Based Software Engineering (RBSE) spider.
RBSE made the important step of listing the results based on relevancy to the keyword. This was crucial. Prior to that, the results were in no particular order and finding the right location could require plowing through hundreds of listings.
Excite was launched in February 1993 by Stanford students and was then called Architext. It introduced concept based searching. This was a complicated procedure that utilized statistical word relationships, such as synonyms. This turned up results that might have been missed by other engines if the exact keyword was not entered.
WebCrawler, which was launched in April 20, 1994, was developed by Brian Pinkerton of the University of Washington.
It added a further degree of accuracy by indexing the entire text of webpages. Other search engines only indexed the URL and titles, which meant that some pertinent keywords might not be indexed. This also greatly improved the relevancy rankings of their results.
As an interesting aside, WebCrawler offers an insightful service, WebCrawler Search Voyeur, that allows you to view what people are searching as they enter their queries. You can even stop it and see the results.
Search Directories
There was still the problem that searchers had to know what they were looking for, which as I can attest, is often not the case. The first browsable Web directory was EINet Galaxy, now known as Tradewave Galaxy, which went online January 1994. It made good use of categories and subcategories and so on.
Users could narrow their search until presumably they found something that caught their eye.
It still exists today and offers users the opportunity to help coordinate directories, becoming an active participant in cataloging the Internet in their field.
perfected the search directory, however.
Yahoo! grew out of two Stanford University students, David Filo’s and Jerry Yang’s, webpages with their favourite links (such pages were quite popular back then).
Started in April 1994 as a way to keep track of their personal interests, Yahoo soon became too popular for the university server.
Yahoo’s user-friendly interface and easy to understand directories have made it the most used search directory. But because everything is reviewed and indexed by people, their database is relatively small, accounting for approximately 1% of webpages.
The Big-Guns
When a search fails on Yahoo it automatically defaults to AltaVista’s search.
AltaVista was late onto the scene in December 1995, but made up for it in scope.
AltaVista was not only big, but also fast. It was the first to adopt natural language queries as well as Boolean search techniques. And to aid in this, it was the first to offer "Tips" for good searching prominently on the site. These advances made for unparalleled accuracy and accessibility.
But AltaVista had competition: HotBot, introduced May 20, 1996 by Paul Gauthier and Eric Brewer at Berkeley. Powered by the Inktomi search engine, it was initially licensed to Wired Magazine website. It has occasionally boasted it can index the entire Web.
Indexing 10 million pages per day, it is the most powerful search engine.
Meta-Engines
The next important step in search engines is the rise of meta-engines. Essentially they don’t offer anything new. They just simultaneously compile search results from various different search engines. Then list the results according to the collective relevancy.
The first meta-engine was MetaCrawler released in 1995. Now called Go2net.com it was developed in 1995 by Eric Selburg, a Masters student at the University of Washington .
Skewing Relevancy
Prior to Direct Hit, launched in the summer of 1998, there were two types of search engines: author controlled services, such as AltaVista and Excite, in which the results are ranked by keyword relevancy and editor-controlled, such as directories like Yahoo and LookSmart, in which people manually decide on placement. Direct Hit, as inventor Gary Culliss relates: "represents a third kind of search, one that's user-controlled, because search rankings are dependent on the choices made by other users."
As users choose to go to a listed link, they keep track of that data and use the collected hit-ratio to calculate the relevancy. So the more people go to the site from Direct Hit the higher it will appear on their results.
which runs as a research project at Stanford University since late 1997, also attempts to improve relevancy rankings. Google uses PageRank, which basically monitors how many sites link to a given page. The more sites and the more important the sites that link to a given site the higher the ranking in the result list.
It does give a slight advantage to .gov and .edu domains. Basically, it is trying to do what Yahoo does but without the need for costly human indexing.
Is This Fair? Another way of fixing relevancy rankings is by selling prominent placement as Goto.com does. Founded by idealab and Bill Gross, this practice caused quite a controversy.
Apparently, there was some doubt as to the actual relevancy of its paid prominent listings. Goto insists that their clients must adhere to a "strict policy" of relevance to the corresponding keywords.
Their corporate site defends its approach:
"In other search engines, there is no cost to spamming or word stuffing or other tricks that advertisers use to increase their placement within search results. When you get conscious decisions involved, and you associate a cost to them, you get better results... GoTo uses a revolutionary new principle for ranking search results by allowing advertisers to bid for consumer attention, and lets the market place determine the rankings and relevance."
For the right amount of money you can ensure your site is placed #1.Check out the words that are still "unbidden".
Look similar?
That's for the courts to decide now. Goto.com has filed suit February 1999, against the Disney owned Go Network. Finding a nicheAs search engines try to index the entire Web, some search engines have found their niche by narrowing their field to a specific subject or geographical region.
Argos was the first to offer a Limited Area Search Engine. Launching October 3, 1996, they index only sites dealing with medieval and ancient topics. A panel decides on whether a site is suitable for inclusion.
Their mandate was to combat such problems as this example (from their site):
"At the time of this writing, a search for "Plato" on the Internet search engine, Infoseek, returned 1,506 responses. Of the first ten of these, only five had anything to do with the Plato that lived in ancient Greece, and one of these was a popular piece on the lost city of Atlantis. The other five entries dealt with such things as a home automation system called, PLATO(tm) for Windows, and another PLATO(r), an interactive software package for the classroom. Elsewhere near the top of the Infoseek list was an ale that went by the name of Plato, a guide to business opportunities in Ireland, and even a novel called the "Lizard of Oz."
Such specializing has also proven effective for MathSearch, Canada.com, and hundreds of others.
Ask Jeeves' niche is making search engines more searchable for the average user. (Who really knows Boolean anyway?) Founded in 1996, but not really well-used until recently, Ask Jeeves has a more human approach. Refining natural language queries so that users can ask normal questions. For example, "Whatever happened to Upper Volta?".
When a question is answered it matches similar queries it has already received and offers these as its results. This is supposed to help guide users to the desired location when they might not know themselves how else to find it.
The Next Generation
There is no denying that these sites are among the most popular websites. They mark the daily entry point into the Web experience.
Search engines are trying to offer more and to be more. Whether it is Northern Light’s private fee-based online library or Yahoo offering free email and content (news, horoscopes, etc.). Search engines are continuing to evolve.
We are seeing the sophistication of the spiders in finding and indexing sites, the increase in user-friendly searching techniques and interface, the expanding of databases and the improved relevancy of results from the database.
(Now if they could just make some money doing it, as most of the companies mentioned continue to operate at a loss.)
As I learned while researching this topic, search engines may open up the door to the World Wide Web, but not without some difficulty. Searching is far from easy or perfect.
As the Web continues to grow rapidly, the need for better search engines only increases.
If you are willing to use Search Engine Genie's Services Contact Us Or Mailto search engine genie support desk

-Search Engine History

B.C-1956: The Dawn of Computing
Before Christ, there was the counting aid Abacus. Some centuries later, in 1642, Blaise Pascal builds a mechanical calculator. Around 1820, Charles Babbage follows-up with his steam-powered Difference Engine, and Countess of Lovelace Augusta Ada Byron is pondering programming it after having met him.
The first computer (a programmable calculator) by German engineer Konrad Zuse is completed in 1941.Britain and USA take over the computing technology field with Colossus, ENIAC, the transistor (by Bell Telephone), and UNIVAC — the "Universal Automatic Computer".
1957-1990: Previously on the Internet ...
In 1957, ARPA (the Advanced Research Projects Agency, within the Department of Defense, DoD) is created to foster US-technology. Some ten years later, DARPA marks the beginnings of the Internet. Intel is founded in '68, Doug Engelbart spends time show-casing his revolutionary ideas of word processing, and a year later, Xerox creates the equally revolutionary think-tank PARC, the Palo Alto Research Center. Universities are slowly being connected together via ARPANET in 1969. In 1977, Apple II is born, followed by the IBM PC in '81. 1984, the year of cyberpunk novel Neuromancer, sees the introduction of the Domain Name System (DNS).
In the late 80s, the number of Internet hosts breaks 100,000, and people are starting to get lost. In 1990, before the days of the World Wide Web, McGill University student Alan Emtage creates FTP indexing search tool Archie. One year later, Mark McCahill introduces the alternative Gopher. Veronica (Archie's girlfriend in the comic books, and the "grandmother of search engines") appears on the scene in 1992, spidering Gopherspace texts, and Jughead is arriving in '93.
1990-1993: WWW, and WWWW
In the meantime, the World Wide Web, created by Tim Berners-Lee* and released by CERN (the European Organization for Nuclear Research) in '91, is starting to take off. And 1993, the time the first web browser Mosaic takes the world by storm, also sees the first acclaimed Web robot, Matthew Gray's World Wide Web Wanderer. Martijn Koster announces meta-tag spidering Aliweb in late '93.
*For the story on how the Web got invented, this is the book by the man who did it; "Weaving the Web" by Tim Berners-Lee.
1994: Search Engines See the Light
The World Wide Web is becoming the most important Internet service. Pizza can be ordered online, and soon Sun will give birth to Java programming technology.In early 1994, Jerry Yang and David Filo of Stanford University start Yahoo!* in their attempt to exert some kind of order on an otherwise anarchic collection of documents.Some months later in Washington, Brian Pinkerton's WebCrawler is getting to work; over at Carnegie Melon, Dr. Michael Maldin creates Lycos (the name comes from the Latin "wolf spider").
*"Yahoo" might be short for "Yet Another Hierarchical Officious Oracle," but the two creators insist they selected the name because they considered themselves yahoos.
1995-1997: Dot-Com Rising
Metacrawler, Excite (late 1995), AltaVista (late 1995), later Inktomi/ HotBot (mid-1996), AskJeeves and GoTo; more and more search engines appear. Yahoo, actually a directory, is the leader, but AltaVista — meaning"a view from above", and being a wordplay on (Palo) Alto-Vista; — launched in 1995 (and bought by Compaq in 1997) is gaining popularity.
1998-2002: Google et al
It's late 1998. Stanford's Larry Page and Sergey Brin reinvent search ranking technology with their paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" and start what some time later becomes the most successful search engine in the world. Google. The uncluttered interface, speed and search result relevancy were cornerstones in winning the tech-savvy people, who were later followed by pretty much everyone looking for something online. Other contenders, like MSN, are pretty much being left in the dust. In September 1999, Google left Beta status.
Search engine optimization becomes bigger and bigger field, with experts trying to boost rankings of commercial websites.
In 2000, Yahoo and Google are becoming partners. In late 2000, Google is handling over 100 million daily search requests.In 2001, AskJeeves aquires Teoma, and GoTo is renamed Overture.
2003: Searching Today
These days, we can find more than ever, faster than dreamed of, but we're also taking it for granted. Information at your fingertips; when you have a question, fire up Google. The answer's out there.
Google, with its 200 million hits, and over 3 billion indexed WWW pages, is hated and loved at the same time, but undeniably the most relevant search engine of today (and biggest player in the field). Google is constantly coming up with new, focussed services to enhance web search and everything that comes with it. But it's a fast world, with others lurking around the corner — most notably Norwegian FAST/ AllTheWeb, which went online 5 years ago — and throne and scepter can be inherited by anyone who dares topping Google's search technology.
For more see the Archive.
Google Blogoscoped is © 2003-2004 by Philipp Lenssen. I am not working for Google. GOOGLE is a trademark of Google Inc. You might also be interested to read the Urban Legend History of Google.

-A Brief History of Search Engines

"How could the world beat a path to your door when the path was uncharted, uncatalogued, and could be discovered only serendipitously?" — Paul Gilster, Digital Literacy
The World Wide Web is different from anything we have known. Within the virtual reality of the Web, we can only see and hear things (at least at the time of this article). Because of this limitation, the Web forces us to find new ways to interact.
For instance, if I wanted to buy a book, I would go down to the local book store, select one I like, pay for it, and go home. The book store is usually in a visible place and has a sign out front, making it relatively easy to find.
But in cyberspace, there's no place to "turn." I have only my computer screen in front of me. Somehow, I need to find a place to purchase the book I want. There's no street on my screen so I can't drive around on the Web (I could "surf," but that's hit and miss; even then I still need to know where to start). Sometimes it's obvious: type in the name of the bookstore, add a .COM (as in barnesandnoble + .com) and it's a pretty good bet you're going to end up where you want to go. But what if it's a specialty bookstore and doesn't have a Web site with an obvious URL?
One solution to this problem is the search engine. In fact, it's probably one of the most widely used methods for navigating in cyberspace. Considering the amount of information that's available from a good search engine, it's similar to having the Yellow Pages, a guide book and a road map all-in-one.
Search engines can provide much more information than just the URL of a Web site. They can also locate reviews, help to compare prices, and even find if there have been any reported problems with the product or the manufacturer. Typing in "books" into the Google search engine returns about 9,270,000 results. If we refine the search to "books, Internet", we end up with about 6,070,000 results. We can narrow our search further to "books, Internet, search engines" and we will get about 803,000 results. If we know the book's author, let's say Danny Sullivan, then we enter "books, Internet, search engines, sullivan" and Google now returns about 10,900 results (of course, these results will change from day to day).
For many people, using search engines has become routine. Not bad for a technology that's not even 20 years old. But how did search engines come into being? What are the origins of this entity that prowls the outer reaches of cyberspace?
Note: This is by no means an exhaustive history of search engines. There's a resource list at the end of this article for more in-depth research.
The Early Beginnings of the Internet and the World Wide Web(or Your Tax Dollars at Work)
In 1957, after the U.S.S.R. launched Sputnik (the first artificial earth satellite), the United States created the Advanced Research Projects Agency (ARPA) as a part of the Department of Defense. Its purpose was to establish U.S. leadership in science and technology applicable to the military.
Part of ARPA's work was to prepare a plan for the United States to maintain control over its missiles and bombers after a nuclear attack. Through this work the ARPANET — a.k.a. the Internet — was born. The first ARPANET connections were made in 1969 and in October 1972 ARPANET went 'public.'
Almost 20 years after the creation of the Internet, the World Wide Web was born to allow the public exchange of information on a global basis. It was built on the backbone of the Internet.
According to Tim Berners-Lee, creator of the World Wide Web, "The Internet [Net] is a network of networks. Basically it is made from computers and cables.... The [World Wide] Web is an abstract imaginary space of information. On the Net, you find computers — on the Web, you find documents, sounds, videos, ... information. On the Net, the connections are cables between computers; on the Web, connections are hypertext links. The Web exists because of programs which communicate between computers on the Net. The Web could not be without the Net. The Web made the Net useful because people are really interested in information and don't really want to have [to] know about computers and cables."
With information being shared worldwide, there was eventually a need to find that information in an orderly manner.
Archie, Veronica, and Jughead(or The History of Search Engines Beginning at Riverdale High)
The very first tool used for searching on the Internet was called "Archie". (The name stands for "archives" without the "v", not the kid from the comics). It was created in 1990 by Alan Emtage, a student at McGill University in Montreal. The program downloaded the directory listings of all the files located on public anonymous FTP (File Transfer Protocol) sites, creating a searchable database of filenames.
While Archie indexed computer files, "Gopher" indexed plain text documents. Gopher was created in 1991 by Mark McCahill at the University of Minnesota. (The program was named after the school's mascot). Because these were text files, most of the Gopher sites became Web sites after the creation of the World Wide Web.
Two other programs, "Veronica" and "Jughead," searched the files stored in Gopher index systems. Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) provided a keyword search of most Gopher menu titles in the entire Gopher listings. Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) was a tool for obtaining menu information from various Gopher servers.
I, Robot
In 1993, MIT student Matthew Gray created what is considered the first robot, called World Wide Web Wanderer. It was initially used for counting Web servers to measure the size of the Web. The Wanderer ran monthly from 1993 to 1995. Later, it was used to obtain URLs, forming the first database of Web sites called Wandex.
According to The Web Robots FAQ, "A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. Web robots are sometimes referred to as web wanderers, web crawlers, or spiders. These names are a bit misleading as they give the impression the software itself moves between sites like a virus; this not the case, a robot simply visits sites by requesting documents from them."
Initially, the robots created a bit of controversy as they used large amounts of bandwidth, sometimes causing the servers to crash. The newer robots have been tweaked and are now used for building most search engine indexes.
In 1993, Martijn Koster created ALIWEB (Archie-Like Indexing of the Web). ALIWEB allowed users to submit their own pages to be indexed. According to Koster, "ALIWEB was a search engine based on automated meta-data collection, for the Web."
Enter the Accountants
Eventually, as it seemed that the Web might be profitable, investors started to get involved and search engines became big business.
Excite was introduced in 1993 by six Stanford University students. It used statistical analysis of word relationships to aid in the search process. Within a year, Excite was incorporated and went online in December 1995. Today it's a part of the AskJeeves company.
EINet Galaxy (Galaxy) was established in 1994 as part of the MCC Research Consortium at the University of Texas, in Austin. It was eventually purchased from the University and, after being transferred through several companies, is a separate corporation today. It was created as a directory, containing Gopher and telnet search features in addition to its Web search feature.
Jerry Yang and David Filo created Yahoo in 1994. It started out as a listing of their favorite Web sites. What made it different was that each entry, in addition to the URL, also had a description of the page. Within a year the two received funding and Yahoo, the corporation, was created.
Later in 1994, WebCrawler was introduced. It was the first full-text search engine on the Internet; the entire text of each page was indexed for the first time.
Lycos introduced relevance retrieval, prefix matching, and word proximity in 1994. It was a large search engine, indexing over 60 million documents in 1996; the largest of any search engine at the time. Like many of the other search engines, Lycos was created in a university atmosphere at Carnegie Mellon University by Dr. Michael Mauldin.
Infoseek went online in 1995. It didn't really bring anything new to the search engine scene. It is now owned by the Walt Disney Internet Group and the domain forwards to Go.com.
Alta Vista also began in 1995. It was the first search engine to allow natural language inquires and advanced searching techniques. It also provides a multimedia search for photos, music, and videos.
Inktomi started in 1996 at UC Berkeley. In June of 1999 Inktomi introduced a directory search engine powered by "concept induction" technology. "Concept induction," according to the company, "takes the experience of human analysis and applies the same habits to a computerized analysis of links, usage, and other patterns to determine which sites are most popular and the most productive." Inktomi was purchased by Yahoo in 2003.
AskJeeves and Northern Light were both launched in 1997.
Google was launched in 1997 by Sergey Brin and Larry Page as part of a research project at Stanford University. It uses inbound links to rank sites. In 1998 MSN Search and the Open Directory were also started. The Open Directory, according to its Web site, "is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors." It seeks to become the "definitive catalog of the Web." The entire directory is maintained by human input.

-ALIWEB:

In October of 1993 Martijn Koster created Archie-Like Indexing of the Web, or ALIWEB in response to the Wanderer. ALIWEB crawled meta information and allowed users to submit their pages they wanted indexed with their own page description. This meant it needed no bot to collect data and was not using excessive bandwidth. The downside of ALIWEB is that many people did not know how to submit their site.
Robots Exclusion Standard:
Martijn Kojer also hosts the web robots page, which created standards for how search engines should index or not index content. This allows webmasters to block bots from their site on a whole site level or page by page basis.
By default, if information is on a public web server, and people link to it search engines generally will index it.
In 2005 Google led a crusade against blog comment spam, creating a nofollow attribute that can be applied at the individual link level. After this was pushed through Google quickly changed the scope of the purpose of the link nofollow to claim it was for any link that was sold or not under editorial control.
Primitive Web Search:
By December of 1993, three full fledged bot fed search engines had surfaced on the web: JumpStation, the World Wide Web Worm, and the Repository-Based Software Engineering (RBSE) spider. JumpStation gathered info about the title and header from Web pages and retrieved these using a simple linear search. As the web grew, JumpStation slowed to a stop. The WWW Worm indexed titles and URL's. The problem with JumpStation and the World Wide Web Worm is that they listed results in the order that they found them, and provided no discrimination. The RSBE spider did implement a ranking system.
Since early search algorithms did not do adequate link analysis or cache full page content if you did not know the exact name of what you were looking for it was extremely hard to find it.
Excite:

Excite came from the project Architext, which was started by in February, 1993 by six Stanford undergrad students. They had the idea of using statistical analysis of word relationships to make searching more efficient. They were soon funded, and in mid 1993 they released copies of their search software for use on web sites.
Excite was bought by a broadband provider named @Home in January, 1999 for $6.5 billion, and was named Excite@Home. In October, 2001 Excite@Home filed for bankruptcy. InfoSpace bought Excite from bankruptcy court for $10 million.
Web Directories:
VLib:

When Tim Berners-Lee set up the web he created the Virtual Library, which became a loose confederation of topical experts maintaining relevant topical link lists.

EINet Galaxy

The EINet Galaxy web directory was born in January of 1994. It was organized similar to how web directories are today. The biggest reason the EINet Galaxy became a success was that it also contained Gopher and Telnet search features in addition to its web search feature. The web size in early 1994 did not really require a web directory; however, other directories soon did follow.
Yahoo! Directory
In April 1994 David Filo and Jerry Yang created the Yahoo! Directory as a collection of their favorite web pages. As their number of links grew they had to reorganize and become a searchable directory. What set the directories above The Wanderer is that they provided a human compiled description with each URL. As time passed and the Yahoo! Directory grew Yahoo! began charging commercial sites for inclusion. As time passed the inclusion rates for listing a commercial site increased. The current cost is $299 per year. Many informational sites are still added to the Yahoo! Directory for free.
Open Directory Project
In 1998 Rich Skrenta and a small group of friends created the Open Directory Project, which is a directory which anybody can download and use in whole or part. The ODP (also known as DMOZ) is the largest internet directory, almost entirely ran by a group of volunteer editors. The Open Directory Project was grown out of frustration webmasters faced waiting to be included in the Yahoo! Directory. Netscape bought the Open Directory Project in November, 1998. Later that same month AOL announced the intention of buying Netscape in a $4.5 billion all stock deal.
LII

Google offers a librarian newsletter to help librarians and other web editors help make information more accessible and categorize the web. The second Google librarian newsletter came from Karen G. Schneider, who is the director of Librarians' Internet Index. LII is a high quality directory aimed at librarians. Her article explains what she and her staff look for when looking for quality credible resources to add to the LII. Most other directories, especially those which have a paid inclusion option, hold lower standards than selected limited catalogs created by librarians.
The Internet Public Library is another well kept directory of websites.
Business.com

Due to the time intensive nature of running a directory, and the general lack of scalability of a business model the quality and size of directories sharply drops off after you get past the first half dozen or so general directories. There are also numerous smaller industry, vertically, or locally oriented directories. Business.com, for example, is a directory of business websites.
Looksmart

Looksmart was founded in 1995. They competed with the Yahoo! Directory by frequently increasing their inclusion rates back and forth. In 2002 Looksmart transitioned into a pay per click provider, which charged listed sites a flat fee per click. That caused the demise of any good faith or loyalty they had built up, although it allowed them to profit by syndicating those paid listings to some major portals like MSN. The problem was that Looksmart became too dependant on MSN, and in 2003, when Microsoft announced they were dumping Looksmart that basically killed their business model.
In March of 2002, Looksmart bought a search engine by the name of WiseNut, but it never gained traction. Looksmart also owns a catalog of content articles organized in vertical sites, but due to limited relevancy Looksmart has lost most (if not all) of their momentum. In 1998 Looksmart tried to expand their directory by buying the non commercial Zeal directory for $20 million, but on March 28, 2006 Looksmart shut down the Zeal directory, and hope to drive traffic using Furl, a social bookmarking program.
Search Engines vs Directories:
All major search engines have some limited editorial review process, but the bulk of relevancy at major search engines is driven by automated search algorithms which harness the power of the link graph on the web. In fact, some algorithms, such as TrustRank, bias the web graph toward trusted seed sites without requiring a search engine to take on much of an editorial review staff. Thus, some of the more elegant search engines allow those who link to other sites to in essence vote with their links as the editorial reviewers.
Unlike highly automated search engines, directories are manually compiled taxonomies of websites. Directories are far more cost and time intensive to maintain due to their lack of scalability and the necessary human input to create each listing and periodically check the quality of the listed websites.
General directories are largely giving way to expert vertical directories, temporal news sites (like blogs), and social bookmarking sites (like del.ici.ous). In addition, each of those three publishing formats I just mentioned also aid in improving the relevancy of major search engines, which further cuts at the need for (and profitability of) general directories.
WebCrawler:

Brian Pinkerton of the University of Washington released WebCrawler on April 20, 1994. It was the first crawler which indexed entire pages. Soon it became so popular that during daytime hours it could not be used. AOL eventually purchased WebCrawler and ran it on their network. Then in 1997, Excite bought out WebCrawler, and AOL began using Excite to power its NetFind. WebCrawler opened the door for many other services to follow suit. Within 1 year of its debuted came Lycos, Infoseek, and OpenText.
Lycos:

Lycos was the next major search development, having been design at Carnegie Mellon University around July of 1994. Michale Mauldin was responsible for this search engine and remains to be the chief scientist at Lycos Inc.
On July 20, 1994, Lycos went public with a catalog of 54,000 documents. In addition to providing ranked relevance retrieval, Lycos provided prefix matching and word proximity bonuses. But Lycos' main difference was the sheer size of its catalog: by August 1994, Lycos had identified 394,000 documents; by January 1995, the catalog had reached 1.5 million documents; and by November 1996, Lycos had indexed over 60 million documents -- more than any other Web search engine. In October 1994, Lycos ranked first on Netscape's list of search engines by finding the most hits on the word ‘surf.’.
Infoseek:
Infoseek also started out in 1994, claiming to have been founded in January. They really did not bring a whole lot of innovation to the table, but they offered a few add on's, and in December 1995 they convinced Netscape to use them as their default search, which gave them major exposure. One popular feature of Infoseek was allowing webmasters to submit a page to the search index in real time, which was a search spammer's paradise.
AltaVista:

AltaVista debut online came during this same month. AltaVista brought many important features to the web scene. They had nearly unlimited bandwidth (for that time), they were the first to allow natural language queries, advanced searching techniques and they allowed users to add or delete their own URL within 24 hours. They even allowed inbound link checking. AltaVista also provided numerous search tips and advanced search features.
Due to poor mismanagement, a fear of result manipulation, and portal related clutter AltaVista was largely driven into irrelevancy around the time Inktomi and Google started becoming popular. On February 18, 2003, Overture signed a letter of intent to buy AltaVista for $80 million in stock and $60 million cash. After Yahoo! bought out Overture they rolled some of the AltaVista technology into Yahoo! Search, and occasionally use AltaVista as a testing platform.
Inktomi:
The Inktomi Corporation came about on May 20, 1996 with its search engine Hotbot. Two Cal Berkeley cohorts created Inktomi from the improved technology gained from their research. Hotwire listed this site and it became hugely popular quickly.
In October of 2001 Danny Sullivan wrote an article titled Inktomi Spam Database Left Open To Public, which highlights how Inktomi accidentally allowed the public to access their database of spam sites, which listed over 1 million URLs at that time.
Although Inktomi pioneered the paid inclusion model it was nowhere near as efficient as the pay per click auction model developed by Overture. Licensing their search results also was not profitable enough to pay for their scaling costs. They failed to develop a profitable business model, and sold out to Yahoo! for approximately approximately $235 million, or $1.65 a share, in December of 2003.
Ask.com (Formerly Ask Jeeves):

In April of 1997 Ask Jeeves was launched as a natural language search engine. Ask Jeeves used human editors to try to match search queries. Ask was powered by DirectHit for a while, which aimed to rank results based on their popularity, but that technology proved to easy to spam as the core algorithm component. In 2000 the Teoma search engine was released, which uses clustering to organize sites by Subject Specific Popularity, which is another way of saying they tried to find local web communities. In 2001 Ask Jeeves bought Teoma to replace the DirectHit search technology.
Jon Kleinberg's Authoritative sources in a hyperlinked environment [PDF] was a source of inspiration what lead to the eventual creation of Teoma. Mike Grehan's Topic Distillation [PDF] also explains how subject specific popularity works.
On Mar 4, 2004, Ask Jeeves agreed to acquire Interactive Search Holdings for 9.3 million shares of common stock and options and pay $150 million in cash. On March 21, 2005 Barry Diller's IAC agreed to acquire Ask Jeeves for 1.85 billion dollars. IAC owns many popular websites like Match.com, Ticketmaster.com, and Citysearch.com, and is promoting Ask across their other properties. In 2006 Ask Jeeves was renamed to Ask, and they killed the separate Teoma brand.
AllTheWeb

AllTheWeb was a search technology platform launched in May of 1999 to showcase Fast's search technologies. They had a sleek user interface with rich advanced search features, but on February 23, 2003, AllTheWeb was bought by Overture for $70 million. After Yahoo! bought out Overture they rolled some of the AllTheWeb technology into Yahoo! Search, and occasionally use AllTheWeb as a testing platform.
Meta Search Engines
Most meta search engines draw their search results from multiple other search engines, then combine and rerank those results. This was a useful feature back when search engines were less savvy at crawling the web and each engine had a significantly unique index. As search has improved the need for meta search engines has been reduced.
Hotbot was owned by Wired, had funky colors, fast results, and a cool name that sounded geeky, but died off not long after Lycos bought it and ignored it. Upon rebirth it was born as a meta search engine. Unlike most meta search engines, Hotbot only pulls results from one search engine at a time, but it allows searchers to select amongst a few of the more popular search engines on the web. Currently Dogpile, owned by Infospace, is probably the most popular meta search engine on the market, but like all other meta search engines, it has limited market share.
One of the larger problems with meta search in general is that most meta search engines tend to mix pay per click ads in their organic search results, and for some commercial queries 70% or more of the search results may be paid results. I also created Myriad Search, which is a free open source meta search engine without ads.
Vertical Search
The major search engines are fighting for content and marketshare in verticals outside of the core algorithmic search product. For example, both Yahoo and MSN have question answering services where humans answer each other's questions for free. Google has a similar offering, but question answerers are paid for their work.
Google, Yahoo, and MSN are also fighting to become the default video platform on the web, which is a vertical where an upstart named YouTube also has a strong position.
Yahoo and Microsoft are aligned on book search in a group called the Open Content Alliance. Google, going it alone in that vertical, offers a proprietary Google Book search.
All three major search engines provide a news search service. Yahoo! has partnered with some premium providers to allow subscribers to include that content in their news search results. Google has partnered with the AP and a number of other news sources to extend their news database back over 200 years. And Topix.net is a popular news service which sold 75% of its ownership to 3 of the largest newspaper companies. Thousands of weblogs are updated daily reporting the news, some of which are competing with (and beating out) the mainstream media. If that were not enough options for news, social bookmarking sites like Del.icio.us frequently update recently popular lists, there are meme trackers like Techmeme that track the spread of stories through blogs, and sites like Digg allow their users to directly vote on how much exposure each item gets.
Google also has a Scholar search program which aims to make scholarly research easier to do.
In some verticals, like shopping search, other third party players may have significant marketshare, gained through offline distribution and branding (for example, yellow pages companies), or gained largely through arbitraging traffic streams from the major search engines.
On November 15, 2005 Google launched a product called Google Base, which is a database of just about anything imaginable. Users can upload items and title, describe, and tag them as they see fit. Based on usage statistics this tool can help Google understand which vertical search products they should create or place more emphasis on. They believe that owning other verticals will allow them to drive more traffic back to their core search service. They also believe that targeted measured advertising associated with search can be carried over to other mediums. For example, Google bought dMarc, a radio ad placement firm. Yahoo! has also tried to extend their reach by buying other high traffic properties, like the photo sharing site Flickr, and the social bookmarking site del.icio.us.

Search Engine Marketing
Search engine marketing is marketing via search engines, done through organic search engine optimization, paid search engine advertising, and paid inclusion programs.
Paid Inclusion
As mentioned earlier, many general web directories charge a one time flat fee or annually recurring rate for listing commercial sites. Many shopping search engines charge a flat cost per click rate to be included in their databases.
As far as major search engines go, Inktomi popularized the paid inclusion model. They were bought out by Yahoo in December of 2003. After Yahoo dropped Google and rolled out their own search technology they continued to offer a paid inclusion program to list sites in their regular search results. Yahoo Search Submit is the only organic search paid inclusion program remaining from the major search providers. Search Submit is sold both on a yearly flat rate basis, and on a category based per click basis.

Pay Per Click
Pay per click ads allow search engines to sell targeted traffic to advertisers on a cost per click basis. Typically pay per click ads are keyword targeted, but in some cases, some engines may also add in local targeting, behavioral targeting, or allow merchants to bid on traffic streams based on demographics as well.
Pay per click ads are typically sold in an auction where the highest bidder ranks #1 for that keyword. Some engines, like Google and Microsoft, also factor ad clickthrough rate into the click cost. Doing so ensures their ads get clicked on more frequently, and that their advertisements are more relevant. A merchant who writes compelling ad copy and gets a high CTR will be allowed to pay less per click to receive traffic.
Overture (Formerly GoTo)

Overture, the pioneer in paid search, was originally launched by Bill Gross under the name GoTo in 1998. His idea was to arbitrage traffic streams and sell them with a level of accountability. John Battelle's The Search has an entertaining section about Bill Gross and the formation of overture. John also published that section on his blog.
“The more I [thought about it], the more I realized that the true value of the Internet was in its accountability,” Gross tells me. “Performance guarantees had to be the model for paying for media.”
Gross knew offering virtually risk-free clicks in an overheated and ravenous market ensured GoTo would takeoff. And while it would be easy to claim that GoTo worked because of the Internet bubble’s ouroboros-like hunger for traffic, the company managed to outlast the bust for one simple reason: it worked.
While Overture was wildly successful, it had two major downfalls which prevented them from taking Google's market position:
Destination Branding: Google allowed itself to grow into a search destination. Bill Gross decided not to grow Overture into one because he feared that would cost him distribution partnerships. When AOL selected Google as an ad partner, in spite of Google also growing out their own brand, that pretty much was the nails in the coffin for Overture being the premiere search ad platform.
Ad Network Efficiency: Google AdWords factors ad clickthrough rate into their ad costs, which ensures higher relevancy and more ad network efficiency. As of September 2006 the Overture platform (then known as Yahoo! Search Marketing) still did not fix that problem.
Those two faults meant that Overture was heavily reliant on it's two largest distribution partners - Yahoo! and Microsoft. Overture bought out AltaVista and AllTheWeb to try to win some leverage, but ultimately they sold out to Yahoo! on July 14, 2003 for $1.63 billion.
Google AdWords
Google AdWords launched in 2000. The initial version was a failure because it priced ads on a flat CPM model. Some keywords were overpriced and unaffordable, while others were sold inefficiently at too cheap of a price. In February of 2002, Google relaunched AdWords selling the ads in an auction similar to Overture's, but also adding ad clickthrough rate in as a factor in the ad rankings.
Affiliates and other web entrepreneurs quickly took to AdWords because the precise targeting and great reach made it easy to make great profits from the comfort of your own home, while sitting in your underwear :)
Over time, as AdWords became more popular and more mainstream marketers adopted it, Google began closing some holes in their AdWords product. For example, to fight off noise and keep their ads as relevant as possible, they disallowed double serving of ads to one website. Later they started looking at landing page quality and establishing quality based minimum pricing, which squeezed the margins of many small arbitrage and affiliate players.
Google intends to take the trackable ad targeting allowed by AdWords and extend it into other mediums. Google has already tested print and newspaper ads. Google allows advertisers to buy graphic or video ads on content websites. On January 17, 2006, Google announced they bought dMarc Broadcasting, which is a company they will use to help Google sell targeted radio ads.
On September 15, 2006, Google partnered with Intuit to allow small businesses using QuickBooks to buy AdWords from within QuickBooks. The goal is to help make local ads more relevant by getting more small businesses to use AdWords.
On March 20, 2007, Google announced they were beta testing creating a distributed pay per action affiliate ad network. On April 13, 2007 Google announced the purchase of DoubleClick for $3.1 billion.
Google AdSense
On March 4, 2003 Google announced their content targeted ad network. In April 2003, Google bought Applied Semantics, which had CIRCA technology that allowed them to drastically improve the targeting of those ads. Google adopted the name AdSense for the new ad program.
AdSense allows web publishers large and small to automate the placement of relevant ads on their content. Google initially started off by allowing textual ads in numerous formats, but eventually added image ads and video ads. Advertisers could chose which keywords they wanted to target and which ad formats they wanted to market.
To help grow the network and make the market more efficient Google added a link which allows advertisers to sign up for AdWords account from content websites, and Google allowed advertisers to buy ads targeted to specific websites, pages, or demographic categories. Ads targeted on websites are sold on a cost per thousand impression (CPM) basis in an ad auction against other keyword targeted and site targeted ads.
Google also allows some publishers to place AdSense ads in their feeds, and some select publishers can place ads in emails.
To prevent the erosion of value of search ads Google allows advertisers to opt out of placing their ads on content sites, and Google also introduced what they called smart pricing. Smart pricing automatically adjusts the click cost of an ad based on what Google perceives a click from that page to be worth. An ad on a digital camera review page would typically be worth more than a click from a page with pictures on it.
Yahoo! Search Marketing
Yahoo! Search Marketing is the rebranded name for Overture after Yahoo! bought them out. As of September 2006 their platform is generally the exact same as the old Overture platform, with the same flaws - ad CTR not factored into click cost, it's hard to run local ads, and it is just generally clunky.
Microsoft AdCenter
Microsoft AdCenter was launched on May 3. 2006. While Microsoft has limited marketshare, they intend to increase their marketshare by baking search into Internet Explorer 7. On the features front, Microsoft added demographic targeting and dayparting features to the pay per click mix. Microsoft's ad algorithm includes both cost per click and ad clickthrough rate.
Microsoft also created the XBox game console, and on May 4, 2006 announced they bought a video game ad targeting firm named Massive Inc. Eventually video game ads will be sold from within Microsoft AdCenter.

Search Engine Optimization
What is SEO?
Search engine optimization is the art and science of publishing information in a format which will make search engines believe that your content satisfies the needs of their users for relevant search queries. SEO, like search, is a field much older than I am. In fact, it was not originally even named search engine optimization, and to this day most people are still uncertain where that phrase came from.
Early SEO
Early search engine optimization consisted mostly of using descriptive file names, page titles, and meta descriptions. As search advanced on the page factors grew more important and then people started trying to aim for specific keyword densities.
Link Analysis
One of the big things that gave Google an advantage over their competitors was the introduction of PageRank, which graded the value of a page based on the number and quality of links pointing at it. Up until the end of 2003 search was exceptionally easy to manipulate. If you wanted to rank for something all you had to do was buy a few powerful links and place the words you wanted to rank for in the link anchor text.
Search Gets More Sophisticated
On November 15, 2003 Google began to heavily introduce many more semantic elements into its search product. Researchers and SEO's alike have noticed wild changes in search relevancy during that update and many times since then, but many searchers remain clueless to the changes.
Search engines would prefer to bias search results toward informational resources to make the commercial ads on the search results appear more appealing. You can see an example of how search can be biased toward commercial or informational resources by playing with Yahoo! Mindset.
Curbing Link Spam
On January 18, 2005, Google, MSN, and Yahoo! announced the release of a NoFollow tag which allows blog owners to block comment spam from passing link popularity. People continued to spam blogs and other resources, largely because search engines may still count some nofollow links, and largely because many of the pages they spammed still rank.
Since 2003 Google has came out with many advanced filters and crawling patterns to help make quality editorial links count more and depreciate the value of many overtly obvious paid links or other forms of link manipulation.
Historical, Editorial, & Usage Data
Older websites may be given more trust in relevancy algorithms than newer websites (just existing for a period of time is a signal of quality). All major search engines use human editors to help review content quality and help improve their relevancy algorithms. Search engines may factor in user acceptance and other usage data to help determine if a site needs reviewed for editorial quality and to help determine if linkage data is legitimate.
Google has also heavily pushed giving away useful software, tools, and services which allow them to personalize search results based on the searcher's historical preferences.
Self Reinforcing Market Positions
In many verticals search is self reinforcing, as in a winner take most battle. Jakob Nielsen's The Power of Defaults notes that the top search result is clicked on as often as 42% of the time. Not only is the distribution and traffic stream highly disproportionate, but many people tend to link to the results that were easy to find, which makes the system even more self reinforcing, as noted in Mike Grehan's Filthy Linking Rich.
A key thing to remember if you are trying to catch up with another website is that you have to do better than what was already done, and significantly enough better that it is comment worthy or citation worthy. You have to make people want to switch their world view to seeing you as an authority on your topic. Search engines will follow what people think.
Hypocrisy in Search
Google engineer Matt Cutts frequently comments that any paid link should have the nofollow attribute applied to it, although Google hypocritically does not place the nofollow attribute on links they buy. They also have placed their ads on the leading Warez site and continued to serve ads on sites that they banned for spamming. Yahoo! Shopping has also been known to be a big link buyer.
Much of the current search research is based upon the view that any form of marketing / promotion / SEO is spam. If that was true, it wouldn't make sense that Google is teaching SEO courses, which they do.

Google

Early Years
Google's corporate history page has a pretty strong background on Google, starting from when Larry met Sergey at Stanford right up to present day. In 1995 Larry Page met Sergey Brin at Stanford.
By January of 1996, Larry and Sergey had begun collaboration on a search engine called BackRub, named for its unique ability to analyze the "back links" pointing to a given website. Larry, who had always enjoyed tinkering with machinery and had gained some notoriety for building a working printer out of Lego™ bricks, took on the task of creating a new kind of server environment that used low-end PCs instead of big expensive machines. Afflicted by the perennial shortage of cash common to graduate students everywhere, the pair took to haunting the department's loading docks in hopes of tracking down newly arrived computers that they could borrow for their network.
A year later, their unique approach to link analysis was earning BackRub a growing reputation among those who had seen it. Buzz about the new search technology began to build as word spread around campus.
BackRub ranked pages using citation notation, a concept which is popular in academic circles. If someone cites a source they usually think it is important. On the web, links act as citations. In the PageRank algorithm links count as votes, but some votes count more than others. Your ability to rank and the strength of your ability to vote for others depends upon your authority: how many people link to you and how trustworthy those links are.
In 1998, Google was launched. Sergey tried to shop their PageRank technology, but nobody was interested in buying or licensing their search technology at that time.
Winning the Search War
Later that year Andy Bechtolsheim gave them $100,000 seed funding, and Google received $25 million Sequoia Capital and Kleiner Perkins Caufield & Byers the following year. In 1999 AOL selected Google as a search partner, and Yahoo! followed suit a year later. In 2000 Google also launched their popular Google Toolbar. Google gained search market share year over year ever since.
In 2000 Google relaunched their AdWords program to sell ads on a CPM basis. In 2002 they retooled the service, selling ads in an auction which would factor in bid price and ad clickthrough rate. On May 1, 2002, AOL announced they would use Google to deliver their search related ads, which was a strong turning point in Google's battle against Overture.
In 2003 Google also launched their AdSense program, which allowed them to expand their ad network by selling targeted ads on other websites.
Going Public
Google used a two class stock structure, decided not to give earnings guidance, and offered shares of their stock in a Dutch auction. They received virtually limitless negative press for the perceived hubris they expressed in their "AN OWNER'S MANUAL" FOR GOOGLE'S SHAREHOLDERS. After some controversy surrounding an interview in Playboy, Google dropped their IPO offer range from $85 to $95 per share from $108 to $135. Google went public at $85 a share on August 19, 2004 and its first trade was at 11:56 am ET at $100.01.
Verticals Galore!
In addition to running the world's most popular search service, Google also runs a large number of vertical search services, including:
Google News: Google News launched in beta in September 2002. On September 6, 2006, Google announced an expanded Google News Archive Search that goes back over 200 years.
Google Book Search: On October 6, 2004, Google launchedGoogle Book Search.
Google Scholar: On November 18, 2004, Google launched Google Scholar, an academic search program.
Google Blog Search: On September 14, 2005, Google announced Google Blog Search.
Google Base: On November 15, 2005, Google announced the launch of Google Base, a database of uploaded information describing online or offline content, products, or services.
Google Video: On January 6, 2006, Google announced Google Video.
Google Universal Search: On May 16, 2007 Google began mixing many of their vertical results into their organic search results.
Just Search, We Promise!
Google's corporate mission statement is:
Google's mission is to organize the world's information and make it universally accessible and useful.
However that statement includes many things outside of the traditional mindset of search, and Google maintains that ads are a type of information. This other information includes:
Email: Google launched Gmail on March 31, 2004, offering search email search and gigabytes of storage space.
Maps: On October 27, 2004, Google bought Keyhole. On February 8, 2005, Google launched Google Maps.
Analytics: On March 29, 2005, Google bought Urchin, a website traffic analytics company. Google renamed the service Google Analytics.
Radio ads: Google bought dMarc Broadcasting on January 17, 2006 .
Ads in other formats: Google tested magazine ads and newspaper ads.
Office productivity software: on March 9, 2006, Google bought Writely, an online collaborative document creating and editing software product.
Calendar: on April 14, 2006, Google launched Google Calendar, which allows you to share calendars with multiple editors and include calendars in web pages.
Checkout: On June 29, 2006, Google launched Google Checkout, a way to store your personal transaction related information online.
Paying for Distribution
In addition to having strong technology and a strong brand Google also pays for a significant portion of their search market share.
On December 20, 2005 Google invested $1 billion in AOL to continue their partnership and buy a 5% stake in AOL. In February 2006 Google agreed to pay Dell up to $1 billion for 3 years of toolbar distribution. On August 7, 2006, Google signed a 3 year deal to provide search on MySpace for $900 million. On October 9, 2006 Google bought YouTube, a leading video site, for $1.65 billion in stock.
Google also pays Mozilla and Opera hundreds of millions of dollars to be the default search provider in their browsers, bundles their Google Toolbar with software from Adobe and Sun Microsystems, and pays AdSense ad publishers $1 for Firefox + Google Toolbar installs, or up to $2 for Google Pack installs.
Google also builds brand exposure by placing Ads by Google on their AdSense ads and providing Google Checkout to commercial websites.
Google Pack is a package of useful software including a Google Toolbar and software from many other companies. At the same time Google helps ensure its toolbar is considered good and its competitors don't use sleazy distribution techniques by sponsoring StopBadware.org.
Google's distribution, vertical search products, and other portal elements give it a key advantage in best understanding our needs and wants by giving them the largest Database of Intentions.
Editorial Partnerships
They have moved away from a pure algorithmic approach to a hybrid editorial approach. In April of 2007, Google started mixing recent news results in their organic search results. After Google bought YouTube they started mixing videos directly in Google search results.
Webmaster Communication
Since the Florida update in 2003 Google has looked much deeper into linguistics and link filtering. Google's search results are generally the hardest search results for the average webmaster to manipulate.
Matt Cutts, Google's lead engineer in charge of search quality, regularly blogs about SEO and search. Google also has an official blog and has blogs specific to many of their vertical search products.
On November 10, 2004, Google opened up their Google Advertising Professional program.
Google also helps webmasters understand how Google is indexing their site via Google Webmaster Central. Google continues to add features and data to their webmaster console for registered webmasters while obfuscating publicly available data.
For an informal look at what working at Google looked like from the inside from 1999 to 2005 you might want to try Xooglers, a blog by former Google brand manager Doug Edwards.
Information Retrieval as a Game of Mind Control
In October of 2007 Google attempted to manipulate the public perception of people buying and selling links by announcing that they were going to penalize known link sellers, and then manually editing the toolbar PageRank scores of some well known blogs and other large sites. These PageRank edits did not change search engine rankings or traffic flows, as the PageRank update was entirely aesthetic.

Yahoo!
Getting Into Search
Yahoo! was founded in 1994 by David Filo and Jerry Yang as a directory of websites. For many years they outsourced their search service to other providers, considering it secondary to their directory and other content features, but by the end of 2002 they realized the importance and value of search and started aggressively acquiring search companies.
Overture purchased AllTheWeb and AltaVista in 2003. Yahoo! purchased Inktomi in December, 2002, and then consumed Overture in July, 2003, and combined the technologies from the various search companies they bought to make a new search engine. Yahoo! dumped Google in favor of their own in house technology on February 17, 2004.
Getting Social
In addition to building out their core algorithmic search product, Yahoo! has largely favored the concept of social search.
On March 20, 2005 Yahoo! purchased Flickr, a popular photo sharing site. On December 9, 2005, Yahoo! purchased Del.icio.us, a social bookmarking site. Yahoo! has also made a strong push to promote Yahoo! Answers, a popular free community driven question answering service.
Yahoo! has a cool Netrospective of their first 10 years, a brief overview of their corporate history here, and Bill Slawski posted a list of many of the companies Yahoo! consumed since Overture.
On July 2, 2007, Yahoo! launched their behaviorally targeted SmartAds product.

Microsoft
In 1998 MSN Search was launched, but Microsoft did not get serious about search until after Google proved the business model. Until Microsoft saw the light they primarily relied on partners like Overture, Looksmart, and Inktomi to power their search service.
They launched their technology preview of their search engine around July 1st of 2004. They formally switched from Yahoo! organic search results to their own in house technology on January 31st, 2005. MSN announced they dumped Yahoo!'s search ad program on May 4th, 2006.
On September 11, 2006, Microsoft announced they were launching their Live Search product.
Other Engines
One would be foolish to think that there is not a better way to index the web, and a new creative idea is probably just under our noses. The fact that Microsoft is making a large investment into developing a new search technology should be some cause for concern for other major search engines.
Through this course of history many smaller search engines have came and went, as the search industry has struggled to find a balance between profitability and relevancy. Some of the newer search engine concepts are web site clustering, semantics, and having industry specific smaller search engines / portals, but search may get attacked from entirely different angles.
On October 5, 2004 Bill Gross ( the founder of Overture and pioneer of paid search) relaunched Snap as a search engine with a completely transparent business model (showing search volumes, revenues, and advertisers). Snap has many advanced sorting features but it may be a bit more than what most searchers were looking for. People tend to like search for the perceived simplicity, even if the behind the scenes process is quite complex.
Outside of technology there are four other frontiers search is being attacked / commoditized from
Browser & Software Distribution: Search companies are paying computer manufacturers or software companies an aggregated value of hundreds of millions or billions of dollars each year to bundle their search toolbar with their products.
Social Search: Large social networks have significant reach and a ton of page views. Yahoo! is rumored to be entertaining buying social network Facebook nearly a billion dollars. Yahoo! has already bought social picture site Flickr and social bookmarking site Del.icio.us. In August of 2006 Google signed a 3 year $900 million contract to provide search and advertising on MySpace. In addition some companies, like Eurekster, are trying to create products which allow groups of webmasters to make topic or community specific search services.
Content Providers: Some content providers are trying to publish content on their own domains and build off their brand. Some are refusing to be included in search indexes. Some are requiring a kickback to be indexed. Some are unsure of what they want and are choosing to sue search engines, either for further brand exposure, or to gain further negotiation leverage.
Content Aggregators: Search is just one way of finding information. Via RSS feeds and various other technologies many sites are offering what some people consider persistent search, or a way to access any information about a specific topic as it becomes available. Google also bought YouTube for $1.65 in stock. YouTube consists largely of pirated content which Google can organize and publish ads against based on usage data and other forms of ad targeting.

Search & Legal Issues
Privacy
In 2005 the DoJ obtained search data from AOL, MSN, and Yahoo!. Google denied the request, and was sued for search data in January of 2006. Google beat the lawsuit and was only required to hand over a small sample of data.
In August of 2006 AOL Research released over 3 months worth of personal search data by 650,000 AOL users. A NYT article identified one of the searchers by name. In 2007 the European Union aggressively probed search companies aiming to limit data retention and maintain searcher privacy rights.
Publishing & Copyright Lawsuits
As more people create content attention is becoming more scarce. Due to The Tragedy of the Commons many publishing businesses and business models will die. Many traditional publishing companies enjoyed the profits enabled by running what was essentially regionally based monopolies. Search, and other forms of online media, allow for better targeting and less wasteful / more efficient business models. Due to growing irrelevancy, a fear of change, and a fear of disintermediation, many traditional publishing companies have fought search.
In an interview by Danny Sullivan, Eric Schmidt stated he thought many of the lawsuits Google face are business deals done in a court room.
Newspapers
In September of 2006 some Belgian newspaper companies won a copyright lawsuit against Google News which makes Belgium judges look like they do not understand how search or the internet work. Some publisher groups are trying to create an arbitrary information access protocol, Agence France Presse (AFP) sued Google to get them to drop their news coverage, and Google paid the AP a licensing fee.
Books
In September of 2005 the Authors Guild sued Google. In October of 2005 major book publishing houses also sued Google over Google Print.
Photos
Perfect 10, a pornography company, sued Google for including cached copies of stolen content in their image index, and for allowing publishers to collect income on stolen copyright content via Google AdSense.
Access to Hate Information
In May of 2000 a French judge required Yahoo! to stop providing access to auctions selling Nazi memorabilia.
Many requests for information removal are covered on Chilling Effects and by the EFF. Eric Goldman tracks these cases as well.
Pay Per Click & Ad Targeting Lawsuits
In 1999 Playboy sued Excite and Netscape for selling banner impressions sold for searches for Playboy.
Overture sued Google for patent infringement. Just prior to Google's IPO they settled with Yahoo! (who by then bought out Overture) by giving them 2.7 million shares of class A Google stock.
Geico took Google to court in the US for trademark violation because Google allowed Geico to be a keyword trigger to target competing ads. Geico lost this case on December 15, 2004. Around the same time Google lost a similar French trademark case filed by Louis Vuitton.
Lane's Gifts sued Google for click fraud, but did not have a strong well put together case. Google's lawyers pushed them into a class wide out of court settlement of up to $90 million in AdWords credits. The March 2006 settlement aimed to absolve Google of any clickfraud related liabilities back through 2002, when Google launched their pay per click model.
Search User Information
The US government requested that major search companies turned over a significant amount of search related data. Yahoo!, MSN, and AOL gave up search data. The Google blog announced that Google fought the subpoena
In August, Google was served with a subpoena from the U. S. Department of Justice demanding disclosure of two full months’ worth of search queries that Google received from its users, as well as all the URLs in Google’s index.
A judge stated that Google did not have to turn over search usage data.
AOL not only shared information with the government, but AOL research also accidentally made search records public record.
Search as a Commoditizer
Each search company has its own business objectives and technologies to define relevancy. The three biggest issues search engines are fighting are
Publishing Rights: All search engines are fighting trying to gain the rights to index quality content. Some of the highest quality content is so expensive to create and market that there is not a business model for openly sharing it on the web. Worse yet, as more and more people get into web publishing the businesses that delay to get their content indexed will have lost authority and distribution the whole time they delayed. This, and the fear of disintermediation, are part of the reason there are so many lawsuits.
Distribution: The more distribution you have the more profit you can use to leverage the ability to buy more content or make better content partnerships. Also more distribution means that you can potentially send more visitors (and thus profit) to a person who lets you index their content. More usage data may also help engines improve their relevancy algorithms.
Ad Network Size & Efficiency: Efficient ad networks can afford to pay for more distribution, and thus help the search company gain more content and distribution.
In order to try to lock users in search engines offer things like free email, news search, blogging platform, content hosting, office software, calendars, and feature rich toolbars. In some cases the software or service is not only free, but it is expensive to provide. For example, Google does not profit from Google news, but they had to pay the AP content licensing fees, and hosting Google Video can't be cheap.
In an attempt to collect more data, better target ads, and improve conversion rates Google offers
a free analytics product
free cross platform tracking
free Wifi internet access in San Francisco and Mountainview
a free wallet product which makes it quick and easy to buy products
The end goal of search is to commoditize the value of as many brands and markets as possible to keep adding value to the search box. They want to commoditize the value of creating content and increase the value of spreading ideas, the value of attention, and the importance of conversion.
As they make the network more and more efficient they can eat more and more of the profits, which was a large part of the reasoning behind Jakob Nielson's Search Engines as Leeches on the Web.
Selling Search as an Ecosystem
Because search aims to gain distribution by virtually any means possible the search engines that can do the best job of branding and get people to believe most in their goals / ideals / ecosystem win. Search engines are fighting many ways on this front, but not all of them are even on the web. For example, search engines are trying to attract the smartest minds by sharing research. Google goes so far as offering free pizza!
Google hires people to track webmaster feedback across the web. Matt Cutts frequently blogs about search and SEO because to him it is important for others to see search, SEO, and Google from his perspective. He offers free tips on Google Video in no small part because it was important for Google Video to beat out YouTube for Google to become the default video platform on the web. Once it was clear that Google lost the video battle to YouTube Google decided to buy them.
Beyond just selling their company beliefs and ideology to get people excited about their field, acquire new workers, and get others to act in a way that benefits their business model search engines also provide APIs to make portions of their system open enough that they can leverage the free work of other smart, creative, and passionate people.
Selling search as an ecosystem goes so far that Google puts out endless betas, allowing users to become unpaid testers and advocates of their products. Even if the other search engines matched Google on relevancy they still are losing the search war due to Google's willingness to take big risks, Google's brand strength, and how much better Google sells search as an ecosystem.
Extending Search
Google wants to make content ad supported and freely accessible. On October 9, 2006, Google announced they were acquiring YouTube for $1.65 billion in stock. In March, 2007,Viacom sued Google / YouTube for $1 billion for copyright infringement. In 2007 Microsoft pushed against Google's market position calling Google a copyright infringer (for scanning books) and doing research stating that many of Google's blogspot hosted blogs are spam.
Social Search
In 2006 and 2007 numerous social bookmarking and decentralized news sites became popular. Del.icio.us, a popular social bookmarking site, was bought out by Yahoo. Digg.com features fresh news and other items of interest on their home page based on user votes.

Text REtrieval Conference (TREC):
In 1992 TREC was launched to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. In addition to helping support the evolution of search they also create special tracks for vertical search and popular publishing models. For example, in 2006 they created a blog track. Past TREC publications are posted here.
Other Search Conferences:
There are a number of other popular conferences covering information retrieval.
ACM SIGIR
Search Engine Meeting
AirWeb
Search Science lists a number of conferences on the right side of the Search Science blog.
There are also a number of conferences which talk about search primarily from a marketer's perspective. The three most well known conferences for are
PubCon - hosted by Brett Tabke, owner of WebmasterWorld.
Search Engine Strategies
Search Marketing Expo - hosted by Danny Sullivan, editor of Search Engine Land
Sources and Further Reading:
Many of the following have not been updated in years, or only cover a partial timeline of the search space, but as a collection they helped me out a lot. SearchEngineWatch is amazingly comprehensive if you piece together all of the articles Danny Sullivan has published.
Archive.org - The Internet Archive Wayback Machine (especially useful to view old content if any resource links break).
SearchEngineWatch - Danny Sullivan's site about search. Here are some important events from his first decade of writing about search.
Where Are They Now? Search Engines We've Known & Loved - Danny Sullivan reflects on some of the search engines that have passed away.
Google Corporate History - The history of Google, from 1995 to today.
Shmula - timeline of Google, Microsoft, and Yahoo! acquisitions.
SEO Consultants - Offers a timeline of important events associated with the history of search and the internet. Up to date through 2006.
Story about the foundation and history of Ask - by Kelli Anderson, through 2006.
John Wiley & Sons - by Wes Sonnenreich, a history of search through 1997.
Google Blogoscope - by Philipp Lessen, a history of search through 2003.
CBS MarketWatch [PDF] - colored timeline of search through 2004.
Wikipedia: Search Engines & Timeline of Hypertext
Search Engines: Evolution & Revolution - by Glen Farrelly, covers 1990 through 1999.
Enquiro Eye Tracking Study - report of how humans interact with search results.
History of the Internet, Internet for Historians (and just about everyone else) - Richard T. Griffith's article about the history of the web, from 1991 to 2001.
Searching on the Web - Link to a page that is no longer active, but which content still exists on the wonderful Archive.org. Contents are from 1960's to 2000.
W3C: A Little History of the World Wide Web - by Robert Cailliau, from 1945 to 1995 .
Search History Page from Sympatico - Link to a page that is no longer active, but which content still exists on the wonderful Archive.org. Contents are from 1990 to 2003.
Hobbes' Internet Timeline - by Robert H'obbes' Zakon, from 1957 through 2004.
How Yahoo Blew It - Wired article about how Google beat out Yahoo to win the search market race
A Brief History of the Internet - by Walt Howe, from 1960 through 2006.
The Sergey Brin Story
Search Industry Explained - Search and Go discusses search history and SEO history
ISOC Internet Histories - list of various histories of the Internet.
Books About Search
The Search - John Battelle's book about the history of search and how search intersects with media and culture.
The Google Story - David Vise's book about Google.
The Google Legacy - Stephen E. Arnold's book about why he believes Google is in a better market position than its competitors.