What's the difference between a spider, a search engine and a directory?

The Internet is a many-headed beast: it enables users all over the world to communicate with each other, to play games online, to surf the Web or plug in to music and movies. It also happens to be the world's largest and most-dynamic database, containing information on virtually everything under the sun. The World Wide Web today consists of something between 200 and 250 million pages, with hundreds more springing up every day!

Now, while this is all fine and dandy, an interesting and important question arises: how does one cut through this information overload, filter out irrelevant material and identify relevant and useful data? And the answer to this question [because, like princesses and frogs, questions and answers must go together] is a search engine, which will scour the Internet for the information you require and present it to you, neatly sorted and tabulated!

So this week, we're going to tell you a little about how these magical creatures work, and also take them for a little test-drive so see if they are really worthy of the glorious send-up we gave them two sentences ago :-)

First of all, let's talk a little about the various types of search engines:

Engines: These are automatically-generated, maintained and updated lists of Web pages and Web sites. They are also called ‘spiders' or ‘crawlers'. Examples of these would be AltaVista or Lycos. A search on one of these engines usually results in a large number of matches, many of which are irrelevant.

Directories: These are not search engines per se, but rather databases of Web sites, classified into different categories. Human intervention is usually required for the classification part of the process. Yahoo! is an example of a directory. The human role here can sometimes result in more focused results, as compared to a spider.

Metasearch engines: These are engines that process queries on more than a single engine at a time. For example, a query for "Italian recipes" on a metasearch engine like Dogpile would display results from AltaVista, Infoseek, Webcrawler, Lycos and other engines. The advantage of a metasearch engine is that the user does not need to remember the specific search syntax for each and every engine he uses; all he needs to know is the syntax for the metasearch engine, and the metasearch engine automatically sends a correctly-formatted query to the others.

Hybrids: Many search engines also include their own directories; for example, Infoseek has its own directory of Web sites, indexed and catalogued into different categories. Such engines are called "hybrids" as they display the features of both spiders and directories.

Accurate search engines: The last type of engine, this one is yet to be found anywhere in cyberspace; and it seems that we will have to wait a few more years for it to evolve ;-)

And now for a little primer on how they work:

Search engines consist of the following 3 components:

The spider: The spider is a program designed to look for Web pages, follow each and every link on those pages and all subsequent links and send this information back to the engine's index. Spiders "crawl" Web sites at periodic intervals, perhaps every two or three weeks.

The index: The index is a database maintained by each engine, and contains the information sent back by the spider on its winding journey through the Web. This index is updated dynamically as the information on the Web changes. And even if a page exists on the ‘Net, but does not form a part of the index, it will not show up in a search on that particular engine.

The search processor: This is the software which actually converts your query into a suitable format for processing, looks it up in the index and ranks and presents the results generated in HTML format on a Web page.

By contrast, the procedure at a directory is much simpler: Web sites are either submitted to the directory owner, or the administrators themselves review sites and classify them appropriately in the database. This lack of automation has both advantages and disadvantages: on the upside, queries are likely to return far better results, but at the same time, it takes much longer to get your site entered in the database.

There are a couple more things to remember here:

No search engine can index the entire World Wide Web. Show us an engine which claims to do that, and we'll show you a liar! ;-)

Every search engine has its own peculiar syntax, so you need to be familiar with the syntax of your favourite search engine. Metasearch engines eliminate the need for this, and many of them even have natural-language search capabilities, which makes things much simpler!

In our travels across cyberspace, we have come to rely on AltaVista, Infoseek, Lycos, Yahoo! and Dogpile for all our needs. A search carried out across all these five engines invariably gives us what we need; and, in particular, we like AltaVista, which has to be one of the most powerful and flexible engines in existence!

Here's a list of popular search engines and directories:

Search engines: AltaVista [ http://www.altavista.com ] Lycos [ http://www.lycos.com ] Hotbot [ http://www.hotbot.com ] Infoseek [ http://www.infoseek.com ] Excite [ http://www.excite.com ] Google [ http://www.google.com ]

Directories: Yahoo! [ http://www.yahoo.com ] Netcenter [ http://www.netscape.com ] Infoseek [ http://www.infoseek.com ] Lycos [ http://www.lycos.com ] Infoseek [ http://www.infoseek.com ] Excite [ http://www.excite.com ] About.com [ http://www.about.com ]

Metasearch engines: Dogpile [ http://www.dogpile.com ] Metacrawler [ http://www.metacrawler.com ] Ask Jeeves! [ http://www.askjeeves.com ]

As you can see, quite a few of the engines above span categories. For example, AltaVista works both as a search engine and a directory, as does Infoseek. And many of these offer multiple methods of searching - Ask Jeeves! and AltaVista can respond to natural languages queries ["Where do I find Italian recipes?"] - but you can also use special search operators on AltaVista to filter and focus your search [+Italian +recipes -pasta]

And now for our test:

We suddenly needed to find information on something called DQPSK or differential quadrature-phase shift keying, a communications protocol used in cellular technology. So we conducted a search on AltaVista, Infoseek, Yahoo!, Lycos and Infoseek for the terms "DQPSK digital communications FAQ".

And then we thought we'd see if we could find some good-looking digital art on the Net...so we looked for "abstract digital artwork royalty free". And we measured relevance on a scale of 1 to 5, 1 being the least relevant.

We found that AltaVista and Infoseek provided us with fairly good links on both cases, while the rest failed quite miserably ;-)

Engine | Digital art | DQPSK | Relevance score

AltaVista | 3,45,321 hits | 1,70,364 hits | 4 Infoseek | 2,22,94,939 | 96,27,580 | 4 Yahoo! | 5,86,660 | 1,34,564 | 2 Hotbot | 179 | 9 | 2 Lycos | 98 | 2 | 1

In case you're curious and would like to know a little more about how search engines work, we can recommend a great site at http://www.searchenginewatch.com. Search Engine Watch contains a large amount of information useful to both the webmaster and the novice about how search engines rank pages, how best to promote your site and much more!

And if you have voyeuristic tendencies, then you must visit Web Voyeur [ http://www.webcrawler.com/SearchTicker.html ], a site which allows you to watch what others are searching for online. We guarantee that some of the queries you will see will leave you scratching your head in amazement :-)

And since we always like to end on a high note, we thought that we would also send a certain three-letter word, everyone's favourite pastime, to each of the major search engines, just to see what turned up ;-)

The search term was quite simple; the three letters "sex"...and the results quite remarkable!

AltaVista found 14,303,380 matches Infoseek found 3,773,881 matches Yahoo got 234 matches Hotbot found 1,729,284 matches and Excite, 584,942!

And while we're quite sure that this says something important about the human race, we're not too sure just what it means...:-)

Till next time, stay healthy !

This article was first published on 25 Oct 1998.