Search Engine History - The Early Years

Nov 10

17:08

2007

David Viney

In this first of three articles on search engine history, SEO Expert David Viney looks at the emergence of the internet and the forerunners to modern, web-based search engines (including WAIS & Archie).

So where did it all start? In the first of three posts on the history of the Search Engines, I look at the history of the internet itself and the forerunners to modern, web-based search engines (in the period immediately prior to the emergence of the first widely-used web browser). In this whistle-stop tour, I take in now forgotten tools like Archie, Veronica and WAIS.

A brief history of the Internet

The internet is arguably the greatest invention of the 20th Century, allowing the almost limitless connection of people to each other and to the resources they seek. Whilst the invention of the telegraph, telephone, radio and computer set the stage for this communications revolution, it was a series of rapid developments in technology during the 1960s that paved the way for the creation of the internet.

The grandparents of the internet were arguably J.C.R. Licklider and Leonard Kleinrock, both at the Massachusetts Institute of Technology (MIT). Licklider was the first head of the computer research program at the Defense Advanced Research Projects Agency (DARPA) and in August 1962 wrote a paper about a "Galactic Network" of globally interconnected computers, through which everyone could quickly access data and programs from any site. Kleinrock made this dream achievable through his work on packet switching theory (from 1961 to 1964) and the creation of the first (however small) wide-area computer network (or WAN), in 1965 (connecting a TX-2 computer in MIT to a Q-32 in California).

Kleinrock worked closely with Lawrence G. Roberts in the creation of the WAN and it was Roberts who went on to pen the design for ARPANET (the Advanced Research Projects Agency Network) in late 1966, collaborating increasingly with teams from the National Physics Laboratory (NPL) in the United Kingdom and the RAND Corporation (both of whom had independently developed packet switching technologies without being aware of each other’s work).

During 1968, Bolt Beranek and Newman (BBN) were selected to build ARPANET and in September 1969 the first node was installed at the University of California (UCLA). A month later, the second node was added (at Stanford Research Institute) and the first Host-to-Host message ever to be sent on the Internet was launched from UCLA. The month in which I was born!

During the period from 1970 to 1972, many computers were added to ARPANET, protocols developed and software written. In October 1972, March Ray Tomlinson at BBN developed the first email system and sent the first email (“quertyuiop”). In the following year, the first ARPANET connections outside the US were made, to NORSAR in Norway and to University College London (UCL) in the United Kingdom. For a great 1972 video documentary on ARPANET, click here.

Whilst the original ARPANET grew quickly during the 1970s, it remained mainly an academic preserve. The key next step in the development of the modern web began in 1982, with the adoption by many participants of the TCP/IP protocol, which was faster, easier to use, and less expensive to implement than earlier protocols. This in turn made it much easier for small networks to connect to the network and for those links to branch in every direction. From this point on, all networks that use TCP/IP refer to themselves as part of the Internet (rather than ARPANET) and the standardization on TCP/IP allows the number of Internet sites and users to grow exponentially.

To use an analogy, these developments created the easel but there were still precious little paints for the artist to use. Most early mass-market internet tools were overly technical in nature and difficult to use. Any of you remember terms like WAIS (wide-area search), Archie (file search), Gopher (data retrieval), Newsnet and more?

Two key tools were to change all this forever. In 1989, Tim Berners-Lee and the team at CERN (the European Particle Physics Laboratory) invented the hypertext-based World Wide Web. Four years later, in 1993, the world’s first commercial-strength web-browser, Mosaic, was launched by Marc Andreesen of the National Center for SuperComputer Applications (NCSA) in the US. Tim’s original specifications for URIs, HTTP and HTML were further refined over the coming years and Andreesen went on to develop the Netscape web browser, based on the original MOSAIC kernel.

The rest, as they say, is history! Since this point, the internet has grown exponentially. According to internetworldstats.com, in December 1995, there were just 16 million internet users (0.4% of the world population) but this had grown to 361 million by December 2000 (a 2,300% increase) and 1,018 million by December 2005.

The World's first search engines

The father and mother of the modern search engine were Archie and Veronica. Archie, developed in 1990 by Emtage, Heelan and Deutsch (students at McGill University in Montreal) was, in a sense, the world’s first search engine. Archie was a tool used to index FTP archives and allowed users to search for and find specific files. The user had to have a pretty good idea of the file name they were looking for, as Archie only indexed filenames (although wildcards were supported, which helped).

In the earliest versions of Archie, the system worked by simply running a job once a month to log-on to each of the member FTP servers and request a listing. These listings were stored in local files to be searched using the Unix grep command. Once a user had found a file in the Archie index, they had to connect to the FTP host and rummage around until they found the file they were looking for (much like the early days of Napster music file sharing nearly 10 years later). This was not for the faint-hearted and the system was only heavily used by the tech-head or academic!

The name Archie derives from the word “archive”, but became associated by users with the comic book series of the same name, created by Bob Montana (featuring the fictional teenage characters Archie Andrews, Betty Cooper, Veronica Lodge, Reggie Mantle and Forsythe "Jughead" Jones characters). As such, when Gopher began to take off in 1992, Foster and Barrie (at the University of Nevada) named their newly developed Gopher search engine Veronica after Archie’s comic book girlfriend. Officially, Veronica stood for “Very Easy Rodent-Oriented Net-wide Index to Computer Archives”.

Veronica was a constantly updated database of the names of almost every menu item on thousands of Gopher servers and could be searched directly from most major Gopher menus. Veronica was, technically, an improvement on Archie in that it (a) indexed the full title of a document rather than just the filename and (b) connected the user directly to the source file with a single click. What neither Archie nor Veronica did, however, was to fully index the target document. This meant that both lacked so-called “semantic ability” – i.e. the ability to connect documents with diverse titles but similar content.

In 1991, Brewster Kahle (at Thinking Machines) launched Wide Area Information Server (WAIS) at Xerox PARC. WAIS only enjoyed a brief presence on the stage of internet history. However, it could certainly be described as the first genuine forerunner to modern search engines, in the sense that it was the first to fully index all the text in Gopher and other internet documents. As Kahle put it at the time, he wanted users to be able to “jump into the middle of the scroll”. WAIS supplemented Veronica, which only searched the menu titles of Gopher sites, but was itself quickly rendered obsolete by the rapid growth of the World Wide Web (which came to replace or front-end all major FTP, Archie, Gopher and WAIS properties).

Comment

At time of writing, internet sales are approximately 15% of all sales in the UK (having risen by 50% over the last year). The numbers are higher still in North America. U-Switch predict 40% of all sales will be over the internet by 2020 and Google is now the world’s number one ranked brand; not bad for a business less than ten years old! Sometimes it seems incredible to me that so much has happened so quickly. My reasons for writing this series of articles is, in part, to pay testimony to those early internet and search pioneers, lest we forget their vital contributions.

In part two of the series, I review web search before Google came to dominate; looking at the first web crawler, WWW Wanderer, and early pioneers like Altavista and Northern Light.

Article "tagged" as: