Skip to main content

Currently Skimming:

7 Internet Navigation: Current State
Pages 313-348

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 313...
... , site maps, and search engines. Because they are usually limited to the contents of the site, the problems of general-purpose Web navigation aids are diminished.
From page 314...
... Directory 2 Human/computer Hierarchical or Fuzzy multi-hierarchical 7. Search engine 2 Computer Inverted Ranked a"KEYWORD" is capitalized to distinguish it from the use of keywords in traditional information retrieval or in Internet search engines (see Sections 7.1.4 and 7.1.7)
From page 315...
... There is no publicly available Internet-wide file of links; they are maintained locally. However, linkage information is collected and used by all major search engines as an important part of the ranking of responses.
From page 316...
... For these reasons, bookmarks may become less useful with the scaling and maturing of the Internet, leading users to rely on search engines to find even familiar sites and Web pages. The bookmark/favorite mechanism as implemented in current browsers and described above is fairly weak, providing a simple association between a name (provided by either the user or the Web page)
From page 317...
... It is also used in this sense in search engine marketing to refer to the search terms for which a particular marketer is willing to pay.4 However, "keyword" has also been used to denote terms in a controlled vocabulary linked to specific Internet locations by a specific Internet (generally, Web) service.
From page 318...
... KEYWORDS have been replaced in most cases -- except for services catering to non-English language users8 and AOL -- by search engines, which provide a wider-ranging response to keyword terms, and by the sale of search engine keywords to multiple bidders. KEYWORDS have many of the same strengths and weaknesses as domain names for navigation.
From page 319...
... -- the programming language of Web site construction -- specifies the expression of metadata in the form of "metatags" that are visible to search engines (as they collect data from the Web -- see Box 7.2) but are not typically displayed to humans by browsers.
From page 320...
... The first approach to coordination is for organizations to collaborate in defining common metadata elements that will be used by all of them as a core for their metadata schemes. The best known and best developed of these is the Dublin Metadata Core Element Set, known as the Dublin Core,14 so named because it originated at a meeting in Dublin, Ohio, that was sponsored by the Online Computer Library Center (OCLC)
From page 321...
... To achieve wide adoption, some believe that it needs to be made more suitable to machine-generated descriptions.15 The second approach to coordination is to provide a higher-level structure that can incorporate multiple metadata schemes, enabling them to be deployed in combination to describe a resource with the assurance that the resultant description will be correctly interpreted by any computer program that is compatible with the higher-level structure. The best known and best developed of these higher-level structures is the Resource Description Framework (RDF)
From page 322...
... While its area of application is far broader than navigation, its developers foresee, for example, that software agents will "use this information to search, filter, and prepare information in new and exciting ways to assist the Web user."2 Like metadata and RDF, the applicability and feasibility of the Semantic Web remains the subject of dis pute between its advocates and the skeptics.3 The practical implementation and use of the Semantic Web is highly dependent on the broad adoption of RDF and the ontologies it requires. That work has proceeded slowly thus far.
From page 323...
... As both the creators and the users of the metadata, the self-interest of cohesive communities leads them to want trustworthy metadata and to provide the resources needed to create and keep them current and accurate.20 Solving that three-component problem is more difficult for the general Web user "community." Metadata would either have to be supplied by independent editors (as it is now for use in directory services) or applied by the resource providers and collected automatically by search engines.
From page 324...
... As a result of the heavy requirements for skilled labor, Internet directories can include only a small selection of all the sites connected by the Web. However, in contrast to search engines, they have the advantage of being able to incorporate listings of many Web sites in the "dark" Web (see "The Deep, Dark, or Invisible Web" in Section 7.1.7)
From page 325...
... 27 to reexamine analogies to the time-tested "yellow pages" model for the Internet. Whereas search engines are modeled on a "search through the visible Web and see where appropriate material can be found" model (augmented, as discussed below, by paid placements)
From page 326...
... Netscape began an ambitious directory project using volunteers and called Open Directory, which continues to this day and is incorporated into Google. 7.1.7 Navigation via Search Engines Search engines rely on indices generated from Web pages collected by software robots, called crawlers or spiders.31 These programs traverse the Web in a systematic way, depositing all collected information in a central repository where it is automatically indexed.32 The selection and ranking of Web pages to include in the response to a query are done by programs that search through the indices, return results, and sort them according to a generally proprietary ranking algorithm.
From page 327...
... Algorithmic Search Because they are automated, search engines are the only currently available navigation aids capable of finding and identifying even a moderate fraction of the billions of Web pages on the public Internet. To index and retrieve that much information from the Web, search engine developers must overcome unique challenges in each of the three main parts that make up a search engine: the crawler, the indexer, and the query engine.
From page 328...
... The frequency with which Web pages are re-crawled directly affects the fresh ness of the results returned from a search engine query. Once the Web pages are retrieved, indexing programs create a word in dex of the Web by extracting the words encountered on each Web page and recording the Uniform Resource Locator (URL)
From page 329...
... Consequently, it is in general users' interest that multiple 34 See "iProspect Search Engine Branding Survey," reported in "iProspect Survey Confirms Internet Users Ignore Web Sites Without Top Search Engine Rankings," iProspect press release, November 14, 2002, available at .
From page 330...
... Very skilled and experienced users might even want to know the criteria by which a search engine ranks its results, enabling them to choose the search engine whose criteria best meets their needs. However, commercial search services treat the details of their ranking algorithms as proprietary since they are a primary means of the services differentiating themselves from their competitors and of minimizing the capacity of Web site operators to "game" the system to achieve higher ranks.
From page 331...
... by adopting means to improve their rankings.36 This has led to the development of search engine optimization in which the site design is optimized to include simple, common 35See Stefanie Olsen, "Search Engines Rethink Paid Inclusion," c/net news.com, June 23, 2004, available at . 36See Mylene Mangalindan, "Playing the Search-Engine Game," Wall Street Journal, June 16, 2003, p.
From page 332...
... Second, a large majority of the information potentially reachable on the Web is not visible to them. The parts that they cannot see are called the "deep," the "dark," or the "invisible" Web.38 Various estimates place the size of the invisible Web at hundreds of times larger than the visible or public World Wide Web.39 Web pages can be invisible to search engines for a variety of reasons.40 A primary reason is the increasing use of databases to deliver content 37 See, for example, Google's guidelines at
From page 333...
... Thus, engines41 cannot crawl inside searchable databases such as library catalogs, the Thomas register of manufacturing information, or indexes of journal literature. A search engine query on "Shakespeare" may retrieve sites that specialize in Shakespearean memorabilia (as described in their Web pages)
From page 334...
... Some of these challenges are being met by specialty search engines, which go beyond the features presented by Google. One of these is Daypop,48 which uses its own kind of link analysis to identify Web logs that are pointed to other Web log sites from their front pages, rather than from archived or back 47Scholarly Publishing and Academic Resources Coalition, "The Case for Institutional Repositories: A SPARC Position Paper," available at .
From page 335...
... 50See Deborah Fallows, Lee Rainie, and Graham Mudd, "The Popularity and Importance of Search Engines," data memo, Pew Internet & American Life Project, August 2004, available at . The results came both from a telephone survey of 1399 Internet users and from tracking of Internet use by comScore Media Metrix.
From page 336...
... Two-thirds of Americans who are online say they use search engines at least twice a week. · Using search engines is second only to using e-mail as the most popular Internet activity, except when major news stories are breaking, when getting the news online surpasses using search engines.
From page 337...
... 52 See "iProspect Search Engine Branding Survey," reported in "iProspect Survey Confirms Internet Users Ignore Web Sites Without Top Search Engine Rankings," iProspect press release, November 14 2002, available at .
From page 338...
... 7.2.1 The Commercial Providers of Navigation Services As noted in Section 6.2.2, the early distinctions between providers of directories and providers of search engines -- when each Web search site featured either algorithmic search engine results or human-powered directory listings54 -- have increasingly become blurred. Technology has helped to automate some of the classification processes for the Yahoo!
From page 339...
... 339 rt> Inktomi of and/or Results from Directory Directory Directory Directory Directory Provider Directory Backup n/a LookSmart Open Open Open n/a Zeal Open n/a Open Backup n/a n/a Yahoo! of Results Provider Paid Overture Overture Google Google Google Overture LookSmart Overture Overture Google Overture n/a Google Overture Jeeves Inktomi Ask of Results of: (Yahoo!
From page 340...
... 58 The new metric generated monthly by comScore Media Metrix, beginning in January 2003, provides a better measure of market share by focusing on the number of searches that a search engine handles per month rather than the number of searchers that perform at least one query on the Web search site. The Web search site queries are based on a panel of 1.5 million Web users located within the United States and in non-U.S.
From page 341...
... Typically, those pay-for-access companies also provide other services, such as training, documentation, and extensive customer support, to their users. 59Commercial search engine companies are exploring possibilities beyond their own search sites.
From page 342...
... Sophisticated algorithms are used by the search services to select which advertisements will appear. These algorithms take into account, among other things, the amount the advertiser is willing to pay if the user clicks on the advertisement, the relevance of the advertisement, and the historic success of the advertisement in generating clicks.
From page 343...
... The paid listings provided by Overture to its affiliated network of Web search sites, including Yahoo! , MSN, Infospace, and Alta Vista, have been estimated to have handled 46.8 percent of all U.S.-based paid searches; and the paid listings provided by Google, appearing on the search results pages of Google, AOL, Infospace and Ask Jeeves, accounted for 46.6 percent of all U.S.
From page 344...
... The model is sufficiently popular that, as noted earlier, a secondary market of search engine marketers/ optimizers has arisen to advise Web sites on how to optimize their bidding for queries.68 The details of the auction systems differ, but the advantage of auctions is that hundreds of thousands of prices can be set by actual demand rather than having to be preset and posted. Since these auctions are subject to gaming, navigation services actively watch for potential fraud by advertisers and monitor the content of advertisers with editorial spot-checking.
From page 345...
... Consolidation Over the past 4 years, there has been considerable consolidation in the search services market.69 Several large search engine service provid 69 See, for example, .
From page 346...
... has apparently decided to vertically integrate by buying both a paid-listing provider and a search engine. It is now able to produce by itself the paid listings previously supplied by an independent Overture and the algorithmic search services previously provided by Google.
From page 347...
... Innovation In the past, as described in Section 6.2, there has been a cycle of innovation, adoption, and displacement of navigation services. It began when some new search engine or directory emerged with new technology, or a better user interface, or both than the incumbent-favored service.
From page 348...
... Conclusion: The importance of the Internet as the infrastructure linking a growing worldwide audience with an expanding array of resources means that improving Internet navigation will remain a profitable goal for commercial developers and a challenging and socially valuable objective for academic researchers. Conclusion: Since competition in the market for Internet navigation services promotes innovation, supports consumer choice, and prevents undue control over the location of and access to the diverse resources available via the Internet, public policies should support the competitive marketplace that has emerged and avoid actions that damage it.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.