Skip to main content

Currently Skimming:

Information Retrieval and the Statistics of Large Data Sets
Pages 143-148

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 143...
... The distance of actual documents to this point is used as a measure of relevance. Probabilistic models attempt to estimate, for instance, the conditional probability of seeing particular words in relevant and nonrelevant documents.
From page 144...
... Statistical IR methods developed over the past 30 years are suddenly being widely applied in everything from shrinkwrapped personal computer software, up to large online databases (Dialog, Lexis/Ne~s, and West Publishing all fielded their first statistical IR systems in the past three years) and search tools for the Internet.
From page 145...
... , drawing conclusions from databases that mix text and formatted data, and choosing what information sources to search in the first place. On the tools side, a range of powerful techniques from statistics have seen relatively little application in IR, including cross-validation, mode!
From page 146...
... (73 Donna Harman. Overview of the third Text REtrieval Conference (TREC-3)
From page 147...
... Recent trends in hierarchic document clustering: A critical review. Information Processing and Management, 24~5)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.