The National Academies Press

Currently Skimming:

Information Retrieval: Finding Needles in Massive Haystacks
Pages 23-32

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 23... ... A large part of the problem is that information retrieval tools provide access to textual data whose meaning is difficult to model. There is no simple relational database mode} for textual information. Read the entire page →
From page 24... ... Another way to think about these retrieval problems is that word-matching methods treat words as if they are uncorrelated or independent. A query about "automobiles" is no more likely to retrieve an article about "cars" than one "elephants" if neither article contains precisely the word automobile. Read the entire page →
From page 25... ... C1 C2 1 C3 C4 C5 M1 1 o 1� 1 O O O O O 1 1 O O O O O O O O 1 1 O 1 O O O O _ _ 1 2 O O O O O O 1 O O 1 O _ _ O O 1 O O O O O O __ ~ O 1 O O O O O O _ O O O O O 1 1 1 O O O O O O O 1 1 1 ~ O ___ __ __ _ Consider a user query about "human computer interaction". Using the oldest and still most common Boolean retrieval method, users specify the relationships among query terms using the logical operators AND, OR and NOT, and documents matching the request are returned. Read the entire page →
From page 26... ... SVD is closely related to Eigen Decomposition, Factor Analysis, Principle Components Analysis, and Linear Neural Nets. We use the truncated SVD to approximate the term-by-document matrix using a smaller number of statistically derived orthogonal indexing dimensions. Read the entire page →
From page 27... ... The SVD allows a simple strategy for an optimal approximate fit. If the singular values of SO are ordered by size, the first k largest may be kept and the remainder set to zero. Read the entire page →
From page 28... ... Word-matching methods would use this matrix. For L.ST, we then compute the truncated SVD of the matrix keeping the k largest singular values and the corresponding left and right singular vectors. Read the entire page →
From page 29... ... For information retrieval applications, the singular values decrease slowly and we have never seen a sharp elbow in the curve to suggest a likely stopping value. Luckily, there is a range of values for which retrieval performance is quite reasonable. Read the entire page →
From page 30... ... 6.0 Conclusions Information retrieval and filtering applications involve tremendous amounts of data that are difficult to mode} using formal logics such as relational databases. Simple statistical approaches have been widely applied to these problems for moderate-sized databases with promising results. Read the entire page →
From page 31... ... , Overview of the Third Text REtrievaZ Conference (TREC3) National Institute of Standards and Technology Special Publication 500-225, 1995. Read the entire page →

From page 23...

... A large part of the problem is that information retrieval tools provide access to textual data whose meaning is difficult to model. There is no simple relational database mode} for textual information.

Read the entire page →

From page 24...

... Another way to think about these retrieval problems is that word-matching methods treat words as if they are uncorrelated or independent. A query about "automobiles" is no more likely to retrieve an article about "cars" than one "elephants" if neither article contains precisely the word automobile.

Read the entire page →

From page 25...

... C1 C2 1 C3 C4 C5 M1 1 o 1� 1 O O O O O 1 1 O O O O O O O O 1 1 O 1 O O O O _ _ 1 2 O O O O O O 1 O O 1 O _ _ O O 1 O O O O O O __ ~ O 1 O O O O O O _ O O O O O 1 1 1 O O O O O O O 1 1 1 ~ O ___ __ __ _ Consider a user query about "human computer interaction". Using the oldest and still most common Boolean retrieval method, users specify the relationships among query terms using the logical operators AND, OR and NOT, and documents matching the request are returned.

Read the entire page →

From page 26...

... SVD is closely related to Eigen Decomposition, Factor Analysis, Principle Components Analysis, and Linear Neural Nets. We use the truncated SVD to approximate the term-by-document matrix using a smaller number of statistically derived orthogonal indexing dimensions.

Read the entire page →

From page 27...

... The SVD allows a simple strategy for an optimal approximate fit. If the singular values of SO are ordered by size, the first k largest may be kept and the remainder set to zero.

Read the entire page →

From page 28...

... Word-matching methods would use this matrix. For L.ST, we then compute the truncated SVD of the matrix keeping the k largest singular values and the corresponding left and right singular vectors.

Read the entire page →

From page 29...

... For information retrieval applications, the singular values decrease slowly and we have never seen a sharp elbow in the curve to suggest a likely stopping value. Luckily, there is a range of values for which retrieval performance is quite reasonable.

Read the entire page →

From page 30...

... 6.0 Conclusions Information retrieval and filtering applications involve tremendous amounts of data that are difficult to mode} using formal logics such as relational databases. Simple statistical approaches have been widely applied to these problems for moderate-sized databases with promising results.

Read the entire page →

From page 31...

... , Overview of the Third Text REtrievaZ Conference (TREC3) National Institute of Standards and Technology Special Publication 500-225, 1995.

Read the entire page →

← Previous Chapter Skim

Next Chapter Skim →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.

Information Retrieval: Finding Needles in Massive Haystacks Pages 23-32

Information Retrieval: Finding Needles in Massive Haystacks
Pages 23-32