Skip to main content

Currently Skimming:

Linguistic Transformations for Information Retrieval
Pages 937-950

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 937...
... HARRIS ABSTRACT. This paper discusses the application to information retrieval of a particular relation in linguistic structure, caked transformations., The method makes possible the reduction of a text, in particular scientific texts, tO a sequence of kernel sentences, which is roughly equivalent in information to the original text.
From page 938...
... Any property of language which is stated only in terms of the relative occurrence of physically describable parts is caLed "formal." Each language is found to have a formal structure, and the structures of various languages are similar to each other In various respects. In each language we find that we can set up and classify elements called "morphemes" (word parts, such as prefixes, or indecomposable words)
From page 939...
... It might seem that not much can be done with this: the satisfaction list for each construction is very large; and it is impossible to say that one or another n-tuple does not satisfy the construction. Thus we cannot say that the noun phrases charged room or atoms evicting overtime, would not occur somewhere.
From page 940...
... If we can suitably mark homonymities, we obtain that every sentence structure is a unique sum of products oftransformations. The set of transformations is then a quotient set of the set of sentences, and under the natural mapping of the set of sentences onto the set of transformations, those sentences which are carried into the identity transformation are the kernel of the set of sentences.
From page 941...
... For scientific, factual, and logical material, however, it seems that the relevant information is held constant under transformation, or is varied in a way that depends explicitly on the transformation used. This means that a sentence, or a text, transformed into a sequence of kernels carries approximately the same information as did the original.8 It is for this reason that a problem like information retrieval, which deals with content, can be treated with formal methods—precisely because they simplify linguistic form while leaving content approximately constant.
From page 942...
... It was found that reduction of texts to kernels yielded stretches too small for efficient retrieval. Consider, for example, the sentence: The optical rotatory power of proteins is very sensitive to the experimental conditions under which it is measured, particularly the wavelength of light which is used.
From page 943...
... We would like to obtain larger kernels, preferably of the size and structure that would provide separate kernels for the separate requests of information search. Larger kernels can be obtained simply by omitting some of the transformations, for each omission of a transformation would leave some section or distinction intact.
From page 944...
... Adjoining these kernels are sections which are adjuncts of them or separate kernels connected to them, and which in many cases contain at most one ofthe words which are centers of the main kernels. These sections often report conditions, detailed operations, and the like, which apply to the main kernel.
From page 945...
... This characterization of informational statuses is tentative and rough, but the relevant fact is that properties of the type mentioned in each case can be recognized by means of the comparison operation introduced above. To the extent that there is a correlation between the types of word recurrence and the informational status of kernels, it will be possible to set up the comparison operation in such a way as to make the desires!
From page 946...
... in the 0 section of the kernel which it contained, or else omitted altogether. As an example, we take a sentence drawn from the same text as the previous examples: One phase of this research, the dependence of the rotatory properties of N1 N2 proteins on wavelength, is recorded here because it is of specialimportance V3 C NlV4 to the problem at hand.
From page 947...
... On this basis, for example, kernel 2 above turns out to be entirely a repetition of other kernels in the same article, and can therefore be omitted. Such a synonym list goes part way toward indicating logical equivalence between sentences, but only in the direction and to the extent that scientific writing actually permits.
From page 948...
... The general question about machine performance of these operations hinges on whether a decision procedure can always be found for the requisite work, and on whether it would be sufficiently short in all cases. Much of the analysis of language structure is based upon comparison of the positions of a great many words In a great many sentences.
From page 949...
... These transformations produce from the original sentence a sequence of tentative kernels, each with its connectors and main grammatical sections marked. At each paragraph division the machine would institute the comparison operation over the kernels of that paragraph (and perhaps with a check of the main kernels of the preceding paragraph)
From page 950...
... More important, as a by-product of analyzing and storing a great many texts it may be possible to collect experience toward a critique of scientific writing and an indication of useful modifications in language and in discourse structure for scientific writing. Science uses more than logic or mathematics but less than language; and in some respects it uses formulations for which language is not very adequate.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.