The Relation Between Completeness and Effectiveness of a Subject Catalogue
C.S.SABEL
Anyone who has frequently undertaken literature searches will have used the references in material already found as a lead to further material relevant to the subject in which the search is being made. This experience suggests that it might be interesting to investigate the material that can be retrieved in this manner in order to see whether too much effort was perhaps being put into aiming at 100% storage of material and at 100% retrieval of the items (regarded as documents) stored.
The preliminary investigation described here analysed the references contained in the documents from three different sources dealing with the same subject (Controlled thermonuclear reactions).
The A.E.R.E. unclassified reports on radioactivation analysis were also examined to see how far the results from these reports agreed with those obtained from the documents on controlled thermonuclear reactions.
Analysis of references
The 98 documents on controlled thermonuclear reactions studied in this investigation were in three classes. These were: (a) 27 Atomic Energy Research Establishment unclassified reports, (b) 20 published articles by A.E.R.E. authors, other than unclassified reports, (c) 51 United States Atomic Energy Commission unclassified reports, listed in a bibliography prepared by the U.S.A.E.C.
For the purpose of this study, these could be regarded as representing, in each class, a 100% sample of the documents dealing with the subject.
The number of times a document was quoted as a reference in other documents is set out in Table 1, which shows, for example, seventy-seven documents were not quoted as a reference in any other document, twelve documents were quoted as a reference in one other document, etc.
C.S.SABEL Atomic Energy Research Establishment, Harwell, England.
TABLE 1 All documents on controlled thermonuclear reactions
No. of documents |
Times quoted as reference |
77 |
0 |
12 |
1 |
3 |
2 |
6 |
3 |
Considering separately within themselves the three classes of documents on controlled thermonuclear reactions, we have Tables 2–4.
TABLE 2 A.E.R.E. unclassified reports
No. of documents |
Times quoted as reference |
19 |
0 |
4 |
1 |
4 |
2 |
TABLE 3 A.E.R.E. author’s papers
No. of documents |
Times quoted as reference |
11 |
0 |
4 |
1 |
5 |
2 |
TABLE 4 U.S.A.E.C. unclassified reports
No. of documents |
Times quoted as reference |
49 |
0 |
2 |
1 |
The references in A.E.R.E. unclassified reports on radioactivation analysis, studied as a comparison, gave the breakdown shown in Table 5.
TABLE 5
No. of documents |
Times quoted as reference |
19 |
0 |
4 |
1 |
0 |
2 |
0 |
3 |
1 |
4 |
A 25th report, a bibliography, was excluded from Table 5 as the presence of a bibliography in a subject field will obviously have a considerable influence on a search, provided its existence is known. In this case, as the bibliography was recent, none of the other reports included it as a reference. The bibliography included 17 of the 24 reports.
Consideration of Tables 1 to 5
The very high proportion of documents which are not included as references in any other document suggests that a complete retrieval from only some of the documents in a given category is unlikely.
The results were a disappointment from the statistical viewpoint in that it was hoped that they would be amenable to an analysis as, possibly, a binomial distribution. In which case p=k/n where p is the probability of one document being quoted in another document, n is the total number of documents, and k is a constant. With a value for k, it would be possible by weighting the above tables with results from tables of the frequency of quotations in documents to arrive at, say, a 95% certainty of obtaining 100% of the documents by choosing some number less than the total number of documents and examining the references within these documents. However, the sample above was not large enough for one to be statistically categorical and the values of k that were obtained were not consistent. Intuitively it does appear to be true that the probability of a document being quoted in another document is inversely proportional to the total number of documents.
Further analysis
In addition, the results were studied to see whether chains of references existed leading from a few to many documents, and to see how far references in one class of document led to documents in another class.
It was found that for the documents on controlled thermonuclear reactions there were no “key” references which led to a large number of others in the subject, and no document was quoted as a reference in each of the three classes of document. In particular, references tended to be restricted to their own class of document. For example, there was no reference in the A.E.R.E. reports or published papers to give entry to the U.S.A.E.C. reports and only one reference in the U.S.A.E.C. reports to an A.E.R.E. published paper.
Conclusions
Tables 1 to 5 show that the references in an incomplete list of documents are unlikely to indicate more than a small proportion of the remaining documents, and hence one is unlikely to be justified in retrieving less than all the documents that can be found from a subject catalogue.
The further analysis of the references also shows that, in the subject studied, there is no possibility of selecting documents for storage which would in themselves indicate most of the material one would wish to retrieve. The results also indicate the necessity, in planning an information service, of ensuring adequate retrieval of all types of relevant documents.
This is only a preliminary survey and further subject fields and different types of documents should be analysed to see if the restriction of references to documents within their own class is as marked in other fields.