Determining Requirements for Atomic Energy Information from Reference Questions1
SAUL HERNER and MARY HERNER
In August of 1956, under a grant from the National Science Foundation, the writers undertook a study to determine the relative efficacy of tailor-made classification systems as a basis for coding and organizing scientific information by manual and mechanical means. One phase of this study involved the design of a special classification scheme for libraries in the atomic energy field. As a basis for developing logical categories by which the literature of atomic energy and related subjects might be broken down, a representative number of typical instances in which actual users of this literature asked questions of it were collected. These typical instances took the form of approximately 5000 reasonably current reference questions received by 14 atomic energy research and reference organizations in the United States. In addition to furnishing some insight into how typical users categorize the information contained (or which they hope is contained) in atomic energy literature, the analysis of reference questions was used to determine the working language and idiom of these users.2
At a certain point in the course of the analysis of the reference questions, it became clear that they might also be used to define the information requirements of users of atomic energy information. Obviously, a reference question is an expression of a need for information. If a statistical quantity of such expressions of need could be collected and categorized, it would seem to follow that they could furnish a basis, or at least a partial basis, for defining a group’s
SAUL HERNER and MARY HERNER Herner and Company, Washington, D.C.
1 |
The work discussed in the present paper was supported in part by a grant from the National Science Foundation. |
2 |
A paper by Fred R.Whaley, “Retrieval Questions from the Use of Linde’s Indexing and Retrieval System,” which has been placed in Area 4, also contains an analysis of reference questions. |
information requirements, provided that they were collected in such a way as to characterize the intellectual level and subject interests of the entire spectrum of users. On the basis of the foregoing logic, a small tangent was taken from the study of classification systems, and an auxiliary study was made of the reference questions for their own sake.
The auxiliary study was undertaken with the understanding that the method being used has its limitations and pitfalls, as do all other methods for analyzing and defining people’s problems. Reference questions received by libraries and similar reference agencies do not encompass all the problems within organizations or among groups of workers which require searches for information. It is probable that the majority of problems requiring access to information which does not exist immediately within one’s own mind are solved either with no recourse whatever to formal tools such as libraries, or by using the publications in libraries, with no direct reference assistance from librarians.
Thus, a study of reference questions, such as the present one, is actually a study of a certain, narrow type of information requirement: one which leads or permits the information seeker, for one reason or other, to place the task of getting the information he needs in the hands of a person or group outside of himself. But, whatever the motive behind reference questions, their study on a quantitative and qualitative basis would seem to be a means of defining those information requirements which reference librarians and other information specialists are likely to be called upon to meet. The problem is to collect such questions in sufficient quantities as to be statistically significant and representative.
Method of study
The reference questions in the present study were collected over a one-year period, beginning in the fall of 1956 and ending in the fall of 1957. The method of collection was as follows. Through the medium of a meeting conducted by the Technical Information Service of the United States Atomic Energy Commission, the Commission’s national laboratories and other prime contractors were asked to collect their current reference questions and to forward them to the authors’ firm. In order to facilitate the transcription and forwarding of questions, each of the participating organizations was given a supply of specially designed forms on which they were to fill in, for each question, the name of the organization receiving the question and the question itself, exactly as it was received. The participating organizations were asked to forward all questions received, regardless of whether they were technical or non-technical.
The foregoing method of collection produced a total of 4696 questions from 14 institutions and organizations. In addition to Atomic Energy Commission prime contractors, the cooperating institutions and organizations included the U.S. Department of Commerce Office of Technical Services, which is a primary disseminator of atomic energy information to individuals and organizations outside the sphere of Atomic Energy Commission contractors. The full list of cooperating institutions and organizations is given in Appendix I.
At the end of the collection period, the questions were categorized first as to subject, second as to number of concepts, and third as to the logical relationship of the concepts to one another, in cases where questions involved two or more concepts.
Of the 4696 questions collected, 3851, or 82.0 per cent, were scientific or technical, involving one or more of the natural or engineering sciences, and 845, or 18.0 per cent, were non-technical. It has been suggested that nontechnical questions are often received on such a routine or casual basis that a significant proportion may have not been recorded and forwarded in the present study. Thus, the number of non-technical questions may in reality be somewhat higher than is indicated. However, the relative proportions of kinds of non-technical questions would appear to be valid, as would the relative proportions of the various kinds of technical questions.
The nature of the reference questions
The relative proportions of the various categories of non-technical questions are given in Appendix II. The most striking characteristic of these categories is the sharp falling-off after the first three. These, along with their relative percentages, were the following. Business and management techniques (31.3 per cent), Buyers’ information and prices and business and commodity statistics (29.2 per cent), and information about institutions and organizations (16.4 per cent). These first three categories constituted 77 per cent of the non-technical questions, indicating the formidable role of business and related matters in atomic energy research and development programs.
The relative proportions of the various categories of scientific and technical questions are given in Appendix III. In the case of the scientific and technical questions, there was slightly more scatter among the categories, with the decrease from the highest proportions to the lowest a good deal more gradual than in the case of the non-technical questions. Nevertheless, the first three of the twelve categories of scientific and technical questions constituted two-thirds of the total. These first three categories, and their respective percentages of the
total number of scientific and technical reference questions, are as follows: Description of a process or method of procedure (25.5 per cent), physical, chemical, and engineering properties of substances (24.6 per cent), and description of apparatus or equipment (16.8 per cent). In view of the period during which the questions were collected for the present study, the relatively small interest in radiation effects (2.9 per cent) is rather surprising. But this in itself is perhaps an indication of the advantages of quantitative measurement of requirements as opposed to the use of one’s imagination.
The conceptual structure of the reference questions
Another means by which information requirements may be determined and expressed is in terms of the number of concepts contained in reference questions and how these concepts relate to one another logically. This type of analysis has significance primarily in cases where reference questions are put to storage and retrieval systems capable of performing correlations involving two or more concepts in their searches. As a rule, it is more difficult for a retrieval system, whether manual or mechanical, to perform a search on a subject involving a number of concepts than it is to perform one involving only one concept; and, as the numbers of concepts in questions increase, the searching difficulty increases.
The question of the numbers of concepts contained in reference questions, as well as the relationship of these concepts to one another, has been a subject of broad discussion recently, particularly as interest in mechanical storage and retrieval devices has heightened. It occurred to the authors that their accumulation of 3851 reasonably verbatim technical reference questions presented an interesting opportunity for measuring conceptual complexity on a statistically valid basis. In order to take advantage of this opportunity, the number of discrete concepts in each of the technical reference questions was counted and tabulated. For purposes of the count, a very strict definition was placed on “discrete concepts,” which were taken to mean those significant concepts in a question which could not be subdivided without changing their essential meanings. Thus, the question, “Give me information on engineering in nuclear reactors,” was taken to contain two concepts, “engineering” and “nuclear reactors,” and the question, “Has a hydrogen-fluorine torch been invented?” was considered to contain one concept, “hydrogen-fluorine torch.” In the case of the second question, the word “invented” was not counted as a concept because it was not considered significant to the meaning or understanding of the information that was wanted.
Using the foregoing definition and limitations, a count was made of the concepts contained in the 3851 questions. The results were as follows:
No. of Concepts |
1 |
2 |
3 |
4 |
5 |
6 |
No. of questions |
466 |
1818 |
1167 |
327 |
73 |
0 |
Per cent of questions |
12.1 |
47.2 |
30.3 |
8.5 |
1.9 |
0 |
In general, the foregoing statistics are in agreement with the results of previous studies of this kind, with the bulk of reference questions containing three or less concepts. This apparent limit on the number of concepts likely to be encountered in reference questions is one which should be taken into account in the design of retrieval mechanisms and capabilities in information storage and retrieval systems.
Another question to be pondered in the design of retrieval systems, if they are to reflect actual user requirements, is that of the logical relationship of the concepts contained in a reference question. In order to determine the relationship of these concepts to one another, an analysis and count was made of the numbers of questions in which the concepts constituted logical sums (where the requestor would settle for information about concept A or concept B), or logical products (where the requestor had to have information about concept A and B), or logical differences (where the requestor was interested in concept A, but not concept B).
The results of these counts were as follows:
|
Logical products |
Logical sums |
Logical differences |
Totals |
Number of questions (with more than one concept) |
3773 |
45 |
33 |
3851 |
Per cent of questions (with more than one concept) |
98.0 |
1.2 |
0.8 |
100 |
Here again, we have a demonstration of the usefulness of studies of actual demands made on information systems in determining how these systems should be set up. While much is made in the literature of the need for retrieval systems and devices that can handle logical sums, products, and differences, it develops, in the present case at least, that the vast majority of questions involve logical products and that questions involving logical sums or differences are relatively rare. Thus, a system that could only handle logical products would be entirely adequate in the present case.
It is, of course, entirely possible that a question may actually be a combination of a number of questions, each of which might involve logical sums, products, or differences. Such questions are frequently discussed by writers in storage and retrieval systems. However, in the present case, from the count
that was made of the concepts in the questions studied, it would appear that reference questions involving such complexities are more likely to be more in the realm of hypothesis than reality. Here again, the advantages of analyzing user requirements before designing a storage and retrieval system are demonstrated.
Conclusions
From the foregoing discussion, it is evident that useful data on the information requirements of a body of users can be obtained from collecting and analyzing statistical quantities of their reference questions. It is entirely probable that the results of a study such as the present one, based on questions from workers in a field other than atomic energy, would produce results quite different from those obtained in the present case. It is also possible that a study of reference questions in the atomic energy field, conducted at some future time, would also produce results at variance with the present ones, first, because the field and the interests of workers in the field are bound to change, and, second, because the capabilities of storage and retrieval mechanisms are likely to improve and offer the searcher greater opportunities for reference help.
APPENDIX I. Organizations and institutions cooperating in the collection of reference questions
Oak Ridge National Laboratory Library, Oak Ridge, Tennessee
Sandia Corporation, Albuquerque, New Mexico
National Reactor Testing Station, Phillips Petroleum Company, Idaho Falls, Idaho
Atomics International, Canoga Park, California
National Lead Company of Ohio, Cincinnati, Ohio
Technical Library, Atomic Energy Commission, Washington, D.C.
University of California (Berkeley) Radiation Laboratory
Union Carbide Nuclear Company, Oak Ridge Gaseous Diffusion Plant Library, Oak Ridge, Tennessee
Technical Information Service Extension, Atomic Energy Commission, Oak Ridge, Tennessee
Brookhaven National Laboratory, Upton, New York
U.S. Department of Commerce Office of Technical Services
University of California (Los Angeles) Atomic Energy Project
General Electric Company, Hanford Laboratories Operation, Richland, Washington
E.I. du Pont de Nemours and Company, Explosives Department, Atomic Energy Division, Aiken, South Carolina
APPENDIX II. Categories of non-technical questions collected
|
No. of questions |
Per cent |
Business and management techniques |
265 |
31.3 |
Buyers’ information and prices: business and commodity statistics |
247 |
29.2 |
Information about institutions and organizations |
139 |
16.4 |
Documentation and communication techniques |
28 |
3.3 |
Spelling: non-technical definitions: identification of quotations |
24 |
2.8 |
Meeting programs |
22 |
2.6 |
Popular information on atomic energy |
21 |
2.5 |
Safety statistics: general safety programs |
20 |
2.4 |
Education and training |
16 |
1.9 |
Laws and regulations |
15 |
1.8 |
History: dates |
14 |
1.7 |
Requests for bibliography of specific author |
13 |
1.5 |
Geographical information |
12 |
1.4 |
Biographical data |
9 |
1.1 |
Totals |
845 |
99.9 |
APPENDIX III. Categories of technical questions collected
|
No. of questions |
Per cent |
Description of a process or method of procedure |
969 |
25.5 |
Physical, chemical, and engineering properties of substances |
953 |
24.6 |
Description of apparatus or equipment |
651 |
16.8 |
Physical and chemical constants |
635 |
16.4 |
Biological effects of substances: Hazards: Toxicology |
225 |
5.8 |
Radiation effects |
112 |
2.9 |
Materials for specific applications |
101 |
2.6 |
Composition of materials |
54 |
1.4 |
Standards and specifications |
46 |
1.2 |
Technical definitions |
46 |
1.2 |
Description of meteorological or geological phenomena |
39 |
1.0 |
Mathematical constants and methods |
20 |
0.5 |
Totals |
3851 |
99.9 |