National Academies Press: OpenBook

Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies (2022)

Chapter: Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics

« Previous: Appendix A: Statistical Metadata Standards - in Detail
Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

Appendix B

The Role of Metadata in Assessing the Transparency of Official Statistics

INTRODUCTION

New expectations and requirements for finding, using, and understanding data are emerging rapidly, and the U.S. federal statistical community needs to keep pace to stay relevant. Transparency and reproducibility are two of these new general concerns, and this note is an attempt at laying out what they mean and how to achieve them. (In putting this together, we did not first perform a systematic literature review.)

As defined in dictionaries, transparency means “the condition of being transparent,” and transparent means “easy to perceive or detect.” So, for us, data or some other resources are transparent when it is easy to perceive or detect where they are, what they mean, and how to use them.

For our purposes, reproducibility is defined as “the extent to which consistent results are obtained when an experiment is repeated.” The related term, repeatability, means “the closeness of the agreement between the results of successive measurements of the same measure carried out under the same conditions of measurement.” Both these terms have a simpler interpretation in science than in federal statistics; however, they are useful for us as well. An example of reproducibility in official statistics would be to check that the same value of some measure (especially an economic indicator) remains the same after successive applications of the processing over the collected data. This becomes interesting when part of the process involves human judgment or some randomization. Replicability, on the other hand, would entail taking data from a different source representing the same reference period and checking to see if final results are consistent.

Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

For instance, does some measure increase at the same rate as compared to one measured at an earlier time?

Both reproducibility and replicability depend on having access to and understanding of the data and the processing and analytical systems. This means descriptions of the data and the design, collection, processing, and estimation stages are accessible and understandable—in other words, transparent. Therefore, our focus will be transparency. This concept appears to be the more fundamental.

However, this general definition of transparency begs questions. How, in specific circumstances, do we ensure transparency? Is it possible to detect that some system is transparent? We know the answers will depend, in part, on the metadata that are available to a user. Metadata are the descriptive data for a resource, so the ability to find, understand, and use a resource depends on the quality of the metadata available. And here quality is a measure of how well the metadata provide all the information necessary to allow the user to complete the tasks set forth.

NATURE OF THE CONCEPT OF TRANSPARENCY

As defined above, transparency is a fairly general notion or concept. It is easy to exemplify, but difficult to define. By this, we mean that transparency is difficult to characterize in general and easier when applied to specific circumstances. We end up having to characterize transparency for each application, or kind of application. Why? Or better, what can we say to make this more tractable?

Cognitive psychologists have identified at least two kinds of concepts: entity concepts and relational concepts (Gentner and Kurtz, 2005). Entity concepts are easily characterized, so they can be directly measured. For example, car and dataset are entity concepts. Each specific one is easy to describe. Relational or role concepts are not so easily characterized. They are more general in nature and have to be refined, or specialized, in order to make them characterizable. An example is the word guest. One can be a guest at a party, a guest in a hotel, or a guest user on a secure Web site. These specific cases do not have much in common, and they are each characterizable.

In federal statistics, employment is an example of a relational concept. Employment by itself is too broad to characterize, but specific areas of employment can be characterized. Compensation, requirements, and employer costs, for example, are refinements that are more easily characterizable. Federal statistics have many such examples, and the concept of transparency has the same nature. In each situation where the issue of transparency comes up, it appears we need to characterize it in its own way.

Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

METADATA SCHEMAS

Characterizing a concept provides a means to determine if a specific object or situation meets those criteria, that is, if the object or situation corresponds (as described further below) to the concept. We can define a tennis ball as a ball with certain specific features—pneumatic but not inflatable, specific colors, specific smell, fuzzy outside surface, 2.67 inches in diameter plus or minus some tolerance, weight of 2.04 ounces again subject to a tolerance, etc. So, if one is presented with a ball, it is pretty easy to determine if it is a tennis ball by examining the ball’s properties.1 That is, determining if a specific ball corresponds to being a tennis ball is easy.

The properties of a tennis ball, and those of any object in general, are descriptive of the object. Given the definition of metadata—data being used to describe some object(s)—the properties of a specific object are metadata.

The characteristics of a concept2 are the categories to which the properties of objects that correspond to the concept belong. Tennis balls must weigh between 1.98 and 2.10 ounces. This is a characteristic. A particular tennis ball might weigh 2.01 ounces, and this is a property of that ball. Note, the weight of this ball (the property) corresponds to what a tennis ball needs to weigh (the characteristic).

The properties are metadata, and their corresponding characteristics are elements of a schema for that metadata. Each characteristic has a set of properties that correspond to it. For instance, the weight of a tennis ball can be any of the values between 1.98 and 2.10 ounces. These properties form the set of allowed values (or value domain) for the element resulting from the characteristic. So, if a particular concept is characterizable, these characteristics lead directly to a metadata schema. The elements and constraints of the schema turn the schema into a specification.

SPECIFICATIONS

A specification is a set of expressions called provisions. This is adapted from ISO/IEC Guide 2.3 We define provision, and related terms, as follows:

___________________

1 Properties differentiate objects from each other.

2 Characteristics differentiate concepts from each other.

3https://ec.europa.eu/eurostat/cros/system/files/ISO%20reference%20definitions%20-%20%20guide%202%20-%202004%20-%20rev.doc.

Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
  • provision
    • expression in a specification that takes the form of a statement, an instruction, a recommendation, or a requirement4
  • statement
    • expression that conveys information
  • instruction
    • expression that conveys an action to be performed
  • recommendation
    • expression that conveys advice or guidance
  • requirement
    • expression that conveys criteria to be fulfilled.

For our tennis ball example, each characteristic or element is described in Table B-1, below. These descriptions contain provisions. For example, the optionality column determines whether an element is required (requirement), optional (recommendation), or conditional (instruction, “if label present”; and requirement). The conditional column contains instructions and requirements (“weight must be between 1.98 and 2.10 ounces”).

The notion of conformance is used to determine and claim that a specification is being followed properly. Loosely speaking, conformance is the situation under which an object adheres to a specification. More precisely, conformance is defined as follows: fulfilment by a product, process, or service of specified requirements.

CONFORMANCE

It is easy to think that the only consideration in determining conformance is whether the requirement provisions are fulfilled, but that would be incomplete. In the tennis ball example below, another expression to determine whether a ball is a permissible tennis ball is to make sure the height of the bounce is between 53 and 58 inches after the ball is dropped onto a concrete floor from a height of 100 inches. Many of the provisions in that condition are instructions and statements, as we have seen.

So, through conformance claims it is possible to know formally if some specification is being followed. In the tennis ball example, we have a metadata schema in the form of a specification (ignoring color, smell, and fuzziness), as seen in Table B-1.

A ball is a permissible tennis ball if it satisfies all the requirements set forth in the schema above, i.e., the ball conforms to the specification. The

___________________

4 These types of provisions are distinguished by the form of wording they employ; e.g., instructions are expressed in the imperative mood, recommendations by the use of the auxiliary “should,” and requirements by the use of the auxiliary “shall.”

Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

question of whether a tennis match is being played according to the rules comes down to whether the court, racquets, balls, and net conform to the specifications for each.

TABLE B-1 Elements

Element name Conditions Optionality Test or other criteria
Weight 1.98 < w < 2.10 oz Required
Diameter 2.57 < d < 2.70 in Required
Bounce 53 < b < 58 in Required Dropped 100 inches onto concrete
Label Brand name Optional
Digit Single-digit numeral Conditional, if label present Digit appears below the label

NOTE: This is not a complete list, and the optionality is modified for the purposes of illustration.

However, this schema provides multiple ways to conform. These options for conformance are known as varieties. In our example, the varieties arise, for instance, because of the “label” element. It is optional, meaning a ball can conform whether a label is printed on the ball or not. But, if there is a label on the ball, there must also be a digit printed on the ball as well. That digit must appear below the label. The solution is to state which optional elements are selected.

TRANSPARENCY

As the previous sections of this document describe, there are a series of considerations needed to ensure transparency in federal statistics. The important ones are these:

  • Identify which aspect of the statistical business life cycle needs to be transparent to users;
  • Identify the relevant information needed for the user;
  • Either identify (from an existing specification or standard) or build the needed elements that can hold this information; and
  • Create conformance criteria for this set of elements.

The considerations above embed the idea that transparency is very much a case-by-case problem. Since the sets of metadata elements needed to describe each part of the statistical business life cycle differ, sometimes markedly, the case-by-case approach is further justified.

The relationship between the information needed for transparency and the set of metadata elements in a specification supporting it is key. If the set of metadata elements is not complete in the sense of the information

Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

TABLE B-2 Elements for Describing Variables

Element name Conditions Optionality Test or other criteria
name text required
universe text required
question text required, if result of interview result of answered question
derivation formula required, if calculated from other variables formal language

needed, the required information will not all be available. We illustrate this with a simplified example in Table B-2 for describing a variable.

In the example in Table B-2, variables are described without enumerating or providing a rule for the allowed values. A simple example of an enumerated set of allowed values is {<M, male>, <F, female>} for a variable for “sex of a person.” An example of a rule describing allowed values is {0 < = age < 100} for a variable “age in years of a person” (in this case, age is top-coded to 99).

Conforming to the specification laid out in Table B-2 does not make the information about variables transparent. There is too much missing information. The set of elements does not match all the information one needs to understand a variable.

One interesting addition to the story about conformance is the distinction between conformance and strict conformance. A system strictly conforms to a specification if it satisfies all the requirements and no others. Conformance, by itself, does not indicate whether this is the case.

A system conforming to the specification in Table B-2 could still be transparent if one extends the information provided by adding metadata elements. One could transform Table B-2 into Table B-3, with the extended element names in italics, where the last two rows are kinds of allowed

TABLE B-3 Extended Elements for Describing Variables

Element name Conditions Optionality Test or other criteria
name text required
universe text required
question text required, if result of interview result of answered question
derivation formula required, if calculated from other variables formal language
allowed values text required one of 2 kinds
<enumerated> if applicable set of ordered pairs <code, meaning>
<described> if applicable rule, formally written
Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

values. Thus, the interpretation of the enumerated kind is further described in the second row and the described kind in the third row.

Now, Table B-3 provides transparency, provided the only necessary but missing information was the allowed values. Assuming this, all relevant information is provided in the metadata elements in the specification.

Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

This page intentionally left blank.

Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 219
Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 220
Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 221
Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 222
Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 223
Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 224
Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 225
Suggested Citation:"Appendix B: The Role of Metadata in Assessing the Transparency of Official Statistics." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 226
Next: Appendix C: Public Meeting Agendas »
Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies Get This Book
×
 Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies
Buy Paperback | $35.00 Buy Ebook | $28.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Widely available, trustworthy government statistics are essential for policy makers and program administrators at all levels of government, for private sector decision makers, for researchers, and for the media and the public. In the United States, principal statistical agencies as well as units and programs in many other agencies produce various key statistics in areas ranging from the science and engineering enterprise to education and economic welfare. Official statistics are often the result of complex data collection, processing, and estimation methods. These methods can be challenging for agencies to document and for users to understand.

At the request of the National Center for Science and Engineering Statistics (NCSES), this report studies issues of documentation and archiving of NCSES statistical data products in order to enable NCSES to enhance the transparency and reproducibility of the agency's statistics and facilitate improvement of the statistical program workflow processes of the agency and its contractors. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies also explores how NCSES could work with other federal statistical agencies to facilitate the adoption of currently available documentation and archiving standards and tools.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!