Skip to main content

Currently Skimming:

Appendix A: Statistical Metadata Standards - in Detail
Pages 177-218

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 177...
... In the following, we discuss four of them, in this order: Generic Statistical Business Process Model, G ­ eneric S­ tatistical Information Model, Common Statistical Production Archi­tecture, and Common Statistical Data Architecture. All four of these standards are equitable.
From page 178...
... It is a companion standard to the three other UNECE statistical standards described later in this Appendix: Generic Statistical Information Model (GSIM) , Common Statistical Production ­ Archi­tecture (CSPA)
From page 179...
... This is what makes GSBPM generic and, therefore, widely applicable. It provides a standard view of statistical business process, yet it is neither too restrictive nor too theoretical.
From page 180...
... This phase occurs at the first iteration for statistics produced on a regular basis. When improvements are identified in the Evaluate phase, the Design phase may be revisited.
From page 181...
... Using the GSBPM GSBPM is a standard recognized by the international community of national and international statistical offices. It serves as a reference model.
From page 182...
... , on behalf of the international statistical community: https:// statswiki.unece.org/display/GSBPM/Clickable+GSBPM+v5. Reproduced under Creative Commons Attribution 4.0 International License: https://creativecommons.org/licenses/by/4.0/legalcode.
From page 183...
... GSIM is an internationally endorsed reference framework for statistical information developed under the auspices of UNECE, and it is an equitable standard, just as GSBPM is. This generic conceptual framework is designed, in part, to help modernize, streamline, and align the work of official statistics in and across national and international statistical offices or agencies.
From page 184...
... < -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- SAMPLING -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > < -- -- -- -- -- -- -- -- -- -- -- -- -- DATA COLLECTION -- -- -- -- -- -- -- -- -- -- -- -- -- > < -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- ESTIMATION AND DISSEMINATION -- -- -- -- -- -- -- -- -- -- -- -- -- -- -> 6 7 1 2 3 4 5 8 9 10 Review and edit Calculate Specify needs Design survey Construct frame Construct sample Collect data Analyze estimates Disseminate data Archive 184 collected data estimates 9.1 1.1 6.1 7.1 2.1 3.1 4.1 5.1 8.1 Update output Analyze needs and Classify and code Adjust sample Design outputs Obtain frame data Select sample Set up data collection Review estimates and resolve anomalies systems with new alternatives collected data weights estimates 1.2 2.2 4.1.1 6.2 8.1.2 9.2 Establish 7.2 8.1.1 Identify business 3.2 Select initial 4.1.2 5.2 Edit, screen, and Investigate data Produce measurement Impute missing Review and variables and their Refine frame sample from one Validate sample Run collection validate collected estimate dissemination objectives, scope, microdata validate estimates characteristics or more frames data anomalies products and coverage 2.3 7.3 8.1.3 9.3 4.1.3 6.3 1.3 Design and 5.2.1 5.2.2 Derive new Resolve data Manage release of Calculate initial Review and correct Identify concepts manage data Request data Collect data variables and estimate dissemination sample weights collected data classification statistical units anomalies products 2.4 9.4 1.4 5.2.3 5.2.4 6.4 7.4 8.2 Design and 4.2 Market Check data Manage refusals Monitor progress Adjust collected Calculate Apply disclosure and other publishability manage data Refine sample dissemination availability and non-response and response data aggregate weights criteria collection forms products 2.5 8.2.1 1.5 Design and 4.2.1 7.5 Check for Determine 4.2.2 8.2.2 9.5 manage Make periodic Calculate publishability economic / Make sample Review data for Manage data user production adjustments to economic based on pre statistical corrections publishability support workflow and sample estimates determined methodologies control criteria 7.6 8.3 Calculate quality Finalize output data measures 7.7 8.4 Calculate seasonal Produce supporting internal adjustment and documentation factors 7.8 Benchmark estimates Figure A-2  BLS business process model. SOURCE: Adapted from Gillman (2018)
From page 185...
... for each class in the model. Each class corresponds to a useful set of objects that statistical offices should manage.
From page 186...
... , on behalf of the international statistical community: https://statswiki.unece.org/display/gsim/ GSIM+v1.2+Communication+Paper (Figure 1)
From page 187...
... , on behalf of the international statistical community: https://statswiki.unece.org/display/gsim/ GSIM+v1.2+Communication+Paper (Figure 2)
From page 188...
... , on behalf of the international statistical community: https:// statswiki.unece.org/display/gsim/GSIM+v1.2+Communication+Paper (Figure 3)
From page 189...
... SOURCE: United Nations Economic Commission for Europe (UNECE) , on behalf of the international statistical community: https://statswiki.unece.org/display/gsim/ GSIM+v1.2+Communication+Paper (Figure 4)
From page 190...
... , on behalf of the international statistical community: https://statswiki.unece.org/display/gsim/ GSIM+v1.2+Communication+Paper (Figure 5)
From page 191...
... Accidental Architectures When statistical agencies build systems without a standard architecture, the result is that it is very difficult for those systems to communicate with each other, and this results in "accidental architectures." It is also difficult to reuse systems across programs. For example, reusing an editing system written for one program may not work in the computing environment for another; therefore, a new system has to be developed and maintained.
From page 192...
... Indeed, most statistical agencies have very similar statistical life cycles and activities (e.g., imputation, data validation, dissemination, mapping) , and very similar services have been built many times over for the same statistical processes, but those services are very hard to share and reuse.
From page 193...
... The Common Statistical Data Architecture (CSDA) is a reference architecture and set of guidelines for managing statistics data and metadata throughout an agency's statistical life cycle.
From page 194...
... The Data Documentation Initiative (DDI) is a family of statistical metadata standards and other work products.
From page 195...
... 2. Information is • Information is discoverable and • Ready access to information • The organization will foster a culture of accessible usable; leads to informed decision information sharing; • Information is available to all making and enables timely • Information will be open by default; unless there is good reason for response to information needs; • The way information is discovered and withholding it; • Users (internal and external)
From page 196...
... original context; • Data and their related metadata • Connections between data can be easily reused by other objects must be documented; business processes, reducing the • Restrictions to data usage must need to transform or recreate be documented. information; • The dependencies and relationships between data objects can be easily known.
From page 197...
... SOURCE: United Nations Economic Commission for Europe (UNECE) , on behalf of the international statistical community: https://statswiki.unece.
From page 198...
... SOURCE: United Nations Economic Commission for Europe (UNECE) , on behalf of the international statistical community: https:// statswiki.unece.org/download/attachments/314934281/CSDA%20v2.0.pdf?
From page 199...
... Codebook is managed in a directly implementable form in XML.6 DDI3: Lifecycle, version 3.3, is used to describe the entire production cycle for statistical activities -- be they censuses, surveys, or some others -- ­ conducted by national statistical offices. This capability corresponds to the work in U.S.
From page 200...
... Several national statistical offices around the world have chosen to join the DDI Alliance, including BLS. It is expected that adoption or consideration of adopting a DDI standard is the main incentive behind joining the DDI Alliance.
From page 201...
... appears very similar to the set of phases as laid out in GSBPM. This is purposeful, because DDI-Lifecycle is intended to incorporate the survey life cycle in use in national statistical offices.
From page 202...
... . Since the DDI standards were originally developed to help data archives, this is not a surprise, and this phase is provided to address these needs.
From page 203...
... resulted from more demanding requirements uncovered through the use of Codebook, especially as the needs of national statistical offices and support for the statistical survey life cycle were recognized. Support for the phases of GSBPM and the reuse requirements for describing ongoing surveys (and not just one-time studies)
From page 204...
... is the latest in the family of DDI standards, though at this writing it is still in draft form. The final release of the standard is expected in late 2021.
From page 205...
... The need for a cross-domain specification became apparent during this work. Serendipitously, efforts in the statistical community showed the need for combining data from multiple sources, thus the idea for a new standard within the DDI family, this time DDI-CDI.
From page 206...
... The standard was developed by seven international statistical offices and banks: Bank of International Settlements (BIS) , European Central Bank (ECB)
From page 207...
... SDMX was approved as an international statistical standard in 2008. The United Nations Statistical Commission at its 39th session "recognized and supported SDMX as the preferred standard for the exchange and sharing of data and metadata, requested that the sponsors continue their work on this initiative and encouraged further SDMX implementations by national and international statistical organizations."15 SDMX was also approved as a technical specification by ISO/TC154 in ISO TS 17369 in 2013.
From page 208...
... In this context, statistical agencies may be confronted with the question of considering SDMX as a solution for harmonizing and automating their multidimensional data and metadata exchanges with international organizations or within their own organization. For instance, the U.S.
From page 209...
... This can help reduce duplicate data storage, improve metadata quality, and enable linkages between datasets. Benefits of SDMX Due to the similar nature of the statistical activities across all national and international statistical organizations, many face similar challenges.
From page 210...
... . SDMX can also be said to improve coherence through the use of crossdomain concepts, shared code lists, harmonized statistical guidelines, and the extensive reuse of SDMX objects across domains and agencies.
From page 211...
... Other open-source tools are available: Statistical Information System Collaboration Community.Stat Suite, SDMX Reference Infrastructure, and Fusion Registry. It is strongly recommended to consider these tools before developing a new platform.
From page 212...
... To varying degrees, it can work with GSIM and DDI. In particular, it can operate on all the various data structures that GSIM and DDI describe.
From page 213...
... GENERAL METADATA STANDARDS There are a number of highly adopted metadata standards (many from the library and digital curation communities)
From page 214...
... This standard does not have any official standing with any national or international statistical agencies, but it is worth noting that it has been ratified as an ISO standard. With regard to describing datasets, Dublin Core can be used to create very general metadata, but it is not designed to provide much variable-level 18 https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#section-7.
From page 215...
... Consider again the 15 terms in this element set: contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title, type Many of these can be adapted to be used to describe statistical datasets, but the fit may be somewhat awkward. For instance, "coverage" can be used to describe the geospatial and temporal range of a dataset.
From page 216...
... Overall, Dublin Core is an unusual standard to include in a report about statistical data. It is included here not only because it is an important metadata standard, broadly speaking, but also because it is a simple, general standard that may be worth using in tandem with another standard covered here to record metadata at the dataset level.
From page 217...
... Additionally, some statistical metadata standards may be usable in combination with ISO 19115. Statistical agencies with a geographic focus should also review this standard.
From page 218...
... Entities are the physical, digital, or conceptual things that are being created or altered; Agents are the people or software doing the alteration; and activities are the specific actions taken upon an Entity.26 The W3C has developed specifications for encoding the PROV data model in a range of formats (e.g., XML and RDF) , and for using it with a range of existing metadata standards (e.g., Dublin Core)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.