Skip to main content

Currently Skimming:

2 Research Opportunities
Pages 17-43

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 17...
... HUMAN-COMPUTER INTERACTION One of the real challenges associated with federal statistical data is that the people who make use of it have a variety of goals. There are, first of all, hundreds or thousands of specialists within the statistical system who manipulate the data to produce the reports and indices that government agencies and business and industry depend on.
From page 18...
... Workshop participants observed, however, that many are likely to remain without ready access to information online, raising a set of social and policy questions (Box 2.1~. However, over time, a growing fraction of potential users can be expected to gain network access, making it increasingly beneficial to place information resources online, together with capabilities that support their interpretation and enhance the statistical literacy of users.
From page 19...
... The expanding audience for federal statistical data represents both an opportunity and a challenge for information providers. 1Data on user behavior must be collected and analyzed in ways that are sensitive to privacy concerns and that avoid, in particular, tracking the actions of individuals over time (though this inhibits within-subject analyses)
From page 20...
... The goal is to provide not merely a data set but also tools that allow making sense of the data. Today, most statistical data is provided in tabular form the form 3see computer science and Telecommunications Board, National Research Council.
From page 21...
... Workshop participants pointed to the challenge of developing more accessible forms of presentation as central to expanding the audience for federal statistical data. Statistics represent complex information that might be thought of as multimedia.
From page 22...
... The intent is to allow users to manipulate data displays directly in a much more interactive fashion. Some of the most effective data presentation techniques emerging from human-computer interaction research involve tightly coupled interactions.
From page 23...
... An ever-greater fraction of the population has such expectations, affecting how one thinks about disseminating statistical information. DATABASE SYSTEMS Database systems cover a range of applications, from the large-scale relational database systems widely used commercially, to systems that provide sophisticated statistical tools and spreadsheet applications that provide simple data-manipulation functionality along with some analysis capability.
From page 24...
... This concept is simple, but selecting and building the required set of basic statistical operations into database systems and creating the integration tools needed to use a workstation to explore databases interactively are significant challenges that will take time. Statistics-related operations that could be built into database systems include the following: · Data-mining operations.
From page 25...
... Today, object-relational systems make it easier for third parties, as well as sophisticated users, to add both new data types and new operations into a database system. Since it is probably not reasonable to push all of the functionality of a statistical analysis product such as SAS into a general-purpose database system, a key challenge is to identify particular aggregation and sampling techniques and statistical operations that would provide the most leverage in terms of increasing both performance and functionality.
From page 26...
... One pattern identified in the data predicts that when three conditions are met no previous vaginal delivery, an abnormal second-trimester ultrasound reading, and the infant malpresenting the patient's risk of an emergency caesarian section rises from a base rate of about 7 percent to approximately 60 percent.5 Data mining finds use in a number of commercial applications. A database containing information on software purchasers (such as age, income, what kind of hardware they own, and what kinds of software they have purchased so far)
From page 27...
... While existing algorithms can sometimes be scaled up to handle these new types of data, mining them frequently requires completely new methods. Methods to mine multimedia data together with more traditional data sources could allow one to learn something that had not been known before.
From page 28...
... How might one use that very large, heterogeneous collection of data to augment the more carefully collected but smaller data sets that come from statistical surveys? For example, many companies in the United States have Web sites that provide information on current and new products, the company's location, and other information such as recruiting announcements.
From page 29...
... These are more compact and more readable than typical computer code, although some familiarity with the specification language and comfort with its more formal nature are required. As with computer code itself, a description in a specification language cannot readily be interpreted by a nonexpert user, but it can be interpreted by a tool that can present salient details to nonexpert users.
From page 30...
... Finally, as the next section discusses, metadata can be particularly important when one wishes to conduct an analysis across data from multiple sources. INFORMATION INTEGRATION Given the number of different statistical surveys and agencies conducting surveys, "one-stop shopping" for federal statistical data would make statistical data more accessible.
From page 31...
... While the framework provided by the recently developed XML standard, including the associated data-type definitions (DTDs) , offers some degree of promise, work is needed to ensure that effective DTDs for federal statistical data sets are defined.
From page 32...
... Heather Contrino, discussing the American Travel Survey CATI system, observed that if a respondent provides information about several trips during the trip section of the survey and then recalls another trip during the household section, it would be useful if the interviewer could immediately go back to a point in the survey where the new information should be captured and then proceed with the survey. The new CATI system used for the 1995 American Travel Survey provides some flexibility, but more would improve survey work.
From page 33...
... There are significant research questions regarding the implications of different techniques for administering 8The Bureau of Labor Statistics started using this technology for the Current Employment Survey in 1992. See Richard L
From page 34...
... While some of their analysis can be conducted using public data sets, some of it depends on information that could be used to infer information about individual respondents, including microdata, which are the data sets containing records on individual respondents. Statistical agencies must strike a balance between the benefits obtained by releasing information for legitimate research and the potential for unintended disclosures that could result from releasing information.
From page 35...
... Both technical and nontechnical approaches have a role in improving researcher access to statistical data. Agencies are exploring a variety of nontechnical solutions to complement their technical solutions.
From page 36...
... Second, the statistical agencies, to meet the research needs of their users, are being asked to release "anonymized" microdata to support additional data analyses. As a result, a balancing act must be performed between the benefits obtained from
From page 37...
... The issue of disclosure control has also been addressed in the context of work on multilevel security in database systems, in which the security authorization level of a user affects the results of database queries.l3 A simple disclosure control mechanism such as classifying individual records is not sufficient because of the possible existence of an inference channel whereby information classified at a level higher than that for which a user is cleared can be inferred by that user based on information at lower levels (including external information) that is possessed by that 13See National Research Council and Social Science Research Council.
From page 38...
... The degree to which these techniques need to be unique to specific data types has not been resolved. The bulk of the research by statistics researchers on statistical disclosure limitation has focused on tabular data, and a number of disclosure-limiting techniques have been developed to protect the confidentiality of individual respondents (including people and businesses)
From page 39...
... Identification can also be achieved through a combination of less precise spatial attributes (e.g., county, Census block, hydrologic unit, land use) , and care must be taken to ensure that including variables of this sort in a public data set will not allow individual respondents to be uniquely identified.
From page 40...
... A third, more general, issue is how to address disclosure limitation when multimedia data such as medical images are considered. Approaches developed for numerical tabular or microdata do not readily apply to images, instrument readings, text, or combinations of them.
From page 41...
... Protecting against disclosure of confidential information and ensuring the integrity of the collection, analysis, and dissemination process are critical issues for federal statistical agencies. For the research community that depends on federal statistics, a key security issue is how to facilitate access to microdata sets without compromising their confidentiality.
From page 42...
... Discussing the challenges of maintaining the back-end systems that support the electronic dissemination of statistics products, Michael Levi of the Bureau of Labor Statistics cited several demands placed on statistics agencies: systems that possess automated failure detection and recovery capabilities; better configuration management including installation, testing, and reporting tools; and improved tools for intrusion prevention, detection, and analysis. As described above, the federal statistical community is moving away from manual, paper-and-pencil modes of data collection to more automated modes.
From page 43...
... The test of these building blocks is how well researchers and technologists can apply them to understand and address the real needs of customers. While there are a number of unsolved research questions in information security, solutions can in many cases be obtained through the application of known security techniques.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.