Skip to main content

Currently Skimming:

3 Characteristics of Scientific and Technical Databases
Pages 6-100

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 6...
... What are the main issues in developing those products? Are you the only source of all or some of your data products?
From page 7...
... GEOGRAPHIC DATA PANEL MR. ONSRUD: My name is, again, Harlan Onsrud with the Department of Spatial Information.
From page 8...
... Ib. What are the main incentives for your database activities (both economic and other)
From page 9...
... The USGS NSDI node encompasses a distributed set of sites organized on the basis of the USGS's four principal data themes biological resource information, geological information, national mapping information, and water resources information. (See for additional information.)
From page 10...
... (2) Reproduction and distribution costs, with the primary cost drivers being customer service, order taking, accounting, and order fulfillment: These cost drivers are funded by congressional appropriations for legislatively required distributions; all other distributions are funded through cost reimbursement fees.
From page 11...
... The USGS products, information, and services are based on or support natural science data and include the following formats: publications (professional papers, circulars, and general interest) , both in electronic and hard cony forms: fact sheets: digital data: mans (including geologic.
From page 12...
... If not, please describe the competition you have for your data products and services. The USGS is not the only source of many of its data products, although it produces some specific research products that can be found only at the USGS The National Water Information System is a unique national database .
From page 13...
... These fees pursue full recovery of costs, including indirect costs such as depreciation of equipment. USGS information products are in the public domain, carry no copyrights, and may be used and shared freely.
From page 14...
... The principal sources of funding for USGS database activities are congressional appropriations, interagency cooperative agreements (other federal agencies, and state and local agencies) , and joint funding arrangements for geospatial data collection, analysis, and interpretation.
From page 15...
... fib. What problems have you had with legal protection of your own database activities and what are some examples of harm to you or misuse of your data thatyou have experienced, if any?
From page 16...
... potential difficulties in cultivating international partnerships due to WIPO-induced restrictions. Both of these problems will be encountered by any federal agency attempting to provide access to data and information about threatened and endangered species or attempting to partner internationally.
From page 17...
... fib. What are the main incentives for your database activities (both economic and other)
From page 18...
... If not, please describe the competition you have foryou data products and services. Yes, we are the only source for some of the integrated-site data products.
From page 19...
... 7a. What are the principal sources offending for your database activities?
From page 20...
... GeoSystems is one of the many commercial firms perhaps one of the more successful firms, that has been taking government geographic data and commercial ~en~rar~hic data and adding value to create services and products that theY then make available to other businesses, as well as to the general consuming public. ~^ -I a' Commercial Data Activity
From page 21...
... It provides digital and multimedia cartography, geographic database development, and comprehensive map and data maintenance through the application of digital and databasedriven cartographic techniques. In addition, it offers map-publishing systems, as well as advanced mapping technology and consultation services to clients.
From page 22...
... Business information content is layered within geographic databases that cover the entire world. For Web sites.
From page 23...
... public domain including governmentproduced maps, digital geographic databases, remotely sensed imagery, and miscellaneous published data/information. Secondary data sources include commercial and non-U.S.
From page 24...
... If not, please describe the competition you have for your data products and services. GeoSystems is the only source for some of the databases used in our products and services.
From page 25...
... Some software products are sold via retail channels (consumer CD-ROMs) ; however, most are sold to corporate customers (such as the airline reservation systems, car rental agencies, real estate database companies, hose!
From page 26...
... 7a. What are the principal sources offending foryour database activities ?
From page 27...
... Probably the major problem area we have faced is in dealing with governments outside the United States that have a particularly restrictive approach to geographic databases. In the extreme, these restrictions can sometimes mean that all government map data are considered sensitive and not releasable to outsiders.
From page 28...
... Some form of legal protection that would proscribe the unauthorized copying and revenue-generating use of a commercial geographic database and derived products such as maps, is necessary for our company to have future viability. Since we pay significant license fees to third-party data providers and also spend literally millions of dollars on creating, enhancing, and maintaining data, we would be at a significant cost disadvantage if, through unauthorized use, competitors could offer similar products and services.
From page 29...
... There are those who believe that geographic data should be a public good and that even databases such as the NavTech database really ought to be taken over by the federal government and put into the public domain. This is precisely due to the business economics discussed above, that is, the necessity to keep the price for the data high, which will limit its potential use in many applications that could benefit from it.
From page 30...
... have more closely emulated the federal public-domain philosophy. Again, some form of consistency here would be very useful to the industry.
From page 31...
... So, in other words, the public domain comes to you and says, "Well, now we need some help from you." Do you have a differentiated pricing policy? Do you have some kind of a two-tiered product or price discrimination that would favor the public-domain users?
From page 32...
... Government Data Activity James Ostell, National Center for Biotechnology Information Response to Committee Questions Provide a descrzpizon of your organization and database-related operations. The National Center for Biotechnology Information (NCBl)
From page 33...
... A sampling of current research projects includes detection and analysis of gene organization, repeating sequence patterns, protein domains, and structural elements; creation of a gene map of the human genome; mathematical modeling of the kinetics of HIV infection; and analysis of effects of sequencing errors for database searching, development of new algorithms for database searching and multiple sequence alignment, construction of nonredundant sequence databases, mathematical models for estimation of statistical significance of sequence similarity, and vector models for text retrieval. Additionally, NCBI investigators maintain ongoing collaborations with several institutes within the NTH and with numerous academic and government research laboratories.
From page 34...
... Our international database collaborators, DDBI and EMBI,, also receive data from individual scientists, and the three databases exchange data nightly. DNA and protein sequence data submission is done voluntarily by the scientific community.
From page 35...
... NC:Bl organized and distributed the sequence data to the mapping centers, and the centers carried out the mapping using a consistent set of radiation hybrid reagents and methodologies. NCB} then developed the database and retrieval systems that provide access to the integrated human gene map, with links to the original source mapping organizations for more detailed information.
From page 36...
... From a technical standpoint, NCB! has facilitated the data submission process by developing two easy-to-use software packages for the preparation of sequence database submissions and by making these available free of charge.
From page 37...
... Each database receives, processes, and maintains data submissions independently, so each database does maintain control over a unique set of sequence submissions. However, the sequence data processed at each of the three databases is exchanged on a daily basis, so that all three databases provide access to essentially the same universe of DNA and protein sequence information.
From page 38...
... Sa. Have you encountere~problemsirom unduly restrictive access or use provisions pertaining to any external source databases?
From page 39...
... Normally, there is a process with the sequence data, where the journals require a section number from the public database in order to show that as supporting evidence for the paper. So, obviously, you can obtain a copyright for the paper, but once the;sequence is deposited in the public database, the public database has a policy of no restrictions.
From page 40...
... DR. OSTELL: Yes, a large number of them republish sequence data, sometimes in the context of, say, a software tool, or a set of analysis software tools.
From page 41...
... Because the rich data resources for biology are largely in the public domain, they have become important testbeds for advances in information technology not readily available elsewhere. A growing trend, which will surely impact ready access to vital information, is the commercialization and restrictive licensing of formerly freely distributed data resources.
From page 42...
... GATA integrates information from DNA and protein sequence databases, gene mapping databases, literature information retrieval systems, and genetics databases among others. EpoDB, which is a prototype framework for building deep coverage databases for a specific problem of interest to biologists.
From page 43...
... If not, please describe the competition you have for your data products and services. The competition largely comes from other research groups, although there are some areas where the difficulty and complications in accessing commercial data has forced us to re-create some of these products.
From page 44...
... 84. What specific legal or policy changes would you like to see implemented to help address the problems identified above?
From page 45...
... Myra Williams is the president and the chief executive officer of Molecular Applications Group. Commercial Data Activity Myra Williams, Molecular Applications Group Response to Committee Questions la.
From page 46...
... They include mining existing databases to extract relevant information for analysis as well as developing value-added databases, which include information extracted from other sources. Some of our database activities are required for us to conduct research in a proprietary environment.
From page 47...
... Our databases are populated with information derived from numerous different sources on the World Wide Web. If legislation should be passed that makes the creation of derivative databases illegal, all of our database activities as well as our current software products would have to be removed from the market.
From page 48...
... DiscoverYBase _ Molecular Applications Group developed DiscoveryBase_ for internal use to duplicate the primary information services that (3eneM~neiM accesses on the lnternet. lh1S server provides us with a secure, stable environment to support our projects and our research programs.
From page 49...
... Certainly, an overwhelming issue would result from a change in copyright law, which limits our ability to extract data from multiple sources and to add value to those data through the use of our proprietary technology.
From page 50...
... If not, please describe the competition you have for your data products and services. The bioinformatics market is a highly competitive one with new companies being announced almost weekly.
From page 51...
... 9. Do you believe the main problems/harriers/issues you have described above are representative of other similar data activities in your discipline or sector?
From page 52...
... funding from the German government. Thus, the privatization of information that used to be in the public domain is something that has been .
From page 53...
... CHEMICAL AND CHEMICAL ENGINEERING DATA PANEL DR. SAXON: In my previous life I was doing research in chemistry, possibly making some contributions and certainly using the products of some of our speakers.
From page 54...
... In 196S, NIST established its formal program on data evaluation, the Standard Reference Data Program, in response to congressional legislation to ensure that "critically evaluated data is available to scientists, engineers, and the general public." The program built upon a decades-long NIST tradition of data evaluation in thermochemistry, thermophysics, and atomic spectroscopy. Today, the Standard Reference Data Program, together with the NIST Measurement and Standards Laboratories, coordinates on a national level the production and dissemination of critically evaluated reference data for the physical sciences and engineering.
From page 55...
... "to provide or arrange for the collection, compilation, critical evaluation, publication and dissemination of standard reference data." It empowers the Department to recover the costs of producing and disseminating reference data and to copyright, on behalf of the United States, standard reference data prepared or made available under the Standard Reference Data Act. Evaluated chemical data are important in diverse areas, including research and development, process and product design, energy efficiency, chemical analysis and identification, custody transfer, and safety, health, and the environment.
From page 56...
... Chemical Abstracts Service registry numbers, experimental conditions, and uncertainties; putting data and auxiliary information in a common electronic format; evaluating the data; developing models to represent the data within their uncertainties; packaging the data in electronic form with appropriate tools for accessing, displaying, and using the data; distributing the data; and providing technical support. Many of these activities are ongoing and highly labor intensive.
From page 57...
... The major cost drivers are evaluation and selection and acquisition of relevant papers from the literature and extracting the data N ST Chemists WebBook In the WebBook, NIST primarily makes aIready-existin~ data collections available over the lnternet. (post drivers of this database Include packaging alreadyexisting data in an electronic database with appropriate tools for accessing, displaying, and using the data; and converting existing data and auxiliary information to a common electronic format.
From page 58...
... NIST Chemistry WebBook The NIST Chemistry WebBook is NIST's first large-scale effort to make its major collections of the~ochem~cal, thermophysical, and spectral reference data for industrially important chemicals available over the Internet. In two Years the WebBook has become by far the most comprehensive source of chemical reference data available on the Web, with data for almost 32,000 chemical species.
From page 59...
... NTST distributes its data products using a variety of methods/formats determined primarily by customer needs. The Standard Reference Data Program is the central point of contact for all electronic databases available from NIST.
From page 60...
... NIST will continue to distribute its data products in a variety of forms driven by customer needs. However, we can expect that the Tnternet will continue to grow rapidly as a method/format for distributing chemical and chemical engineering data and for communicating and exchanging data with users and with other data activities around the world.
From page 61...
... NIST electronic databases are available for sale to any interested party through the NIST Standard Reference Data Program and from secondary distributors who have entered into licensing agreements with NIST. All NIST databases include the following copyright statement: "(~)
From page 62...
... For example, MIST would have one underlying basis if it tried to recover the costs of collection, compilation, evaluation, publication, and dissemination of standard reference data to the extent practicable and appropriate for each data product, and quite another if it tried to recover no costs at all. In the former case, NIST could make some data products available for free (or for a nominal fee)
From page 63...
... He is the director of information industry relations for the Chemical Abstracts Service, which is part of the American Chemical Society and a service that ~ think is one of the original and large bibliographic databases. Nof-for-Prof;t Data Activity James Lohr, Chemical Abstracts Service Response to Committee Questions la.
From page 64...
... 3. What are the main cost drivers of your operations?
From page 65...
... 4c. Are you the only source of all or some of your data products?
From page 66...
... The two main CAS databases, Chemical Abstracts File and Registry, enjoy copyright protection. Also, as noted above, electronic access to the data in these files is frequently governed by agreements that further protect CAS's interests.
From page 67...
... Sd. What specific legal or policy changes would you like to see implemented to help address the problems z~entified above?
From page 68...
... Commercial Data Activity Leslie Singer, Institute for Scientific Information Response to Committee Questions Provide a description of your organization and related database activities. The Institute for Scientific Information (ISI)
From page 69...
... Ib. What are the main incentives for your database activities (both economic and other)
From page 70...
... database with new source matenals involves three key steps of cataloguing the new journal issues or books, capturing the bibliographic and other ISI data from the source materials, and verifying the integrity of the data and database after the source data has been captured. Publication Processing Cataloging Many functions parallel those in libraries.
From page 71...
... The main cost drivers are volume of materials and the labor required to support the translations, data capture, database support, quality assurance, data extraction and dissemination, and search and retrieval software support.
From page 72...
... . Arts and Humanizes Data Capture Cited references in the humanities are notorious for being incomplete.
From page 73...
... :ndex@, Social Sciences Citation Index@, Web of ScienceSM) and specialty citation indexes (Biochemistry & Biophysics Citation Index_, Biotechnology Citation Index_, Chemistry Citation Index_, CompuMath Citation Index@, Materials Science Citation Index@, and Neuroscience Citation Index_~; · Current awareness products such as Current Book Contents@; Current Contents including Current Contents ConnectiM, and (Jurrent (contents editions (Agnculture, biology & Environmental Sciences; Arts & Humanities; Clinical Medicine; Engineering, Computing & Technology; Life Sciences; Physical, Chemical & Earth Sciences; Social & Behavioral Sciences)
From page 74...
... If not, please describe the competition you havefor your data products and services. IST or its authorized agents are the sole sources of its propriety search and retrieval software combined with a unique scholarly multidisciplinary database.
From page 75...
... 7a. What are the principal sources offending for your database activities ?
From page 76...
... O - ~ fib. What problems have you had with legal protection of your own database activities and what are some examples of harm to you or misuse of your data that you have experienced, if any?
From page 77...
... 9. Do you believe the main problems/harriers/issues you have described above are representative of other similar data activities in your discipline or sector?
From page 78...
... One of our competitors-and maybe it is not a complete overlap is certainly the National Center for Biotechnology Information (NCBl)
From page 79...
... Principal sources are the observational networks of the National Weather Service, the international World Meteorological Organization (WMO) Global Telecommunications Network exchanges through the World Data Center system, NASA, bilateral agreements with other countries, and special collections gathered in the conjunction with global climate change projects.
From page 80...
... Manuscript forms, charts, and paper tapes must be processed and entered prior to quality control. Tapes received routinely from National Weather Service or other entities must be checked for format and completeness before going to the technicians for conversion and quality control.
From page 81...
... If not, please describe the competition you have for your data products and services. NCDC is the only source for most of the products described earlier.
From page 82...
... The customer profile is basically a judgment call on the part of the customer service representative who takes the order. For example, if a law firm requests data to be used in litigation against a business or insurance firm, the customer would be listed under "Legal." The most recent 12-month period shows these categories, most of which have not changed significantly over the past several years.
From page 83...
... 7a. What are the principal sources offundingfor your database activities ?
From page 84...
... Increased navments to the General Services Administration for rent and utilities are seldom covered fully. Communications costs to access new observing systems of the National Weather Service are only partially covered by increased funding.
From page 85...
... 9. Do you believe the main problems/harriers/issues you have described above are representative of other similar data activities in your discipline or sector?
From page 86...
... Our principal sources are the National Weather Service (NWS) and the National Environmental Satellite Data and , , Information Service; some of the NWS data originate from foreign weather services.
From page 87...
... · We provide "decoder" routines (i.e., format translation codes) that match our data streams and create files (from the IDD)
From page 88...
... The main issues pertaining to software development are the complexities of multiplatform use, keeping pace with data stream changes, exploiting technology advances, and making the software easy to use while offering comprehensive functionality. Unidata disseminates but does not "develop" its quasi-real-time data products.
From page 89...
... We are not the sole source for any data products, but for universities who seek data in quasi-real-time, Unidata is by far the dominant source.
From page 90...
... We have not sought legal protections for our database activities, and we do not think Unidata products have been misused with respect to our rights or those of our data providers. Our view notwithstanding, complaints have been raised to the NWS and the U.S.
From page 91...
... 9. Do you believe the main problems/harriers/issues you have described above are representative of other similar data activities in your discipline or sector?
From page 92...
... The biggest problem that we have has to do with redistribution constraints, preventing our universities from exercising the full range of educational opportunities which have included to a verY successful extent ~ believe, the ~ ~ , , .# . · · r · r , · · , 1 T r 1 ~ ~ ~ T ~ _ 1 _ ~ _ _ _ _ _1 _ ~ 1_ ~ 1_ _ _ provision of information in the K-12 context.
From page 93...
... The radar data are actually nrovirlecl or collected or acouireti through the National Weather Service radar. The National Weather Service determined that it did not have the resources to broadly distribute those data to the community, even its own weather forecasting offices in the network.
From page 94...
... National Weather Service (Family of Services)
From page 95...
... Raw data available hours before NWS DIFAX charts. DATAsuite DATAsuite incorporates all of WST's data and value-added products into one offering with the added advantage of including all future data products still in development during the life of a customer's contract.
From page 96...
... Service also includes a full range of specialties, such as consulting, design, animation, programming, and forecasting services. Emerge Agricultural Information Products Emerge is a comprehensive precision agricultural information service that provides real-time site-specific data to subscribers.
From page 97...
... If not, please describe the competition you have foryour data products and services. WS} is the largest of the providers of real-time weather information.
From page 98...
... that there is a sufficient market demand. Until recently the commercial terms from many national weather services were far too expensive for us to obtain data from them on a profitable basis.
From page 99...
... Sa`. What specific legal or policy changes woul~you like to see implemented to help address the problems addressed above?
From page 100...
... The problems that we face are representative of those faced by similar data activities elsewhere. The strict time-lirrut requirements of much of our business is a limitation to some of the unauthorized copying and redistribution issues that other types of information businesses may face.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.