Skip to main content

Currently Skimming:

3. Sharing Data and Software
Pages 35-50

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 35...
... WHICH DATA SHOULD BE SHARED? In the context of a published finding, the information that should be shared and the manner in which it should be made available depend on how central it is to the principal conclusions of the paper and to the ability of others to validate or refute it.
From page 36...
... Background information would not be essential for reproducing, verifying, or building on the claims in the paper; it might be considered as background, for instance, because obvious alternative methods or sources of data could be substituted. A corollary to the uniform principle for sharing integral data and materials expeditiously (UPSIDE)
From page 37...
... Clinical databases might contain details that would permit linkages to identify research participants. The committee recognizes that o databases arising from clinical studies or treatment trials must be made available in a manner that complies with applicable standards for protection of human subjects (Department of Health and Human Services, 2001)
From page 38...
... A paper that describes a new software package claimed to be useful for investigating specific types of life-science questions presents a slightly different situation. Here, the intended scientific reading audience for the paper is a wider user community, not other computational or mathematical biologists.
From page 39...
... In considering how to determine whether particular DNA sequence data are integral to a scientific paper, workshop participants discussed several hypothetical journal articles about the kangaroo genome. For a paper titled "The Complete Genome Sequence of the Kangaroo," the complete genome is the result of the paper; therefore, the entire genome sequence should be made available as though it were a figure or table in the paper.
From page 40...
... (BLAST is a public resource that allows researchers to scan all publicly available DNA-sequence data for specific sequence homologies.) The software itself is the principal result being announced and therefore considered integral to aDowing others to duplicate the claims and should be made available as described above to support the central claim of the paper.
From page 41...
... make software available in a way that meets the principle of publication. Some authors explicitly place source code in the public domain with no restrictions, as they would materials with no commercial value.
From page 42...
... Consistency with the standards described herein for access to integral data and materials requires that such software implementations should be made available to the entire scientific community on the same terms. The principle is that publication involves equal responsibilities and benefits to not-for-profit and for-profit researchers alike.
From page 43...
... . However, most database fees are likely to be far more expensive than individual journal subscriptions; and once a journal subscription is paid for, it does not seem reasonable for additional fees to be imposed to gain access to data reported in it.
From page 44...
... . In genomics, the community standard is for researchers to deposit DNA sequences in one of the public electronic databases in the International Nucleotide Sequence Database Collaboration, which comprises GenBank, the European Molecular Biology Laboratory Nucleotide Sequence Database, and the DNA Data Bank of Japan (these are henceforth referred to collectively as genome databanks)
From page 45...
... The sequence data in public genome databanks are the starting point for an interconnected web of bioinformatics data resources that serve the larger research community. These resources include the National Center for Biotechnology Information BLAST server, a widely used resource that allows researchers to scan all known, publicly available DNAsequence data for unexpected and informative homologies; public protein databases derived from the DNA-sequence data in GenBank, such as SVVISS-PROT or Protein Information Resource; and public genome browsers, such as Ensembl, which adds useful annotations to eukaryotic genome-sequence data and facilitates their interpretation.
From page 47...
... Sharing Data and Software 47
From page 48...
... When companies have published papers in which a database was a central part of the research finding (and were granted an exception to the requirement to place the data in a public data repository) , access to the data required an investigator to agree to terms that not only prohibited the use of the data for commercial purposes, but also prohibited other specific uses of the data (see Box 3-2)
From page 49...
... , it is not altogether clear that compromising the q2'idiro q2~0 win be offset by a gain in published research results that could not be made available by other means. It may not be feasible to exert property rights in data that allow them to be published, verified by the scientific community, and provided in "dynamic" format without also facilitating commercial competitors.
From page 50...
... As a way to improve the process of sharing publication-related data, the committee makes the following recommendation: Recommendation 1. The scientific community should continue to be involved in crafting appropriate terms of any legislation that provides additional database protection.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.