Skip to main content

Currently Skimming:

7 Bioinformatics and Data
Pages 143-156

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 143...
... Specific fields that will benefit include • Understanding genetic diversity, • Epidemiology, • Vaccinology, • Global health, • Metabolic reconstruction, • Systems biology, and • Personalized medicine. As most of the workshop speakers noted, WGS has made possible giant steps forward for epidemiology and microbial forensics.
From page 144...
... He reviewed the challenges faced in assembling and curating databases, as well as in developing bioinformatics to efficiently and effectively exploit the data. Chun, who is a trained taxonomist, reminded the workshop participants that the traditional concept of species is "groups of actually or potentially interbreeding natural populations that are reproductively isolated from other such groups" (Mayr, 1942)
From page 145...
... Applying genome sequencing to species-level identification is limited because reference genomes do not exist. Accurate identification based on genome data is possible with ANI, but more type strains should be sequenced.
From page 146...
... Databases Chun considers to be up to date and stably funded include the following: • NIH's National Center for Biotechnology Information (NCBI) maintains several databases: GenBank, RefSeq, Microbial Genomes Resources.
From page 147...
... Pure culture sample analysis involves the assembly of genomes, taxonomic identification, identification of variation, such as SNPs, gene content, tracing of gene transfer, and matching against a database. Metagenomic sample analysis involves assembly of a metagenome, taxonomic analysis of community composition, and matching against a database.
From page 148...
... , 33, 35, 15 21 40, 54, 57, 58 4, 7, 9, 10, 17, 19, 37, 56, 59 B33 O139 47(a) 1, 7, 9, 21, 623-39 14 2740-80 26, 33-35, 1, 6, 7, 9-11, 51 12,13 40, 43, 44 16, 19, 26, 34, 16 35, 52, 57 MJ-1236 22 46, 48-50 18, 20 11, 17, 19 AM-19226 42, 45 BX330286 MAK757 NCTC 8457 TMA21 MZO-3 MO10 GIs coding for O antigens Known major pathogenicity islands Strain specific GIs 148 FIGURE 7-1  Evolutionary phylogenetic tree of V
From page 149...
... A bioinformatics pipeline typically involves many software tools, parameters, and hidden "know-how." For this reason it is difficult to reproduce most of the large-scale genomics papers (e.g., genome assembly, metagenome comparison)
From page 150...
... Cloud computing offers a way to share metadata, and to send out streams of data that can enable accurate identification, provide global epidemiology and treatment information, and inform genetic engineering. Overall, Chun believes that cloud computing technology and bioinformatics will find a way.
From page 151...
... If one investigator has a genome sequence using Life Technologies' PGM sequencer and another using an Illumina MiSeq sequencer, the data can be easily compared, put in a database, and will be forward compatible with any technology that comes along. He agrees that a committee will never decide anything, but the genome is the genome and once we have it, we can use it.
From page 152...
... the Global Microbial Identifier (a multiorganizational international effort)
From page 153...
... Those at Australian research institutions are issued free allocations for use; in instances when large computing resources and many computing hours are needed, users can make special requests. Other paradigms for using cloud computing have emerged, including the HTCondor project, which essentially leverages idle computing time; it is a cycle-scavenging system.
From page 154...
... ; for sequence indexing and generic sequence analysis algorithms, the SeqAn library; and for phylogenetic analysis, graphics processing unit–enabled versions of MrBayes and Bayesian evolutionary analysis by sampling trees, which use the BEAGLE Library (beagle-lib)
From page 155...
... A recent trend toward educating this community can be seen in the series of workshops called Software Carpentry.7 These workshops offer training in basic computing skills, such as version control, as well as literate programming, so scientists can generate workflows that are reproducible on different systems and engineered according to standards generally accepted in software engineering. In addition, there are a number of efforts to generate point-and-click visual systems that will enable users to generate reproducible workflows.
From page 156...
... Darling thinks that there is probably a way to generate reproducible results and still obtain careful version control of analyses using cloud systems, but it should be something that the service provider builds into the system at the very basic level. This needs to be thought about and discussed with the providers who develop the necessary software.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.