B
Challenge Problems in Bioinformatics and Computational Biology from Other Reports
B.1 GRAND CHALLENGES IN COMPUTATIONAL BIOLOGY (David Searls)1
-
Protein structure prediction
-
Homology searches
-
Multiple alignment and phylogeny construction
-
Genomic sequence analysis and gene-finding
B.2 OPPORTUNITIES IN MOLECULAR BIOMEDICINE IN THE ERA OF TERAFLOP COMPUTING (Klaus Schulten et al.)2
-
Study protein-protein and protein-nucleic acid recognition and assembly
-
Investigate integral functional units (dynamic form and function of large macromolecular and supramolecular complexes)
-
Bridge the gap between computationally feasible and functionally relevant time scales
-
Improve multiresolution structure prediction
-
Combine classical molecular dynamics simulations with quantum chemical forces
-
Sample larger sets of dynamical events and chemical species
-
Realize interactive modeling
-
Foster the development of biomolecular modeling and bioinformatics
-
Train computational biologists in teraflop technologies, numerical algorithms, and physical concepts
-
Bring experimental and computational groups in molecular biomedicine closer together.
1 |
D. Searls, “Grand Challenges in Computational Biology,” Computational Methods in Molecular Biology, S. Salzberg, D. Searls, and Simon Kasif, eds., Elsevier Science, 1998. |
2 |
K. Schulten, G. Budescu, F. Molnar, Opportunities in Molecular Biomedicine in the Era of Teraflop Computing, NIH Resource for Macromolecular Modeling and Bioinformatics, March 3-4, 1999, Rockville, MD; see http://whitepapers.zdnet.co.uk/0,39025945,60014729p-39000617q,00.htm. |
B.3 WORKSHOP ON MODELING OF BIOLOGICAL SYSTEMS (Peter Kollman and Simon Levin)3
Challenging Issues That Span All Areas of Modeling Systems
-
Integrating data and developing models of complex systems across multiple spatial and temporal scales
-
Scale relations and coupling
-
Temporal complexity and coding
-
Parameter estimation and treatment of uncertainty
-
Statistical analysis and data mining
-
Simulation modeling and prediction
-
-
Structure-function relationships
-
Large and small nucleic acids
-
Proteins
-
Membrane systems
-
General macromolecular assemblies
-
CeIlular, tissue, organismal systems
-
Ecological and evolutionary systems
-
-
Image analysis and visualization
-
Image interpretation and data fusion
-
Inverse problems
-
Two-, three- and higher-dimensional visualization and virtual reality
-
-
Basic mathematical issues
-
Formalisms for spatial and temporal encoding
-
Complex geometry
-
Relationships between network architecture and dynamics
-
Combinatorial complexity
-
Theory for systems that combine stochastic and nonlinear effects often in partially distributed systems
-
-
Data management
-
Data modeling and data structure design
-
Query algorithms, especially across heterogeneous data types
-
Data server communication, especially peer-to-peer replication
-
Distributed memory management and process management
-
B.4 WORKSHOP ON NEXT-GENERATION BIOLOGY: THE ROLE OF NEXT-GENERATION COMPUTING (Shankar Subramaniam and John Wooley)4
Exemplar Challenges for Bioinformatics and Computational Biology
-
Full genome-genome comparisons
-
Rapid assessment of polymorphic genetic variations
3 |
“Modeling of Biological Systems,” P. Kollman and S. Levin (chairs), a workshop at the National Science Foundation, March 14 and 15, 1996, available at http://www.resnet.wm.edu/~jxshix/math490/Modeling%20of%20Biological%20Systems.htm. |
4 |
S. Subramaniam and J. Wooley, DOE-NSF-NIH 1998 Workshop on Next-Generation Biology: The Role of Next Generation Computing, available at http://cbcg.lbl.gov/ssi-csb/nextGenBioWS.html. |
-
Complete construction of orthologous and paralogous groups of genes
-
Structure determination of large macromolecular assemblies/complexes
-
Dynamical simulation of realistic oligomeric systems
-
Rapid structural/topological clustering of proteins
-
Prediction of unknown molecular structures; protein folding
-
Computer simulation of membrane structure and dynamic function
-
Simulation of genetic networks and the sensitivity of these pathways to component stoichiometry and kinetics
-
Integration of observations across scales of vastly different dimensions and organization to yield realistic environmental models for basic biology and societal needs
B.5 TECHNOLOGIES FOR BIOLOGICAL COMPUTER-AIDED DESIGN (Masaru Tomita)5
-
Enzyme engineering: to refine enzymes and to analyze kinetic parameters in vitro
-
Metabolic engineering: to analyze flux rates in vivo
-
Analytical chemistry: to determine and analyze the quantity of metabolites efficiently
-
Genetic engineering: to cut and paste genes on demand, for modifying metabolic pathways
-
Simulation science: to efficiently and accurately simulate a large number of reactions
-
Knowledge engineering: to construct, edit and maintain large metabolic knowledge bases
-
Mathematical engineering: to estimate and tune unknown parameters
B.6 TOP BIOINFORMATICS CHALLENGES (Chris Burge et al.)6
-
Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome
-
Precise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing pattern of any primary transcript
-
Precise, quantitative models of signal transduction pathways:ability to predict cellular response to external stimuli
-
Determining effective protein-DNA, protein-RNA and protein-protein recognition codes
-
Accurate ab initio structure prediction
-
Rational design of small molecule inhibitors of proteins
-
Mechanistic understanding of protein evolution: understanding exactly how new protein functions evolve
-
Mechanistic understanding of speciation: molecular details of how speciation occurs
-
Continued development of effective gene ontologies-systematic ways to describe the functions of any gene or protein
-
(Infrastructure and education challenge)
-
Education: development of appropriate bioinformatics curricula for secondary, undergraduate, and graduate education
B.7 EMERGING FIELDS IN BIOINFORMATICS (Patricia Babbitt)7
-
Data storage and retrieval, database structures, annotation
-
Analysis of genomic/proteomic/other high-throughput information
5 |
M. Tomita, “Towards Computer Aided Design (CAD) of Useful Microorganisms,” Bioinformatics 17(12):1091-1092, 2001. |
6 |
C. Burge, “Bioinformaticists Will Be Busy Bees,” Genome Technology, No. 17, January, 2002. Available (by free subscription) at http://www.genome-technology.com/articles/view-article.asp?Article=20021023161457. |
7 |
P. Babbitt et al., “A Very Very Very Short Introduction to Protein Bioinformatics,” August 22-23, 2002, University of California, San Francisco, available at http://baygenomics.ucsf.edu/education/workshop1/lectures/w1.print2.pdf. |
-
Evolutionary model building and phylogenic analysis
-
Architecture and content of genomes
-
Complex systems analysis/genetic circuits
-
Information content in DNA, RNA, protein sequences and structure
-
Metabolic computing
-
Data mining using machine learning tools, neural nets, artificial intelligence
-
Nucleic acid and protein sequence analyses
B.8 TEN GRAND CHALLENGES (Sylvia Spengler)8
-
The origin, structure, and fate of the universe
-
The fundamental structure of matter
-
Earth’s physical systems
-
The diversity of life on Earth
-
The tree of life
-
The language of life
-
The web of life
-
Human ecology
-
The brain and artificial thinking machines
-
Integrating Earth and human systems
-
A knowledge server for planetary management
Research Across Domains: Data
-
Information management—human evolution continued
-
Exponential increase in data and information across domains
-
Access to information across domains—as or more important than the information itself
-
Integration of data across knowledge domains
-
Apply analytical tools across knowledge domains
-
Modeling of complex systems
-
Simulation of phenomena—descriptive science becomes predictive science
Research Across Domains: People
-
Share data across disciplines
-
Build and use analytical and modeling tools across disciplines
-
Work in collaborative, cross-domain groups
Research Across Domains: Time
-
Real-time data access, integration, and analysis
-
Real-time modeling and effects prediction
-
Real-time dissemination of research results
-
Real-time testing by research community
-
Real-time policy discussions
-
Real-time policy decisions
B.9 GRAND CHALLENGES IN BIOMEDICAL COMPUTING (John A. Board, Jr.)9
Biomedical Applications from Coupling Imaging and Modeling
-
Real-time noninvasive three-dimensional imaging of many body systems
-
Real-time generation of three-dimensional patient-specific models
-
Multiple-technology (multimodal) imaging and modeling
-
Whole-organ modeling
-
Multiple-organ system modeling
-
Patient-specific modeling of organ anomalies
-
Model support for (partial) restoration of hearing, coarse vision, and locomotion (via both paralyzed and artificial limbs)
All of these applications make use of:
-
Three-dimensional models
-
Increasingly refined grids and increasing levels of tissue discrimination
-
Anatomically realistic models
-
Special-purpose hardware for visualization
-
Distributed computing techniques.
B.10 ACCELERATING MATHEMATICAL-BIOLOGICAL LINKAGES: REPORT OF A JOINT NSF-NIH WORKSHOP (Margaret Palmer et al.)10
List of Top Ten Problems at the Mathematical Biology Interface
-
Model multilevel systems: from the cells in people, to human communities in physical, chemical, and biotic ecologies.
-
Model networks of complex metabolic pathways, cell signaling, and species interactions.
-
Integrate probabilistic theories: understand uncertainty and risk.
-
Understand computation: gaining insight and proving theorems from numerical computation and agent-based models.
-
Provide tools for data mining and inference.
-
Address linguistic and graph theoretical approaches.
-
Model brain function.
-
Build computational tools for problems with multiple temporal and spatial scales.
-
Provide ecological forecasts.
-
Understand effects of erroneous data on biological understanding.
B.11 GRAND CHALLENGES OF MULTIMODAL BIOMEDICAL SYSTEMS (J. Chen et al.)11
Science Challenges
-
Allow early detection of where and when an infectious disease outbreak occurs, whether it is naturally occurring or man-made, in real time.
9 |
J.A. Board, Jr., “Grand Challenges in Biomedical Computing, High-Performance Computing in Biomedical Research, T.C. Pilkington, B. Loftis, J.F. Thompson, S.L.Y. Woo, T.C. Palmer, and T.F. Budinger, eds., CRC Press, Boca Raton, FL, 1993. |
10 |
M. Palmer et al., “Accelerating Mathematical-Biological Linkages: Report of a Joint NSF-NIH Workshop,” February 2003, available at www.maa.org/mtc/NIH-feb03-report.pdf. |
11 |
J. Chen et al., “Grand Challenges of Multimodal Bio-Medical Systems,” IEEE Circuits and Systems Magazine, pp. 46-52, 2nd Quarter 2005, available at http://gsp.tamu.edu/Publications/PDFpapers/pap_CASmag_MBM.pdf. |
-
Develop multidimensional drug profiling databases to facilitate drug discovery and to identify biomarkers for diagnosis and monitoring the progress of individual disease treatments.
-
Connect activities and events derived from cellular processes to high-level cognitions.
-
Support personalized medical care and clinical decision for patients
Technology Challenges and Enabling Technologies
-
Formalization of biological knowledge into predictive models for systems biology and system-based analysis
-
Interdisciplinary training
-
Development of open source, multiscale modality informatics toolkits
B.12 THE DEPARTMENT OF ENERGY’S GENOMES TO LIFE PROGRAM12
21st Century Biology Requiring “Biocomp” Tools
-
Population models, symbiosis, and stability
-
Discrete growth models
-
Reaction kinetics
-
Biological oscillators and switches
-
Coupled oscillators
-
Reaction-diffusion, chemotaxis, and nonlocality
-
Oscillator-generated wave phenomena and patterns
-
Spatial pattern formation with population interactions
-
Mechanical models for generating pattern and form in development
-
Evolution and morphogenesis
A Mathematica for Molecular, Cellular, and Systems Biology
-
Core data models and structures [database management]
-
Optimized functions [core libraries]
-
Scripting environment [e.g., Python, PERL, ruby, etc.]
-
Database accessors and built-in schemas
-
Simulation interfaces
-
Parallel and accelerated kernels
-
Visualization interfaces (for information visualization and scientific visualization)
-
Collaborative workflow and group use interfaces
Hierarchical Biological Modeling Environment
-
Genetic sequences
-
Molecular machines
-
Molecular complexes and modules
-
Networks + pathways [metabolic, signaling, regulation]
-
Structural components [ultrastructures]
-
Cell structure and morphology
-
Extracellular environment
-
Populations and consortia
12 |
R. Stevens, “GTL Software Infrastructure: A Computer Science Perspective,” undated presentation, Argonne National Laboratory, available at www.doegenomics.org/compbio/mtg_1_22_02/RickStevens.ppt. |
Modeling and Simulation Challenges for 21st Century Biology
-
Modeling activity of single genes
-
Probabilistic models of prokaryotic genes and regulation
-
Logical models of regulatory control in eukaryotic systems
-
Gene regulation networks and genetic network inference in computational models and applications to large-scale gene expression data
-
Atomistic-level simulation of biomolecules
-
Diffusion phenomena in cytoplasm and extracellular environment
-
Kinetic models of excitable membranes and synaptic interactions
-
Stochastic simulation of cell signaling pathways
-
Complex dynamics of cell cycle regulation
-
Model simplification
B.13 HIGH-PERFORMANCE COMPUTING, COMMUNICATION, AND INFORMATION TECHNOLOGY GRAND CHALLENGES (LATE 1980s, EARLY 1990s)13
Computing Applications to Map and Sequence Human Genome
-
Understanding protein folding
-
Predicting structure of native protein
-
Exhaustive discovery and analysis of cancer genes
-
Molecular recognition and dynamics
-
Drug discovery