Skip to main content

Currently Skimming:

3 A Proposal for Consideration: Sequence-Based Classification of Select Agents
Pages 73-106

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 73...
... Second, the committee finds that even if it were possible to assign Select Agent status solely on the basis of genome-encoded biological properties, the answer would remain no. Chapter 2 described why accurate prediction of an organism's pathogenicity from its genome sequence is not possible now, and will not be feasible in the foreseeable future -- certainly not at the level of ac curacy appropriate for statutory regulations.
From page 74...
... The rapidly expanding capabilities of automated gene synthethesis and of synthetic genomics to synthesize and "boot" complete Select Agent genomes means that the Select Agent Regulations do need to be defined in terms of genome sequence analysis, not by the phenotypic properties of an encoded agent. A Select Agent genome is covered by the Select Agent Regulations whether or not it is ever "booted" into a living agent whose phenotype can be assayed.
From page 75...
... A sequence-based classification system would still be based on a discrete list of Select Agents, but could be used to create a pragmatic "brighter line" for decid ing whether a new genome sequence should be regarded as one of the existing Select Agents or not. NOVEL AGENTS: SYNTHETIC GENOMICS AND THE SELECT AGENT REGULATIONS We need to examine what we mean by a "novel" synthetic agent.
From page 76...
... nor the IL-4 gene by themselves are Select Agents.
From page 77...
... . As the scope of DNA synthesis increases and the technology becomes commoditized, an increasing number of Select Agents can be reconstituted with a modest level of skill in molecular biology.
From page 78...
... For some Select Agents, there are no surrogate experimental hosts for characterizing virulence, and the only suitable host for a human pathogen may be a human. Those considerations raise the research and development bar substantially, and expose such a program to existing legal prohibitions other than the Select Agent Regulations.
From page 79...
... We have distinguished three kinds of novel synthetic organisms because we believe that there is a tendency to imagine nightmare scenarios in which a de noo unnamed pathogen, dissimilar to any known pathogen and thus unrecognizable by any sequence comparison protocol, is created deliberately or accidentally with synthetic genomics. Clearly, a regulatory system like the Select Agent Regulations based on a list of known agents and their genome sequences is not effective for regulating entirely de novo agents.
From page 80...
... Modified Select Agents, made facile by the commoditization of synthetic ge nomics, constitute the most important and pressing practical issue related to the Select Agent Regulations. The taxonomic nomenclature of microorganisms is designed for wild isolates of actual organisms that have observable growth phenotypes, not for non-natural modified sequences that exist only as genomic DNA.
From page 81...
... By addressing this issue as an example, we would also deal with a number of other scenarios in which synthetic genomics might be used to create modified Select Agents. And we will also be able to deal with some of the most obvious and likely ways that chimeric agents might be assembled with synthetic biology.
From page 82...
... An important principle of automated classification (also known as "superised learning" methods, in statistical inference) is that given the known sequences of things that we want to label as Select Agents and things that we do not want to label as Select Agents, there is always a classification scheme that can achieve the desired labeling of known sequences with 100 percent accuracy.
From page 83...
... IMPORTANT -- acquiring sequence data and biological information is needed to define the space around Select Agents and close the gap around novel sequences.
From page 84...
... Our main concern in using a broadened definition of a Select Agent's sequence space is to balance the need to encompass the most likely modifications and chimeras with the need to avoid the classification of a useful non-Select Agent genome (including those of vaccines and attenuated research strains) as Select Agents.
From page 85...
... SYNTHETIC GENOME CLASSIFICATION UNDER THE CURRENT SELECT AGENT REGULATIONS As noted earlier, the current Select Agents Regulations cover not just culturable organisms, but also naked DNA, including synthetic genomes. The Select Agent Regulations language confers Select Agent status on "nucleic acids that can produce infectious forms of any of the Select Agent iruses" and "recombinant nucleic acids that encode for the functional form(s)
From page 86...
... The Select Agent Regulations language will need to broaden to cover nucleic acids of all Select Agents that can be booted from synthetic genomes. The stickier point from the standpoint of sequence analysis and sequencebased classification -- and where one enters a grey area of modified synthetic agents, with engineered differences from wild-type Select Agents -- is what ex actly is meant by such terms as infectious forms or functional forms or genomic fragments.
From page 87...
... However, we should be able to use sequence-based classifica tion to establish a reasonable operational definition of the sequence space that circumscribes complete agent genomes, as distinct from incomplete genomes or complete genomes of related non-Select Agents. There is an important distinction between identifying a suspicious "sequence of concern" that might be part of a Select Agent and determining that a genome sequence is "complete" ("infectious")
From page 88...
... The NSABB recommended repealing this problematic language "particularly because the misuse of variola virus is adequately covered by other criminal laws already in place" (NSABB 2006:12) , including the Select Agents Regulations and the Biological Weapons Convention.
From page 89...
... Partial answers, given the current state of knowledge, suffice for an operational definition of a "complete" Select Agent genome. If an agent actually has 20 genes essential for viability and pathogenesis, and we know only about 10 of them, and we define any genome that contains those 10 genes as a "complete Select Agent genome," our definition is biologically incorrect, but it can suffice as an operational definition.8 That is, for an operational definition of a complete Select Agent genome we can define a parts list of genes that are thought to be necessary but not sufficient for a biologically functional Select Agent genome.
From page 90...
... Any given parts list would reflect only the current state of scientific knowl edge about each Select Agent.10 It would need to be subject to review and revision to keep up with the state of knowledge distinguishing Select Agents from other organisms. SEQUENCE ANALYSIS OF INDIVIDUAL "PARTS" Given a parts list that addresses the content of a Select Agent genome, the other question is how to define sequences that are covered for each part -- the 9 That is, modifications that would not require work on the scale of an offensive bioweapons research program, which we deem to be beyond the scope of concern for counterbioterrorism in general and for the Select Agent Regulations in particular.
From page 91...
... Neither a BLAST score nor an E value is a "distance" suitable for sequence classification into families. A BLAST score measures how likely it is that two sequences are related at all, not how closely related they are, and the E value just measures the statistical significance of the score.
From page 92...
... Thus perhaps counterintuitively, simple percent identity (not percent "similarity," not BLAST score, and not E value) is a reasonable although rough measure of genetic distance.
From page 93...
... METHODS FOR SEQUENCE SUBFAMILY CLASSIFICATION Armed with background about such sequence comparison programs as BLAST, evolutionary distances, evolutionary trees, and functional sequence subgroups as distinct clades on trees, we return to the problem of screening DNA sequence orders for Select Agents. Screening for significant BLAST hits to a database of sequences of concern does not work, because the parts of Select Agents are homologous to parts of many non-Select Agent organisms.
From page 94...
... The position of the new sequence in a tree of known homologous sequences that represent different functional subfamilies (or Select Agent and non-Select Agent sequences) is examined.
From page 95...
... . Profile-based sequence classification is highly automatable because the classification system relies on a relatively stable set of sequence alignments of representative sequences that define the desired families.
From page 96...
... might be specially flagged to raise a flag to indicate that parts of a Select Agent are present even though a complete Select Agent genome is not, for the purposes of prudent follow-up on the part of a DNA synthesis company -- for instance, if an order might represent an attempt to obtain a Select Agent genome in several individually legal pieces. For each Select Agent, given a minimal parts list and a profile-based clas sification system for each part the classification system would be tested, bench marked, and challenged using known genome sequences.
From page 97...
... Genome sequences of almost all Select Agents are available, but there has been less emphasis on obtaining genome sequences for closely related non-Select Agents. Future studies are sure to discover numerous new microbial and viral species, and it is desirable that these new discoveries not be misclassified as Select Agents just because they are closely related to Se lect Agents.
From page 98...
... From the standpoint of dealing with the implications of synthetic biology and synthetic genomes, the utility of the classification system would not be to distinguish successful genome designs from unsuccessful ones -- "bootable" pathogens from inert DNA sequences -- but to distinguish attempts to synthesize a dangerous ge nomes similar to a Select Agent from an attempt to synthesize benign genomes from a non-Select Agent organism, a non-pathogenic strain, or a vaccine. The classification system does not distinguish legitimate research from illegitimate research; rather it identifies agents that are restricted under the Select Agent Regulations and provides a means of identifying "sequences of concern" that may be worth monitoring.
From page 99...
... Classifying the current 82 Select Agents would require 82 parts lists and on the order of several thousand different profiles for the parts, and each Select Agent classification would need to be carefully tested and maintained over time. That would be on the same scale as the curation effort involved in the current Pfam or TIGRfams databases for automated protein sequence annotation.
From page 100...
... Nevertheless, such a system would be an improvement over the current process. It would transparently, consistently, 14 "Transform the international dialogue on biological threats: Activities targeted to promote a robust and sustained discussion among all nations as to the evolving biological threat and identify mutually agreed steps to counter it." 15 2009 National Research Council report Responsible Research with Biological Select Agents and Toxins, "RECOMMENDATION 2: To provide continued engagement of stakeholders in oversight of the Select Agent Program, a Biological Select Agents and Toxins Advisory Committee (BSATAC)
From page 101...
... Predictive systems biology will slowly give us a better ability to assess whether a synthetic genome sequence might be more or less hazardous and whether the organism that it encodes might be more or less likely to have the phenotypic properties of Select Agents. Therefore, to the extent that prediction of biological properties of Select Agents does become possible, we believe that it will first be useful in the context of a yellow flag biosafety warning system -- warning investigators and their institutions and institutional biosafety committees (IBCs)
From page 102...
... The Select Agent Regulations cover complete genomes ("nucleic acids that can produce infectious forms of any of the Select Agent iruses" -- language that has been formally interpreted as excluding "genomic fragments from Select Agents")
From page 103...
... The concept of raising a yellow flag on synthetic genomic constructs clearly overlaps with the biosecurity goals of the Select Agents Program -- providing a sort of buffer zone that identifies individual Select Agent synthetic parts that do not rise to the precise inclusive definitions of the "complete, infectious" genomes that come under the Select Agent Regulations. Moreover, we view a yellow flag system more broadly from a biosafety perspective.
From page 104...
... Together, the profile-based classification for Select Agents, and the yellow flag system for sequences of concern would address many of the emerging biosecurity concerns posed by synthetic biology. Gene synthesis companies, which have to make daily judgments based on the Select Agent Regulations and other regulations, strongly favor development of such a system.
From page 105...
... The intuitive concepts of sequence-based classification are sufficiently clear for anyone to know whether a sequence is in the vicinity of any reasonable definition of the line. However, a "reasonable ness" approach does not solve the problem of vagueness that troubles the DNA synthesis companies, researchers, and law enforcement as they try to apply the Select Agent Regulations.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.