Skip to main content

Currently Skimming:

2 The GenotypePhenotype Challenge
Pages 5-16

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 5...
... This has been a truly remarkable achievement by the human genetics research community, she said, but at the same time that achievement points to one of the biggest challenges facing that community today: How does one take these 100,000 variants scattered around the genome and understand what they do -- first at the level of cells and then at the level of tissues, then organs, and then, finally, across entire humans beings? Regev provided a big-picture view of the challenges facing the biological researchers attempting to understand the genotype–phenotype connection -- that is, how the information coded in the genome leads to the physical characteristics of an organism.
From page 6...
... "So it doesn't really matter in which direction you look," she said. "In theory, for each of these problems -- and many other problems -- the space of possibilities is enormous." Regev noted that the main issue is that researchers do not know, up front, which connections matter and which do not.
From page 7...
... This would not have worked in the past when it was not even possible to measure gene expression levels in single cells. In the past few years, breakthroughs in single-cell genomics have made it possible to measure expression, chromatin, and other molecular profiles in large numbers of individual cells.
From page 8...
... Second, they help in studying the function of genes, that is, its phenotypic mapping. Finally, simply knowing that genes form structured programs helps with such problems as genetic interactions, which otherwise might appear intractable.
From page 9...
... When Regev's group investigated ILCs with single-cell RNA sequencing, they found that the ILCs were not discrete cell types but rather spanned a range of continuous cell states. This is difficult to capture when assuming that cells are the basic unit instead of considering the gene programs (Bielecki et al., 2018)
From page 10...
... In essence they were looking for gene programs to lead them to clues about gene function. The result was a collection of gene modules made up of a set of genes whose expression co-varied in specific cell types, and, as it turned out, most of the modules they identified consisted of multiple genes identified by GWAS -- that is, most of the genes that co-varied with a GWAS gene in a particular cell type were themselves GWAS genes.
From page 11...
... "It's really hard to decide even which screen you should devise for them," she said. "So we devised this screen you do when you don't know anything." Testing 35 genes known to be implicated in autism spectrum disorder against five major cell types, they first examined the effects of individual genes on individual cell types and found very little.
From page 12...
... Using Structured Programs to Understand Genetic Interactions The third way that genetic programs can be of assistance is that simply knowing that genes form structured programs helps with such problems as understanding genetic interactions. For example, Regev said, "we can use this knowledge that there are expression programs not just to change our analysis of data we already measured, but also to change how we do measurements in the first place." This was, for instance, why she knew that she could get useful results from sparse and noisy data from a large number of cells -- because there was an underlying structure.
From page 13...
... This should work, Regev said, because "transcription factors bind in short degenerate sequences, and most transcription factor binding sites should exist in random DNA." According to a calculation she and her colleagues made in 2009, one transcription factor motif should appear in every 1,500 base pairs, so if one analyzed a library of 10 million 80-base-pair sequences, there should be more than 500,000 such motifs. With this in mind, Carl de Boer, who was working in Regev's laboratory at the time, devised a simple assay: "You would measure the extremely noisy expression level of hundreds of millions of sequence examples (de Boer et al., 2020)
From page 14...
... At a small scale, genetic interactions can be studied by profiling-based methods such as Perturb-seq. Regev described one such study involving two genes for transcription factors, NF-κB1 and Rela, that jointly control a program, with Rela activating the program and NF-κB1 suppressing it.
From page 15...
... Using the GWAS genes from the ulcerative colitis work as seeds, they are building modules of two types: cell-type-specific modules that vary across all of the cell types, and program modules in which genes covary within a cell type. Then they will examine genetic interactions either where the genes in the same module interact or where the interactions are between genes in different modules.
From page 16...
... These include understanding epigenetics and gene regulation, learning about how environmental interactions and perturbations affect gene expression, and the use of research organisms as models for laboratory work. Her talk also covered some conceptual ideas that other speakers brought up in their talks, including the non-linearity of biological interactions and gene expression, the complexity of understanding genetic function, and the inherent structure present in biological systems.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.