Predicting the proportion of essential genes in mouse duplicates based on biased mouse knockout genes.
ABSTRACT In the yeast or nematode, the proportion of essential genes in duplicates is lower than in singletons (single-copy genes), due to the functional redundancy. One may expect that it should be the same in the mouse genome. However, based on the publicly available mouse knockout data, it was observed that the proportion of essential genes in duplicates is similar to that in singletons. The most straightforward interpretation, as claimed in a recent study, is that duplicate genes may have a negligible role in the mouse genetic robustness. Here we show that in the current mouse knockout dataset, recently duplicated genes have been highly underrepresented, leading to an overestimation of the proportion of essential genes in duplicates. After estimating the duplication time of mouse duplication events, we have developed a simple bias-correcting procedure and shown that the bias-corrected proportion of essential genes in mouse duplicates is significantly lower than that in singletons.
- SourceAvailable from: xungulab.com[show abstract] [hide abstract]
ABSTRACT: There are two proposed mechanisms for the emergence of gene network robustness: (1) ‘genetic buffering’ from redundant gene networks (i.e. alternative metabolic or regulatory or signal pathways), and (2) functional complementation from duplicate genes. Their relative significance is a subject to debate, but recent studies in functional genomics provide some interesting insights. In particular, experiments in yeast using whole-genome libraries of single-gene-deletion mutants and worm whole-genome RNAi show that both mechanisms are important for the genetic robustness. Yet, many questions remain.Trends in Genetics 08/2003; 19(7):354-6. · 9.77 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: The FASTA3 and FASTA2 packages provide a flexible set of sequence-comparison programs that are particularly valuable because of their accurate statistical estimates and high-quality alignments. Traditionally, sequence similarity searches have sought to ask one question: "Is my query sequence homologous to anything in the database?" Both FASTA and BLAST can provide reliable answers to this question with their statistical estimates; if the expectation value E is < 0.001-0.01 and you are not doing hundreds of searches a day, the answer is probably yes. In general, the most effective search strategies follow these rules: 1. Whenever possible, compare at the amino acid level, rather than the nucleotide level. Search first with protein sequences (blastp, fasta3, and ssearch3), then with translated DNA sequences (fastx, blastx), and only at the DNA level as a last resort (Table 5). 2. Search the smallest database that is likely to contain the sequence of interest (but it must contain many unrelated sequences for accurate statistical estimates). 3. Use sequence statistics, rather than percent identity or percent similarity, as your primary criterion for sequence homology. 4. Check that the statistics are likely to be accurate by looking for the highest-scoring unrelated sequence, using prss3 to confirm the expectation, and searching with shuffled copies of the query sequence [randseq, searches with shuffled sequences should have E approx 1.0]. 5. Consider searches with different gap penalties and other scoring matrices. Searches with long query sequences against full-length sequence libraries will not change dramatically when BLOSUM62 is used instead of BLOSUM50 (20), or a gap penalty of -14/-2 is used in place of -12/-2. However, shallower or more stringent scoring matrices are more effective at uncovering relationships in partial sequences (3,18), and they can be used to sharpen dramatically the scope of the similarity search. However, as illustrated in the last section, the E value is only the first step in characterizing a sequence relationship. Once one has confidence that the sequences are homologous, one should look at the sequence alignments and percent identities, particularly when searching with lower quality sequences. When sequence alignments are very short, the alignment should become more significant when a shallower scoring matrix is used, e.g., BLOSUM62 rather than BLOSUM50 (remember to change the gap penalties). Homology can be reliably inferred from statistically significant similarity. Whereas homology implies common three-dimensional structure, homology need not imply common function. Orthologous sequences usually have similar functions, but paralogous sequences often acquire very different functional roles. Motif databases, such as PROSITE (21), can provide evidence for the conservation of critical functional residues. However, motif identity in the absence of overall sequence similarity is not a reliable indicator of homology.Methods in molecular biology (Clifton, N.J.) 01/2000; 132:185-219.
- [show abstract] [hide abstract]
ABSTRACT: The functions of many open reading frames (ORFs) identified in genome-sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of 6925 Saccharomyces cerevisiae strains were constructed, by a high-throughput strategy, each with a precise deletion of one of 2026 ORFs (more than one-third of the ORFs in the genome). Of the deleted ORFs, 17 percent were essential for viability in rich medium. The phenotypes of more than 500 deletion strains were assayed in parallel. Of the deletion strains, 40 percent showed quantitative growth defects in either rich or minimal medium.Science 09/1999; 285(5429):901-6. · 31.03 Impact Factor