[Show abstract][Hide abstract] ABSTRACT: Recombination is a fundamental evolutionary force. Therefore the population recombination rate ρ plays an important role in the analysis of population genetic data, however it is notoriously difficult to estimate. This difficulty applies both to the accuracy of commonly used estimates, and to the computational efforts required to obtain them. Some particularly popular methods are based on approximations to the likelihood. They require considerably less computational efforts than full likelihood method with not much less accuracy. Nevertheless, the computation of these approximate estimates can still be very time consuming, in particular when the sample size is large. Although auxiliary quantities for composite likelihood estimates can be computed in advance and stored in tables, these tables need to be recomputed if either the sample size or the mutation rate θ changes. Here we introduce a new method based on regression combined with boosting as a model selection technique. For large samples, it requires much less computational effort than other approximate methods, while providing similar levels of accuracy. Notably, for a sample of hundreds or thousands of individuals, the estimate of ρ using regression can be obtained on a single personal computer within a couple of minutes while other methods may need a couple of days or months (or even years). When the sample size is smaller (n ≤ 50), our new method remains computational efficient but produces biased estimates. We expect the new estimates to be helpful when analyzing large samples and/or many loci with possibly different mutation rates.
[Show abstract][Hide abstract] ABSTRACT: Pleistocene climate change has had an important effect in shaping intraspecific genetic variation in many species; however, its role in driving speciation is less clear. We examined the possibility of a Pleistocene origin of the only two representatives of the genus Pugionium (Brassicaceae), Pugionium cornutum and Pugionium dolabratum, which occupy different desert habitats in northwest China. We surveyed sequence variation for internal transcribed spacer (ITS), three chloroplast (cp) DNA fragments, and eight low-copy nuclear genes among individuals sampled from 11 populations of each species across their geographic ranges. One ITS mutation distinguished the two species, whereas mutations in cpDNA and the eight low-copy nuclear gene sequences were not species-specific. Although interspecific divergence varied greatly among nuclear gene sequences, in each case divergence was estimated to have occurred within the Pleistocene when deserts expanded in northwest China. Our findings point to the importance of Pleistocene climate change, in this case an increase in aridity, as a cause of speciation in Pugionium as a result of divergence in different habitats that formed in association with the expansion of deserts in China.
[Show abstract][Hide abstract] ABSTRACT: Population genetics data based on multiple nuclear loci provide invaluable information to understand demographic, selective, and divergence histories of the current species. We studied nucleotide variation at 13 nuclear loci in 53 populations distributed among four closely related, but morphologically distinct juniper species of the Qinghai-Tibetan Plateau (QTP). We used a novel approach combining Approximate Bayesian Computation and a recently developed neutrality test based on the maximum frequency of derived mutations to examine the demographic and selective histories of individual species, and isolation-with-migration analyses to study the joint history of the species and detect gene flow between them. We found that (1) the four species, which diverged in response to the extensive QTP uplifts, have different demographic histories; (2) two loci, Pgi and CC0822, depart significantly from neutrality in one species and Pgi, is also marginally significant in another; and (3) shared polymorphisms are common, indicating both incomplete lineage sorting and gene flow after species divergence. In addition, the detected unidirectional gene flow provides indirect support for the theoretical prediction that introgression should mostly take place from local to invading species. Our results, together with previous studies, underscore complex evolutionary histories of plant diversification in the biodiversity-hotspot QTP.
[Show abstract][Hide abstract] ABSTRACT: Summary statistics are widely used in population genetics, but they suffer from the drawback that no simple sufficient summary statistic exists, which captures all information required to distinguish different evolutionary hypotheses. Here, we apply boosting, a recent statistical method that combines simple classification rules to maximize their joint predictive performance. We show that our implementation of boosting has a high power to detect selective sweeps. Demographic events, such as bottlenecks, do not result in a large excess of false positives. A comparison to other neutrality tests shows that our boosting implementation performs well compared to other neutrality tests. Furthermore, we evaluated the relative contribution of different summary statistics to the identification of selection and found that for recent sweeps integrated haplotype homozygosity is very informative whereas older sweeps are better detected by Tajima's π. Overall, Watterson's was found to contribute the most information for distinguishing between bottlenecks and selection.
[Show abstract][Hide abstract] ABSTRACT: We used a machine learning method, the nearest neighbor algorithm (NNA), to learn the relationship between miRNAs and their target proteins, generating a predictor which can then judge whether a new miRNA-target pair is true or not. We acquired 198 positive (true) miRNA-target pairs from Tarbase and the literature, and generated 4,888 negative (false) pairs through random combination. A 0/1 system and the frequencies of single nucleotides and di-nucleotides were used to encode miRNAs into vectors while various physicochemical parameters were used to encode the targets. The NNA was then applied, learning from these data to produce a predictor. We implemented minimum redundancy maximum relevance (mRMR) and properties forward selection (PFS) to reduce the redundancy of our encoding system, obtaining 91 most efficient properties. Finally, via the Jackknife cross-validation test, we got a positive accuracy of 69.2% and an overall accuracy of 96.0% with all the 253 properties. Besides, we got a positive accuracy of 83.8% and an overall accuracy of 97.2% with the 91 most efficient properties. A web-server for predictions is also made available at http://app3.biosino.org:8080/miRTP/index.jsp.
Full-text · Article · Nov 2010 · Molecular Diversity
[Show abstract][Hide abstract] ABSTRACT: The transcription factor (TF) is a protein that binds DNA at specific site to help regulate the transcription from DNA to RNA. The mechanism of transcriptional regulatory can be much better understood if the category of transcription factors is known. We introduce a system which can automatically categorize transcription factors using their primary structures. A feature analysis strategy called "mRMR" (Minimum Redundancy, Maximum Relevance) is used to analyze the contribution of the TF properties towards the TF classification. mRMR is coupled with forward feature selection to choose an optimized feature subset for the classification. TF properties are composed of the amino acid composition and the physiochemical characters of the proteins. These properties will generate over a hundred features/parameters. We put all the features/parameters into a classifier, called NNA (nearest neighbor algorithm), for the classification. The classification accuracy is 93.81%, evaluated by a Jackknife test. Feature analysis using mRMR algorithm shows that secondary structure, amino acid composition and hydrophobicity are the most relevant features for classification. A free online classifier is available at http://app3.biosino.org/132dvc/tf/.
Full-text · Article · Jul 2010 · Protein and Peptide Letters
[Show abstract][Hide abstract] ABSTRACT: After Darwin's natural selection theory emerged, Kimura proposed the theory of neutral evolution in 1968, which considered neutral mutations and random genetic drift as the major evolutionary forces. In the following years, various kinds of methods were developed to test whether natural selection has ever happened. With the improvement of DNA sequencing technologies, large amount of DNA sequence polymorphism data is available, providing a mass of materials for testing the natural selection theory. Since natural selection would leave its footprint on the genome, we are able to refer the adaptive evolutionary history of a population. On the other hand, population demographic history and other evolutionary forces may affect DNA polymorphic pattern in similar way, which may interfere with tests. In this paper, we summarized some basic concepts of neutral test, and briefly introduced some classical methods. Focus was given to several recently developed methods.
[Show abstract][Hide abstract] ABSTRACT: In this paper, amino acid compositions are combined with some protein sequence properties (physiochemical properties) to predict protein structural classes. We are able to predict protein structural classes using a mathematical model that combines the nearest neighbor algorithm (NNA), mRMR (minimum redundancy, maximum relevance), and feature forward searching strategy. Jackknife cross-validation is used to evaluate the prediction accuracy. As a result, the prediction success rate improves to 68.8%, which is better than the 62.2% obtained when using only amino acid compositions. Therefore, we conclude that the physiochemical properties are factors that contribute to the protein folding phenomena and the most contributing features are found to be the amino acid composition. We expect that prediction accuracy will improve further as more sequence information comes to light. A web server for predicting the protein structural classes is available at http://app3.biosino.org:8080/liwenjin/index.jsp.
Preview · Article · Nov 2008 · Molecular Diversity