[Show abstract][Hide abstract] ABSTRACT: Background
Musical abilities such as recognising music and singing performance serve as means for communication and are instruments in sexual selection. Specific regions of the brain have been found to be activated by musical stimuli, but these have rarely been extended to the discovery of genes and molecules associated with musical ability.
A total of 1008 individuals from 73 families were enrolled and a pitch-production accuracy test was applied to determine musical ability. To identify genetic loci and variants that contribute to musical ability, we conducted family-based linkage and association analyses, and incorporated the results with data from exome sequencing and array comparative genomic hybridisation analyses.
We found significant evidence of linkage at 4q23 with the nearest marker D4S2986 (LOD=3.1), whose supporting interval overlaps a previous study in Finnish families, and identified an intergenic single nucleotide polymorphism (SNP) (rs1251078, p=8.4×10−17) near UGT8, a gene highly expressed in the central nervous system and known to act in brain organisation. In addition, a non-synonymous SNP in UGT8 was revealed to be highly associated with musical ability (rs4148254, p=8.0×10−17), and a 6.2 kb copy number loss near UGT8 showed a plausible association with musical ability (p=2.9×10−6).
This study provides new insight into the genetics of musical ability, exemplifying a methodology to assign functional significance to synonymous and non-coding alleles by integrating multiple experimental methods.
Journal of Medical Genetics 11/2012; 49(12). DOI:10.1136/jmedgenet-2012-101209 · 6.34 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
[Show abstract][Hide abstract] ABSTRACT: Molecular recognition plays a fundamental role in all biological processes, and that is why great efforts have been made to understand and predict protein-ligand interactions. Finding a molecule that can potentially bind to a target protein is particularly essential in drug discovery and still remains an expensive and time-consuming task. In silico, tools are frequently used to screen molecular libraries to identify new lead compounds, and if protein structure is known, various protein-ligand docking programs can be used. The aim of docking procedure is to predict correct poses of ligand in the binding site of the protein as well as to score them according to the strength of interaction in a reasonable time frame. The purpose of our studies was to present the novel consensus approach to predict both protein-ligand complex structure and its corresponding binding affinity. Our method used as the input the results from seven docking programs (Surflex, LigandFit, Glide, GOLD, FlexX, eHiTS, and AutoDock) that are widely used for docking of ligands. We evaluated it on the extensive benchmark dataset of 1300 protein-ligands pairs from refined PDBbind database for which the structural and affinity data was available. We compared independently its ability of proper scoring and posing to the previously proposed methods. In most cases, our method is able to dock properly approximately 20% of pairs more than docking methods on average, and over 10% of pairs more than the best single program. The RMSD value of the predicted complex conformation versus its native one is reduced by a factor of 0.5 Å. Finally, we were able to increase the Pearson correlation of the predicted binding affinity in comparison with the experimental value up to 0.5.
[Show abstract][Hide abstract] ABSTRACT: Abstract Molecular docking is a widely used method for lead optimization. However, docking tools often fail to predict how a ligand (the smaller molecule, such as a substrate or drug candidate) binds to a receptor (the accepting part of a protein). We present here the HarmonyDOCK, a novel method for assessing the docking software accuracy, and creating the scoring function which would determine consensus protein-ligand pose among those generated by available docking programs. Conformations for few hundred protein-ligand complexes with known three-dimensional structure were predicted on a benchmark set by set of different docking programs. On the basis of the derived ranking, the point of reference and the lower score limit were determined for subsequent investigations. The focus of the methodology is on the top-ranked poses, with the assumption being that the conformation of the docked molecules is the most accurate. We found out that some docking programs perform considerably better than the others, yet in all cases the proper selection of decoys, namely HarmonyDOCK, is needed for successful docking procedure.
Journal of computational biology: a journal of computational molecular cell biology 11/2010; 21(3). DOI:10.1089/cmb.2009.0111 · 1.74 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We present here the random forest supervised machine learning algorithm applied to flexible docking results from five typical virtual high throughput screening (HTS) studies. Our approach is aimed at: i) reducing the number of compounds to be tested experimentally against the given protein target and ii) extending results of flexible docking experiments performed only on a subset of a chemical library in order to select promising inhibitors from the whole dataset. The random forest (RF) method is applied and tested here on compounds from the MDL drug data report (MDDR). The recall values for selected five diverse protein targets are over 90% and the performance reaches 100%. This machine learning method combined with flexible docking is capable to find 60% of the active compounds for most protein targets by docking only 10% of screened ligands. Therefore our in silico approach is able to scan very large databases rapidly in order to predict biological activity of small molecule inhibitors and provides an effective alternative for more computationally demanding methods in virtual HTS.
[Show abstract][Hide abstract] ABSTRACT: The 'omics' revolution is causing a flurry of data that all needs to be annotated for it to become useful. Sequences of proteins of unknown function can be annotated with a putative function by comparing them with proteins of known function. This form of annotation is typically performed with BLAST or similar software. Structural genomics is nowadays also bringing us three dimensional structures of proteins with unknown function. We present here software that can be used when sequence comparisons fail to determine the function of a protein with known structure but unknown function. The software, called 3D-Fun, is implemented as a server that runs at several European institutes and is freely available for everybody at all these sites. The 3D-Fun servers accept protein coordinates in the standard PDB format and compare them with all known protein structures by 3D structural superposition using the 3D-Hit software. If structural hits are found with proteins with known function, these are listed together with their function and some vital comparison statistics. This is conceptually very similar in 3D to what BLAST does in 1D. Additionally, the superposition results are displayed using interactive graphics facilities. Currently, the 3D-Fun system only predicts enzyme function but an expanded version with Gene Ontology predictions will be available soon. The server can be accessed at http://3dfun.bioinfo.pl/ or at http://3dfun.cmbi.ru.nl/.
Nucleic Acids Research 08/2008; 36(Web Server issue):W303-7. DOI:10.1093/nar/gkn308 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In many cases, at the beginning of a high throughput screening experiment some information about active molecules is already available. Active compounds (such as substrate analogues, natural products and inhibitors of related proteins) are often identified in low throughput validation studies on a biochemical target. Sometimes the additional structural information is also available from crystallographic studies on protein and ligand complexes. In addition, the structural or sequence similarity of various protein targets yields a novel possibility for drug discovery. Co-crystallized compounds from homologous proteins can be used to design leads for a new target without co-crystallized ligands. In this paper we evaluate how far such an approach can be used in a real drug campaign, with severe acute respiratory syndrome (SARS) coronavirus providing an example. Our method is able to construct small molecules as plausible inhibitors solely on the basis of the set of ligands from crystallized complexes of a protein target, and other proteins from its structurally homologous family. The accuracy and sensitivity of the method are estimated here by the subsequent use of an electronic high throughput screening flexible docking algorithm. The best performing ligands are then used for a very restrictive similarity search for potential inhibitors of the SARS protease within the million compounds from the Ligand.Info small molecule meta-database. The selected molecules can be passed on for further experimental validation.
[Show abstract][Hide abstract] ABSTRACT: A structure-based in silico virtual drug discovery procedure was assessed with severe acute respiratory syndrome coronavirus main protease serving as a case study. First, potential compounds were extracted from protein-ligand complexes selected from Protein Data Bank database based on structural similarity to the target protein. Later, the set of compounds was ranked by docking scores using a Electronic High-Throughput Screening flexible docking procedure to select the most promising molecules. The set of best performing compounds was then used for similarity search over the 1 million entries in the Ligand.Info Meta-Database. Selected molecules having close structural relationship to a 2-methyl-2,4-pentanediol may provide candidate lead compounds toward the development of novel allosteric severe acute respiratory syndrome protease inhibitors.
Chemical Biology & Drug Design 05/2007; 69(4):269-79. DOI:10.1111/j.1747-0285.2007.00475.x · 2.49 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In many cases at the beginning of an HTS-campaign, some information about active molecules is already available. Often known active compounds (such as substrate analogues, natural products, inhibitors of a related protein or ligands published by a pharmaceutical company) are identified in low-throughput validation studies of the biochemical target. In this study we evaluate the effectiveness of a support vector machine applied for those compounds and used to classify a collection with unknown activity. This approach was aimed at reducing the number of compounds to be tested against the given target. Our method predicts the biological activity of chemical compounds based on only the atom pairs (AP) two dimensional topological descriptors. The supervised support vector machine (SVM) method herein is trained on compounds from the MDL drug data report (MDDR) known to be active for specific protein target. For detailed analysis, five different biological targets were selected including cyclooxygenase-2, dihydrofolate reductase, thrombin, HIV-reverse transcriptase and antagonists of the estrogen receptor. The accuracy of compound identification was estimated using the recall and precision values. The sensitivities for all protein targets exceeded 80% and the classification performance reached 100% for selected targets. In another application of the method, we addressed the absence of an initial set of active compounds for a selected protein target at the beginning of an HTS-campaign. In such a case, virtual high-throughput screening (vHTS) is usually applied by using a flexible docking procedure. However, the vHTS experiment typically contains a large percentage of false positives that should be verified by costly and time-consuming experimental follow-up assays. The subsequent use of our machine learning method was found to improve the speed (since the docking procedure was not required for all compounds from the database) and also the accuracy of the HTS hit lists (the enrichment factor).
[Show abstract][Hide abstract] ABSTRACT: The modeling of the severe acute respiratory syndrome coronavirus helicase ATPase catalytic domain was performed using the protein structure prediction Meta Server and the 3D Jury method for model selection, which resulted in the identification of 1JPR, 1UAA and 1W36 PDB structures as suitable templates for creating a full atom 3D model. This model was further utilized to design small molecules that are expected to block an ATPase catalytic pocket thus inhibit the enzymatic activity. Binding sites for various functional groups were identified in a series of molecular dynamics calculation. Their positions in the catalytic pocket were used as constraints in the Cambridge structural database search for molecules having the pharmacophores that interacted most strongly with the enzyme in a desired position. The subsequent MD simulations followed by calculations of binding energies of the designed molecules were compared to ATP identifying the most successful candidates, for likely inhibitors - molecules possessing two phosphonic acid moieties at distal ends of the molecule.
[Show abstract][Hide abstract] ABSTRACT: The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity.
Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes.
We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file.
http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.
[Show abstract][Hide abstract] ABSTRACT: Ligand.Info is a compilation of various publicly available databases of small molecules. The total size of the Meta-Database is over 1 million entries. The compound records contain calculated three-dimensional coordinates and sometimes information about biological activity. Some molecules have information about FDA drug approving status or about anti-HIV activity. Meta-Database can be downloaded from the http://Ligand.Info web page. The database can also be screened using a Java-based tool. The tool can interactively cluster sets of molecules on the user side and automatically download similar molecules from the server. The application requires the Java Runtime Environment 1.4 or higher, which can be automatically downloaded from Sun Microsystems or Apple Computer and installed during the first use of Ligand.Info on desktop systems, which support Java (Ms Windows, Mac OS, Solaris, and Linux). The Ligand.Info Meta-Database can be used for virtual high-throughput screening of new potential drugs. Presented examples showed that using a known antiviral drug as query the system was able to find others antiviral drugs and inhibitors.
[Show abstract][Hide abstract] ABSTRACT: Cytokinins are plant hormones involved in the essential processes of plant growth and development. They bind with receptors known as CRE1/WOL/AHK4, AHK2, and AHK3, which possess histidine kinase activity. Recently, the sensor domain cyclases/histidine kinases associated sensory extracellular (CHASE) was identified in those proteins but little is known about its structure and interaction with ligands. Distant homology detection methods developed in our laboratory and molecular phylogeny enabled the prediction of the structure of the CHASE domain as similar to the photoactive yellow protein-like sensor domain. We have identified the active site pocket and amino acids that are involved in receptor-ligand interactions. We also show that fold evolution of cytokinin receptors is very important for a full understanding of the signal transduction mechanism in plants.
[Show abstract][Hide abstract] ABSTRACT: Meta-BASIC (http://basic.bioinfo.pl) is a novel sensitive approach for recognition of distant similarity between proteins based on consensus alignments of meta profiles. Specifically, Meta-BASIC compares sequence profiles combined with predicted secondary structure by utilizing several scoring systems and alignment algorithms. In our benchmarking tests, Meta-BASIC outperforms many individual servers, including fold recognition servers, and it can compete with meta predictors that base their strength on the structural comparison of models. In addition, Meta-BASIC, which enables detection of very distant relationships even if the tertiary structure for the reference protein is not known, has a high-throughput capability. This new method is applied to 860 PfamA protein families with unknown function (DUF) and provides many novel structure-functional assignments available on-line at http://basic.bioinfo.pl/duf.pl. Detailed discussion is provided for two of the most interesting assignments. DUF271 and DUF431 are predicted to be a nucleotide-diphospho-sugar transferase and an alpha/beta-knot SAM-dependent RNA methyltransferase, respectively.
Nucleic Acids Research 08/2004; 32(Web Server issue):W576-81. DOI:10.1093/nar/gkh370 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We present here a simple method for fast and accurate comparison of proteins using their structures. The algorithm is based on structural alignment of segments of Calpha chains (with size of 99 or 199 residues). The method is optimized in terms of speed and accuracy. We test it on 97 representative proteins with the similarity measure based on the SCOP classification. We compare our algorithm with the LGscore2 automatic method. Our method has the same accuracy as the LGscore2 algorithm with much faster processing of the whole test set, which is promising. A second test is done using the ToolShop structure prediction evaluation program and shows that our tool is on average slightly less sensitive than the DALI server. Both algorithms give a similar number of correct models, however, the final alignment quality is better in the case of DALI. Our method was implemented under the name 3D-Hit as a web server at http://3dhit.bioinfo.pl/ free for academic use, with a weekly updated database containing a set of 5000 structures from the Protein Data Bank with non-homologous sequences.