Ovidiu Ivanciuc

University of Texas Medical Branch at Galveston, Galveston, TX, United States

Are you Ovidiu Ivanciuc?

Claim your profile

Publications (90)112.51 Total impact

  • Ovidiu Ivanciuc
    [Show abstract] [Hide abstract]
    ABSTRACT: Chemical and molecular graphs have fundamental applications in chemoinformatics, quantitative structure-property relationships (QSPR), quantitative structure-activity relationships (QSAR), virtual screening of chemical libraries, and computational drug design. Chemoinformatics applications of graphs include chemical structure representation and coding, database search and retrieval, and physicochemical property prediction. QSPR, QSAR and virtual screening are based on the structure-property principle, which states that the physicochemical and biological properties of chemical compounds can be predicted from their chemical structure. Such structure-property correlations are usually developed from topological indices and fingerprints computed from the molecular graph and from molecular descriptors computed from the three-dimensional chemical structure. We present here a selection of the most important graph descriptors and topological indices, including molecular matrices, graph spectra, spectral moments, graph polynomials, and vertex topological indices. These graph descriptors are used to define several topological indices based on molecular connectivity, graph distance, reciprocal distance, distance-degree, distance-valency, spectra, polynomials, and information theory concepts. The molecular descriptors and topological indices can be developed with a more general approach, based on molecular graph operators, which define a family of graph indices related by a common formula. Graph descriptors and topological indices for molecules containing heteroatoms and multiple bonds are computed with weighting schemes based on atomic properties, such as the atomic number, covalent radius, or electronegativity. The correlation in QSPR and QSAR models can be improved by optimizing some parameters in the formula of topological indices, as demonstrated for structural descriptors based on atomic connectivity and graph distance.
    Current Computer - Aided Drug Design 05/2013; · 1.54 Impact Factor
  • Ovidiu Ivanciuc
    Current Computer - Aided Drug Design 05/2013; · 1.54 Impact Factor
  • Ovidiu Ivanciuc, Teodora Ivanciuc, Douglas J Klein
    [Show abstract] [Hide abstract]
    ABSTRACT: Usual quantitative structure-activity relationship (QSAR) models are computed from unstructured input data, by using a vector of molecular descriptors for each chemical in the dataset. Another alternative is to consider the structural relationships between the chemical structures, such as molecular similarity, presence of certain substructures, or chemical transformations between compounds. We defined a class of network-QSAR models based on molecular networks induced by a sequence of substitution reactions on a chemical structure that generates a partially ordered set (or poset) oriented graph that may be used to predict various molecular properties with quantitative superstructure-activity relationships (QSSAR). The network-QSAR interpolation models defined on poset graphs, namely average poset, cluster expansion, and spline poset, were tested with success for the prediction of several physicochemical properties for diverse chemicals. We introduce the flow network QSAR, a new poset regression model in which the dataset of chemicals, represented as a reaction poset, is transformed into an oriented network of electrical resistances in which the current flow results in a potential at each node. The molecular property considered in the QSAR model is represented as the electrical potential, and the value of this potential at a particular node is determined by the electrical resistances assigned to each edge and by a system of batteries. Each node with a known value for the molecular property is attached to a battery that sets the potential on that node to the value of the respective molecular property, and no external battery is attached to nodes from the prediction set, representing chemicals for which the values of the molecular property are not known or are intended to be predicted. The flow network QSAR algorithm determines the values of the molecular property for the prediction set of molecules by applying the Ohm's law and Kirchhoff's current law to the poset network of electrical resistances. Several applications of the flow network QSAR are demonstrated.
    Current Computer - Aided Drug Design 05/2013; · 1.54 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Allergenic proteins must cross-link specific IgE molecules, bound to the surface of mast cells and basophils, to stimulate an immune response. A structural understanding of the allergen-IgE interface is needed to predict cross-reactivities between allergens and to design hypoallergenic proteins. However, there are less than 90 experimentally determined structures available for the approximately 1500 sequences of allergens and isoallergens catalogued in the Structural Database of Allergenic Proteins (SDAP). To provide reliable structural data for the remaining proteins, we previously produced over 500 3D-models using an automated procedure, with strict controls at template choice and model quality evaluation. Here we assessed how well the fold and residue surface exposure of 10 of these models correlated with recently published experimental 3D structures determined by X-ray crystallography or NMR. We also discuss the impact of intrinsically disordered regions on the structural comparison and epitope prediction. Overall, for seven allergens with sequence identities to the original templates higher than 27%, the backbone root-mean square deviations were less than 2Å between the models and the subsequently determined experimental structures for ordered regions. Further, the surface exposure of known IgE epitopes on the models of three major allergens, from peanut (Ara h 1), latex (Hev b 2) and soy (Gly m 4) was very similar to the experimentally determined structures. For three remaining allergens with lower sequence identities to the modeling templates, the 3D folds were correctly identified. However the accuracy of those models is not sufficient for a reliable epitope mapping. Proteins 2012. © 2012 Wiley Periodicals, Inc.
    Proteins Structure Function and Bioinformatics 12/2012; · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many concerns have been raised about the potential allergenicity of novel, recombinant proteins into food crops. Guidelines, proposed by WHO/FAO and EFSA, include the use of bioinformatics screening to assess the risk of potential allergenicity or cross-reactivities of all proteins introduced, for example, to improve nutritional value or promote crop resistance. However, there are no universally accepted standards that can be used to encode data on the biology of allergens to facilitate using data from multiple databases in this screening. Therefore, we developed AllerML a markup language for allergens to assist in the automated exchange of information between databases and in the integration of the bioinformatics tools that are used to investigate allergenicity and cross-reactivity. As proof of concept, AllerML was implemented using the Structural Database of Allergenic Proteins (SDAP; http://fermi.utmb.edu/SDAP/) database. General implementation of AllerML will promote automatic flow of validated data that will aid in allergy research and regulatory analysis.
    Regulatory Toxicology and Pharmacology 03/2011; 60(1):151-60. · 2.13 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The complex network induced by a sequence of substitution reactions on a chemical structure generates a partially ordered set (or poset) oriented graph. Such a poset can be used to develop network-QSAR models to predict various molecular properties with quantitative superstructure-activity relationships (QSSARs). These novel network- QSAR models look beyond simple molecular structure and chemical descriptors, and predict molecular properties from the topology of a poset network and from the embedding of a chemical compound into a reaction network. We demonstrate this novel quantitative structure-activity relationship (QSAR) approach for the prediction of chromatographic retention properties of polychlorinated biphenyls (PCBs). PCBs have become worldwide pollutants due to their presence in the environment. Exposure to PCBs can permanently damage the nervous, reproductive, and immune systems. PCBs are known carcinogens and have been linked with the development of various forms of cancer including skin and liver. To predict the chromatographic properties for PCBs we generate the substitution reaction poset, which is a formal chlorosubstitution network which progresses from biphenyl to decachlorobiphenyl. Three network-QSAR models are compared, namely poset-average, splinoid poset, and cluster expansion QSSAR models, to estimate the chromatographic properties in different conditions (of column, temperature, or detector) for all 209 PCB congeners. Excellent results are obtained for all QSSAR chromatographic models. Based on the poset reaction diagram, all these three QSSAR models reflect in distinct ways the topology of the network describing the interconversion of chemical species. QSSAR equations based on poset reaction networks add a supramolecular dimension to QSAR models.
    Current Bioinformatics 02/2011; 6(1):25-34. · 2.02 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
    ChemInform 01/2010; 27(45).
  • [Show abstract] [Hide abstract]
    ABSTRACT: ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
    ChemInform 01/2010; 32(32).
  • O. IVANCIUC, T. IVANCIUC, A. T. BALABAN
    ChemInform 01/2010; 29(45).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent progress in the biochemical classification and structural determination of allergens and allergen-antibody complexes has enhanced our understanding of the molecular determinants of allergenicity. Databases of allergens and their epitopes have facilitated the clustering of allergens according to their sequences and, more recently, their structures. Groups of similar sequences are identified for allergenic proteins from diverse sources, and all allergens are classified into a limited number of protein structural families. A gallery of experimental structures selected from the protein classes with the largest number of allergens demonstrate the structural diversity of the allergen universe. Further comparison of these structures and identification of areas that are different from innocuous proteins within the same protein family can be used to identify features specific to known allergens. Experimental and computational results related to the determination of IgE binding surfaces and methods to define allergen-specific motifs are highlighted.
    Bioinformatics and biology insights 01/2010; 4:113-25.
  • Ovidiu Ivanciuc, Douglas J. Klein
    [Show abstract] [Hide abstract]
    ABSTRACT: ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
    ChemInform 01/2010; 33(17).
  • Ovidiu Ivanciuc
    [Show abstract] [Hide abstract]
    ABSTRACT: Computer-assisted drug design is used to increase the chances of finding valuable drug candidates, by applying a wide range of computational methods, such as machine learning, structure-activity relationships, quantitative structure-activity relationships, molecular mechanics, quantum mechanics, molecular dynamics, and drug-protein docking. Machine learning is an important field of artificial intelligence, and includes a diversity of methods and algorithms that extract rules and functions from large datasets. The most important algorithms are linear discriminant analysis, artificial neural networks, decision trees, lazy learning, k-nearest neighbors, Bayesian methods, Gaussian processes, support vector machines, and kernel algorithms. This special issue presents a representative selection of machine learning applications for the virtual screening of chemical libraries. Machine learning is a rich and dynamic field, with new methods proposed constantly, which makes difficult to estimate the quality of predictions expected from a particular algorithm. Schwaighofer et al. explore the theoretical and practical aspects of estimating the confidence (error bars) of predictions obtained with quantitative structure-activity relationships based on three prevalent nonlinear regression methods, namely support vector regression, Gaussian processes, and decision trees. This practical aspect of estimating biological activities is currently overlooked in many structure-activity models, but the algorithms presented in this paper demonstrate an efficient approach in computing confidence levels for activity predictions. Naive Bayesian classifiers are robust and efficient algorithms for the rapid virtual screening of large compound libraries. Klon presents a substantial and comprehensive review of Bayesian classifiers that are currently used in drug design and discovery. Bayesian models have consistently been shown to be tolerant of noisy training data, often outperforming more elaborated machine learning algorithms, and may provide reliable predictions even when trained with limited amounts of experimental data. Alternatively, Bayesian classifiers have been used as an effective post-processing technique to integrate sets of predictions obtained with other machine learning methods. Ligand-protein docking is an effective approach in selecting promising inhibitors, but its main drawback is the large computation time necessary to screen large chemical libraries. Plewczynski et al. propose a hybrid method in which a fast machine learning algorithm, random forest, is coupled with ligand-protein docking to obtain a virtual screening procedure that demonstrates in practical applications both speed and reliable predictions. The random forest machine learning is trained with predictions obtained from ligand-protein docking and scoring, and thus the virtual screening procedure may be applied even when trained only with limited number of experimental data.
    Combinatorial Chemistry & High Throughput Screening 05/2009; 12(5):451-452. · 2.00 Impact Factor
  • Ovidiu Ivanciuc
    [Show abstract] [Hide abstract]
    ABSTRACT: Computer-assisted drug design is used to increase the chances of finding valuable drug candidates, by applying a wide range of computational methods, such as machine learning, structure-activity relationships, quantitative structure-activity relationships, molecular mechanics, quantum mechanics, molecular dynamics, and drug-protein docking. Machine learning is an important field of artificial intelligence, and includes a diversity of methods and algorithms that extract rules and functions from large datasets. The most important algorithms are linear discriminant analysis, artificial neural networks, decision trees, lazy learning, k-nearest neighbors, Bayesian methods, Gaussian processes, support vector machines, and kernel algorithms. This special issue presents a representative selection of machine learning applications for the virtual screening of chemical libraries. In the opening paper, Melville, Burke and Hirst review recent applications of machine learning techniques in ranking chemical libraries based on their biological activity against a particular protein target. Applications of ligand-based similarity searching and structure-based docking are critically evaluated, with an accent on the major algorithms, such as decision trees, naïve Bayesian classifiers, artificial neural networks, and support vector machines. Chen et al. examine the technical aspects of ligand-based virtual screening, such as available software, molecular descriptors, and performance measures. The procedures reviewed include binary kernel discrimination, k-nearest neighbors, linear discriminant analysis, logistic regression, and probabilistic neural networks. The detailed comparison of various studies is especially valuable in providing an estimate of the level of success that may be expected in virtual screening. The comparison of various machine learning techniques is further explored by Plewczynski, Spieser and Koch in a large-scale evaluation of the screening success. Based on the biological targets explored in the literature, it was found that there is no machine learning approach that consistently provides the best results. Thorough careful tuning of parameters, most chemical libraries may be modeled with existing algorithms. The study found that a promising class of methods is represented by fusion (or ensemble) classifiers, which combine predictions from several models and are thus able to outperform single classifiers.
    Combinatorial Chemistry & High Throughput Screening 04/2009; 12(4):330-331. · 2.00 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In many countries regulatory agencies have adopted safety guidelines, based on bioinformatics rules from the WHO/FAO and EFSA recommendations, to prevent potentially allergenic novel foods or agricultural products from reaching consumers. We created the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to combine data that had previously been available only as flat files on Web pages or in the literature. SDAP was designed to be user friendly, to be of maximum use to regulatory agencies, clinicians, as well as to scientists interested in assessing the potential allergenic risk of a protein. We developed methods, unique to SDAP, to compare the physicochemical properties of discrete areas of allergenic proteins to known IgE epitopes. We developed a new similarity measure, the property distance (PD) value that can be used to detect related segments in allergens with clinical observed cross-reactivity. We have now expanded this work to obtain experimental validation of the PD index as a quantitative predictor of IgE cross-reactivity, by designing peptide variants with predetermined PD scores relative to known IgE epitopes. In complementary work we show how sequence motifs characteristic of allergenic proteins in protein families can be used as fingerprints for allergenicity.
    Regulatory Toxicology and Pharmacology 01/2009; 54(3 Suppl):S11-9. · 2.13 Impact Factor
  • Source
    Ovidiu Ivanciuc
    [Show abstract] [Hide abstract]
    ABSTRACT: Developing machine learning methods to predict peptide-protein binding affinity has become an important ap-proach in proteomics. A diversity of linear and nonlinear machine learning algorithms is applied in quantitative structure– activity relationships (QSAR) to generate predictive models for ligand binding to a biological receptor. QSAR represent regression models that define quantitative correlations between the chemical structure of molecules and their physical, chemical, or biological properties. A QSAR equation predicts a molecular property from a set of molecular descriptors representing the input data to a machine learning algorithm, such as linear regression, partial least squares, artificial neural networks, or support vector machines. Here we present a QSAR comparative study for peptides binding to the human am-phiphysin–1 SH3 domain, based on five machine learning methods, namely partial least squares, radial basis function arti-ficial neural networks, support vector machines, Gaussian processes, k-nearest neighbors, and the decision trees REPTree and M5P, as implemented in the machine learning software Weka. The peptide structure was encoded with five amino acid scales, namely the Miyazawa-Jernigan (MJ) substitution matrix, G. Schneider's principal component (GSPC) scale, Lv's DPPS scale, Clementi's GRID scale, and Wold's z scale. The machine learning models were trained with a dataset of 200 peptides, and the QSAR models were tested for a prediction dataset of 684 peptides. The best predictions were ob-tained with the decision tree M5P for all five amino acid scales, namely z scale q 2 = 0.543, MJ scale q 2 = 0.553, GSPC scale q 2 = 0.557, GRID scale q 2 = 0.558, and DPPS scale q 2 = 0.599. These results show that M5P decision trees give pre-dictive QSAR for peptide-protein binding affinity, and should be considered as valuable candidates for other peptide QSAR. Also, the new DPPS scale has clear advantages compared to the previous amino acid descriptors. The study pro-vides support to QSAR approaches based on a large-scale evaluation of machine learning algorithms and diverse classes of structural descriptors.
    Current Proteomics - CURR PROTEOMICS. 01/2009; 6(4).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Reaction networks are viewed as derived from ordinary molecular structures related in reactant-product pairs so as to manifest a chemical super-structure. Such super-structures then are candidates for applications in a general combinatoric chemistry. Notable additional characterization of a reaction super-structure occurs when such reaction graphs are directed, as for example when there is progressive substitution (or addition) on a fixed molecular skeleton. Such a set of partially ordered entities is in mathematics termed a poset, which further manifests a number of special properties, as then might be utilized in different applications. Focus on the overall "super-structural" poset goes beyond ordinary molecular structure in attending to how a structure fits into a (reaction) network, and thereby brings an extra "dimension" to conventional stereochemical theory. The possibility that different molecular properties vary smoothly along chains of interconnections in such a super-structure is a natural assumption for a novel approach to molecular property and bioactivity correlations. Different manners to interpolate/extrapolate on a poset network yield quantitative super-structure/activity relationships (QSSARs), with some numerical fits, e.g., for properties of polychlorinated biphenyls (PCBs) seemingly being quite reasonable. There seems to be promise for combinatoric posetic ideas.
    Combinatorial Chemistry & High Throughput Screening 12/2008; 11(9):723-33. · 2.00 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Similarities in the sequence and structure of allergens can explain clinically observed cross-reactivities. Distinguishing sequences that bind IgE in patient sera can be used to identify potentially allergenic protein sequences and aid in the design of hypo-allergenic proteins. The property distance index PD, incorporated in our Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/), may identify potentially cross-reactive segments of proteins, based on their similarity to known IgE epitopes. We sought to obtain experimental validation of the PD index as a quantitative predictor of IgE cross-reactivity, by designing peptide variants with predetermined PD scores relative to three linear IgE epitopes of Jun a 1, the dominant allergen from mountain cedar pollen. For each of the three epitopes, 60 peptides were designed with increasing PD values (decreasing physicochemical similarity) to the starting sequence. The peptides synthesized on a derivatized cellulose membrane were probed with sera from patients who were allergic to Jun a 1, and the experimental data were interpreted with a PD classification method. Peptides with low PD values relative to a given epitope were more likely to bind IgE from the sera than were those with PD values larger than 6. Control sequences, with PD values between 18 and 20 to all the three epitopes, did not bind patient IgE, thus validating our procedure for identifying negative control peptides. The PD index is a statistically validated method to detect discrete regions of proteins that have a high probability of cross-reacting with IgE from allergic patients.
    Molecular Immunology 11/2008; 46(5):873-83. · 2.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver MotifMate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins.
    Molecular Immunology 11/2008; 46(4):559-68. · 2.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Similarities in sequences and 3D structures of allergenic proteins provide vital clues to identify clinically relevant immunoglobulin E (IgE) cross-reactivities. However, experimental 3D structures are available in the Protein Data Bank for only 5% (45/829) of all allergens catalogued in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP). Here, an automated procedure was used to prepare 3D-models of all allergens where there was no experimentally determined 3D structure or high identity (95%) to another protein of known 3D structure. After a final selection by quality criteria, 433 reliable 3D models were retained and are available from our SDAP Website. The new 3D models extensively enhance our knowledge of allergen structures. As an example of their use, experimentally derived "continuous IgE epitopes" were mapped on 3 experimentally determined structures and 13 of our 3D-models of allergenic proteins. Large portions of these continuous sequences are not entirely on the surface and therefore cannot interact with IgE or other proteins. Only the surface exposed residues are constituents of "conformational IgE epitopes" which are not in all cases continuous in sequence. The surface exposed parts of the experimental determined continuous IgE epitopes showed a distinct statistical distribution as compared to their presence in typical protein-protein interfaces. The amino acids Ala, Ser, Asn, Gly and particularly Lys have a high propensity to occur in IgE binding sites. The 3D-models will facilitate further analysis of the common properties of IgE binding sites of allergenic proteins.
    Molecular Immunology 08/2008; 45(14):3740-7. · 2.65 Impact Factor
  • Ovidiu Ivanciuc
    Molecular Drug Properties - Measurement and Prediction, 03/2008: pages 85 - 109; , ISBN: 9783527621286

Publication Stats

1k Citations
112.51 Total Impact Points

Institutions

  • 2001–2013
    • University of Texas Medical Branch at Galveston
      • • Department of Biochemistry and Molecular Biology
      • • Sealy Center for Structural Biology and Molecular Biophysics
      Galveston, TX, United States
  • 2010
    • New York Structural Biology Center
      New York City, New York, United States
  • 2000–2010
    • University of Nice-Sophia Antipolis
      Nice, Provence-Alpes-Côte d'Azur, France
    • Texas A&M University - Galveston
      Galveston, Texas, United States
  • 2007
    • Howard University
      Washington, West Virginia, United States
  • 2004
    • Galveston College
      Galveston, Texas, United States
  • 1992–2000
    • Polytechnic University of Bucharest
      • Department of Inorganic Chemistry
      Bucharest, Bucuresti, Romania
    • Ruhr-Universität Bochum
      Bochum, North Rhine-Westphalia, Germany