Jay Vyas

PHD, UCHC, Bioinformatics ; MS Computer Science, Rennsalaer Polytech., BS Mathematics - University of Arizona.

Publications

  • [Show abstract] [Hide abstract]
    ABSTRACT: The problem of formatting data so that it conforms to the required input for scientific data processing tools pervades scientific computing. The CONNecticut Joint University Research Group (CONNJUR) has developed a data translation tool based on a pipeline architecture that partially solves this problem. The CONNJUR Spectrum Translator supports data format translation for experiments that use Nuclear Magnetic Resonance to determine the structure of large protein molecules.
    Computing in Science and Engineering 05/2012; 15(1):76-83. DOI:10.1109/MCSE.2012.60 · 1.25 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org).
    Information Technology: New Generations (ITNG), 2012 Ninth International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many bioinformatic databases and applications focus on a limited domain of knowledge federating links to information in other databases. This segregated data structure likely limits our ability to investigate and understand complex biological systems. To facilitate research, therefore, we have built HIVToolbox, which integrates much of the knowledge about HIV proteins and allows virologists and structural biologists to access sequence, structure, and functional relationships in an intuitive web application. HIV-1 integrase protein was used as a case study to show the utility of this application. We show how data integration facilitates identification of new questions and hypotheses much more rapid and convenient than current approaches using isolated repositories. Several new hypotheses for integrase were created as an example, and we experimentally confirmed a predicted CK2 phosphorylation site. Weblink: [http://hivtoolbox.bio-toolkit.com].
    PLoS ONE 05/2011; 6(5):e20122. DOI:10.1371/journal.pone.0020122 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This poster presents the release of the CONNJUR Spectrum Translator: an open source software tool which converts time-domain NMR data between Varian, Bruker, nmrPipe and RNMRTK formats. It is envisioned that the open source nature of CONNJUR will facilitate the addition of other available file formats.
    52th Experimental Nuclear Magnetic Resonance Conference,; 04/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The CONNecticut Joint University Research (CONNJUR) team is a group of biochemical and software engineering researchers at multiple institutions. The vision of the team is to develop a comprehensive application that integrates a variety of existing analysis tools with workflow and data management to support the process of protein structure determination using Nuclear Magnetic Resonance (NMR). The use of multiple disparate tools and lack of data management, currently the norm in NMR data processing, provides strong motivation for such an integrated environment. This manuscript briefly describes the domain of NMR as used for protein structure determination and explains the formation of the CONNJUR team and its operation in developing the CONNJUR application. The manuscript also describes the evolution of the CONNJUR application through four prototypes and describes the challenges faced while developing the CONNJUR application and how those challenges were met.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: NMR spectroscopists are hindered by the lack of standardization for spectral data among the file formats for various NMR data processing tools. This lack of standardization is cumbersome as researchers must perform their own file conversion in order to switch between processing tools and also restricts the combination of tools employed if no conversion option is available. The CONNJUR Spectrum Translator introduces a new, extensible architecture for spectrum translation and introduces two key algorithmic improvements. This first is translation of NMR spectral data (time and frequency domain) to a single in-memory data model to allow addition of new file formats with two converter modules, a reader and a writer, instead of writing a separate converter to each existing format. Secondly, the use of layout descriptors allows a single fid data translation engine to be used for all formats. For the end user, sophisticated metadata readers allow conversion of the majority of files with minimum user configuration. The open source code is freely available at http://connjur.sourceforge.net for inspection and extension. Electronic supplementary material The online version of this article (doi:10.1007/s10858-011-9497-1) contains supplementary material, which is available to authorized users.
    Journal of Biomolecular NMR 03/2011; 50(1):83-9. DOI:10.1007/s10858-011-9497-1 · 3.31 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: γ-Type small, acid-soluble spore proteins (SASP) are the most abundant proteins in spores of at least some members of the bacterial order Bacillales, yet they remain an enigma from both functional and phylogenetic perspectives. Current work has shown that the γ-type SASP or their coding genes (sspE genes) are present in most spore-forming members of Bacillales, including at least some members of the Paenibacillus genus, although they are apparently absent from Clostridiales species. We have applied a new method of searching for sspE genes, which now appear to also be absent from a clade of Bacillales species that includes Alicyclobacillus acidocaldarius and Bacillus tusciae. In addition, no γ-type SASP were found in A. acidocaldarius spores, although several of the DNA-binding α/β-type SASP were present. These findings have elucidated the phylogenetic origin of the sspE gene, and this may help in determining the precise function of γ-type SASP.
    Journal of bacteriology 02/2011; 193(8):1884-92. DOI:10.1128/JB.00018-11 · 2.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-protein interactions are important to understanding cell functions; however, our theoretical understanding is limited. There is a general discontinuity between the well-accepted physical and chemical forces that drive protein-protein interactions and the large collections of identified protein-protein interactions in various databases. Minimotifs are short functional peptide sequences that provide a basis to bridge this gap in knowledge. However, there is no systematic way to study minimotifs in the context of protein-protein interactions or vice versa. Here we have engineered a set of algorithms that can be used to identify minimotifs in known protein-protein interactions and implemented this for use by scientists in Minimotif Miner. By globally testing these algorithms on verified data and on 100 individual proteins as test cases, we demonstrate the utility of these new computation tools. This tool also can be used to reduce false-positive predictions in the discovery of novel minimotifs. The statistical significance of these algorithms is demonstrated by an ROC analysis (P = 0.001).
    Proteins Structure Function and Bioinformatics 01/2011; 79(1):153-64. DOI:10.1002/prot.22868 · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A major problem patients encounter when reading about health related issues is document interpretation, which limits reading comprehension and therefore negatively impacts health care. Currently, searching for medical definitions from an external source is time consuming, distracting, and negatively impacts reading comprehension and memory of the material. SciReader was built as a Java application with a Flex-based front-end client. The dictionary used by SciReader was built by consolidating data from several sources and generating new definitions with a standardized syntax. The application was evaluated by measuring the percentage of words defined in different documents. A survey was used to test the perceived effect of SciReader on reading time and comprehension. We present SciReader, a web-application that simplifies document interpretation by allowing users to instantaneously view medical, English, and scientific definitions as they read any document. This tool reveals the definitions of any selected word in a small frame at the top of the application. SciReader relies on a dictionary of ~750,000 unique Biomedical and English word definitions. Evaluation of the application shows that it maps ~98% of words in several different types of documents and that most users tested in a survey indicate that the application decreases reading time and increases comprehension. SciReader is a web application useful for reading medical and scientific documents. The program makes jargon-laden content more accessible to patients, educators, health care professionals, and the general public.
    BMC Medical Informatics and Decision Making 01/2011; 11:4. DOI:10.1186/1472-6947-11-4 · 1.50 Impact Factor
  • XXIVth International Conference on Magnetic Resonance in Biological Systems; 08/2010
  • Source
    Progress in Nuclear Magnetic Resonance Spectroscopy 05/2010; 56(4):329-45. DOI:10.1016/j.pnmrs.2010.02.002 · 8.71 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to dynamically rank papers with respect to context.
    BMC Bioinformatics 01/2010; 11:328. DOI:10.1186/1471-2105-11-328 · 2.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Residue conservation is an important, established method for inferring protein function, modularity and specificity. It is important to recognize that it is the 3D spatial orientation of residues that drives sequence conservation. Considering this, we have built a new computational tool, VENN that allows researchers to interactively and graphically titrate sequence homology onto surface representations of protein structures. Our proposed titration strategies reveal critical details that are not readily identified using other existing tools. Analyses of a bZIP transcription factor and receptor recognition of Fibroblast Growth Factor using VENN revealed key specificity determinants. Weblink: http://sbtools.uchc.edu/venn/.
    Nucleic Acids Research 09/2009; 37(18):e124. DOI:10.1093/nar/gkp616 · 8.81 Impact Factor
    This article is viewable in ResearchGate's enriched format
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the most important developments in bioinformatics over the past few decades has been the observation that short linear peptide sequences (minimotifs) mediate many classes of cellular functions such as protein-protein interactions, molecular trafficking and post-translational modifications. As both the creators and curators of a database which catalogues minimotifs, Minimotif Miner, the authors have a unique perspective on the commonalities of the many functional roles of minimotifs. There is an obvious usefulness in standardizing functional annotations both in allowing for the facile exchange of data between various bioinformatics resources, as well as the internal clustering of sets of related data elements. With these two purposes in mind, the authors provide a proposed syntax for minimotif semantics primarily useful for functional annotation. Herein, we present a structured syntax of minimotifs and their functional annotation. A syntax-based model of minimotif function with established minimotif sequence definitions was implemented using a relational database management system (RDBMS). To assess the usefulness of our standardized semantics, a series of database queries and stored procedures were used to classify SH3 domain binding minimotifs into 10 groups spanning 700 unique binding sequences. Our derived minimotif syntax is currently being used to normalize minimotif covalent chemistry and functional definitions within the MnM database. Analysis of SH3 binding minimotif data spanning many different studies within our database reveals unique attributes and frequencies which can be used to classify different types of binding minimotifs. Implementation of the syntax in the relational database enables the application of many different analysis protocols of minimotif data and is an important tool that will help to better understand specificity of minimotif-driven molecular interactions with proteins.
    BMC Genomics 09/2009; 10:360. DOI:10.1186/1471-2164-10-360 · 4.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Minimotif Miner (MnM) consists of a minimotif database and a web-based application that enables prediction of motif-based functions in user-supplied protein queries. We have revised MnM by expanding the database more than 10-fold to approximately 5000 motifs and standardized the motif function definitions. The web-application user interface has been redeveloped with new features including improved navigation, screencast-driven help, support for alias names and expanded SNP analysis. A sample analysis of prion shows how MnM 2 can be used. Weblink: http://mnm.engr.uconn.edu, weblink for version 1 is http://sms.engr.uconn.edu.
    Nucleic Acids Research 11/2008; 37(Database issue):D185-90. DOI:10.1093/nar/gkn865 · 8.81 Impact Factor
    This article is viewable in ResearchGate's enriched format
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Short functional peptide motifs cooperate in many molecular functions including protein interactions, protein trafficking, and posttranslational modifications. Viruses exploit these motifs as a principal mechanism for hijacking cells and many motifs are necessary for the viral life-cycle. A virus can accommodate many short motifs in its small genome size providing a plethora of ways for the virus to acquire host molecular machinery. Host enzymes that act on motifs such as kinases, proteases, and lipidation enzymes, as well as protein interaction domains, are commonly mutated in human disease, suggesting that the short peptide motif targets of these enzymes may also be mutated in disease; however, this is not observed. How can we explain why viruses have evolved to be so dependent on motifs, yet these motifs, in general do not seem to be as necessary for human viability? We propose that short motifs are used at the system level. This system architecture allows viruses to exploit a motif, whereas the viability of the host is not affected by mutation of a single motif.
    Frontiers in Bioscience 02/2008; 13:6455-71. DOI:10.2741/3166 · 4.25 Impact Factor

16 Following View all

28 Followers View all