Topics (10) View all

Questions and Answers (1) View all

Publications (14) View all

  • Article: Distributed chemical computing using ChemStar: an open source java remote method invocation architecture applied to large scale molecular data from PubChem.
    [show abstract] [hide abstract]
    ABSTRACT: We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (API; ChemXtreme). Due to its open source character and its flexibility, the underlying server/client framework can be quickly adopted to virtually every computational task that can be parallelized. Here, we present the server/client communication framework as well as an application to distributed computing of chemical properties on a large scale (currently the size of PubChem; about 18 million compounds), using both the Marvin toolkit as well as the open source JOELib package. As an application, for this set of compounds, the agreement of log P and TPSA between the packages was compared. Outliers were found to be mostly non-druglike compounds and differences could usually be explained by differences in the underlying algorithms. ChemStar is the first open source distributed chemical computing environment built on Java RMI, which is also easily adaptable to user demands due to its "plug-in architecture". The complete source codes as well as calculated properties along with links to PubChem resources are available on the Internet via a graphical user interface at http://moltable.ncl.res.in/chemstar/.
    Journal of Chemical Information and Modeling 05/2008; 48(4):691-703. · 4.68 Impact Factor
  • Article: Distributed Chemical Computing Using ChemStar: An Open Source Java Remote Method Invocation Architecture Applied to Large Scale Molecular Data from PubChem
    M. Karthikeyan, S. Krishnan
    [show abstract] [hide abstract]
    ABSTRACT: We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (API; ChemXtreme). Due to its open source character and its flexibility, the underlying server/client framework can be quickly adopted to virtually every computational task that can be parallelized. Here, we present the server/client communication framework as well as an application to distributed computing of chemical properties on a large scale (currently the size of PubChem; about 18 million compounds), using both the Marvin toolkit as well as the open source JOELib package. As an application, for this set of compounds, the agreement of log P and TPSA between the packages was compared. Outliers were found to be mostly non-druglike compounds and differences could usually be explained by differences in the underlying algorithms. ChemStar is the first open source distributed chemical computing environment built on Java RMI, which is also easily adaptable to user demands due to its “plug-in architecture”. The complete source codes as well as calculated properties along with links to PubChem resources are available on the Internet via a graphical user interface at http://moltable.ncl.res.in/chemstar/.
    04/2008;
  • Article: Harvesting Chemical Information from the Internet Using a Distributed Approach:  ChemXtreme
    M. Karthikeyan, S. Krishnan, Anil Kumar Pandey
    [show abstract] [hide abstract]
    ABSTRACT: The Internet is a comprehensive resource of chemical information which is at the same time largely unstructured. It provides a wealth of scientific information such as experimental data and requires a suitable automated data mining and analysis tool for its meaningful exploration. The Java based software presented here, ChemXtreme, is developed for harvesting chemical information from the Internet employing the Google API in combination with a distributed client/server text analysis architecture based on JavaRMI. It represents the first and until now the only toolkit for automated structured data retrieval from the Internet which is itself open source. ChemXtreme employs the “search the search engine” strategy, where the URLs returned from the search engine are analyzed further via textual pattern analysis. This process resembles the manual analysis of the hit list, where relevant data are captured and, by means of human intervention, are mined into a format suitable for further analysis. ChemXtreme on the other hand transforms chemical information automatically into a structured format suitable for storage in databases and further analysis and also provides links to the original information source. The query data retrieved from the search engine by the server is encoded, encrypted, and compressed and then sent to all the participating active clients in the network for parsing. Relevant information identified by the clients on the retrieved Web sites is sent back to the server, verified, and added to the database for data mining and further analysis. The distributed further analysis of URLs in a client/server architecture scales very favorably, thus producing only minimal overhead.
    02/2006;
  • Article: Encoding and Decoding Graphical Chemical Structures as Two-Dimensional (PDF417) Barcodes
    M. Karthikeyan
    [show abstract] [hide abstract]
    ABSTRACT: A wide range of molecular representations exist today, ranging from human-readable structural diagrams over line notations such as Wiswesser Line Notation (WLN) and SMILES to several dozen computer-readable file formats. Still, to encode molecular structures in a computer-readable way for inputting structures in computer systems those formats are not the method of choice since they are not easily and faultlessly readable via optical recognition. In the present study a two-dimensional (PDF417) barcode representation of molecular structures in SMILES format is explored that enables the user to read and input molecular structures into computer systems in a fully automated fashion. A Lempel-Ziv-Welch (LZW) based compressed version of SMILES is suggested for cases where the size of the structure exceeds the storage capacity of PDF417 barcodes. Alternatively, the compact ACS format may be employed as a structural representation. The input via barcodes is fast, practically error free due to the 2D barcodes used which employ error correction and fully automatic. A Web application interface is developed which is able to interpret these barcodes and export them as optimized 3D chemical structures. Applications of this representation range from keeping automated storage systems to Web-based tracking systems of molecular samples. The National Chemical Laboratory, Pune, employs 2D barcode encoded structures for in-house repository management, where barcodes can also be used for querying the database for similar or substructures of the query structure.
    04/2005;
  • Article: PharmTree 2.1.
    Muthukumarasamy Karthikeyan
    Journal of Chemical Information and Computer Sciences. 01/2003; 43:2194-2195.

Following (1) See all

Followers (39) See all