Muthukumarasamy Karthikeyan
Research interests
-
Interestschemoinformatics bioinformatics
Publications
-
3.88Impact points
Distributed chemical computing using ChemStar: an open source java remote method invocation architecture applied to large scale molecular data from PubChem.
Journal of chemical information and modeling. 05/2008; 48(4):691-703.
We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (... [more] We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (API; ChemXtreme). Due to its open source character and its flexibility, the underlying server/client framework can be quickly adopted to virtually every computational task that can be parallelized. Here, we present the server/client communication framework as well as an application to distributed computing of chemical properties on a large scale (currently the size of PubChem; about 18 million compounds), using both the Marvin toolkit as well as the open source JOELib package. As an application, for this set of compounds, the agreement of log P and TPSA between the packages was compared. Outliers were found to be mostly non-druglike compounds and differences could usually be explained by differences in the underlying algorithms. ChemStar is the first open source distributed chemical computing environment built on Java RMI, which is also easily adaptable to user demands due to its "plug-in architecture". The complete source codes as well as calculated properties along with links to PubChem resources are available on the Internet via a graphical user interface at http://moltable.ncl.res.in/chemstar/.
-
Photoinduced Electron Transfer (Pet) Promoted Carboannulation Strategy: Arene Radical Cation In Carbon-Carbon Bond Formation Reaction (1998)
Photoinduced Electron Transfer (Pet) Promoted Carboannulation Strategy: Arene Radical Cation In Carbon-Carbon Bond Formation Reaction (1998) carbo annulation Spiro cyclization... [more] Photoinduced Electron Transfer (Pet) Promoted Carboannulation Strategy: Arene Radical Cation In Carbon-Carbon Bond Formation Reaction (1998) carbo annulation Spiro cyclization
-
Encoding and decoding of graphical chemical structure as commercial barcodes
Encoding and decoding of graphical chemical structure as commercial barcodes Encoding and decoding of graphical chemical structure as commercial barcodes Dept. Scientific & Industrial Research... [more] Encoding and decoding of graphical chemical structure as commercial barcodes Encoding and decoding of graphical chemical structure as commercial barcodes Dept. Scientific & Industrial Research
-
Encoding and Decoding of chemical structures as Commercial Barcodes
Most chemical or pharmaceutical companies have large in-house chemical structure database with research details of every chemical sample prepared, analyzed and used. With the modernization and availability of automatic instruments it is possible to create large number of chemical samples especially ... [more] Most chemical or pharmaceutical companies have large in-house chemical structure database with research details of every chemical sample prepared, analyzed and used. With the modernization and availability of automatic instruments it is possible to create large number of chemical samples especially in combinatorial chemistry and high throughput screening, the inventories are quite large. There was a need to develop tools to handle chemical information in automatic manner. In this project completion report, the methodology, technical details and applications of encoding and decoding chemical structures as commercial barcodes is described in details. This report highlight the milestone achieved in implementation of encoding strategy for barcoding chemical structures for inventory and other research applications. In the first phase commercial barcode standards both Code 39 and Code 128 were used for encoding. Advanced 2D barcoding technique PDF417 with security features to accommodate complicated molecular structures is also implemented. In the following sections complete details of program objectives, concept, implementation, testing and applications are presented along with references where ever applicable. In recent times barcodes become part of both commercial and industrial sectors. Bar codes have been used in a wide variety of applications as a source for information. Typically bar codes are used at a point-of-sale terminal in merchandising for pricing and inventory control. Bar codes are also used in controlling personnel access systems, mailing systems, and in manufacturing for work-in process and inventory control systems, etc. Barcodes are also used for security purposes such as military applications, and very recently for issuing international visa forms used in travel1. The bar codes themselves represent numeric / alphanumeric characters by series of adjacent stripes of various widths, i.e. the universal product code. Considering the advantages of well-established commercial barcodes and challenges in chemical information for retrieving specific data efficiently, it was proposed to integrate both automation technologies with chemical informatics to enhance research activities in chemistry and related areas. The final objective of this project is to develop a tool to encode chemical structures as commercial barcode and retrieve chemical structures from large chemical structure databases through barcode input instead of conventional GUI (Graphical User Interface) based structure input as shown in the following figure (Figure –1). Figure-1: Retrieval of chemical structure through barcode scanning Representation of graphical chemical in computer readable format is a challenging task as the pictorial image which is easily understood by a chemist should be read by computer for interpretation and further computing purposes. Unlike raster graphics chemical structure drawing programs are specially designed to understand graphical lines (edges) as chemical bonds and points of connectivity (vertices) as atoms with chemical significance along with their inter connectivity. Chemical structures are generally created by GUI (Graphical User Interface), drawing templates are predefined as icons and can be selected for drawing. However the program internally understand the user’s input with chemical significance. Details of chemical structure representation and their utility is described in the subsequent sections. In this project various commercial barcode formats were investigated for their compatibility to encode chemical structures. Additionally, it is necessary to read and interpret the barcodes generated easily by simple barcode readers. Among various commercial barcodes, both Code39 and Code128 linear barcodes were investigated for this purpose. Advanced 2D barcode PDF417 is also explored for encoding large and complicated chemical structures. Department of Scientific and Industrial Research (DSIR) NISSAT, New Delhi
-
3.88Impact points
Harvesting chemical information from the Internet using a distributed approach: ChemXtreme.
Journal of chemical information and modeling. 46(2):452-61.
The Internet is a comprehensive resource of chemical information which is at the same time largely unstructured. It provides a wealth of scientific information such as experimental data and requires a suitable automated data mining and analysis tool for its meaningful exploration. The Java based sof... [more] The Internet is a comprehensive resource of chemical information which is at the same time largely unstructured. It provides a wealth of scientific information such as experimental data and requires a suitable automated data mining and analysis tool for its meaningful exploration. The Java based software presented here, ChemXtreme, is developed for harvesting chemical information from the Internet employing the Google API in combination with a distributed client/server text analysis architecture based on JavaRMI. It represents the first and until now the only toolkit for automated structured data retrieval from the Internet which is itself open source. ChemXtreme employs the "search the search engine" strategy, where the URLs returned from the search engine are analyzed further via textual pattern analysis. This process resembles the manual analysis of the hit list, where relevant data are captured and, by means of human intervention, are mined into a format suitable for further analysis. ChemXtreme on the other hand transforms chemical information automatically into a structured format suitable for storage in databases and further analysis and also provides links to the original information source. The query data retrieved from the search engine by the server is encoded, encrypted, and compressed and then sent to all the participating active clients in the network for parsing. Relevant information identified by the clients on the retrieved Web sites is sent back to the server, verified, and added to the database for data mining and further analysis. The distributed further analysis of URLs in a client/server architecture scales very favorably, thus producing only minimal overhead.
-
Text-based chemical information locator from the internet using commercial barcodes
Abstracts of Papers of the American Chemical Society, v.223, 72-CINF (2002).
-
3.88Impact points
General melting point prediction based on a diverse compound data set and artificial neural networks.
Journal of chemical information and modeling. 45(3):581-90.
We report the development of a robust and general model for the prediction of melting points. It is based on a diverse data set of 4173 compounds and employs a large number of 2D and 3D descriptors to capture molecular physicochemical and other graph-based properties. Dimensionality reduction is per... [more] We report the development of a robust and general model for the prediction of melting points. It is based on a diverse data set of 4173 compounds and employs a large number of 2D and 3D descriptors to capture molecular physicochemical and other graph-based properties. Dimensionality reduction is performed by principal component analysis, while a fully connected feed-forward back-propagation artificial neural network is employed for model generation. The melting point is a fundamental physicochemical property of a molecule that is controlled by both single-molecule properties and intermolecular interactions due to packing in the solid state. Thus, it is difficult to predict, and previously only melting point models for clearly defined and smaller compound sets have been developed. Here we derive the first general model that covers a comparatively large and relevant part of organic chemical space. The final model is based on 2D descriptors, which are found to contain more relevant information than the 3D descriptors calculated. Internal random validation of the model achieves a correlation coefficient of R(2) = 0.661 with an average absolute error of 37.6 degrees C. The model is internally consistent with a correlation coefficient of the test set of Q(2) = 0.658 (average absolute error 38.2 degrees C) and a correlation coefficient of the internal validation set of Q(2) = 0.645 (average absolute error 39.8 degrees C). Additional validation was performed on an external drug data set consisting of 277 compounds. On this external data set a correlation coefficient of Q(2) = 0.662 (average absolute error 32.6 degrees C) was achieved, showing ability of the model to generalize. Compared to an earlier model for the prediction of melting points of druglike compounds our model exhibits slightly improved performance, despite the much larger chemical space covered. The remaining model error is due to molecular properties that are not captured using single-molecule based descriptors, namely both inter- and intramolecular interactions and crystal packing, for which examples of and reasons for outliers are given.
-
3.88Impact points
Encoding and decoding graphical chemical structures as two-dimensional (PDF417) barcodes.
Journal of chemical information and modeling. 45(3):572-80.
A wide range of molecular representations exist today, ranging from human-readable structural diagrams over line notations such as Wiswesser Line Notation (WLN) and SMILES to several dozen computer-readable file formats. Still, to encode molecular structures in a computer-readable way for inputting ... [more] A wide range of molecular representations exist today, ranging from human-readable structural diagrams over line notations such as Wiswesser Line Notation (WLN) and SMILES to several dozen computer-readable file formats. Still, to encode molecular structures in a computer-readable way for inputting structures in computer systems those formats are not the method of choice since they are not easily and faultlessly readable via optical recognition. In the present study a two-dimensional (PDF417) barcode representation of molecular structures in SMILES format is explored that enables the user to read and input molecular structures into computer systems in a fully automated fashion. A Lempel-Ziv-Welch (LZW) based compressed version of SMILES is suggested for cases where the size of the structure exceeds the storage capacity of PDF417 barcodes. Alternatively, the compact ACS format may be employed as a structural representation. The input via barcodes is fast, practically error free due to the 2D barcodes used which employ error correction and fully automatic. A Web application interface is developed which is able to interpret these barcodes and export them as optimized 3D chemical structures. Applications of this representation range from keeping automated storage systems to Web-based tracking systems of molecular samples. The National Chemical Laboratory, Pune, employs 2D barcode encoded structures for in-house repository management, where barcodes can also be used for querying the database for similar or substructures of the query structure.
Following (1)
-
Alexander Tropsha
University of North Carolina at Chapel Hill