Alain B TchagangNational Research Council Canada | NRC · Institute for Information Technology (IIT)
Alain B Tchagang
Ph.D. Biomedical Engineering
About
90
Publications
9,922
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
531
Citations
Introduction
Alain B Tchagang is a senior research scientist at the Digital Technologies Research Center (DTRC) of the National Research Council, Canada.
His current focus in on Machine Learning, Artificial Intelligence, Signal Processing, Bioinformatics, and Computational Physics.
Publications
Publications (90)
In modern materials discovery, materials are now efficiently screened using machine learning (ML) techniques with target-specific properties for meeting various engineering applications. However, a major challenge that persists with deep generative ML approach is the issue related to lattice reconstruction at the decoding phase, leading to the gene...
Background
Drug discovery and development is the extremely costly and time-consuming process of identifying new molecules that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distribution, metabolism, exc...
Background: Drug discovery is a time-consuming and expensive process. Artificial intelligence (AI) methodologies have been adopted to cut costs and speed up the drug development process, serving as promising in silico approaches to efficiently design novel drug candidates targeting various health conditions. Most existing AI-driven drug discovery s...
Background: Drug discovery and development is an extremely costly and time-consuming processing of identifying new molecules as therapeutics that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distributi...
Ni‐CeO 2 nanoparticles (NPs) are promising nanocatalysts for water splitting and water gas shift reactions due to the ability of ceria to temporarily donate oxygen to the catalytic reaction and accept oxygen after the reaction is completed. Therefore, elucidating how different properties of the Ni‐Ceria NPs relate to the activity and selectivity of...
Reinforcement learning (RL) methods have helped to define the state of the art in the field of modern artificial intelligence, mostly after the breakthrough involving AlphaGo and the discovery of novel algorithms. In this work, we present a RL method, based on Q‐learning, for the structural determination of adsorbate@substrate models in silico, whe...
This paper (i) explores the internal structure of two quantum mechanics datasets (QM7b, QM9), composed of several thousands of organic molecules and described in terms of electronic properties, and (ii) further explores an inverse design approach to molecular design consisting of using machine learning methods to approximate the atomic composition...
Since the form of the exact functional in density functional theory is unknown, we must rely on density functional approximations (DFAs). In the past, very promising results have been reported by combining semi-local DFAs with exact, i.e. Hartree–Fock, exchange. However, the spin-state energy ordering and the predictions of global minima structures...
Machine learning (ML) techniques emerged as viable means for novel materials discovery and target property determination. At the vanguard of discoverable energy materials are perovskite crystalline materials, which are known for their robust design space and multifunctionality. Previous efforts for simulating the discovery of novel perovskites via...
Structural elucidation of chemical compounds is challenging experimentally, and theoretical chemistry methods have added important insight into molecules, nanoparticles, alloys, and materials geometries and properties. However, finding the optimum structures is a bottleneck due to the huge search space, and global search algorithms have been used s...
Drug design and optimization are challenging tasks that call for strategic and efficient exploration of the extremely vast search space. Multiple fragmentation strategies have been proposed in the literature to mitigate the complexity of the molecular search space. From an optimization standpoint, drug design can be considered as a multi-objective...
Finding the optimum material with improved properties for a given application is challenging because data acquisition in materials science and chemistry is time consuming and expensive. Therefore, dealing with small datasets is a reality in chemistry, whether the data are obtained from synthesis or computational experiments. In this work, we propos...
In computational material sciences, Machine Learning (ML) techniques are now competitive alternatives that can be used in determining target properties conventionally resolved by ab initio quantum mechanical simulations or experimental synthesization. The successes realized with ML-based techniques often rely on the quality of the design architectu...
Genetic algorithms (GAs) are stochastic global search methods inspired by biological evolution. They have been used extensively in chemistry and materials science coupled with theoretical methods, ranging from force‐fields to high‐throughput first‐principles methods. The methodology allows an accurate and automated structural determination for mole...
Finding the optimum structures of non-stoichiometric or berthollide materials, such as (1D, 2D, 3D) materials or nanoparticles (0D), is challenging due to the huge chemical/structural search space. Computational methods...
The design of a new therapeutic agent is a time-consuming and expensive process. The rise of machine intelligence provides a grand opportunity of expeditiously discovering novel drug candidates through smart search in the vast molecular structural space. In this paper, we propose a new approach called adversarial deep evolutionary learning (ADEL) t...
Drug discovery is a challenging process with a huge molecular space to be explored and numerous pharmacological properties to be appropriately considered. Among various drug design protocols, fragment-based drug design is an effective way of constraining the search space and better utilizing biologically active compounds. Motivated by fragment-base...
Adsorbate interactions with substrates (e.g. surfaces and nanoparticles) are fundamental for several technologies, such as functional materials, supramolecular chemistry, and solvent interactions. However, modeling these kinds of systems in silico, such as finding the optimum adsorption geometry and energy, is challenging, due to the huge number of...
In this paper, a prototypical deep evolutionary learning (DEL) process is proposed to integrate deep generative model and multi-objective evolutionary computation for molecular design. Our approach enables
(1)
evolutionary operations in the latent space of the generative model, rather than the structural space, to generate promising novel molecul...
Active learning (AL) has been successfully applied in materials science for the global optimization of clusters and defects in materials. Many important chemistry problems require the structural elucidation of molecules as a first step to the mechanistic elucidation of complex heterogeneous catalysis phenomena. Theoretical methods coupled with glob...
In catalysis, an accurate structural elucidation of molecules, atomic clusters, nanoparticles and solid surfaces is required to understand chemical processes. Therefore, an efficient and automatic structure determination for these systems is of great benefit since it requires a global search within huge chemical spaces. In this work, we propose a n...
Machine learning (ML) methods have recently been widely employed to tackle several problems in quantum mechanics and materials science. Their main objective is to develop surrogate models that can be used to bypass the costly Schrodinger equations and their approximations such as the density functional theory. However, most approaches so far have f...
The increased penetration of renewable energy sources (RES) and electric vehicles (EVs) is resulting in significant challenges to the stability, reliability, and resiliency of the electrical grid due to the intermittency nature of RES and uncertainty of charging demands of EVs. There is a potential for significant economic returns to use vehicle-to...
The study of materials in the nanoscale regime has important applications for catalytic reactions, the energy industry and medicine. We performed exploratory density functional theory calculations for molybdenum disulphide (MoS2) and calcium carbonate (CaCO3) nanoparticles (NPs), the former being developed as a hydrogenation and coke-prevention cat...
In this paper, we propose a deep evolutionary learning (DEL) process that integrates fragment-based deep generative model and multi-objective evolutionary computation for molecular design. Our approach enables (1) evolutionary operations in the latent space of the generative model, rather than the structural space, to generate novel promising molec...
The availability of huge molecular databases generated by first principles quantum mechanics (QM) computations opens new venues for data science to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with machine learning (ML) known as QM-ML models have been successful in delivering the accuracy of QM at the spee...
The energy stored in electric vehicles (EVs) would be made available to commercial buildings to actively manage energy consumption and costs in the near future. These concepts known as vehicle-to-building (V2B) and vehicle-to-grid (V2G) technologies have the potential to provide storage capacity to benefit both EV and building owners respectively,...
Exact calculation of electronic properties of molecules is a fundamental step for intelligent and rational compounds and materials design. The intrinsically graph-like and non-vectorial nature of molecular data generates a unique and challenging machine learning problem. In this paper we embrace a learning from scratch approach where the quantum me...
The availability of BIG molecular databases derived from quantum mechanics computations represent an opportunity for computational intelligence practitioners to develop new tools with same accuracy but much lower computational complexity compared to the costly Schrödinger equation. In this study, unsupervised and supervised learning methods are app...
Exact calculation of electronic properties of molecules is a fundamental step for intelligent and rational compounds and materials design. The intrinsically graph-like and non-vectorial nature of molecular data generates a unique and challenging machine learning problem. In this paper we embrace a learning from scratch approach where the quantum me...
In machine learning and molecular design, there exist two approaches: discriminative and generative. In the discriminative approach dubbed forward design, the goal is to map a set of features/molecules to their respective electronics properties. In the generative approach dubbed inverse design, a set of electronics properties is given and the goal...
High-throughput approximations of quantum mechanics calculations and combinatorial experiments have been traditionally used to reduce the search space of possible molecules, drugs and materials. However, the interplay of structural and chemical degrees of freedom introduces enormous complexity, which the current state-of-the-art tools are not yet d...
Identification of biological significant subspace clusters (biclusters and triclusters) of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high throughput technologies. Several methods and applications of subspace clustering (biclustering and triclustering) in DNA microarray data anal...
Targeted therapy is a treatment that targets the cancer's specific genes, proteins, or the tissue environment that contributes to cancer growth and survival. Identification of therapeutics targets is a very challenging problem in bioinformatics. An integrative and iterative approach for the identification of drug-gene modules (i.e., groups of genes...
Background
Phenotypic studies in Triticeae have shown that low temperature-induced protective mechanisms are developmentally regulated and involve dynamic acclimation processes. Understanding these mechanisms is important for breeding cold-resistant wheat cultivars. In this study, we combined three computational techniques for the analysis of gene...
In the past decades, many high-throughput studies have been performed to investigate molecular mechanisms underlying epithelial ovarian cancer (EOC), to improve treatments and to develop early detection and staging biomarkers. EOC is still a deadly disease due in part to a lack of screening tools and to the absence of subtype and stage-specific tar...
Genome sequencing efforts for the Triticum aestivum genome produce massive amounts of contigs, preliminary assemblies and putative genes/proteins, nevertheless their annotation is still in its infancy. Given the much larger percentage of annotated genes in other previously sequenced plant genomes such as Arabidopsis thaliana and Oryza sativa and th...
Fusarium head blight (FHB) limits wheat yield and compromises grain quality. We investigated differentially expressed genes after FHB challenge. FHB-susceptible and -resistant common wheat (Triticum aestivum) cultivars were challenged with the toxigenic fungus Fusarium graminearum and gene expression was analyzed using 61K Affymetrix wheat microarr...
Download: http://www.biomedcentral.com/content/pdf/s12864-015-1496-2.pdf
While the gargantuan multi-nation effort of sequencing T. aestivum gets close to completion, the annotation process for the vast number of wheat genes and proteins is in its infancy. Previous experimental studies carried out on model plant organisms such as A. thaliana and O....
Identification of biological significant subspace clusters (biclusters and triclusters) of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high throughput technologies. Several methods and applications of subspace clustering (biclustering and triclustering) in DNA microarray data anal...
Understanding the relationships between transcription factors (TFs) and genes in plants under abiotic stress responses, tolerance and adaptation to adverse environments is very important in developing resilient crop varieties. While experimental methods to characterize stress responsive TFs and their targets are highly accurate, identification and...
Clustering data analysis that is robust to noise and is able to extract the most reliable information from sequential data comprises the ranking all of the measurement values across a third dimension of a 3D dataset in a selected one of an increasing order or a decreasing order and producing a three dimensional array of ranked values therefrom. It...
In this chapter, different methods and applications of biclustering algorithms to DNA microarray data analysis that have been developed in recent years are discussed and compared. Identification of biological significant clusters of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high...
Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden...
Gene Ontology analysis of whole seed Brassica napus clusters GO analysis of the 11 clusters in whole seed development Brassica napus.
Statistic of differences between inner and outer cotyledons The x-axis corresponds to the combination of time points, the y-axis the number of genes.
Expression profile of NPR1 in different samples The axis corresponds to the time point experiments, the y-axis the expression level in Log2. Each curve corresponds to a sample.
In this chapter, different methods and applications of biclustering algorithms to DNA microarray data analysis that have been developed in recent years are discussed and compared. Identification of biological significant clusters of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high...
Vast amount of data in various forms have been accumulated through many years of functional genomic research throughout the
world. It is a challenge to discover and disseminate knowledge hidden in these data. Many computational methods have been
developed to solve this problem. Taking analysis of the microarray data as an example, we spent the past...
GOAL jar file and GOAL user manual (GOAL-1.0.zip).
GOAL (GO biological process and KEGG pathway association analysis) and GOSt [19] detail results of six human genes: IFNA4 IL12B IL2RB STAT1 STAT2 IRF9. They were provided to us by one of the reviewers.
Quick video tutorial of GOAL (GOAL_Quick_Video_Tutorial.zip) (HTML, JavaScript & Shockwave Flash files). Require web browser to visualize. This tutorial is also accessible via the GOAL project website at the following URL: http://bioinfo.iit.nrc.ca/GOAL/demo/GOAL-Demo.htm
Modern high throughput experimental techniques such as DNA microarrays often result in large lists of genes. Computational biology tools such as clustering are then used to group together genes based on their similarity in expression profiles. Genes in each group are probably functionally related. The functional relevance among the genes in each gr...
We studied defense mechanism of the Arabidopsis thaliana subjected to Salicylic Acid (SA) treatment for 0, 1, and 8 hours using a broader application of the frequent itemset approach. Four genotypes of the plant were used in this study, Columbia wild type, mutant npr1-3, double mutant tga1 tga4 and triple mutant tga2 tga5 tga6. We defined the major...
Time series gene expression data analysis is used widely to study the dynamics of various cell processes. Most of the time series data available today consist of few time points only, thus making the application of standard clustering techniques difficult.
We developed two new algorithms that are capable of extracting biological patterns from short...
Supplementary materials. The following additional data are available with the online version of this paper. Additional file 1 contains the pseudo codes for ASTRO and MiMeSR.
The increasing demand for canola (Brassica napus) for both food (e.g. vegetable oil) and non-food (e.g. biofuel) applications presents significant socio-economic benefits. While genetic engineering offers great potential to speed up the process of canola improvement, such an effort relies on a good understanding of the molecular mechanisms underlyi...
In this paper, we propose a new framework for assessing the biological significance of the outputs of any biclustering algorithm. The framework relies on the p-value computed by a Fisher's exact test on a 2x2 contingency table derived from gene ontology (GO) enrichment level and chromatin immunoprecipitation (ChIP) data enrichment level. We illustr...
One reason that ovarian cancer is such a deadly disease is because it is not usually diagnosed until it has reached an advanced stage. In this study, we developed a novel algorithm for group biomarkers identification using gene expression data. Group biomarkers consist of coregulated genes across normal and different stage diseased tissues. Unlike...
Show by Cheng and Church to be an NP-complex problem, biclustering algorithms are more complex than the classical one dimensional clustering technique, particularly requiring multiple computing platforms for large and distributed datasets. In this study, we proposed and extension of the robust biclustering algorithm (RoBA) that is capable of perfor...
In this paper, we propose group-biomarkers as an alternative to the traditional single biomarkers used to date for the detection of ovarian cancer. Group-biomarkers are a set of genes that are used simultaneously for the diagnosis of early-stage and/or recurrent cancer. We describe a procedure for identifying such group-biomarkers from a data set o...
Convergences and divergences among related organisms (S.cerevisiae and C.albicans for example) or same organisms (healthy and disease tissues for example) can often be traced to the differential expression of specific group of genes. Yet, algorithms to characterize such differences and similarities using gene expression data are not well developed....
In this paper, we describe an approach for finding all order preserving genes biclusters from a set of DNA microarray experimental data that combines the algorithm that finds biclusters with constant values on columns developed in one of our previous study with an adaptive gene expression level quantization procedure. All the biclusters discovered...
Uncovering genetic pathways is equivalent to finding clusters of genes with expression levels that evolve coherently under subsets of conditions. This can be done by applying a biclustering procedure to gene expression data. Given a microarray data set with M genes and N conditions, we define a bicluster with coherent evolution as a subset of genes...
Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray exp...
Finding clusters of genes with expression levels that evolve coherently under subsets of conditions can help uncover genetic pathways. This can be done by applying a biclustering procedure to gene expression data. Given a microarray data set with M genes and N conditions, we define a bicluster with coherent evolution as a subset of genes with expre...
The NIH/NCI estimates that one out of 57 women will develop ovarian cancer during their lifetime. Ovarian cancer is 90 percent curable when detected early. Unfortunately, many cases of ovarian cancer are not diagnosed until advanced stages because most women do not develop noticeable symptoms. This paper presents an exhaustive identification of all...