Achraf El Allali

Achraf El Allali
Mohammed VI Polytechnic University · African Genome Center (AGC)

PhD
bioinformatics.um6p.ma

About

26
Publications
19,793
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
92
Citations
Introduction
My main research interest lies in facilitating the integration of experimental and computational research, in particular computational genomics and metagenomics. I am interested in developing algorithms and tools to analyze the large amount of data from sequencing projects. As computation becomes the most expensive part of the sequence analysis pipeline, the need for efficient algorithms becomes necessary and crucial.
Additional affiliations
July 2013 - September 2019
King Saud University
Position
  • Professor (Assistant)
January 2012 - July 2012
University of South Carolina
Position
  • PostDoc Position
Education
January 2006 - December 2012
University of South Carolina
Field of study
  • Computer Science and Engineering
January 2005 - December 2005
University of South Carolina
Field of study
  • Computer Science and Engineering
January 2002 - December 2004
University of South Carolina
Field of study
  • Computer Engineering

Publications

Publications (26)
Article
Full-text available
Accurate gene prediction in metagenomics fragments is a computationally challenging task due to the short-read length, incomplete, and fragmented nature of the data. Most gene-prediction programs are based on extracting a large number of features and then applying statistical approaches or supervised classification approaches to predict genes. In o...
Article
Full-text available
Ribonucleic acid (RNA) modifications are post-transcriptional chemical composition changes that have a fundamental role in regulating the main aspect of RNA function. Recently, large datasets have become available thanks to the recent development in deep sequencing and large-scale profiling. This availability of transcriptomic datasets has led to i...
Article
Full-text available
Abstract Over the past decade, the problem of finding an efficient gene-targeting marker set or signature for plant trait characterization has remained challenging. Many databases focusing on pathway mining have been released with one major deficiency, as they lack to develop marker sets that target only genes controlling a specific pathway or cer...
Article
Full-text available
The development of reliable methods for identification of robust biomarkers for complex diseases is critical for disease diagnosis and prognosis efforts. Integrating multi-omics data with protein-protein interaction (PPI) networks to investigate diseases may help better understand disease characteristics at the molecular level. In this study, we de...
Article
Full-text available
Hepatitis C virus (HCV) is a serious disease that threatens human health. Despite consistent efforts to inhibit the virus, it has infected more than 58 million people, with 300,000 deaths per year. The HCV nonstructural protein NS5A plays a critical role in the viral life cycle, as it is a major contributor to the viral replication and assembly pro...
Article
Full-text available
Human immunodeficiency virus (HIV) infection is a major problem for humanity because HIV is constantly changing and developing resistance to current drugs. This necessitates the development of new anti-HIV drugs that take new approaches to combat an ever-evolving virus. One of the promising alternatives to combination antiretroviral therapy (cART)...
Article
The outbreak of the SARS-CoV-2 virus in late 2019 and the spread of the COVID-19 pandemic have caused severe health and socioeconomic damage worldwide. Despite the significant research effort to develop vaccines, antiviral treatments, and repurposed therapeutics to effectively contain the catastrophe, there are no available effective vaccines or an...
Article
Full-text available
Transfer RNAs (tRNAs) are intermediate-sized non-coding RNAs found in all organisms that help translate messenger RNA into protein. Recently, the number of sequenced plant genomes has increased dramatically. The availability of this extensive data greatly accelerates the study of tRNAs on a large scale. Here, 8,768,261 scaffolds/chromosomes contain...
Article
Discovered in Pseudomonas stutzeri, phosphite dehydrogenase (PTDH) is an enzyme that catalyzes the oxidation of phosphite to phosphate while simultaneously reducing NAD+ to NADH. Despite several investigations into the mechanism of reaction and cofactor regeneration, only a few studies have focused on improving the activity and stability of PTDH. I...
Article
Full-text available
Soil salinity is significant abiotic stress that severely limits global crop production. Chickpea ( Cicer arietinum L.) is an important grain legume that plays a substantial role in nutritional food security, especially in the developing world. This study used a chickpea population collected from the International Center for Agricultural Research i...
Article
Full-text available
Recently, Cicer species have experienced increased research interest due to their economic importance, especially in genetics, genomics, and crop improvement. The Cicer arietinum , Cicer reticulatum , and Cicer echinospermum genomes have been sequenced and provide valuable resources for trait improvement. Since the publication of the chickpea draft...
Article
Full-text available
The goal of this research was to develop a new genetic database of simple sequence repetition (SSR) primers for faba and classify them according to their target genes and respective biological processes. Approximately 75,605 and 148,196 previously published genomic and transcriptomic faba sequences, respectively, have been used to detect possible S...
Article
Full-text available
Genomic structural variations are significant causes of genome diversity and complex diseases. With advances in sequencing technologies, many algorithms have been designed to identify structural differences using next-generation sequencing (NGS) data. Due to repetitions in the human genome and the short reads produced by NGS, the discovery of struc...
Article
Full-text available
Background: Due to the technological progress in Next Generation Sequencing (NGS), the amount of genomic data that is produced daily has seen a tremendous increase. This increase has shifted the bottleneck of genomic projects from sequencing to computation and specifically storing, managing and analyzing the large amount of NGS data. Compression t...
Conference Paper
Full-text available
The development of next generation sequencing facilitates the study of metagenomics. Computational gene prediction aims to find the location of genes in a given DNA sequence. Gene prediction in metagenomics is a challenging task because of the short and fragmented nature of the data. Our previous framework minimum redundancy maximum relevance-suppo...
Article
Next-generation sequencing approaches and genome-wide studies have become essential for characterizing the mechanisms of human diseases. Consequently, many researchers have applied these approaches to discover the genetic/genomic causes of common complex and rare human diseases, generating multiomics big data that span the continuum of genomics, pr...
Article
Full-text available
Background: Finding accurate genome structural variations (SVs) is important for understanding phenotype diversity and complex diseases. Limited research using classification to find SVs from next-generation sequencing is available. Additionally, the existing algorithms are mainly dependent on an analysis of the alignment signatures of paired-end r...
Article
Full-text available
Background Computational approaches, specifically machine-learning techniques, play an important role in many metagenomic analysis algorithms, such as gene prediction. Due to the large feature space, current de novo gene prediction algorithms use different combinations of classification algorithms to distinguish between coding and non-coding sequen...
Article
Full-text available
Learning management systems (LMS) provide students and instructors with an environment to virtually access and manage all aspect of their courses. Students can access course material and submit their work online; while instructors can organize their courses, add content, evaluation students, grade their work and communicate with them. Blackboard is...
Article
Full-text available
Background: Computational gene finding algorithms have proven their robustness in identifying genes in complete genomes. However, metagenomic sequencing has presented new challenges due to the incomplete and fragmented nature of the data. During the last few years, attempts have been made to extract complete and incomplete open reading frames (ORFs...
Conference Paper
Full-text available
Computational gene finding algorithms have proven their robustness in identifying genes in complete genomes. However, metagenomic sequencing has presented new challenges due to the incomplete and fragmented nature of the data. During the last few years, attempts have been made to extract complete and incomplete open reading frames (ORFs) directly f...
Article
Full-text available
A number of competing methodologies have been de-veloped to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. An information theory measure based on...
Conference Paper
Full-text available
Several methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. The approach described herein measures mutual informat...
Chapter
An important impediment in the adoption of mobile computing as a dominant paradigm in many industries has been the human “overhead” required to manage the configuration, tuning and optimization of mobile resources in wireless networks. The mere fact that mobile devices and resources are in motion—where each is denoted as a station on a link-layer w...

Questions

Questions (2)
Question
I have very large feature space, what is the best way to do feature selection when not all the data can fit in memory. Is it better to do feature selection for substes of the features at a time? if so what would be the best algorithm for such scenario.
Thanks
Question
It is know that NGS data has sequencing errors which affects downstream analyses. Do we have to consider dealing with sequencing errors when designing analysis algorithms or will the 3rd generation sequencers solve the problem?

Network

Cited By

Projects

Projects (5)
Project
Recently, the number of sequenced plant genomes has increased dramatically, the goal of this project is to create plant biological databases by analyzing this large amount of data and predicting different biological markers. The created databases are robust and include built-in tools, statistical information, and visualization.
Project
Genome-wide identification of TEs in plant genomes.
Archived project
Predict SVs using NGS data The main goal of the project is to improve the least performing SV type predictions, which is deletion.