Simon Orozco Arias

Simon Orozco Arias
Center for Research in Agricultural Genomics | CRAG · Department of Plant Genetics

Ph.D

About

59
Publications
27,640
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
792
Citations
Introduction
System and Computing Engineer, specialist in bioinformatics and Deep Learning, and Ph.D. at Universidad de Caldas, Colombia. He has worked in some researches about supercomputing, bioinformatics, machine learning, genomics and transposable elements.
Additional affiliations
March 2016 - February 2018
Centro de Bioinformatíca y Biología Computacional BIOS
Position
  • HPC Cluster Management

Publications

Publications (59)
Article
Full-text available
Understanding the evolution of chromatin conformation among species is fundamental to elucidate the architecture and plasticity of genomes. Nonrandom interactions of linearly distant loci regulate gene function in species-specific patterns, affecting genome function, evolution, and, ultimately, speciation. Yet, data from nonmodel organisms are scar...
Article
Full-text available
Coffea arabica, an allotetraploid hybrid of Coffea eugenioides and Coffea canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid proge...
Preprint
Full-text available
The number of species with high quality genome sequences continues to increase, in part due to scaling up of multiple large scale biodiversity sequencing projects. While the need to annotate genic sequences in these genomes is widely acknowledged, the parallel need to annotate transposable element sequences that have been shown to alter genome arch...
Article
Full-text available
Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central...
Preprint
Full-text available
Coffea arabica, an allotetraploid hybrid of C. eugenioides and C. canephora, is the source of approximately 60% of coffee products worldwide. Cultivated accessions have undergone several population bottlenecks resulting in low genetic diversity. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives o...
Article
LTR retrotransposons (LTR-RT) are major components of plant genomes. These transposable elements participate in the structure and evolution of genes and genomes through their mobility and their copy number amplification. For example, they are commonly used as evolutionary markers in genetic, genomic, and cytogenetic approaches. However, the plant r...
Article
Artificial intelligence is revolutionizing all fields that affect people's lives and health. One of the most critical applications is in the study of tumors. It is the case of glioblastoma (GBM) that has behaviors that need to be understood to develop effective therapies. Due to advances in single-cell RNA sequencing (scRNA-seq), it is possible to...
Article
Full-text available
Human endogenous retroviruses (HERVs) are LTR retrotransposons that are present in the human genome. Among them, members of the HERV-K (HML-2) group are suspected to play a role in the development of different types of cancer, including lung, ovarian, and prostate cancer, as well as leukemia. Acute myeloid leukemia (AML) is an important disease tha...
Article
Full-text available
Artificial intelligence (AI) is one of the components recognized for its potential to transform the way we live today radically. It makes it possible for machines to learn from experience, adjust to new contributions and perform tasks like human beings. The business field is the focus of this research. This article proposes implementing an incident...
Article
Full-text available
A common task in bioinformatics is to compare DNA sequences to identify similarities between organisms at the sequence level. An approach to such comparison is the dot-plots, a 2-dimensional graphical representation to analyze DNA or protein alignments. Dot-plots alignment software existed before the sequencing revolution, and now there is an ongoi...
Article
Full-text available
Transposable elements (TEs) are mobile elements found in the majority of eukaryotic genomes. TEs deeply impact the structure and evolution of chromosomes and can induce mutations affecting coding genes. In plants, the major group of TEs is long terminal repeat retrotransposons (LTR-RTs). They are classified into superfamilies (Gypsy, Copia) and sub...
Article
Full-text available
LTR-retrotransposons are the most abundant repeat sequences in plant genomes and play an important role in evolution and biodiversity. Their characterization is of great importance to understand their dynamics. However, the identification and classification of these elements remains a challenge today. Moreover, current software can be relatively sl...
Article
Full-text available
The emergence of COVID-19 as a global pandemic forced researchers worldwide in various disciplines to investigate and propose efficient strategies and/or technologies to prevent COVID-19 from further spreading. One of the main challenges to be overcome is the fast and efficient detection of COVID-19 using deep learning approaches and medical images...
Article
Full-text available
Coffee leaf rust is the most damaging disease for coffee cultivation around the world. It is caused by a fungal pathogen, Hemileia vastatrix (Hva), belonging to the phylum Basidiomycota. Coffee leaf rust causes significant yield losses and increases costs related to its control, with evaluated losses of USD 1–2 billion annually. It attacks both the...
Article
Full-text available
Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype...
Preprint
Full-text available
There is a necessity to develop affordable, and reliable diagnostic tools, which allow containing the COVID-19 spreading. Machine Learning (ML) algorithms have been proposed to design support decision-making systems to assess chest X-ray images, which have proven to be useful to detect and evaluate disease progression. Many research articles are pu...
Chapter
COVID-19 caused by the SARS-CoV-2 virus has affected healthcare and people's lifestyles worldwide since 2019. Among the available diagnostic tools, reverse transcription-polymerase chain reaction has proven highly accurate. However, the need for a specialized laboratory makes these tests expensive and time-consuming between sample collection and re...
Preprint
Full-text available
Transposable elements (TEs) are mobile genetic elements found in the majority of eukaryotic genomes. Because of their mobility in the host genome, TEs can deeply impact the structure and evolution of chromosomes and can induce mutations affecting coding genes. In response to these potential threats, host genomes use various processes to repress the...
Chapter
Transposable elements are mobile sequences in all eukaryotic genomes. LTR (Long Terminal Repeat) retrotransposons are the most abundant elements in plant genomes where they play a fundamental role in evolution, gene function and genetic diversity. It is therefore important to develop bioinformatic tools to identify them in sequenced genomes and to...
Article
Full-text available
This work presents a framework for coffee maturity classification from multispectral image data based on Convolutional Neural Networks (CNNs). The system leverages the use of multispectral image acquisition systems that generate large amounts of data, by taking advantage of the ability of CNNs to extract meaningful patterns from very high-dimension...
Article
Full-text available
Recent advances in artificial intelligence with traditional machine learning algorithms and deep learning architectures solve complex classification problems. This work presents the performance of different artificial intelligence models to classify two-phase flow patterns, showing the best alternatives for this specific classification problem usin...
Conference Paper
Transposable elements (TEs) are specific structures of the genome of species, which can move from one location to another. For that reason, they can cause mutations or changes that can be negative, such as the appearance of diseases, or beneficial, such as participating in fundamental roles in the evolution of genomes and genetic diversity. Long Te...
Article
Full-text available
Skin cancer is one of the most severe diseases, and medical imaging is among the main tools for cancer diagnosis. The images provide information on the evolutionary stage, size, and location of tumor lesions. This paper focuses on the classification of skin lesion images considering a framework of four experiments to analyze the classification perf...
Article
Full-text available
In recent years, the traditional approach to spatial image steganalysis has shifted to deep learning (DL) techniques, which have improved the detection accuracy while combining feature extraction and classification in a single model, usually a convolutional neural network (CNN). The main contribution from researchers in this area is new architectur...
Article
Full-text available
COVID-19 global pandemic affects health care and lifestyle worldwide, and its early detection is critical to control cases’ spreading and mortality. The actual leader diagnosis test is the Reverse transcription Polymerase chain reaction (RT-PCR), result times and cost of these tests are high, so other fast and accesible diagnostic tools are needed....
Chapter
Full-text available
Electromyographic (EMG) signals provide information about muscle activity. In hand movements, each gesture’s execution involves the activation of different combinations of the forearm muscles, which generate distinct electrical patterns. Furthermore, the analysis of muscle activation patterns represented by EMG signals allows recognizing these gest...
Article
Full-text available
Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR retrotransposons are the most frequent repetitive seq...
Article
Full-text available
Caffeine is the most consumed alkaloid stimulant in the world. It is synthesized through the activity of three known N‐methyltransferase proteins. Here we are reporting on the 422‐Mb chromosome‐level assembly of the Coffea humblotiana genome, a wild and endangered, naturally caffeine‐free, species from the Comoro archipelago. We predicted 32,874 ge...
Article
Full-text available
In recent years, Deep Learning techniques applied to steganalysis have surpassed the traditional two-stage approach by unifying feature extraction and classification in a single model, the Convolutional Neural Network (CNN). Several CNN architectures have been proposed to solve this task, improving steganographic images’ detection accuracy, but it...
Article
Full-text available
Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotra...
Article
Full-text available
Advances in Deep Learning (DL) have provided alternative approaches to various complex problems, including the domain of spatial image steganalysis using Convolutional Neural Networks (CNN). Several CNN architectures have been developed in recent years, which have improved the detection accuracy of steganographic images. This work presents a novel...
Article
Full-text available
El cáncer cervical se forma en las células que revisten el cuello uterino y la parte inferior del útero. Debido a razones de costo y baja oferta de servicios destinados a la detección de este tipo de cáncer, muchas mujeres no tienen acceso a un diagnóstico pronto y preciso, ocasionando un inicio tardío del tratamiento. Para dar solución a este prob...
Article
Full-text available
Transposable elements (TEs) are non-static genomic units capable of moving indistinctly from one chromosomal location to another. Their insertion polymorphisms may cause beneficial mutations, such as the creation of new gene function, or deleterious in eukaryotes, e.g., different types of cancer in humans. A particular type of TE called LTR-retrotr...
Chapter
Steganography is the process of hiding messages inside an object known as a carrier. The idea is establishing a covert communication channel where messages go unnoticed by observers having access to that channel. Steganalysis is dedicated to the detection of such hidden messages; these messages can be embedded in several different types of digital...
Article
Full-text available
Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recentl...
Article
Full-text available
Cancer classification is a topic of major interest in medicine since it allows accurate and efficient diagnosis and facilitates a successful outcome in medical treatments. Previous studies have classified human tumors using a large-scale RNA profiling and supervised Machine Learning (ML) algorithms to construct a molecular-based classification of c...
Preprint
Full-text available
Electromyographic (EMG) signals provide information about a person's muscle activity. For hand movements, in particular, the execution of each gesture involves the activation of different combinations of the forearm muscles, which generate distinct electrical patterns. Conversely, the analysis of these muscle activation patterns, represented by EMG...
Article
Full-text available
Background Transposable elements (TEs) constitute the most common repeated sequences in eukaryotic genomes. Recent studies demonstrated their deep impact on species diversity, adaptation to the environment and diseases. Although there are many conventional bioinformatics algorithms for detecting and classifying TEs, none have achieved reliable resu...
Article
Full-text available
Transposable elements (TEs) are genomic units able to move within the genome of virtually all organisms. Due to their natural repetitive numbers and their high structural diversity, the identification and classification of TEs remain a challenge in sequenced genomes. Although TEs were initially regarded as "junk DNA", it has been demonstrated that...
Article
Full-text available
Bacterial infections are a major global concern, since they can lead to public health problems. To address this issue, bioinformatics contributes extensively with the analysis and interpretation of in silico data by enabling to genetically characterize different individuals/strains, such as in bacteria. However, the growing volume of metagenomic da...
Article
Full-text available
Un alineamiento gráfico o "dot plot" es un método de representación visual del análisis de datos genómicos, comúnmente utilizado para comparar la similitud de dos secuencias biológicas. El programa DOTTER desarrollado en 1995, es la herramienta más utilizada para este tipo de tareas. El mayor problema de este software radica en el elevado tiempo de...
Article
Full-text available
Los retrovirus endógenos humanos (HERVs) constituyen aproximadamente el 8% del genoma humano, particularmente están sobreexpresados en algunas células y tejidos del carcinoma de mama que es el más común y la segunda causa de muerte por cáncer en mujeres en todo el mundo. Investigaciones recientes muestran que la familia de retrovirus HERV-K es la d...
Article
Full-text available
The co-occurrence of plant species is a fundamental aspect of plant ecology that contributes to understanding ecological processes, including the establishment of ecological communities and its applications in biological conservation. A priori algorithms can be used to measure the co-occurrence of species in a spatial distribution given by coordina...
Poster
Full-text available
Figure 4. Detailed analysis of four potential cases of horizontal transfer identified in 69 sequenced plant genomes. A. Tree of the 69 plant genomes positioned. Potential cases of horizontal transfer analyzed here are represented by a colored lines connecting species, with the lineage involved, the BLAST score and the nucleotide identity percentage...
Article
Full-text available
One particular class of Transposable Elements (TEs), called Long Terminal Repeats (LTRs), retrotransposons, comprises the most abundant mobile elements in plant genomes. Their copy number can vary from several hundreds to up to a few million copies per genome, deeply affecting genome organization and function. The detailed classification of LTR ret...
Article
Full-text available
Centromeric regions of plants are generally composed of large array of satellites from a specific lineage of Gypsy LTR-retrotransposons, called Centromeric Retrotransposons. Repeated sequences interact with a specific H3 histone, playing a crucial function on kinetochore formation. To study the structure and composition of centromeric regions in th...
Book
Full-text available
Nunca antes se han tenido tantos datos de secuenciación disponibles y la posibilidad de contar con tecnologías que se actualizan constantemente, que permiten estudiar de forma masiva y simultánea cientos de especies para diferentes objetivos, entre los cuales se destacan los estudios de taxonomía molecular, evolución y la producción de compuestos p...
Article
Full-text available
Las tecnologías de la computación de altas prestaciones se han convertido en herramientas muy útiles, empleadas por diferentes empresas o centros de investigación para la ejecución de procesos y análisis de datos masivos en tiempo real, convirtiéndose en necesidades básicas para la mayoría de los procesos investigativos. Sin embargo, uno de los pri...
Article
Full-text available
Usage of supercomputing in science is more necessary every day, due to the large amount of data that researches have to analyze to obtain significant results. Many-cores technologies such as Intel Xeon Phi were developed as an alternative to accelerate these studies and its use in supercomputers and especially in bioinformatics is more common nowad...
Poster
Full-text available
Introduction: Transposable elements (TE) are major genome components. They represent the main portion of plant genomes, reaching up to 85% of the genome. They are able to move from one genetic locus to another. TEs were traditionally classified into two main classes according to their replication mode: Class I or retrotransposons and Class II, also...
Poster
Full-text available
Introduction: Data mining has been applied in the field of biological sciences, since the latter produces a great amount of complex and noisy data of unseen proportion. Therefore, ecological data analysis represents an opportunity to explore and comprehend biological systems with bioinformatics tools. The knowledge of occupancy, spatial auto-correl...
Conference Paper
Bioinformatics is now one of the most important fields of modern sciences grouping different fields of research such as Biology, Genomics, Genetics and Molecular evolution. These fields generate a large amount of information via the utilization of the new generations of sequencing techniques (NGS). This amount of data requires the development of a...
Conference Paper
New sequencing technologies has been increasing the size of current genomes rapidly reducing its cost at the same time, those data need to be processed with efficient and innovated tools using high performance computing (HPC), but for taking advantage of nowadays supercomputers, parallel programming techniques and strategies have to be used. Plant...
Article
Full-text available
La bioinformática es la ciencia que combina la biología con las tecnologías de la información. Esta unión surge debido a la problemática dada en fenómenos tan complejos como la genética, la simulación del efecto de medicinas, la predicción de enfermedades, etc. Todas estas situaciones manejan gran cantidad de información y variables, de allí surge...

Network

Cited By