
Martin A Smith- PhD MSc BSc
- Professor at Université de Montréal
Martin A Smith
- PhD MSc BSc
- Professor at Université de Montréal
About
91
Publications
26,142
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,834
Citations
Introduction
Current institution
Additional affiliations
July 2017 - July 2017
August 2012 - present
June 2013 - present
Publications
Publications (91)
Accurate prediction of RNA secondary structures is essential for understanding the evolutionary conservation and functional roles of long noncoding RNAs (lncRNAs) across diverse species. In this study, we benchmarked two leading tools for predicting evolutionarily conserved RNA secondary structures (ECSs), SISSIz and R-scape, using two distinct exp...
Convolutional Neural Networks (CNNs) have been central to the Deep Learning revolution and played a key role in initiating the new age of Artificial Intelligence. However, in recent years newer architectures such as Transformers have dominated both research and practical applications. While CNNs still play critical roles in many of the newer develo...
The persistence of replication-competent HIV-1 in people living with HIV (PLWH) is a barrier to a cure for HIV. Early-phase studies of clinical interventions to deplete the intact persistent HIV-1 reservoir are ongoing. However, the ability to distinguish intact proviruses is limited by sequence variation and the predominance of defective proviruse...
Nanopore sequencing depends on the FAST5 file format, which does not allow efficient parallel analysis. Here we introduce SLOW5, an alternative format engineered for efficient parallelization and acceleration of nanopore data analysis. Using the example of DNA methylation profiling of a human genome, analysis runtime is reduced from more than two w...
DNA replication timing and three-dimensional (3D) genome organization are associated with distinct epigenome patterns across large domains. However, whether alterations in the epigenome, in particular cancer-related DNA hypomethylation, affects higher-order levels of genome architecture is still unclear. Here, using Repli-Seq, single-cell Repli-Seq...
Nanopore sequencing is an emerging genomic technology with great potential. However, the storage and analysis of nanopore sequencing data have become major bottlenecks preventing more widespread adoption in research and clinical genomics. Here, we elucidate an inherent limitation in the file format used to store raw nanopore data, known as FAST5, t...
Nanopore sequencing is an emerging genomic technology with great potential. However, the storage and analysis of nanopore sequencing data have become major bottlenecks preventing more widespread adoption in research and clinical genomics. Here, we elucidate an inherent limitation in the file format used to store raw nanopore data – known as FAST5 –...
Background
Hepatitis C (HCV) and many other RNA viruses exist as rapidly mutating quasi-species populations in a single infected host. High throughput characterization of full genome, within-host variants is still not possible despite advances in next generation sequencing. This limitation constrains viral genomic studies that depend on accurate id...
Current methods for dengue virus (DENV) genome amplification, amplify parts of the genome in at least 5 overlapping segments and then combine the output to characterize a full genome. This process is laborious, costly and requires at least 10 primers per serotype, thus increasing the likelihood of PCR bias. We introduce an assay to amplify near ful...
DNA replication timing and three-dimensional (3D) genome organisation occur across large domains associated with distinct epigenome patterns to functionally compartmentalise genome regulation. However, it is still unclear if alternations in the epigenome, in particular cancer-related DNA hypomethylation, can directly result in alterations to cancer...
Nanopore sequencing enables direct measurement of RNA molecules without conversion to cDNA, thus opening the gates to a new era for RNA biology. However, the lack of molecular barcoding of direct RNA nanopore sequencing data sets severely affects the applicability of this technology to biological samples, where RNA availability is often limited. He...
Background:
Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a comp...
Non-coding RNA has a proven ability to direct and regulate chromatin modifications by acting as scaffolds between DNA and histone-modifying complexes. However, it is unknown if ncRNA plays any role in DNA replication and epigenome maintenance, including histone eviction and re-instalment of histone-modifications after genome duplication. Isolation...
Background
The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance, and search-and-rescue. Yet, GSDs are well known to be susceptible to a range of genetic diseases that can interfer...
Nanopore sequencing has enabled sequencing of native RNA molecules without conversion to cDNA, thus opening the gates to a new era for the unbiased study of RNA biology. However, a formal barcoding protocol for direct sequencing of native RNA molecules is currently lacking, limiting the efficient processing of multiple samples in the same flowcell....
Familial Adult Myoclonic Epilepsy (FAME) is characterised by cortical myoclonic tremor usually from the second decade of life and overt myoclonic or generalised tonic-clonic seizures. Four independent loci have been implicated in FAME on chromosomes (chr) 2, 3, 5 and 8. Using whole genome sequencing and repeat primed PCR, we provide evidence that c...
The epitranscriptomics field has undergone an enormous expansion in the last few years; however, a major limitation is the lack of generic methods to map RNA modifications transcriptome-wide. Here, we show that using direct RNA sequencing, N6-methyladenosine (m6A) RNA modifications can be detected with high accuracy, in the form of systematic error...
Nanopore sequencing has the potential to revolutionise genomics by realising portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these applications requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. For instance, comparing raw nanopore signals...
Motivation:
The management of raw nanopore sequencing data poses a challenge that must be overcome to facilitate the creation of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualisation, and signal processing....
High-throughput single-cell RNA sequencing is a powerful technique but only generates short reads from one end of a cDNA template, limiting the reconstruction of highly diverse sequences such as antigen receptors. To overcome this limitation, we combined targeted capture and long-read sequencing of T-cell-receptor (TCR) and B-cell-receptor (BCR) mR...
In hypersaline environments, Nanohaloarchaeota (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarch-aeota [DPANN] superphylum) are thought to be free-living microorganisms. We report cultivation of 2 strains of Antarctic Nanohaloarchaeota and show that they require the haloarchaeon Halorubrum lacusprofundi for growth. By per...
The human brain is one of the last frontiers of biomedical research. Genome-wide association studies (GWAS) have succeeded in identifying thousands of haplotype blocks associated with a range of neuropsychiatric traits, including disorders such as schizophrenia, Alzheimer’s and Parkinson’s disease. However, the majority of single nucleotide polymor...
Chirality is a property describing any object that is inequivalent to its mirror image. Due to its 5′–3′ directionality, a DNA sequence is distinct from a mirrored sequence arranged in reverse nucleotide-order, and is therefore chiral. A given sequence and its opposing chiral partner sequence share many properties, such as nucleotide composition an...
The advent of Nanopore sequencing has realised portable genomic research and applications. However, state of the art long read aligners and large reference genomes are not compatible with most mobile computing devices due to their high memory requirements. We show how memory requirements can be reduced through parameter optimisation and reference g...
The management of raw nanopore sequencing data poses a challenge that must be overcome to accelerate the development of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualisation, and signal processing. Its modul...
The human brain is one of the last frontiers of biomedical research. Genome-wide association studies (GWAS) have succeeded in identifying thousands of haplotype blocks associated with a range of neuropsychiatric traits, including disorders such as schizophrenia, Alzheimer's and Parkinson's disease. However, the majority of single nucleotide polymor...
The field of epitranscriptomics has undergone an enormous expansion in the last few years; however, a major limitation is the lack of generic methods to map RNA modifications transcriptome-wide. Here we show that using Oxford Nanopore Technologies, N6-methyladenosine (m6A) RNA modifications can be detected with high accuracy, in the form of systema...
High-throughput single-cell RNA-Sequencing is a powerful technique for gene expression profiling of complex and heterogeneous cellular populations such as the immune system. However, these methods only provide short-read sequence from one end of a cDNA template, making them poorly suited to the investigation of gene-regulatory events such as mRNA s...
fast5_fetcher is a tool for fetching nanopore fast5 files to save time and simplify downstream analysis. fast5_fetcher does the heavy lifting for you.
Chirality is a geometric property describing any object that is inequivalent to a mirror image of itself. Due to its 5-3 directionality, a DNA sequence is distinct from a mirrored sequence arranged in reverse nucleotide order, and is therefore chiral. A given sequence and its opposing chiral partner sequence share many properties, such as nucleotid...
The advent of nanopore sequencing has realised portable genomic research and applications. However, state of the art long read aligners and large reference genomes are not compatible with most mobile computing devices due to their high memory requirements. We show how memory requirements can be reduced through parameter optimization and reference g...
The complexity of microbial communities, combined with technical biases in next-generation sequencing, pose a challenge to metagenomic analysis. Here, we develop a set of internal DNA standards, termed "sequins" (sequencing spike-ins), that together constitute a synthetic community of artificial microbial genomes. Sequins are added to environmental...
The diversity of processed transcripts in eukaryotic genomes poses a challenge for the classification of their biological functions. Sparse sequence conservation in non-coding sequences and the unreliable nature of RNA structure predictions further exacerbate this conundrum. Here, we describe a computational method, DotAligner, for the unsupervised...
Protein-coding RNAs represent only a small fraction of the transcriptional output in higher eukaryotes. The remaining RNA species encompass a broad range of molecular functions and regulatory roles, a consequence of the structural polyvalence of RNA polymers. Albeit several classes of small noncoding RNAs are relatively well characterized, the acce...
RNA modifications have been historically considered as fine-tuning chemo-structural features of infrastructural RNAs, such as rRNAs, tRNAs and snoRNAs. This view has changed dramatically in the recent years, to a large extent as a result of systematic efforts to map and quantify various RNA modifications in a transcriptome-wide manner, revealing th...
RNA has the intrinsic property to base pair, forming complex structures fundamental to its diverse functions. Here, we develop PARIS, a method based on reversible psoralen crosslinking for global mapping of RNA duplexes with near base-pair resolution in living cells. PARIS analysis in three human and mouse cell types reveals frequent long-range str...
The initial steps in the analysis of Next Generation Sequencing (NGS) data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly due to evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands e...
Evolutionarily conserved RNA secondary structures are a robust indicator of purifying selection and, consequently, molecular function. Evaluating their genome-wide occurrence through comparative genomics has consistently been plagued by high false-positive rates and divergent predictions. We present a novel benchmarking pipeline aimed at calibratin...
The ability to sequence genomes and characterize their products has begun to reveal the central role for regulatory RNAs in biology, especially in complex organisms. It is now evident that the human genome contains not only protein-coding genes, but also tens of thousands of non-protein coding genes that express small and long ncRNAs (non-coding RN...
181 genes are oppositely regulated by mPINC knockdown and overexpression. This list includes all genes that are significantly altered in the opposite direction between mPINC knockdown (undifferentiated or differentiated) and mPINC overexpression (p>0.01, FC>1.4).
(XLSX)
GO analysis of genes altered by mPINC knockdown reveals GO terms associated with development and differentiation. This spread sheet includes lists of genes identified by microarray analysis that are associated with each GO term (adjusted p value>0.05) by DAVID GO analysis.
(XLSX)
mPINC is expressed at varying levels in alveolar cells of the midpregnant gland. (A) In situ hybridization shows that some alveolar cells have higher levels of mPINC1.0 (large arrow) while some alveolar cells appear to express less (small arrow) (10×). (B) mPINC1.6 is also expressed at higher levels in some alveolar clusters (large arrow) than in o...
436 gene probes are significantly changed in mPINC knockdown cells. This list includes all genes that are significantly altered following mPINC knockdown (p<0.01, FC>1.8 for either siPINC1.0/1.6 or siPINC relative to siNEG).
(XLSX)
Pregnancy-induced noncoding RNA (PINC) and retinoblastoma-associated protein 46 (RbAp46) are upregulated in alveolar cells of the mammary gland during pregnancy and persist in alveolar cells that remain in the regressed lobules following involution. The cells that survive involution are thought to function as alveolar progenitor cells that rapidly...
The human mitochondrial genome comprises a distinct genetic system transcribed as precursor polycistronic transcripts that are subsequently cleaved to generate individual mRNAs, tRNAs, and rRNAs. Here, we provide a comprehensive analysis of the human mitochondrial transcriptome across multiple cell lines and tissues. Using directional deep sequenci...
The identification of cancer-associated long noncoding RNAs (lncRNAs) and the investigation of their molecular and biological functions are important to understand the molecular biology of cancer and its progression. Although the functions of lncRNAs and the mechanisms regulating their expression are largely unknown, recent studies are beginning to...
Long noncoding RNAs (lncRNAs) are increasingly recognized to play major regulatory roles in development and disease. To identify novel regulators in breast biology, we identified differentially regulated lncRNAs during mouse mammary development. Among the highest and most differentially expressed was a transcript (Zfas1) antisense to the 5' end of...
We have previously shown that the Leishmania genome possess two widespread families of extinct retroposons termed Short Interspersed DEgenerated Retroposons (SIDER1/2)
that play a role in post-transcriptional regulation. Moreover, we have demonstrated that SIDER2 retroposons promote mRNA degradation.
Here we provide new insights into the mechanism...
Alignment and phylogeny of 1416 LmSIDERs. Detailed representation of the data presented in Figure 1. The project can be opened and modified with the freely available JALVIEW software [8].
Comparative genomic distribution of SIDERs in three Leishmania species. Complete distribution of SIDERs in all chromosomes of L. major, L. infantum, and L. braziliensis. Same description as in Figure 4.
Selectivity scatter-plot of initial SIDER profiles. Unaligned input sequences were scanned with the initial HMM profile of two SIDER subgroups using the hmms global alignment command from HMMER-1.8.5. The bit-scores for each sequence are plotted in the bidimensional grid.
Linkage between LmSIDER1 and LmDIRE sequences. All LmDIREs identified in the L. major genome are represented by large black lines whereby the numbers correspond to the chromosome location and the order of appearance on the chromosome. The corresponding LmSIDER1 sequences are shown above the DIRE line by a thin red line.
All SIDER positions identified with refined HMM profiles in three Leishmania species. Microsoft Excel spreadsheet containing all hits scoring over 0 bits with the refined HMM profiles reported in this work. Separate tabs for individual profiles.
Gene Ontology term enrichment of SIDER1- and SIDER2-associated genes in all three Leishmania species. Microsoft Excel spreadsheet displaying all results with P-values < 0.05 obtained with AmiGO term enrichment from GeneDB website (see text for details). Each tab encloses the data for one particular SIDER subgroup.
We have recently identified two large families of extinct transposable elements termed Short Interspersed DEgenerated Retroposons (SIDERs) in the parasitic protozoan Leishmania major. The characterization of SIDER elements was limited to the SIDER2 subfamily, although members of both subfamilies have been shown to play a role in the regulation of g...
Genes differentially expressed in Leishmania infantum intracellular amastigotes. This Table lists all the Leishmania infantum genes that are differentially expressed in intracellular amastigotes as determined by DNA microarray studies.
Genes differentially expressed in Leishmania major promastigotes. This Table lists all the Leishmania major genes that are differentially expressed in promastigotes as determined by DNA microarray studies.
Genes differentially expressed in Leishmania infantum promastigotes. This Table lists all the Leishmania infantum genes that are differentially expressed in promastigotes as determined by DNA microarray studies.
Genes differentially expressed in Leishmania major intracellular amastigotes. This Table lists all the Leishmania major genes that are differentially expressed in lesion-derived amastigotes as determined by DNA microarray studies.
Comparison of expression levels obtained by quantitative real-time PCR between different Leishmania species or experimental models of infection. qRT-PCR analysis was performed on selected differentially expressed genes as determined by microarray experiments. The same RNA used for the microarray analysis was also used for qRT-PCR. (A) Expression va...
Primers used for quantitative real-time PCR expression analysis. Table lists the sequences of the primers used for quantitative real-time PCR expression analysis to validate DNA microarray studies.
Polyadenylated ESTs. PDF document of all 218 polyadenylated EST accession IDs used to build scanning matrices in this work.
Over-represented hexamers. A Microsoft Word document containing the 10 highest scoring hexamers identified with YMF and FindExplanator programs. An alignment of 223 sequences was used to compare regions encompassing the [-125; +125] of genomic poly(A) sites to the [-800; -126] and [+126; +800] regions.
PRED-A-TERM program. The prediction algorithm described in this manuscript has been implemented into a JAVA program which can be used to scan query sequences using any operating system. Once extracted, detailed usage instructions can be viewed in the README.txt file.
PSSM scanning results for different matrix sizes and scanning distances. A Microsoft Excel spreadsheet containing all sensitivity results for single-matrix scanning approaches.
Leishmania parasites cause a diverse spectrum of diseases in humans ranging from spontaneously healing skin lesions (e.g., L. major) to life-threatening visceral diseases (e.g., L. infantum). The high conservation in gene content and genome organization between Leishmania major and Leishmania infantum contrasts their distinct pathophysiologies, sug...
Leishmania and other members of the Trypanosomatidae family diverged early on in eukaryotic evolution and consequently display unique cellular properties. Their apparent lack of transcriptional regulation is compensated by complex post-transcriptional control mechanisms, including the processing of polycistronic transcripts by means of coupled tran...
Trypanosomatids are unicellular protists that include the human pathogens Leishmania spp. (leishmaniasis), Trypanosoma brucei (sleeping sickness), and Trypanosoma cruzi (Chagas disease). Analysis of their recently completed genomes confirmed the presence of non-long-terminal repeat retrotransposons, also called retroposons. Using the 79-bp signatur...
Alignment of 1,013 LmSIDER2
The alignment, saved under the Philip format, was performed as described in Materials and Methods with the introduction of gaps (-) to maximize the alignments. The LmSIDER names indicate the chromosomal localization followed by the model number.
(1.6 MB DOC)
Differential Gene Expression of L. major SIDER2-Containing Transcripts Analyzed by DNA Microarrays
(77 KB PDF)
Distribution of Genes and Retroposons on the 36 L. major Chromosomes
The central scale bars showing the size of the chromosomes (kb) separate features located on different strands. The position of protein-encoding genes and retroposons is indicated by vertical bars with the color code shown on the right margin. Protein-encoding genes and DIREs are...
LmSIDER Sequences Annotated in the L. major Genome
The chromosome localization (“chr”), genomic coordinates (“start” and “end”), strand localization (“str”), family (“fam”), and name (“name”) of the annotated LmSIDERs are indicated. The first column (“ID”) shows the name of each LmSIDER annotated in the database (version 4.0 of the assembly) hosted...
Primers Used for the Generation of the LUC-Expressing Vectors
(59 KB PDF)
Alignment of the Core Sequence of 1,013 LmSIDER2
This alignment was generated from the one presented in Figure S1 by deleting all positions showing a gap for at least 50% of the aligned sequences.
(553 KB DOC)
Distribution of Genes and Retroposons on the 11 T. brucei Megachromosomes
The central scale bars showing the size of the chromosomes (kb) separate features located on different strands. The position of protein-encoding genes and retroposons is indicated by vertical bars with the color code shown in the right margin. Protein-encoding genes and ingi...