About
101
Publications
19,511
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,291
Citations
Introduction
Skills and Expertise
Publications
Publications (101)
High-income countries have a wealth of genomics expertise that can be rapidly activated to deal with disease threats. African countries should invest in a federated data-management system for genomics epidemiology to deal with such threats better.
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analys...
While plant genome analysis is gaining speed worldwide, few plant genomes have been sequenced and analyzed on the African continent. Yet, this information holds the potential to transform diverse industries as it unlocks medicinally and industrially relevant biosynthesis pathways for bioprospecting. Considering that South Africa is home to the high...
Genetic evolution of Rift Valley fever virus (RVFV) in Africa has been shaped mainly by environmental changes such as abnormal rainfall patterns and climate change that has occurred over the last few decades. These gradual environmental changes are believed to have effected gene migration from macro (geographical) to micro (reassortment) levels. Pr...
An operon is a set of adjacent genes which are transcribed into a single messenger RNA. Operons allow prokaryotes to efficiently circumvent environmental stresses. It is estimated that about 60% of the Mycobacterium tuberculosis genome is arranged into operons, which makes them interesting drug targets in the face of emerging drug resistance. We th...
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analys...
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galax...
The growing demands on protein producers and the dwindling available resources have made Hermetia illucens (the black soldier fly, BSF) an economically important species. Insights into the genome of this insect will better allow for robust breeding protocols, and more efficient production to be used as a replacement of animal feed protein. The use...
While the reduction in the cost of WGS is making sequencing more affordable in lower- and middle-income countries (LMICs), public health laboratories in these countries seldom have access to bioinformaticians and system support engineers adept at using the Linux command line and complex bioinformatics software. The COMBAT-TB Workbench provides an o...
Whole Genome Sequencing (WGS) is a powerful method for detecting drug resistance, genetic diversity and transmission dynamics of Mycobacterium tuberculosis . Implementation of WGS in public health microbiology laboratories is impeded by a lack of user-friendly, automated and semi-automated pipelines. We present the COMBAT-TB workbench, a modular, e...
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to m...
Rooibos (Aspalathus linearis), widely known as a herbal tea, is endemic to the Cape Floristic Region of South Africa (SA). It produces a wide range of phenolic compounds that have been associated with diverse health promoting properties of the plant. The species comprises several growth forms that differ in their morphology and biochemical composit...
Next-generation sequencing (NGS) technologies have revolutionized biological research by generating genomic data that were once unaffordable by traditional first-generation sequencing technologies. These sequencing methodologies provide an opportunity for in-depth analyses of host and pathogen genomes as they are able to sequence millions of templa...
Motivation:
Recent advancements in genomic technologies have enabled high throughput cost-effective generation of 'omics' data from M.tuberculosis (M.tb) isolates, which then gets shared via a number of heterogeneous publicly available biological data resources. Albeit useful, fragmented curation negatively impacts the researcher's ability to leve...
The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines ada...
Background:
The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and...
The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines ada...
Background:
Drought is the most disastrous abiotic stress that severely affects agricultural productivity worldwide. Understanding the biological basis of drought-regulated traits, requires identification and an in-depth characterization of genetic determinants using model organisms and high-throughput technologies. However, studies on drought tol...
Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing...
Annotation table for all proteins discovered in this study comparing MetaVir annotations with those derived from the nr and RefSeq databases from NCBI, considering Pfam.
TEM images of a selection of phage morphologies identified in material from the sampling site. Siphovirus and myovirus morphologies were observed in particular abundance.
Multiple sequence alignment of circoviral replication associated proteins (MSA conducted using MAFFT).
Satellite image of the sampling site (Cu-site) showing its position in relation to nearby copper mines (Hope and Gorob). Inset depicts an enlargement of the site showing the trench dug alongside the road with heaps of sampled copper laden material, as well as images of the sampled material (including material with a green copper patina).
Protein annotation workflow.
Assembly results for the simulated virome datasets SIM1, SIM2, and SIM3 for all contigs larger than 200 nt.
Assembly workflow.
Multiple sequence alignment of circoviral capsid-like proteins (MSA conducted using MAFFT).
Annotation table for the Thalassomonas phage-like genome described in this study (contig_13).
[This corrects the article DOI: 10.1371/journal.pgen.1005954.].
Background
The National Institutes of Health (USA) has committed 5 years of funding to the Bioinformatics Network of the Human Heredity and Health in Africa initiative. This pan-African network aims to develop capacity for bioinformatics research, in order to provide support to human health genomics research programs ongoing on the continent. Over...
Author
We describe the genome assembly of Asian seabass (Lates calcarifer), a marine teleost with aquaculture relevance. Though >500 eukaryotic genome sequences are available in public repositories, the majority are highly fragmented with incomplete assemblies, which explains why considerable effort and resources are often spent to improve their q...
Observed k-mer distribution and modeling results.
k-mer frequency counting analyses was done for the Illumina genomic reads. Jellyfish [74] was used with the following parameters and commands: jellyfish count -m 21 -s 100000000 -t 5 -o output -C InputFile (counting 21-mer frequencies), jellyfish merge -o output.jf output_* (merging multiple output...
Pipeline for Asian seabass population data analyses.
The flowchart outlines the steps taken for analyses of Asian seabass genome sequence information from 62 fishes collected from 13 regions across its geographic range.
(TIF)
Asian seabass tandem repeat consensus sequences.
(DOCX)
The number of contigs in the primary Asian seabass genome assembly (v1; 3,917 contigs) compared to those of published fish genome assemblies (see S23 Table for more details).
(TIF)
Evaluation of the Asian seabass scaffolded genome assembly (v2) by mapping Illumina PE Genome reads to assembly for linear insert size libraries in the size range of 500 bp (A) and 750 bp (B). The 80X Illumina paired-end HiSeq genome sequence data was mapped to the PacBio-based assembled genome using the CLC Genomics Workbench version 8.5.1 mapping...
A screenshot of the Asian seabass genome assembly (v1) showing a location wherein a ~15 kb region missed by short reads has been captured using long reads from PacBio sequencing.
(TIF)
The Asian seabass genome assembly contains a more continuous cluster of MHC-class I genes compared to the well-assembled G. aculeatus genome.
The L. calcarifer MHC-class I genes were found to be located on eight contigs/scaffolds, four of which were placed onto linkage group 3 (LG3). Four of these eight contigs/scaffolds were also >1Mb in length. T...
Tandem repeats with highest coverage of 23-mer HiSeq reads.
(XLSX)
Summary of transposable elements identified in the Asian seabass genome.
(XLSX)
Placement of genome sequences ≥40kb on assembled Maptigs.
(XLSX)
Inventory of 247 overlaps between ends of neighbouring contigs that were closed during scaffolding of the Asian seabass genome.
(XLSX)
Read statistics for B chromosome-derived sequences.
(XLSX)
Assembly of B chromosome-derived fragment statistics.
(XLSX)
Functional annotation of Asian seabass protein-coding genes.
The number of genes in top ten entries for A) Interpro B) KEGG pathways and C) GO.
(TIF)
PCA plots of Asian seabass populations using SNPs.
A total of 64,634 SNPs were used for PCA analyses. The results at different percentages of explained variation are shown in A) and B).
(TIF)
De novo assembly of Single Molecule Restriction Maps.
(XLSX)
Statistics of sequence placement (only sequences ≥40kb) on assembled Maptigs.
(XLSX)
Inventory of potentially misassembled sequences identified by the optical map data.
(XLSX)
Summary of synteny blocks shared between L. calcarifer and D. labrax.
(XLSX)
Summary of synteny blocks shared between L. calcarifer and G. aculeatus.
(XLSX)
Sample collection details for Asian seabass whole genome resequencing effort.
(XLSX)
Summary of sequenced fish genomes.
(XLSX)
Comparison of GC content of Asian seabass genome assembly (v2)with few selected fish genomes (A), with representatives from the different classes of vertebrates (B) and comparison of GC content with genome size of selected fishes (C). The GC-content of genomes of interest were calculated using a 20 kb sliding window (BedTools utilities [145]). In a...
Maximum likelihood (ML) tree constructed using 123,594 SNPs from Lates calcarifer with Indian region (red), S-E Asia/Philippines (green) and Australia/Papua New Guinea (blue).
(TIF)
Map of the tropical Asia Pacific region showing the sampling locations for Asian seabass across its native range.
India-Western coast (orange), India-Eastern coast (brown), Cambodia (red), Thailand-Eastern Coast (purple), Vietnam (pink), Singapore (black), Philippines (yellow), Indonesia-South Jakarta (green), Indonesia-Kalimantan (dark green), Ind...
RepeatMasker output file tabulating the masking results for vertebrate repeat sequences (A)* and for Asian seabass-specific repeat sequences (B)^.
(XLSX)
In silico enzyme selection for optical mapping.
(XLSX)
Three evaluated MapCards for optical mapping.
(XLSX)
Whole genome MapCard collection summary for optical mapping.
(XLSX)
Details of Asian seabass whole genome resequencing effort.
(XLSX)
Truss morphometric analyses of Asian seabass individuals from the three regions.
(XLSX)
Functions over-represented in duplicated genes of the Asian seabass.
(XLSX)
Cross-validation error analyses to identify the number of Ks which explain variation in the Asian seabass species complex.
Cross-validation methodology was used to find number of Ks (clusters/population) which better explain observed variation. The best model was obtained at K = 3, with the lowest error level.
(TIF)
The Asian seabass genome assembly (v2; blue bars) anchored to the 24 linkage groups (white bars) using 772 markers [21].
Regions indicated in red represent positions of contig/scaffold containing Lca_217 (peri-centromeric sequences).
(TIF)
Truss morphometric analyses of Asian seabass individuals collected from three regions.
Purple and green lines represent truss measurements with blue circles indicating the landmark regions. The descriptions for the landmarks are, 1—tip of the snout, 2—point on dorsal surface of fish that is exactly perpendicular to the base of pectoral fin, 3—anter...
Metrics of the Asian seabass genome assemblies.
(XLSX)
Percentage of repeats in the Asian seabass genome obtained by various tools.
(XLSX)
Details of microsatellites identified in the Asian seabass genome assembly (v2).
(XLSX)
Asian seabass repeat libraries.
(XLSX)
Statistics of tRNAs identified in the Asian seabass genome assembly.
(XLSX)
Chromosome-level assembly of the Asian seabass genome (v3).
(XLSX)
Summary statistics of synteny analyses.
(XLSX)