Peter van Heusden

Peter van Heusden
University of the Western Cape | uwc · South African National Bioinformatics Institute (SANBI)

About

98
Publications
15,547
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,098
Citations
Introduction

Publications

Publications (98)
Article
Full-text available
Genetic evolution of Rift Valley fever virus (RVFV) in Africa has been shaped mainly by environmental changes such as abnormal rainfall patterns and climate change that has occurred over the last few decades. These gradual environmental changes are believed to have effected gene migration from macro (geographical) to micro (reassortment) levels. Pr...
Preprint
An operon is a set of adjacent genes which are transcribed into a single messenger RNA. Operons allow prokaryotes to efficiently circumvent environmental stresses. It is estimated that about 60% of the Mycobacterium tuberculosis genome is arranged into operons, which makes them interesting drug targets in the face of emerging drug resistance. We th...
Preprint
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analys...
Article
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galax...
Article
Full-text available
The growing demands on protein producers and the dwindling available resources have made Hermetia illucens (the black soldier fly, BSF) an economically important species. Insights into the genome of this insect will better allow for robust breeding protocols, and more efficient production to be used as a replacement of animal feed protein. The use...
Article
Full-text available
While the reduction in the cost of WGS is making sequencing more affordable in lower- and middle-income countries (LMICs), public health laboratories in these countries seldom have access to bioinformaticians and system support engineers adept at using the Linux command line and complex bioinformatics software. The COMBAT-TB Workbench provides an o...
Preprint
Full-text available
Whole Genome Sequencing (WGS) is a powerful method for detecting drug resistance, genetic diversity and transmission dynamics of Mycobacterium tuberculosis . Implementation of WGS in public health microbiology laboratories is impeded by a lack of user-friendly, automated and semi-automated pipelines. We present the COMBAT-TB workbench, a modular, e...
Preprint
Full-text available
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to m...
Article
Full-text available
Rooibos (Aspalathus linearis), widely known as a herbal tea, is endemic to the Cape Floristic Region of South Africa (SA). It produces a wide range of phenolic compounds that have been associated with diverse health promoting properties of the plant. The species comprises several growth forms that differ in their morphology and biochemical composit...
Article
Full-text available
Next-generation sequencing (NGS) technologies have revolutionized biological research by generating genomic data that were once unaffordable by traditional first-generation sequencing technologies. These sequencing methodologies provide an opportunity for in-depth analyses of host and pathogen genomes as they are able to sequence millions of templa...
Article
Full-text available
Motivation: Recent advancements in genomic technologies have enabled high throughput cost-effective generation of 'omics' data from M.tuberculosis (M.tb) isolates, which then gets shared via a number of heterogeneous publicly available biological data resources. Albeit useful, fragmented curation negatively impacts the researcher's ability to leve...
Article
Full-text available
The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines ada...
Article
Full-text available
Background: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and...
Article
Full-text available
The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines ada...
Article
Full-text available
Background: Drought is the most disastrous abiotic stress that severely affects agricultural productivity worldwide. Understanding the biological basis of drought-regulated traits, requires identification and an in-depth characterization of genetic determinants using model organisms and high-throughput technologies. However, studies on drought tol...
Article
Full-text available
Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing...
Data
Annotation table for all proteins discovered in this study comparing MetaVir annotations with those derived from the nr and RefSeq databases from NCBI, considering Pfam.
Data
TEM images of a selection of phage morphologies identified in material from the sampling site. Siphovirus and myovirus morphologies were observed in particular abundance.
Data
Multiple sequence alignment of circoviral replication associated proteins (MSA conducted using MAFFT).
Data
Satellite image of the sampling site (Cu-site) showing its position in relation to nearby copper mines (Hope and Gorob). Inset depicts an enlargement of the site showing the trench dug alongside the road with heaps of sampled copper laden material, as well as images of the sampled material (including material with a green copper patina).
Data
Assembly results for the simulated virome datasets SIM1, SIM2, and SIM3 for all contigs larger than 200 nt.
Data
Multiple sequence alignment of circoviral capsid-like proteins (MSA conducted using MAFFT).
Data
Annotation table for the Thalassomonas phage-like genome described in this study (contig_13).
Article
Full-text available
Background The National Institutes of Health (USA) has committed 5 years of funding to the Bioinformatics Network of the Human Heredity and Health in Africa initiative. This pan-African network aims to develop capacity for bioinformatics research, in order to provide support to human health genomics research programs ongoing on the continent. Over...
Article
Full-text available
Author We describe the genome assembly of Asian seabass (Lates calcarifer), a marine teleost with aquaculture relevance. Though >500 eukaryotic genome sequences are available in public repositories, the majority are highly fragmented with incomplete assemblies, which explains why considerable effort and resources are often spent to improve their q...
Data
Observed k-mer distribution and modeling results. k-mer frequency counting analyses was done for the Illumina genomic reads. Jellyfish [74] was used with the following parameters and commands: jellyfish count -m 21 -s 100000000 -t 5 -o output -C InputFile (counting 21-mer frequencies), jellyfish merge -o output.jf output_* (merging multiple output...
Data
Pipeline for Asian seabass population data analyses. The flowchart outlines the steps taken for analyses of Asian seabass genome sequence information from 62 fishes collected from 13 regions across its geographic range. (TIF)
Data
Asian seabass tandem repeat consensus sequences. (DOCX)
Data
The number of contigs in the primary Asian seabass genome assembly (v1; 3,917 contigs) compared to those of published fish genome assemblies (see S23 Table for more details). (TIF)
Data
Evaluation of the Asian seabass scaffolded genome assembly (v2) by mapping Illumina PE Genome reads to assembly for linear insert size libraries in the size range of 500 bp (A) and 750 bp (B). The 80X Illumina paired-end HiSeq genome sequence data was mapped to the PacBio-based assembled genome using the CLC Genomics Workbench version 8.5.1 mapping...
Data
A screenshot of the Asian seabass genome assembly (v1) showing a location wherein a ~15 kb region missed by short reads has been captured using long reads from PacBio sequencing. (TIF)
Data
The Asian seabass genome assembly contains a more continuous cluster of MHC-class I genes compared to the well-assembled G. aculeatus genome. The L. calcarifer MHC-class I genes were found to be located on eight contigs/scaffolds, four of which were placed onto linkage group 3 (LG3). Four of these eight contigs/scaffolds were also >1Mb in length. T...
Data
Tandem repeats with highest coverage of 23-mer HiSeq reads. (XLSX)
Data
Summary of transposable elements identified in the Asian seabass genome. (XLSX)
Data
Placement of genome sequences ≥40kb on assembled Maptigs. (XLSX)
Data
Inventory of 247 overlaps between ends of neighbouring contigs that were closed during scaffolding of the Asian seabass genome. (XLSX)
Data
Read statistics for B chromosome-derived sequences. (XLSX)
Data
Assembly of B chromosome-derived fragment statistics. (XLSX)
Data
Functional annotation of Asian seabass protein-coding genes. The number of genes in top ten entries for A) Interpro B) KEGG pathways and C) GO. (TIF)
Data
PCA plots of Asian seabass populations using SNPs. A total of 64,634 SNPs were used for PCA analyses. The results at different percentages of explained variation are shown in A) and B). (TIF)
Data
De novo assembly of Single Molecule Restriction Maps. (XLSX)
Data
Statistics of sequence placement (only sequences ≥40kb) on assembled Maptigs. (XLSX)
Data
Inventory of potentially misassembled sequences identified by the optical map data. (XLSX)
Data
Summary of synteny blocks shared between L. calcarifer and D. labrax. (XLSX)
Data
Summary of synteny blocks shared between L. calcarifer and G. aculeatus. (XLSX)
Data
Sample collection details for Asian seabass whole genome resequencing effort. (XLSX)
Data
Summary of sequenced fish genomes. (XLSX)
Data
Comparison of GC content of Asian seabass genome assembly (v2)with few selected fish genomes (A), with representatives from the different classes of vertebrates (B) and comparison of GC content with genome size of selected fishes (C). The GC-content of genomes of interest were calculated using a 20 kb sliding window (BedTools utilities [145]). In a...
Data
Maximum likelihood (ML) tree constructed using 123,594 SNPs from Lates calcarifer with Indian region (red), S-E Asia/Philippines (green) and Australia/Papua New Guinea (blue). (TIF)
Data
Map of the tropical Asia Pacific region showing the sampling locations for Asian seabass across its native range. India-Western coast (orange), India-Eastern coast (brown), Cambodia (red), Thailand-Eastern Coast (purple), Vietnam (pink), Singapore (black), Philippines (yellow), Indonesia-South Jakarta (green), Indonesia-Kalimantan (dark green), Ind...
Data
RepeatMasker output file tabulating the masking results for vertebrate repeat sequences (A)* and for Asian seabass-specific repeat sequences (B)^. (XLSX)
Data
In silico enzyme selection for optical mapping. (XLSX)
Data
Three evaluated MapCards for optical mapping. (XLSX)
Data
Whole genome MapCard collection summary for optical mapping. (XLSX)
Data
Details of Asian seabass whole genome resequencing effort. (XLSX)
Data
Truss morphometric analyses of Asian seabass individuals from the three regions. (XLSX)
Data
Functions over-represented in duplicated genes of the Asian seabass. (XLSX)
Data
Cross-validation error analyses to identify the number of Ks which explain variation in the Asian seabass species complex. Cross-validation methodology was used to find number of Ks (clusters/population) which better explain observed variation. The best model was obtained at K = 3, with the lowest error level. (TIF)
Data
The Asian seabass genome assembly (v2; blue bars) anchored to the 24 linkage groups (white bars) using 772 markers [21]. Regions indicated in red represent positions of contig/scaffold containing Lca_217 (peri-centromeric sequences). (TIF)