Marghoob Mohiyuddin's research while affiliated with Roche Sequencing and Life Science and other places

Publications (16)

Article
e14074 Background: Accurate and comprehensive interpretation of genomic variants has become a bottleneck in clinical sequencing applications due to the accelerated implementation of precision oncology and the rapid growth of relevant biomedical findings. We therefore are motivated to build Ephesus, a framework enabling curation of clinical evidence...
Article
Full-text available
Investigation of large structural variants (SVs) is a challenging yet important task in understanding trait differences in highly repetitive genomes. Combining different bioinformatic approaches for SV detection, we analyzed whole-genome sequencing data from 3000 rice genomes and identified 63 million individual SV calls that grouped into 1.5 milli...
Article
Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyse 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By inve...
Article
Full-text available
Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous wo...
Article
Full-text available
VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously repo...
Conference Paper
Background / Purpose: Currently there is a lack of comprehensive simulation validation framework for next generation sequencing (NGS) analysis. Multiple agreed-upon validation datasets are critical for development of new secondary analysis methods, and read simulation is a bottleneck when simulating high coverage data. The genome in a bottle cons...
Conference Paper
Background / Purpose: Structural variations (SVs) are large genomic rearrangements, including deletion, insertion, inversion, duplication and translocation. SV detection is a key challenge with next-generation sequencing reads since SVs are generally much larger than read length. Accuracy of SV detection varies significantly by type, region and s...

Citations

... An analysis of 3000 rice genomes identified over 63 million structural variants and classified each SV as either an insertion, deletion, inversion, or duplication (Fuentes et al., 2019). Based on these designations researchers determined that deletions were the most prevalent form of SVs, followed by insertions and duplications; inversions were the least common SV identified in this study. ...
... [24] with the following parameters: -report-genotype-likelihood-max, -minbase-quality 10, -min-mapping-quality 20, -genotype-qualities, -use-mapping-quality, -no-mnps, -no-complex, -max-complex-gap 50, -min-alternate-fraction 0.1, -min-repeatentropy 1, -no-partial-observations-min-coverage 10, -max-coverage 500, and -pooledcontinuous. Structural variants were called on individual fish using Parliament2 v2.0 [25], which consists of an ensemble of callers, including Breakdancer v1.4.3 [26], BreakSeq2 v2.2 [27], CNVnator v0.3.3 [28], Delly v0.7.2 [29], Lumpy v0.2.13 [30], and Manta v1.4.0 [31]. For variant frequencies and densities, structural variant call-sets of the full cohort of 80 fish and the 32 fish of the training and test sets were merged using SURVIVOR v1.0.3 [32]. ...
... Compared with many previous successful CNV calling methods based on bulk tissue sequencing data [223][224][225][226][227][228][229][230][231][232][233], CNV detection from scRNA-seq is challenging due to several technical limitations, including low and non-uniform genome coverage, amplification biases [234,235] and prevalent monoallelic detection due to transcriptional stochasticity [234,[236][237][238]. The monoallelic bias is more pronounced for lowly expressed genes than highly expressed genes. ...
... Over the past few years, multiple methods have been proposed and implemented to simulate SV (e.g., VarSim [67], SURVIVOR [68], etc.) and simulate read data (e.g., Nano-Sim [69], PBsim [70], etc.) [71]. While these methods are helpful for rapid, early understanding of the utility of an SV caller, they often under-represent the complexity of SVs either at the level of the allele itself or in the regions they tend to occur (e.g., repetitive). ...