Hannes HauswedelldeCODE genetics, Inc. · Statistics department
Hannes Hauswedell
Dr. rer. nat. / PhD Bioinformatics
About
21
Publications
3,586
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
460
Citations
Introduction
Additional affiliations
January 2013 - present
Publications
Publications (21)
Motivation
Local alignments of query sequences in large databases represent a core part of metagenomic studies and facilitate homology search. Following the development of NCBI Blast, many applications aimed to provide faster and equally sensitive local alignment frameworks. Most applications focus on protein alignments, while only few also facilit...
Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts wit...
This chapter integrates the results of Chaps. 2 and 3 into a new library design as well as covering questions relating to SeqAn3 as a project and its interactions with other libraries and applications.
Sequence analysis is a domain in bioinformatics which encompasses all computer-aided studies of biological sequence data. This data is produced from molecules such as DNA and RNA, which store a cell’s genetic information, and proteins, which are the “machines” of a cell and provide a myriad of functions including signalling, metabolism and immune r...
Sequence alignment is an arrangement of two or more sequences that visualises which regions are conserved between the set of sequences and which regions differ. Typically, one assumes that the compared sequences are of common evolutionary descent and that mutation events have introduced changes between them, but differences may also be the result o...
The last chapter of this book contains the conclusion with a summary of the previous discussion sections.
The C++ programming language is a general purpose programming language created by Bjarne Stroustrup in the early 1980s. The original intent was to extend the C programming language by features for object-orientation similar to the programming language Simula that was popular at the time. Current versions of C++ combine elements of procedural, funct...
The search module offers data structures and algorithms for efficiently finding exact and approximate matches between the so-called query sequences and a text (also called subject sequence(s) or the reference). Query and subject may each be a single sequence or a collection of sequences, and typically the total subject size is much larger than the...
The reading and writing of files is a crucial part in almost all bioinformatics pipelines. In contrast to other computer-aided sciences that often deal with computationally expensive problems on a small set of input data (e.g. molecular dynamics), sequence analysis in bioinformatics is especially data-intensive. This chapter covers low-level stream...
Based on the analysis of SeqAn2 in Chap. 2 and the ambitious plans developed for SeqAn3 in the previous chapter, the first thing to establish is that SeqAn3 will have to be a new library, not a mere improvement on SeqAn2. While many of its designs will be inspired by SeqAn2, the fundamental shifts in the employed programming techniques mandate star...
This chapter gives a brief overview of the SeqAn library, important design goals and programming principles, as well as an analysis of in how far these were reached. I will discuss all aspects that I deem necessary to understanding the design and development process of SeqAn3.
This chapter introduces lambda3, a new version of the LAMBDA local alignment application. Background information on homology search, local alignment computation and prior research in this area is given. This includes the author’s contributions to this domain as well as an analysis of the current applications of other authors. But the main purpose o...
SeqAn3 uses and recommends using many containers and views provided by the standard library (or indirectly through SeqAn3’s STD module). This chapter introduces many general-purpose (“non-biological”) ranges that are not (yet) part of the standard library as well as “biological” or “bioinformatical” ranges specific to SeqAn3 alphabets.
We describe the analysis of whole genome sequencing (WGS) of 150,119 individuals from the UK biobank (UKB). This yielded a set of high quality variants, including 585,040,410 SNPs, representing 7.0% of all possible human SNPs, and 58,707,036 indels. The large set of variants allows us to characterize selection based on sequence variation within a p...
This thesis introduces SeqAn3, a new software library built with Modern C++ to solve problems from the domain of sequence analysis in bioinformatics. It discusses previous versions of the library in detail and explains the importance of highly performing programming languages like C++. Complexity in the design of the library and of the programming...
Background:
The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome Venter et al. (2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, ow...
Motivation: Next-generation sequencing technologies produce unprecedented amounts of data, leading to completely new research fields. One of these is metagenomics, the study of large-size DNA samples containing a multitude of diverse organisms. A key problem in metagenomics is to functionally and taxonomically classify the sequenced DNA, to which e...