Marcel Ehrhardt’s research while affiliated with Freie Universität Berlin and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (5)


Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading
  • Article

October 2018

·

149 Reads

·

43 Citations

Bioinformatics

·

Stefan Budach

·

Pascal Costanza

·

[...]

·

Motivation: Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (single instruction multiple data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we (a) distribute many independent alignments on multiple threads and (b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. Results: We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon PhiTM (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon PhiTM and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. Availability and implementation: The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4 under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME: SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. Supplementary information: Supplementary data are available at Bioinformatics online.


The SeqAn C++ template library for efficient sequence analysis: A resource for programmers

September 2017

·

354 Reads

·

104 Citations

Journal of Biotechnology

Background: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome Venter et al. (2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 Döring et al. (2008). Results: The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools. Conclusions: We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.


Delta-Fast Tries: Local Searches in Bounded Universes with Linear Space

July 2017

·

13 Reads

·

2 Citations

Lecture Notes in Computer Science

Let wNw \in {\mathbb {N}} and U={0,1,,2w1}U = \{0, 1, \dots , 2^w-1\} be a bounded universe of w-bit integers. We present a dynamic data structure for predecessor searching in U. Our structure needs O(loglogΔ)O(\log \log \varDelta ) time for queries and O(loglogΔ)O(\log \log \varDelta ) expected time for updates, where Δ\varDelta is the difference between the query element and its nearest neighbor in the structure. Our data structure requires linear space. This improves a result by Bose et al. [CGTA, 46(2), pp. 181–189].


EPR-Dictionaries: A Practical and Fast Data Structure for Constant Time Searches in Unidirectional and Bidirectional FM Indices

April 2017

·

39 Reads

·

16 Citations

Lecture Notes in Computer Science

The unidirectional FM index was introduced by Ferragina and Manzini in 2000 and allows to search a pattern in the index in one direction. The bidirectional FM index (2FM) was introduced by Lam et al. in 2009. It allows to search for a pattern by extending an infix of the pattern arbitrarily to the left or right. If σ\sigma is the size of the alphabet then the method of Lam et al. can conduct one step in time O(σ)\mathcal {O}(\sigma ) while needing space O(σn)\mathcal {O}(\sigma \cdot n) using constant time rank queries on bit vectors. Schnattinger and colleagues improved this time to O(logσ)\mathcal {O}(\log \sigma ) while using O(logσn)\mathcal {O}(\log \sigma \cdot n) bits of space for both, the FM and 2FM index. This is achieved by the use of binary wavelet trees.


Fig. 2: Plot of the runtime for EPR for different alphabets and the runtime of WT divided by log σ.
Fig. 5: 2-level dictionary. Blocks and superblocks are allocated for each character (only one shown).
Constant-time and space-efficient unidirectional and bidirectional FM-indices using EPR-dictionaries
  • Article
  • Full-text available

August 2016

·

127 Reads

·

3 Citations

We introduce a new method for conducting an exact search in a uni- and bidirectional FM index in O(1)\mathcal{O}(1) time per step while using O(logσn)+o(logσσn)\mathcal{O}(\log \sigma \cdot n) + o(\log \sigma \cdot \sigma \cdot n) bits of space. This is done by replacing the wavelet tree by a new data structure, the \emph{Enhanced Prefixsum Rank dictionary} (EPR-dictionary). To our knowledge this is the first constant time method for a search step in 2FM indices and a space improvement for FM indices. We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between 2.74.6\approx 2.7-4.6 times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping.

Download

Citations (4)


... Consequently, a number of parallelized implementations have been developed for computing pairwise sequence alignments on a variety of architectures including CPUs [2][3][4][5][6][7], GPUs [8][9][10][11][12][13][14][15][16][17][18][19], and FPGAs [20][21][22][23]. They typically target two types of application scenarios: ...

Reference:

CUDASW++4.0: ultra-fast GPU-based Smith–Waterman protein sequence database search
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading
  • Citing Article
  • October 2018

Bioinformatics

... Software Baselines: We compared the throughput of DP-HLS Kernels #1-7, #11-12, #15 with state-of-the-art parallel CPU implementations, using SeqAn3 [76] (v3.3.0), a widely-used, multi-threaded bioinformatics library, as the baseline. For the Two-Piece affine (#5) and protein sequence alignment (#15) kernels, we used Minimap2 [48] (v2.28) and the command-line version of EMBOSS Water [44] (v6.6.0) as our software baselines, respectively. ...

The SeqAn C++ template library for efficient sequence analysis: A resource for programmers
  • Citing Article
  • September 2017

Journal of Biotechnology

... Exact occurrences of a search pattern P in T are represented in the bidirectional FM-index by two intervals: an interval [b, e[ over SA and an interval [b r , e r [ over SA r , such that all suffixes T SA[i] for b ≤ i < e have P as their prefix while suffixes T r SA r [i] for b r ≤ i < e r are prefixed by P r , the reverse of P. For example, for search pattern P = "ATG", SA [3, 5[ refers to the suffixes of T prefixed by P, while SA r [9,11[ refers to suffixes of T r prefixed by P r = "GTA". Patterns are matched character by character: given a pattern P and its intervals [b, e[ and [b r , e r [ , the intervals [b ′ , e ′ [ and [b r ′ , e r ′ [ of the extended pattern cP (extendBackward) or Pc (extendForward) can be found in O(1) time [49]. In other words, the key functionality of a bidirectional FMindex entails that a partial match can be extended with a character either to the left or to the right. ...

EPR-Dictionaries: A Practical and Fast Data Structure for Constant Time Searches in Unidirectional and Bidirectional FM Indices
  • Citing Conference Paper
  • April 2017

Lecture Notes in Computer Science

... Note that many variants of the BWT-index have been suggested including BWT-index for graphs [367,368], BWT-index of alignments [526,527], bi-directional BWT-index [528,529,530], relative BWT-index [531], or dynamic BWT-index [532]. ...

Constant-time and space-efficient unidirectional and bidirectional FM-indices using EPR-dictionaries