Martin Dahlö

Martin Dahlö
Uppsala University | UU · Science for Life Laboratory

About

33
Publications
2,175
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
244
Citations

Publications

Publications (33)
Article
Full-text available
Background Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered “data hierarchy". We introduce the HASTE Toolkit, a p...
Preprint
Full-text available
This paper introduces the HASTE Toolkit , a cloud-native software toolkit capable of partitioning data streams in order to prioritize usage of limited resources. This in turn enables more efficient data-intensive experiments. We propose a model that introduces automated, autonomous decision making in data pipelines, such that a stream of data can b...
Article
Full-text available
Background Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for app...
Article
Full-text available
Background The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex wor...
Article
Full-text available
Scientific workflows are becoming increasingly popular as a way to automate complex scientific computations consisting of multiple programs. One of the main motivations behind this development is increased robustness and reproducibility of computational analyses. Chaining together multiple programs using plain scripts, as is often the first step in...
Preprint
Full-text available
Background The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation and aid reproducibility of analyses. Many contemporary workflow tools are specialized and not designed for highly complex wor...
Article
Full-text available
Background Next-Generation Sequencing (NGS) has transformed the life sciences and many research groups are newly dependent upon computer clusters to store and analyse large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPP...
Article
Full-text available
Background: Silver-based products have been marketed as an alternative to antibiotics, and their consumption has increased. Bacteria may, however, develop resistance to silver. Aim: To study the presence of genes encoding silver resistance (silE, silP, silS) over time in three clinically important Enterobacteriaceae genera. Methods: Using poly...
Article
Full-text available
With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processin...
Article
Full-text available
One of the foundations of the scientific method is to be able to reproduce experiments and corroborate the results of research that has been done before. However, with the increasing complexities of new technologies and techniques, coupled with the specialisation of experiments, reproducing research findings has become a growing challenge. Clearly,...
Article
Full-text available
One of the foundations of the scientific method is to be able to reproduce experiments and corroborate the results of research that has been done before. However, with the increasing complexities of new technologies and techniques, coupled with the specialisation of experiments, reproducing research findings has become a growing challenge. Clearl...
Article
Full-text available
Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. H...
Article
As sequencing get cheaper more and more researchers turn to this technology for answers to their questions. The large amounts of generated data will have to be stored somewhere, and the tools to analyse it will have to be updated constantly. UPPNEX tries to solve these problems through high performance computing, large scale and high availability s...
Article
Full-text available
Alternations of cellular gene expression following an adenovirus type 2 infection of human primary cells were studied by using superior sensitive cDNA sequencing. In total, 3791 cellular genes were identified as differentially expressed more than 2-fold. Genes involved in DNA replication, RNA transcription and cell cycle regulation were very abunda...

Network

Cited By

Projects

Projects (2)
Project
FOSTER (2014-2016 and 2017-2019) facilitates implementation of EC`s Open Science agenda through training all key actors in the academic ecosystem (https://www.fosteropenscience.eu).
Project
The goal is to enable agile workflow design, by being able to build up a component library of re-usable analysis components that can be connected together ad-hoc into workflows that support dynamic scheduling, to enable highly complex machine learning workflows in Drug Discovery.