Samuel Lampa

Samuel Lampa
Karolinska University Hospital | Karolinska · Department of Clinical Microbiology

PhD
Bioinformatician

About

61
Publications
24,059
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
871
Citations
Introduction
Bioinformatician in Clinical Microbiology at the Karolinska University Hospital. PhD alumn from the Pharmaceutical Bioinformatics group (http://pharmb.io) at Dept. of Pharmaceutical Biosciences, Uppsala University, where I have been researching the handling of large and complex data and processing in pharmaceutical biosciences using approaches such as scientific workflow management systems, HPC and cloud computing, as well as semantic technologies, especially for querying and data integration.
Additional affiliations
July 2014 - September 2018
Uppsala University
Position
  • PhD Student
Description
  • Information about and link to thesis: https://bionics.it/posts/phdthesis
Education
September 2004 - May 2010
Uppsala University
Field of study
  • Molecular Biotechnology Engineering

Publications

Publications (61)
Article
Full-text available
Background The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex wor...
Article
Full-text available
Ligand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid...
Article
Full-text available
Background Biological sciences are characterised not only by an increasing amount but also the extreme complexity of its data. This stresses the need for efficient ways of integrating these data in a coherent description of biological systems. In many cases, biological data needs organization before integration. This is not seldom a collaborative e...
Article
Full-text available
Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water–octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and eval...
Preprint
Containers are gaining popularity in life science research as they encompass all dependencies of provisioned tools and simplifies software installations for end users, as well as offering a form of isolation between processes. Scientific workflows are ideal to chain containers into data analysis pipelines to sustain reproducible science. In this ma...
Article
Full-text available
Containers are gaining popularity in life science research as they provide a solution for encompassing dependencies of provisioned tools, simplify software installations for end users and offer a form of isolation between processes. Scientific workflows are ideal for chaining containers into data analysis pipelines to aid in creating reproducible a...
Preprint
Containers are gaining popularity in life science research as they provide a solution for encompassing dependencies of provisioned tools, simplify software installations for end users and offer a form of isolation between processes. Scientific workflows are ideal for chaining containers into data analysis pipelines to aid in creating reproducible a...
Article
Full-text available
The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computa...
Article
Full-text available
Scientific workflows are becoming increasingly popular as a way to automate complex scientific computations consisting of multiple programs. One of the main motivations behind this development is increased robustness and reproducibility of computational analyses. Chaining together multiple programs using plain scripts, as is often the first step in...
Article
Full-text available
The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance o...
Article
Full-text available
Background Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data fo...
Preprint
Full-text available
Background: Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data f...
Preprint
Full-text available
Containers are gaining popularity in life science research as they encompass all dependencies of provisioned tools and simplifies software installations for end users, as well as offering a form of isolation between processes. Scientific workflows are ideal to chain containers into data analysis pipelines to sustain reproducible science. In this ma...
Preprint
Full-text available
Background The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation and aid reproducibility of analyses. Many contemporary workflow tools are specialized and not designed for highly complex wor...
Article
Full-text available
Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. T...
Article
Full-text available
Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud co...
Poster
Full-text available
Poster presented at the Swedish e-Science Academy 2016 in Lund, 12-13 October 2016, about our work on developing a flexible workflow solution with dynamic scheduling, encapsulated components, and declarative, separate workflow definition, to enable agile workflow development in drug discovery.
Article
Full-text available
The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million...
Article
Full-text available
High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that...
Research
Full-text available
Presents briefly our work with automating Machine Learning computations in Drug Discovery, using Spotify's Luigi workflow tool. We found some limitations with Luigi for the use in the complex and dynamic nature of explorative scientific workflows, and have created a library on top of Luigi, called Scientific Luigi (SciLuigi) to overcome these limit...
Article
Full-text available
Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. H...
Article
Full-text available
Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situa...
Data
SPARQL query to extract an IC50 QSAR training data.
Data
Bioclipse Scripting Language script that queries Bio2RDF for HIV proteins.
Data
Bioclipse Scripting Language script that queries DBPedia for small molecules.
Data
Notation3 file of methanol using the CDK data model.
Data
Notation3 file showing a QSAR descriptor calculation output.
Data
Bioclipse Scripting Language script to search NMR spectra in a database.
Data
SPARQL query to create a proteochemometrics data set for a ion channel proteins.
Data
Bioclipse Scripting Language script demonstrating how Prolog code can be run in Bioclipse.
Data
Prolog script defining spectral similarity which allows searching the NMRShiftDB RDF data for matching spectra.
Article
Full-text available
The huge amounts of data produced in high-throughput techniques in the life sciences and the need for integration of heterogeneous data from disparate sources in new fields such as Systems Biology and translational drug development require better approaches to data integration. The semantic web is anticipated to provide solutions through new format...

Questions

Questions (3)
Question
I'm wondering whether, in the central dogma of DNA-[transcription]->RNA-[translation]->Protein process in Biology, whether it happens in such a way, in any organism, that the translation to a protein of a specific mRNA strand starts before its transcription is finished?
That is, that the mRNA strand is simultaneously connected to the trasncription, and the translation machinery (on each side of the nucleus wall, I assume)?
Question
I was wondering if anyone is using the Java based http://processing.org/ library for doing molecular biology simulations, and if so, what are your experiences with it? E.g, what is its strengths / weaknesses compared to other simulations techniques?
Question
What is a suitable ABM framework for learning? That is, something where you can quickly get up to speed and play around with it, to get a feel for how ABM works.
It has to be suited for biological simulations, since that is what I would like to use it for in the end.
I have seen about Flame, Breve, Spade, MASON, Swarm etc, but it is hard to know in beforehand what it is like to work in the respective tools, what is the learning curve etc.

Network

Cited By