Samuel LampaKarolinska University Hospital | Karolinska · Department of Clinical Microbiology
Samuel Lampa
PhD
Bioinformatician
About
61
Publications
24,059
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
871
Citations
Introduction
Bioinformatician in Clinical Microbiology at the Karolinska University Hospital. PhD alumn from the Pharmaceutical Bioinformatics group (http://pharmb.io) at Dept. of Pharmaceutical Biosciences, Uppsala University, where I have been researching the handling of large and complex data and processing in pharmaceutical biosciences using approaches such as scientific workflow management systems, HPC and cloud computing, as well as semantic technologies, especially for querying and data integration.
Additional affiliations
Education
September 2004 - May 2010
Publications
Publications (61)
Background
The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex wor...
Ligand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid...
Background
Biological sciences are characterised not only by an increasing amount but also the extreme complexity of its data. This stresses the need for efficient ways of integrating these data in a coherent description of biological systems. In many cases, biological data needs organization before integration. This is not seldom a collaborative e...
Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water–octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and eval...
Containers are gaining popularity in life science research as they encompass all dependencies of provisioned tools and simplifies software installations for end users, as well as offering a form of isolation between processes. Scientific workflows are ideal to chain containers into data analysis pipelines to sustain reproducible science. In this ma...
Containers are gaining popularity in life science research as they provide a solution for encompassing dependencies of provisioned tools, simplify software installations for end users and offer a form of isolation between processes. Scientific workflows are ideal for chaining containers into data analysis pipelines to aid in creating reproducible a...
Containers are gaining popularity in life science research as they provide a solution for encompassing dependencies of provisioned tools, simplify software installations for end users and offer a form of isolation between processes. Scientific workflows are ideal for chaining containers into data analysis pipelines to aid in creating reproducible a...
The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computa...
Scientific workflows are becoming increasingly popular as a way to automate complex scientific computations consisting of multiple programs. One of the main motivations behind this development is increased robustness and reproducibility of computational analyses. Chaining together multiple programs using plain scripts, as is often the first step in...
The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance o...
Background
Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data fo...
Background: Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data f...
Containers are gaining popularity in life science research as they encompass all dependencies of provisioned tools and simplifies software installations for end users, as well as offering a form of isolation between processes. Scientific workflows are ideal to chain containers into data analysis pipelines to sustain reproducible science. In this ma...
Background
The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation and aid reproducibility of analyses. Many contemporary workflow tools are specialized and not designed for highly complex wor...
Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. T...
Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud co...
Poster presented at the Swedish e-Science Academy 2016 in Lund, 12-13 October 2016, about our work on developing a flexible workflow solution with dynamic scheduling, encapsulated components, and declarative, separate workflow definition, to enable agile workflow development in drug discovery.
The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million...
High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that...
Presents briefly our work with automating Machine Learning computations in Drug Discovery, using Spotify's Luigi workflow tool. We found some limitations with Luigi for the use in the complex and dynamic nature of explorative scientific workflows, and have created a library on top of Luigi, called Scientific Luigi (SciLuigi) to overcome these limit...
Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. H...
Semantic web technologies are finding their way into the life sciences. Ontologies and semantic markup have already been used for more than a decade in molecular sciences, but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods show to be sufficiently versatile to change that situa...
Notation3 file representing methoxymethane.
SPARQL query to extract an IC50 QSAR training data.
Bioclipse Scripting Language script that queries Bio2RDF for HIV proteins.
Bioclipse Scripting Language script that queries DBPedia for small molecules.
Notation3 file of methanol using the CDK data model.
Notation3 file showing a QSAR descriptor calculation output.
Notation3 file with an NMR spectrum.
Bioclipse Scripting Language script to search NMR spectra in a database.
SPARQL query to create a proteochemometrics data set for a ion channel proteins.
Bioclipse Scripting Language script demonstrating how Prolog code can be run in Bioclipse.
Prolog script defining spectral similarity which allows searching the NMRShiftDB RDF data for matching spectra.
The huge amounts of data produced in high-throughput techniques in the life sciences and the need for integration of heterogeneous data from disparate sources in new fields such as Systems Biology and translational drug development require better approaches to data integration. The semantic web is anticipated to provide solutions through new format...
Questions
Questions (3)
I'm wondering whether, in the central dogma of DNA-[transcription]->RNA-[translation]->Protein process in Biology, whether it happens in such a way, in any organism, that the translation to a protein of a specific mRNA strand starts before its transcription is finished?
That is, that the mRNA strand is simultaneously connected to the trasncription, and the translation machinery (on each side of the nucleus wall, I assume)?
I was wondering if anyone is using the Java based http://processing.org/ library for doing molecular biology simulations, and if so, what are your experiences with it? E.g, what is its strengths / weaknesses compared to other simulations techniques?
What is a suitable ABM framework for learning? That is, something where you can quickly get up to speed and play around with it, to get a feel for how ABM works.
It has to be suited for biological simulations, since that is what I would like to use it for in the end.
I have seen about Flame, Breve, Spade, MASON, Swarm etc, but it is hard to know in beforehand what it is like to work in the respective tools, what is the learning curve etc.