• Home
  • Daniel Blankenberg
Daniel Blankenberg

Daniel Blankenberg
Lerner Research Institute / Cleveland Clinic Lerner College of Medicine · Genomic Medicine Institute

PhD

About

78
Publications
16,353
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,389
Citations

Publications

Publications (78)
Article
Cell division, wherein 1 cell divides into 2 daughter cells, is fundamental to all living organisms. Cytokinesis, the final step in cell division, begins with the formation of an actomyosin contractile ring, positioned midway between the segregated chromosomes. Constriction of the ring with concomitant membrane deposition in a specified spatiotempo...
Article
Full-text available
Millions of transcriptomic profiles have been deposited in public archives, yet remain underused for the interpretation of new experiments. We present a method for interpreting new transcriptomic datasets through instant comparison to public datasets without high-performance computing requirements. We apply Principal Component Analysis on 536 studi...
Preprint
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analys...
Article
Full-text available
Background Computational methods based on initial screening and prediction of peptides for desired functions have proven to be effective alternatives to lengthy and expensive biochemical experimental methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing pro...
Article
Full-text available
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galax...
Article
Full-text available
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely acc...
Article
Measurement of traffic emissions has gained a lot of interest in recent times due to its contribution to urban pollution. This paper reports the outcome from an unmanned aerial vehicle (UAV) based measurement of PM concentration near an urban roadway at Kolkata, India. A total of 54 flights were carried out for simultaneous measurements of PM1, PM2...
Preprint
Purpose Large copy number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype-phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that require the in...
Preprint
Cell division, wherein one cell divides into two daughter cells, is fundamental to all living organisms. Cytokinesis, the final step in cell division, begins with the formation of an actomyosin contractile ring, positioned midway between the segregated chromosomes. Constriction of the ring with concomitant membrane deposition in a spatiotemporal ma...
Article
Low-cost sensors have the potential to revolutionize air pollution research by providing high spatial resolution data. However, data accuracy from the low-cost sensors remains a major concern. Therefore, it is necessary to evaluate the performance of the low-cost sensors. The current study evaluated the performance of two such low cost PM (particul...
Article
Full-text available
Literature exploration in PubMed on a large number of biomedical entities (e.g., genes, diseases or experiments) can be time-consuming and challenging, especially when assessing associations between entities. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among...
Preprint
Full-text available
Division of one cell into two daughter cells is fundamental in all living organisms. Cytokinesis, the final step of cell division, begins with the formation of an actomyosin contractile ring, positioned midway between the segregated chromosomes. Constriction of the ring with concomitant membrane deposition in a spatiotemporal precision generates a...
Preprint
Full-text available
Computational methods based on initial screening and prediction of peptides for desired functions have been proven effective alternatives to the lengthy and expensive methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries and the lack o...
Article
Modern biology continues to become increasingly computational. Datasets are becoming progressively larger, more complex, and more abundant. The computational savviness necessary to analyze these data creates an ongoing obstacle for experimental biologists. Galaxy (galaxyproject.org) provides access to computational biology tools in a web-based inte...
Preprint
Full-text available
A growing number of biomedical methods and protocols are being disseminated as open-source software packages. When put in concert with other packages, they can execute in-depth and comprehensive computational pipelines. Therefore, their integration with other software packages plays a prominent role in their adoption in addition to their availabili...
Article
Full-text available
Background The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more...
Preprint
Full-text available
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to m...
Article
Full-text available
The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, an...
Preprint
Full-text available
Literature exploration in PubMed on a large number of biomedical entities (e.g., genes, diseases, experiments) can be time consuming and challenging comparing many entities to one other. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among a set of entities bas...
Article
Full-text available
Galaxy project is a community driven effort to promote openness and reproducibility of data analyses in biomedical research. Our original publication has made a grave mistake of omitting many key participants of this effort. Here we are attempting to correct this error. We are including a new, greatly expanded version of Table 1. Again, we regret o...
Article
Full-text available
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV...
Preprint
Full-text available
Background The vast ecosystem of single-cell RNA-seq tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more toward...
Article
Full-text available
Galaxy (https://galaxyproject.org) is a web-based computational workbench used by tens of thousands of scientists across the world to analyze large biomedical datasets. Since 2005, the Galaxy project has fostered a global community focused on achieving accessible, reproducible, and collaborative research. Together, this community develops the Galax...
Article
Full-text available
Serine palmitoyltransferase (SPT) long-chain base subunit 1 (SPTLC1) is 1 of the 2 main catalytic subunits of the SPT complex, which catalyzes the first and rate-limiting step of sphingolipid biosynthesis. Here, we show that Sptlc1 deletion in adult bone marrow (BM) cells results in defective myeloid differentiation. In chimeric mice from noncompet...
Conference Paper
You've written software, published the code, and described it in a paper. Now, how do you make your software stand out and actually get used? This tutorial introduces two technologies that can make it easy to deploy by researchers around the world and greatly increase your software's reach. Bioconda (https://bioconda.github.io/) is a platform for p...
Preprint
Interoperability of datasets, tools, and resources is essential to modern scientific investigation and analysis. The necessity to gather disparate datasets together, perform analysis with a collection of discrete tools, and visualize the results remains a standard approach for exploring and making sense across scientific research domains. Here, we...
Article
Full-text available
The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance o...
Preprint
Full-text available
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is yet to be defined. In this study, we manually curated 1235 SVs which can ultimately be used to evaluate SV...
Article
Full-text available
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced...
Article
Full-text available
Gut and oral microbiota perturbations have been observed in obese adults and adolescents; less is known about their influence on weight gain in young children. Here we analyzed the gut and oral microbiota of 226 two-year-olds with 16S rRNA gene sequencing. Weight and length were measured at seven time points and used to identify children with rapid...
Article
Full-text available
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced...
Article
Full-text available
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three k...
Article
Full-text available
Research in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today’s life sciences ranging from cancer biology to microbial ec...
Preprint
Full-text available
We present Bioconda (https://bioconda.github.io), a distribution of bioinformatics software for the lightweight, multi-platform and language-agnostic package manager Conda. Currently, Bioconda offers a collection of over 3000 software packages, which is continuously maintained, updated, and extended by a growing global community of more than 200 co...
Preprint
Full-text available
Gut and oral microbiome perturbations have been observed in obese adults and adolescents. Less is known about how weight gain in early childhood is influenced by gut, and particularly oral, microbiomes. Here we analyze the relationships among weight gain and gut and oral microbiomes in 226 two-year-olds who were followed during the first two years...
Article
Full-text available
High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has le...
Article
Complex biomedical analyses require the use of multiple software tools in concert and remain challenging for much of the biomedical research community. We introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource that currently supports the streamlined interaction of 20 bioinformatics tools and data resources...
Code
Command-line utilities to assist in developing tools for the Galaxy Project. http://galaxyproject.org
Article
Full-text available
The availability of high-throughput sequencing has created enormous possibilities for scientific discovery. However, the massive amount of data being generated has resulted in a severe informatics bottleneck. A large number of tools exist for analyzing next-generation sequencing (NGS) data, yet often there remains a disconnect between these researc...
Article
Full-text available
The manifestation of mitochondrial DNA (mtDNA) diseases depends on the frequency of heteroplasmy (the presence of several alleles in an individual), yet its transmission across generations cannot be readily predicted owing to a lack of data on the size of the mtDNA bottleneck during oogenesis. For deleterious heteroplasmies, a severe bottleneck may...
Article
Full-text available
The extraordinary throughput of next-generation sequencing (NGS) technology is outpacing our ability to analyze and interpret the data. This chapter will focus on practical informatics methods, strategies, and software tools for transforming NGS data into usable information through the use of a web-based platform, Galaxy. The Galaxy interface is ex...
Article
Full-text available
The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently....
Article
Full-text available
Polymorphism discovery is a routine application of next-generation sequencing technology where multiple samples are sent to a service provider for library preparation, subsequent sequencing, and bioinformatic analyses. The decreasing cost and advances in multiplexing approaches have made it possible to analyze hundreds of samples at a reasonable co...
Article
Full-text available
The proliferation of web-based integrative analysis frameworks has enabled users to perform complex analyses directly through the web. Unfortunately, it also revoked the freedom to easily select the most appropriate tools. To address this, we have developed Galaxy ToolShed.
Article
Full-text available
Whole genome sequencing (WGS) allows researchers to pinpoint genetic differences between individuals and significantly shortcuts the costly and time-consuming part of forward genetic analysis in model organism systems. Currently, the most effort-intensive part of WGS is the bioinformatic analysis of the relatively short reads generated by second ge...
Article
Full-text available
To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
Data
Table S1 - overview of Mouse ENCODE data. Snapshot of data generated by the Mouse ENCODE Consortium and released through University of California Santa Cruz (UCSC) browser. Vertical axis: cell lines and ex vivo cells and tissues. The originating cell type is shown in parentheses next to each line. For mouse embryonic or fetal tissues, the developme...
Article
Innovations in biomedical research technologies continue to provide experimental biologists with novel and increasingly large genomic and high-throughput data resources to be analyzed. As creating and obtaining data has become easier, the key decision faced by many researchers is a practical one: where and how should an analysis be performed? Datas...
Article
Full-text available
Here we describe a set of tools implemented within the Galaxy platform designed to make analysis of multiple genome alignments truly accessible for biologists. These tools are available through both a web-based graphical user interface and a command-line interface. This open-source toolset was implemented in Python and has been integrated into the...
Article
Full-text available
Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we desc...
Article
Full-text available
Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps. Availability and Implementation: This open-source toolset was implemented in Python and has been in...