
Daniel Blankenberg- PhD
- Professor (Assistant) at Lerner Research Institute / Cleveland Clinic Lerner College of Medicine
Daniel Blankenberg
- PhD
- Professor (Assistant) at Lerner Research Institute / Cleveland Clinic Lerner College of Medicine
About
97
Publications
23,990
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,984
Citations
Introduction
Current institution
Lerner Research Institute / Cleveland Clinic Lerner College of Medicine
Current position
- Professor (Assistant)
Publications
Publications (97)
The continuingly decreasing cost of next-generation sequencing has recently led to a significant increase in the number of microbiome-related studies, providing invaluable information for understanding host-microbiome interactions and their relation to diseases. A common approach in metagenomics consists of determining the composition of samples in...
Biomarkers play a central role in medicine's gradual progress towards proactive, personalized precision diagnostics and interventions. However, finding biomarkers that provide very early indicators of a change in health status, particularly for multi-factorial diseases, has been challenging. Discovery of such biomarkers stands to benefit significan...
Clinical trials are necessary for assessing the safety and efficacy of treatments. However, trial timelines are severely delayed with minimal success due to a multitude of factors, including imperfect trial site selection, cohort recruitment challenges, lack of efficacy, absence of reliable biomarkers, etc. Each of these factors possesses a unique...
Hypomyelinating leukodystrophy (HLD) is an autosomal recessive disorder characterized by defective central nervous system myelination. Exome sequencing of two siblings with severe cognitive and motor impairment and progressive hypomyelination characteristic of HLD revealed homozygosity for a missense single-nucleotide variant (SNV) in EPRS1 (c.4444...
Despite the recent advancements by deep learning methods such as AlphaFold2, in silico protein structure prediction remains a challenging problem in biomedical research. With the rapid evolution of quantum computing, it is natural to ask whether quantum computers can offer some meaningful benefits for approaching this problem. Yet, identifying spec...
T helper 17 (T H 17) cells are implicated in autoimmune diseases, and several metabolic processes are shown to be important for their development and function. In this study, we report an essential role for sphingolipids synthesized through the de novo pathway in T H 17 cell development. Deficiency of SPTLC1 , a major subunit of serine palmitoyl tr...
Viral helicases are promising targets for the development of antiviral therapies. Given their vital function of unwinding double-stranded nucleic acids, inhibiting them blocks the viral replication cycle. Previous studies have elucidated key structural details of these helicases, including the location of substrate binding sites, flexible domains,...
A new series of thiazole central scaffold-based small molecules of hLDHA inhibitors were designed using an in silico approach. Molecular docking analysis of designed molecules with hLDHA (PDB ID: 1I10) demonstrates that Ala 29, Val 30, Arg 98, Gln 99, Gly 96, and Thr 94 possessed strong interaction with the compounds. Compounds 8a, 8b, and 8d showe...
Motivation:
Pathogenic copy number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype-phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that requ...
We present Genomics to Notebook (g2nb), an environment that combines the JupyterLab notebook system with widely-used bioinformatics platforms. Galaxy, GenePattern, and the JavaScript versions of IGV and Cytoscape are currently available within g2nb. The analyses and visualizations within those platforms are presented as cells in a notebook, making...
As the availability of genomic data and analysis tools from large-scale cancer initiatives continues to increase, with single-cell studies adding new dimensions to the potential scientific insights, the need has become more urgent for a software environment that supports the rapid pace of cancer data science. The The Jupyter Notebook environment ha...
Bacillus anthracis Ser/Thr protein kinase PrkC is necessary for phenotypic memory and spore germination, and the loss of PrkC-dependent phosphorylation events affect the spore development. During sporulation, Bacillus sp. can store 3-Phosphoglycerate (3-PGA) that will be required at the onset of germination when ATP will be necessary. The Phosphogl...
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analys...
Transcriptional and post-transcriptional mechanisms diversify the proteome beyond gene number, while maintaining a sequence relationship between original and altered proteins. A new mechanism breaks this paradigm, generating novel proteins by translating alternative open reading frames (Alt-ORFs) within canonical host mRNAs. Uniquely, ‘alt-proteins...
Cell division, wherein 1 cell divides into 2 daughter cells, is fundamental to all living organisms. Cytokinesis, the final step in cell division, begins with the formation of an actomyosin contractile ring, positioned midway between the segregated chromosomes. Constriction of the ring with concomitant membrane deposition in a specified spatiotempo...
Millions of transcriptomic profiles have been deposited in public archives, yet remain underused for the interpretation of new experiments. We present a method for interpreting new transcriptomic datasets through instant comparison to public datasets without high-performance computing requirements. We apply Principal Component Analysis on 536 studi...
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analys...
Background
Computational methods based on initial screening and prediction of peptides for desired functions have proven to be effective alternatives to lengthy and expensive biochemical experimental methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing pro...
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galax...
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely acc...
Measurement of traffic emissions has gained a lot of interest in recent times due to its contribution to urban pollution. This paper reports the outcome from an unmanned aerial vehicle (UAV) based measurement of PM concentration near an urban roadway at Kolkata, India. A total of 54 flights were carried out for simultaneous measurements of PM1, PM2...
Purpose Large copy number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype-phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that require the in...
Cell division, wherein one cell divides into two daughter cells, is fundamental to all living organisms. Cytokinesis, the final step in cell division, begins with the formation of an actomyosin contractile ring, positioned midway between the segregated chromosomes. Constriction of the ring with concomitant membrane deposition in a spatiotemporal ma...
Low-cost sensors have the potential to revolutionize air pollution research by providing high spatial resolution data. However, data accuracy from the low-cost sensors remains a major concern. Therefore, it is necessary to evaluate the performance of the low-cost sensors. The current study evaluated the performance of two such low cost PM (particul...
Literature exploration in PubMed on a large number of biomedical entities (e.g., genes, diseases or experiments) can be time-consuming and challenging, especially when assessing associations between entities. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among...
Division of one cell into two daughter cells is fundamental in all living organisms. Cytokinesis, the final step of cell division, begins with the formation of an actomyosin contractile ring, positioned midway between the segregated chromosomes. Constriction of the ring with concomitant membrane deposition in a spatiotemporal precision generates a...
Computational methods based on initial screening and prediction of peptides for desired functions have been proven effective alternatives to the lengthy and expensive methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries and the lack o...
Modern biology continues to become increasingly computational. Datasets are becoming progressively larger, more complex, and more abundant. The computational savviness necessary to analyze these data creates an ongoing obstacle for experimental biologists. Galaxy ( galaxyproject.org ) provides access to computational biology tools in a web‐based in...
A growing number of biomedical methods and protocols are being disseminated as open-source software packages. When put in concert with other packages, they can execute in-depth and comprehensive computational pipelines. Therefore, their integration with other software packages plays a prominent role in their adoption in addition to their availabili...
Background
The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more...
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to m...
The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, an...
Literature exploration in PubMed on a large number of biomedical entities (e.g., genes, diseases, experiments) can be time consuming and challenging comparing many entities to one other. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among a set of entities bas...
Galaxy project is a community driven effort to promote openness and reproducibility of data analyses in biomedical research. Our original publication has made a grave mistake of omitting many key participants of this effort. Here we are attempting to correct this error. We are including a new, greatly expanded version of Table 1. Again, we regret o...
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV...
Background
The vast ecosystem of single-cell RNA-seq tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more toward...
Galaxy (https://galaxyproject.org) is a web-based computational workbench used by tens of thousands of scientists across the world to analyze large biomedical datasets. Since 2005, the Galaxy project has fostered a global community focused on achieving accessible, reproducible, and collaborative research. Together, this community develops the Galax...
Serine palmitoyltransferase (SPT) long-chain base subunit 1 (SPTLC1) is 1 of the 2 main catalytic subunits of the SPT complex, which catalyzes the first and rate-limiting step of sphingolipid biosynthesis. Here, we show that Sptlc1 deletion in adult bone marrow (BM) cells results in defective myeloid differentiation. In chimeric mice from noncompet...
You've written software, published the code, and described it in a paper. Now, how do you make your software stand out and actually get used? This tutorial introduces two technologies that can make it easy to deploy by researchers around the world and greatly increase your software's reach. Bioconda (https://bioconda.github.io/) is a platform for p...
Interoperability of datasets, tools, and resources is essential to modern scientific investigation and analysis. The necessity to gather disparate datasets together, perform analysis with a collection of discrete tools, and visualize the results remains a standard approach for exploring and making sense across scientific research domains. Here, we...
The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance o...
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is yet to be defined. In this study, we manually curated 1235 SVs which can ultimately be used to evaluate SV...
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced...
Gut and oral microbiota perturbations have been observed in obese adults and adolescents; less is known about their influence on weight gain in young children. Here we analyzed the gut and oral microbiota of 226 two-year-olds with 16S rRNA gene sequencing. Weight and length were measured at seven time points and used to identify children with rapid...
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced...
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three k...
Research in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today’s life sciences ranging from cancer biology to microbial ec...
We present Bioconda (https://bioconda.github.io), a distribution of bioinformatics software for the lightweight, multi-platform and language-agnostic package manager Conda. Currently, Bioconda offers a collection of over 3000 software packages, which is continuously maintained, updated, and extended by a growing global community of more than 200 co...
Gut and oral microbiome perturbations have been observed in obese adults and adolescents. Less is known about how weight gain in early childhood is influenced by gut, and particularly oral, microbiomes. Here we analyze the relationships among weight gain and gut and oral microbiomes in 226 two-year-olds who were followed during the first two years...
High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and
disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated
statistical and computational methods, as well as substantial computational power. This has le...
Complex biomedical analyses require the use of multiple software tools in concert and remain challenging for much of the biomedical research community. We introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource that currently supports the streamlined interaction of 20 bioinformatics tools and data resources...
Command-line utilities to assist in developing tools for the Galaxy Project. http://galaxyproject.org
The availability of high-throughput sequencing has created enormous possibilities for scientific discovery. However, the massive amount of data being generated has resulted in a severe informatics bottleneck. A large number of tools exist for analyzing next-generation sequencing (NGS) data, yet often there remains a disconnect between these researc...
Significance
The frequency of intraindividual mitochondrial DNA (mtDNA) polymorphisms—heteroplasmies—can change dramatically from mother to child owing to the mitochondrial bottleneck at oogenesis. For deleterious heteroplasmies such a change may transform alleles that are benign at low frequency in a mother into disease-causing alleles when at a h...
The extraordinary throughput of next-generation sequencing (NGS) technology is outpacing our ability to analyze and interpret the data. This chapter will focus on practical informatics methods, strategies, and software tools for transforming NGS data into usable information through the use of a web-based platform, Galaxy. The Galaxy interface is ex...
The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently....
Polymorphism discovery is a routine application of next-generation sequencing technology where multiple samples are sent to a service provider for library preparation, subsequent sequencing, and bioinformatic analyses. The decreasing cost and advances in multiplexing approaches have made it possible to analyze hundreds of samples at a reasonable co...
The proliferation of web-based integrative analysis frameworks has enabled users to perform complex analyses directly through the web. Unfortunately, it also revoked the freedom to easily select the most appropriate tools. To address this, we have developed Galaxy ToolShed.
Whole genome sequencing (WGS) allows researchers to pinpoint genetic differences between individuals and significantly shortcuts the costly and time-consuming part of forward genetic analysis in model organism systems. Currently, the most effort-intensive part of WGS is the bioinformatic analysis of the relatively short reads generated by second ge...
To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
Table S1 - overview of Mouse ENCODE data. Snapshot of data generated by the Mouse ENCODE Consortium and released through University of California Santa Cruz (UCSC) browser. Vertical axis: cell lines and ex vivo cells and tissues. The originating cell type is shown in parentheses next to each line. For mouse embryonic or fetal tissues, the developme...
Innovations in biomedical research technologies continue to provide experimental biologists with novel and increasingly large genomic and high-throughput data resources to be analyzed. As creating and obtaining data has become easier, the key decision faced by many researchers is a practical one: where and how should an analysis be performed? Datas...
Here we describe a set of tools implemented within the Galaxy platform designed to make analysis of multiple genome alignments truly accessible for biologists. These tools are available through both a web-based graphical user interface and a command-line interface.
This open-source toolset was implemented in Python and has been integrated into the...
Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we desc...
Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps.
Availability and Implementation: This open-source toolset was implemented in Python and has been in...
High-throughput data production has revolutionized molecular biology. However, massive increases in data generation capacity require analysis approaches that are more sophisticated, and often very computationally intensive. Thus, making sense of high-throughput data requires informatics support. Galaxy (http://galaxyproject.org) is a software syste...
This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the...
While most experimental biologists know where to download genomic data, few have a concrete plan on how to analyze it. This situation can be corrected by: (1) providing unified portals serving genomic data and (2) building Web applications to allow flexible retrieval and on-the-fly analyses of the data. Powerful resources, such as the UCSC Genome B...
Comparison of sequence identity between the pair-wise 'Muscle' alignments and the multiple sequence alignments provided by PFAM. The data provides a comparison of sequence identity between the pair-wise 'Muscle' alignments and PFAM alignments.
The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2(ENCODE), that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a...
The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function--the basis of transfer of annotations in databases--must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the G...
Accessing and analyzing the exponentially expanding genomic sequence and functional data pose a challenge for biomedical researchers. Here we describe an interactive system, Galaxy, that combines the power of existing genome annotation databases with a simple Web portal to enable users to search remote resources, combine data from independent queri...