Jens Hoefkens’s research while affiliated with Genedata and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (5)


Example of a bidirectional data exploration workflow using the tranSMART/analyst platform. The platform is designed to improve understanding of large amounts of raw (unstructured) data by combining and comparing it with structured data sets (where inclusion in a relational database is seamless and it is readily searchable by simple, straightforward search engine algorithms). Left: The advanced workflow tab in the tranSMART dataset explorer contains a button (red rectangle) that makes clinical data available for statistical analyses in Analyst through an easy drag and drop concept. 1: The patient cohorts to be examined are selected and further inclusion criteria can be defined. 2: Multi-omics, high dimensional data (e.g. microarray or NGS data) are selected and transferred by a single mouse-click into the Analyst GUI. 3: For curation purposes, imported data can be run through preprocessing and quality control steps. 4: Filtered data are selected for in-depth analyses such as PCA, correlation, network analyses, logistic regression, ANOVA, time-series analysis, clustering, annotation analyses, pathway mapping, and partial least square analysis (PLS). Third-party tools such as R-scripts can also be integrated. 5: The results of the analysis can be saved directly back into tranSMART via a menu-button. At any point, it is possible to simply pull more clinical/annotation data from tranSMART (and other sources such as GEO or ArrayExpress) into Analyst and vice versa. 6: Analyzed and curated data is available for further storage, sharing, and analyses in tranSMART.
Example of a data-sharing and big-data analytics value chain in translational medicine. The collection of large volumes of structured phenotypic data and its integration with the abundant Omic data adds new dimensions and challenges for the management, analysis, and visualization of this information. Clinical electronic data capture systems (EDCs, such as OpenClinica or REDCap) may feed patient data into tranSMART for data integration. An in-depth analysis of the data can then be performed in Genedata Analyst, an established system for the integrated analysis of high-dimensional omics data in the context of low-dimensional (clinical) sample information, often used in translational research projects. It enables scientists to efficiently analyze experiments by applying rigorous statistical algorithms combined with intuitive, interactive data visualization tools. Leveraging a built-in scripting engine, Analyst standardizes and automates complex and time-consuming data analysis processes. Via a flexible application program interface (API), the analyst platform also provides the possibility to use popular open source tools such as the R/Bioconductor-environment for downstream analyses (Gentleman et al., 2004). Overall, the platform has the ability to reduce the time to import, export, integrate, and analyze complex data—from days to minutes. (ETL=Extract, Transform, and Load process for loading raw source data into tranSMART.) The APIs have been integrated into tranSMART and are freely available to the research community. The statistical analysis software itself is a commercial software available for licensing.
A collaborative approach to develop a multi-Omics data analytics platform for translational research
  • Article
  • Full-text available

December 2014

·

1,200 Reads

·

35 Citations

Applied & Translational Genomics

·

·

Jens Hoefkens

The integration and analysis of large datasets in translational research has become an increasingly challenging problem. We propose a collaborative approach to integrate established data management platforms with existing analytical systems to fill the hole in the value chain between data collection and data exploitation. Our proposal in particular ensures data security and provides support for widely distributed teams of researchers. As a successful example for such an approach, we describe the implementation of a unified single platform that combines capabilities of the knowledge management platform tranSMART and the data analysis system Genedata Analyst™. The combined end-to-end platform helps to quickly find, enter, integrate, analyze, extract, and share patient- and drug-related data in the context of translational R&D projects.

Download

MC13-0089 Epigenome-wide discovery of ovarian and breast cancer specific DNA methylation markers

October 2013

·

41 Reads

European Journal of Cancer

H. Lempiainen

·

D. Mertens

·

·

[...]

·

Breast and ovarian cancers pose huge and unsolved challenges to the medical profession. Breast cancer is the most common cancer in women in the EU: more than 332,000 women are diagnosed with breast cancer each year and a woman dies every 6 minutes from this disease. Ovarian cancer, whilst far less common than breast cancer, is often diagnosed when the disease is at an advanced stage and has spread to other areas of the body. More than 60% of ovarian cancer patients die within the first 5 years after diagnosis. Implementation of successful screening programs has dramatically reduced the number of women dying from cervical cancer. Similarly, the EU FP7 consortium EpiFemCare aims to reduce the number of women diagnosed with late stage breast or ovarian cancer by 50%, reduce the number of women who receive unnecessary long-term chemotherapy by 50%, and reduce the number of women dying from these cancers by 20%. EpiFemCare will establish and clinically validate a series of blood tests based upon DNA methylation technology that will facilitate both early detection and prediction of therapeutic outcome. The project consists of three phases: (1) Epigenome-wide discovery of ovarian/breast cancer specific DNA methylation markers. (2) Development of serum based assays for cancer specific markers. (3) Validation of the serum test performance in thousands of serial samples from prospective clinical trials. In phase 1 Illumina Infinium Human Methylation450 BeadChip Array technology is used to assess the methylation status of ~485’000 sites in cancer and control tissues. In parallel Reduced Representation Bisulfite Sequencing (RRBS) is used to identify & confirm cancer specific methylated circulating DNA in matching serum samples. Using Genedata Expressionist® for Genomic Profiling, we have established an automated bioinformatics pipeline for the detection of cancer specific differentially methylated regions (DMRs) that are most likely to fulfill the strict specificity criteria of a serum based test. The most promising DMRs are taken forward for the serum based clinical assay development and validation.



Use of Integrated Data Analysis to Gain an Advantage in Biomarker Discovery and Personalized Medicine

October 2011

·

3 Reads

·

1 Citation

American Laboratory

Commercial software packages including Genedata Expressionist®, from Genedata USA, Lexington, MA, are available that support the management and integrated analysis of large and diverse biological data. The tool is a perfect answer to In addition to the inherent data complexity and strict divisions among research, development, and clinical trials, of various organizations. Genedata scientists were part of the data analysis team that demonstrated that the combination of omics and conventional toxicology does indeed prove to be a useful tool for mechanistic investigations and the identification of putative biomarkers. Similar consortia are under way and their results support the thesis of a true benefit in the integration of transcriptome, proteome, metabolome, and genome profiling data for, among others, toxicity prediction and patient stratification.


Discovery of Genetic Interactions Across Diverse Oryza sativa Sequencing Data Sets: a Workflow-Based Approach

The interrogation of genome, transcriptome, and epigenome information is key to understanding genetic regulatory networks in crop species. While applications such as RNA-seq, ChIP-seq, and bisulfite sequencing can yield valuable information about such networks, it can be challenging for a scientist, or even a biostatistician, to bring all of this information together in a meaningful way. Many-to-one relationships across data types and variability in how data are distributed can make interpretation of the data difficult. Here we present a workflow-based approach for analyzing and integrating data across these diverse data types. Our approach has been designed not only to overcome these challenges, but also give researchers easier access to their data. By “mapping” data types to a common context, diverse sequence data, as well as raw data from other platforms such as microarray and quantitative PCR, can be efficiently integrated into an analysis. To demonstrate this, we created a workflow to analyze a previously published Oryza sativa data set containing mRNA, siRNA, and bisulfite sequencing reads (Chodavarapu et al, 2012). This information was used to build a correlation network of genetic interactions, which is informed by information across all of the data types. Despite broad variation between the individual strains, an integrated analysis utilizing all three data types revealed important differences between the groups. The method and results presented here highlight the power of integrated analysis across data types, and the importance of algorithms that can efficiently place large volumes of diverse data types into a shared context.

Citations (1)


... Reinforcement learning, on the other hand, involves training models through trial and error, using feedback from their predictions to improve performance over time. These approaches enhance model robustness and accuracy by leveraging additional information from unlabeled data, thus providing a more comprehensive analysis of multi-omics datasets (Schumacher et al., 2014;Shen et al., 2010;Speicher & Pfeifer, 2015). Different integration strategies are employed in machine learning to combine multi-omics data effectively. ...

Reference:

Integrative Machine Learning Approaches For Multi-Omics Data Analysis In Cancer Research
A collaborative approach to develop a multi-Omics data analytics platform for translational research

Applied & Translational Genomics