Werner Dubitzky

Werner Dubitzky
Freelance Data Scientist · Meitingen (Germany)

PhD in Computer Science
Freelance Data Scientist

About

164
Publications
28,991
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,202
Citations
Introduction
I am interested in artificial intelligence research, bioinformatics and systems biology. Currently I am working on soccer outcome prediction algorithm. I have extensive experience in science management and evaluation of research grant applications (mainly for national and European programs).
Additional affiliations
January 2002 - April 2016
Ulster University
Position
  • Professor of Bioinformatics
Description
  • Bioinformatics, data science.
January 2000 - December 2001
German Cancer Research Center
Position
  • Biomedical data mining researcher
June 1993 - September 1999
Ulster University
Position
  • Research Associate/Fellow and Lecturer in Computer Science

Publications

Publications (164)
Article
Full-text available
Significance testing has become a mainstay in machine learning, with the p value being firmly embedded in the current research practice. Significance tests are widely believed to lend scientific rigor to the interpretation of empirical findings; however, their problems have received only scant attention in the machine learning literature so far. He...
Article
Full-text available
How well can machine learning predict the outcome of a soccer game, given the most commonly and freely available match data? To help answer this question and to facilitate machine learning research in soccer, we have developed the Open International Soccer Database. Version v1.0 of the Database contains essential information from 216,743 league soc...
Article
Full-text available
The task of the 2017 Soccer Prediction Challenge was to use machine learning to predict the outcome of future soccer matches based on a data set describing the match outcomes of 216,743 past soccer matches. One of the goals of the Challenge was to gauge where the limits of predictability lie with this type of commonly available data. Another goal w...
Article
Full-text available
In a recent crowdsourcing project, 29 teams analyzed the same data set to address the following question: “Are football (soccer) referees more likely to give red cards to players with dark skin tone than to players with light skin tone?” The major finding was that the results of the individual teams varied widely, from no effect to highly significa...
Research
Full-text available
The Machine Learning journal invites submissions of original contributions to machine learning research for soccer analytics. Soccer 1 is the biggest global sport and is a fast-growing multibillion dollar industry. The annual revenue of European football clubs alone is estimated at $27bn. Data science and analytics are being more frequently employe...
Article
Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern systems biology investigations into mechanisms underlying gene regulation. A key task in this area is the automated inference or reverse-engineering of dynamic mechanistic GRN models from gene expression time-course data. Besides a lack of suitable d...
Article
Full-text available
Modelling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern systems biology investigations. An important and unsolved problem in this area is the automated inference (reverseengineering) of dynamic mechanistic GRN models from gene-expression timecourse data. The conventional single-stage algorithm determines...
Article
Systems medicine is the application of systems biology concepts, methods, and tools to medical research and practice. It aims to integrate data and knowledge from different disciplines into biomedical models and simulations for the understanding, prevention, cure, and management of complex diseases. Complex diseases arise from the interactions amon...
Article
Full-text available
Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern systems biology investigations into mechanisms underlying gene regulation. A key challenge in this area is the automated inference (reverse-engineering) of dynamic, mechanistic GRN models from gene expression time-course data. Common mathematical for...
Article
Full-text available
Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern computational biology investigations into gene regulation. A key challenge in this area is the automated inference (reverse-engineering) of dynamic, mechanistic GRN models from time-course gene expression data. Common mathematical formalisms used to...
Article
Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern systems biology investigations into mechanisms underlying gene regulation. An important and unsolved problem in this area is the automated inference (reverse-engineering) of dynamic, mechanistic GRN models from time-course gene expression data. The c...
Article
Full-text available
Multiscale simulations model phenomena across natural scales using monolithic or component-based code, running on local or distributed resources. In this work, we investigate the performance of distributed multiscale computing of component-based models, guided by six multiscale applications with different characteristics and from several discipline...
Article
Full-text available
The eight papers in this special issue focus on data mining in bioinformatics, biomedicine, and healthcare informatics. Four of the papers in this special issue were selected from papers presented at the 2012 IEEE Conference on Bioinformatics and Biomedicine and the other four came from open solicitation with a wide range of authors.
Technical Report
Full-text available
Multiscale modeling and simulation is concerned with the development, analysis and use of models that are composed of two or more single-scale sub-models. A single-scale sub-model is a model that represents a process or system associated with a specific spatial, temporal or organizational scale. The main advantages of multiscale over conventional s...
Book
Systems biology refers to the quantitative analysis of the dynamic interactions among several components of a biological system and aims to understand the behavior of the system as a whole. Systems biology involves the development and application of systems theory concepts for the study of complex biological systems through iteration over mathemati...
Conference Paper
Reverse-engineering of quantitative, dynamic gene-regulatory network (GRN) models from time-series gene expression data is becoming important as such data are increasingly generated for research and other purposes. A key problem in the reverse-engineering process is the under-determined nature of these data. Because of this, the reverse-engineered...
Article
Full-text available
Performance prediction or forecasting sporting outcomes involves a great deal of insight into the particular area one is dealing with, and a considerable amount of intuition about the factors that bear on such outcomes and performances. The mathematical Theory of Evidence offers representation formalisms which grant experts a high degree of freedom...
Book
Systems biology refers to the quantitative analysis of the dynamic interactions among several components of a biological system and aims to understand the behavior of the system as a whole. Systems biology involves the development and application of systems theory concepts for the study of complex biological systems through iteration over mathemati...
Book
Complex systems modeling and simulation approaches are being adopted in a growing number of sectors, including finance, economics, biology, astronomy, and many more. Technologies ranging from distributed computing to specialized hardware are explored and developed to address the computational requirements arising in complex systems simulations. The...
Article
Full-text available
Systems biology has developed considerably in the past decade combining the different disciplines of mathematical modelling, computational simulation and biological experimentation facilitating the quantitative analysis of biological systems. This is often severely hampered by the lack of time-resolved data which ultimately leads to problems in val...
Chapter
Full-text available
Creative information exploration refers to a novel framework for exploring large volumes of heterogeneous information. In particular, creative information exploration seeks to discover new, surprising and valuable relationships in data that would not be revealed by conventional information retrieval, data mining and data analysis technologies. Whil...
Conference Paper
Full-text available
The bile acid and xenobiotic system describes a biological network or system that facilitates detoxification and removal from the body of harmful xenobiotic and endobiotic compounds. While life scientists have developed a relatively comprehensive understanding of this system, many mechanistic details are yet to be discovered. Critical mechanisms ar...
Article
This chapter is concerned with modeling and simulating the dynamics of gene regulatory networks (GRNs). It explains the process of reverse-engineering GRNs from time-series gene expression data sets. The idea is to discover an optimal set of parameters for a computational model of the network that is able to adequately simulate the behavior describ...
Article
Full-text available
The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. High-throughput experiments (e.g., microarrays) are generating an overwhelming amount of data of biological systems at the molecular and cellular level. To adequately organize, analyze, and interpret this de...
Article
Characterization of the kinetic and conformational properties of channel proteins is a crucial element in the integrative study of congenital cardiac diseases. The proteins of the ion channels of cardiomyocytes represent an important family of biological components determining the physiology of the heart. Some computational studies aiming to unders...
Book
Systems biology has risen as a direct result of the limitation of conventional (reductionistic) biology to understand complex phenomena emerging as a result of dynamic and multiscale biological interactions. By applying mathematical and computational models, systems biologists integrate the elementary processes into a coherent description that allo...
Conference Paper
Full-text available
The interplay of potassium ion channels in modulating the action potential (AP) of the human left ventricle has not been well elucidated due to the precise nature of some underlying ionic components that are not fully characterized. In the current study, we have developed an allosteric conformation model for rapid delayed rectifier (IKr) and incorp...
Article
Full-text available
A gene-regulatory network (GRN) refers to DNA segments that interact through their RNA and protein products and thereby govern the rates at which genes are transcribed. Creating accurate dynamic models of GRNs is gaining importance in biomedical research and development. To improve our understanding of continuous deterministic modeling methods empl...
Article
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contai...
Article
Full-text available
Conference Paper
Full-text available
Bile acids represent essential but also toxic biological reagents whose concentrations within the body require critical maintenance. Many of the genetic factors that dictate bile acid concentration also govern the detoxification and removal from the body of many drugs and foreign compounds. These overlapping biological processes define a network te...
Chapter
IntroductionData miningGrid computingData mining grid – mining grid dataConclusions Summary of Chapters in this VolumeReferences
Book
Based around eleven international real life case studies and including contributions from leading experts in the field this groundbreaking book explores the need for the grid-enabling of data mining applications and provides a comprehensive study of the technology, techniques and management skills necessary to create them. This book provides a simu...
Conference Paper
The aim of the QosCosGrid project is to bring supercomputer-like performance and structure to cross-cluster computations. To support parallel complex systems simulations, QosCosGrid provides six reusable templates that may be instantiated with simulation-specific code to help with developing parallel applications using the ProActive Java library. T...
Conference Paper
Full-text available
The P-found system is designed to allow scientists to share, an-alyze and compare protein folding and unfolding simulations across large, distributed simulation data sets. Although a useful way of exploring folding and unfolding events in proteins, simulations are time-consuming, compu-tationally expensive and data intensive. Thus when sharing this...
Article
Full-text available
In girls, a plateau in parathyroid hormone (PTH) was observed at a 25-hydroxyvitamin D (25(OH)D) concentration of approximately 60 nmol/l. In boys, there was no plateau in PTH concentrations as 25(OH)D concentration increased. A 25(OH)D threshold of 60 nmol/l appears to have implications for bone health outcomes in both girls and boys. Our objectiv...
Article
Full-text available
As modern data mining applications increase in complexity, so too do their demands for resources. Grid computing is one of several emerging networked computing paradigms promising to meet the requirements of heterogeneous, large-scale, and distributed data mining applications. Despite this promise, there are still too many issues to be resolved bef...
Conference Paper
Genetic defects in the KCNH2 gene are a primary cause of instable cardiac ventricular repolarization. The aim of this work has been to assess the functional implication of the stoichiometric properties of the mutated KCNH2 protein complex. We have developed both homotetrameric and heterotetrameric kinetic models based on the heterologous expression...
Conference Paper
Full-text available
The alpha-subunit of the rapid delayed rectifier I<sub>kr</sub> has been identified to be composed of multiple function domains. However, much less is known about the electrophysiological consequences of the interaction properties in the assembled channel protein. In this paper, we present a detailed conformational kinetic model through characteriz...
Conference Paper
Full-text available
Modern distributed applications require coallocation of mas- sive amounts of resources. Grid level allocation systems must eciently decide where these applications can be executed. To this end, the re- source requests are described as labeled graphs, which must be matched with equivalent labeled graphs of available resources. The coallocation probl...
Conference Paper
Models of gene regulatory networks encapsulate important features of cell behaviour, and understanding gene regulatory networks is important for a wide range of biomedical applications. Network models may be constructed using reverse-engineering techniques based on evolutionary algorithms. This optimisation process can be very computationally inten...
Conference Paper
Full-text available
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data and a data warehouse. Here we demonstrate how grid technologies can support multiple, distributed P-f...
Conference Paper
Full-text available
The ultimate goal of grid technologies is to materialize the vision of grids as virtual supercomputers of unprecedented power, through utilization of geographically disperse distributively owned resources. Despite the overwhelming success of grids in running pleasantly parallel tasks, there still exists a large set of demanding applications conside...
Article
Full-text available
Despite recent concerns about the high prevalence of sub-clinical vitamin D deficiency in adolescents, relatively few studies have investigated the underlying reasons. The objective of the present study was to investigate the prevalence and predictors of vitamin D inadequacy among a large representative sample of adolescents living in Northern Irel...
Article
The DataMiningGrid system has been designed to meet the requirements of modern and distributed data mining scenarios. Based on the Globus Toolkit and other open technology and standards, the DataMiningGrid system provides tools and services facilitating the grid-enabling of data mining applications without any intervention on the application side....
Article
Full-text available
The effects of subclinical vitamin D deficiency on bone mineral density (BMD) and bone turnover in adolescents, especially in boys, are unclear. We aimed to investigate the relations of different stages of vitamin D status and BMD and bone turnover in a representative sample of adolescent boys and girls. BMD was measured by dual-energy X-ray absorp...
Article
Src family kinases (SFKs) interact with a number of cellular receptors. They participate in diverse signaling pathways and cellular functions. Most of the receptors involved in SFK signaling are characterized by similar modes of regulation. This computational study discusses a general kinetic model of SFK-receptor interaction. The analysis of the m...
Conference Paper
Full-text available
Grids are becoming mission-critical components in research and industry, offering sophisticated solutions in leveraging large- scale computing and storage resources. Grid resources are usually shared among multiple organizations in an opportunistic manner. However, an opportunistic or "best effort" quality-of-service scheme may be inadequate in sit...
Article
Complex systems are defined as systems with many interdependent parts which give rise to non-linear and emergent properties. Supercomputers constitute the de facto technology to deliver the required computational performance. However, supercomputers involve considerable costs, which many organizations cannot afford. The working assumption of this p...
Article
Src family tyrosine kinases play a key role in many cellular signalling networks, but due to the high complexity of these networks their precise function remains elusive. Many factors involved in Src regulation, such as specific kinases and phosphatases, are still unknown. Mathematical models have been constructed to improve the understanding of th...
Chapter
Full-text available
Introduction Grids for Toxicology and Drug Discovery Example OpenMolGRID Summary and Outlook Acknowledgments References
Article
A knowledge base management system (KBMS) realises a combination of techniques found in database management systems and knowledge-based systems. At the data model and knowledge representation level, many systems of this kind constitute a marriage of the relational data model and the rule-based reasoning. Experience has shown that either approach is...
Article
Full-text available
Software tools that model and simulate the dynamics of biological processes and systems are becoming increasingly important. Some of these tools offer sophisticated graphical user interfaces (GUIs), which greatly enhance their acceptance by users. Such GUIs are based on symbolic or graphical notations used to describe, interact and communicate the...
Article
A central aim of systems biology is to elucidate the complex dynamic structure of biological systems within which functioning and control occur. The success of this endeavour requires a dialogue between the two quite distinct disciplines of life science and systems theory, and so drives the need for graphical notations which facilitate this dialogu...
Article
Full-text available
The challenges involved, applications developed, and lessons learned from efforts in bringing data mining in to modern grid computing and Web services environments, are explored. Grid environments refers to persistent computing environments that enable software applications to integrate instruments, displays, and information resources that are mana...
Book
More than ever before, research and development in genomics and proteomics depends on the analysis and interpretation of large amounts of data generated by high-throughput techniques. With the advance of computational systems biology, this situation will become even more manifest as scientists will generate truly large-scale data sets by simulating...