[show abstract][hide abstract] ABSTRACT: BACKGROUND: Sequencing of the human genome and the subsequent analyses have produced immense volumes of data. The technological advances have opened new windows into genomics beyond the DNA sequence. In parallel, clinical practice generate large amounts of data. This represents an underused data source that has much greater potential in translational research than is currently realized. This research aims at implementing a translational medicine informatics platform to integrate clinical data (disease diagnosis, diseases activity and treatment) of Rheumatoid Arthritis (RA) patients from Karolinska University Hospital and their research database (biobanks, genotype variants and serology) at the Center for Molecular Medicine, Karolinska Institutet. METHODS: Requirements engineering methods were utilized to identify user requirements. Unified Modeling Language and data modeling methods were used to model the universe of discourse and data sources. Oracle11g were used as the database management system, and the clinical development center (CDC) was used as the application interface. Patient data were anonymized, and we employed authorization and security methods to protect the system. RESULTS: We developed a user requirement matrix, which provided a framework for evaluating three translation informatics systems. The implementation of the CDC successfully integrated biological research database (15172 DNA, serum and synovial samples, 1436 cell samples and 65 SNPs per patient) and clinical database (5652 clinical visit) for the cohort of 379 patients presents three profiles. Basic functionalities provided by the translational medicine platform are research data management, development of bioinformatics workflow and analysis, sub-cohort selection, and re-use of clinical data in research settings. Finally, the system allowed researchers to extract subsets of attributes from cohorts according to specific biological, clinical, or statistical features. CONCLUSIONS: Research and clinical database integration is a real challenge and a road-block in translational research. Through this research we addressed the challenges and demonstrated the usefulness of CDC. We adhered to ethical regulations pertaining to patient data, and we determined that the existing software solutions cannot meet the translational research needs at hand. We used RA as a test case since we have ample data on active and longitudinal cohort.
Journal of Translational Medicine 04/2013; 11(1):85. · 3.46 Impact Factor
[show abstract][hide abstract] ABSTRACT: European funding under Framework 7 (FP7) for the virtual physiological human (VPH) project has been in place now for 5 years. The VPH Network of Excellence (NoE) has been set up to help develop common standards, open source software, freely accessible data and model repositories, and various training and dissemination activities for the project. It is also working to coordinate the many clinically targeted projects that have been funded under the FP7 calls. An initial vision for the VPH was defined by the FP6 STEP project in 2006. In 2010, we wrote an assessment of the accomplishments of the first two years of the VPH in which we considered the biomedical science, healthcare and information and communications technology challenges facing the project (Hunter et al. 2010 Phil. Trans. R. Soc. A 368, 2595–2614 (doi:10.1098/rsta.2010.0048)). We proposed that a not-for-profit professional umbrella organization, the VPH Institute, should be established as a means of sustaining the VPH vision beyond the time-frame of the NoE. Here, we update and extend this assessment and in particular address the following issues raised in response to Hunter et al.: (i) a vision for the VPH updated in the light of progress made so far, (ii) biomedical science and healthcare challenges that the VPH initiative can address while also providing innovation opportunities for the European industry, and (iii) external changes needed in regulatory policy and business models to realize the full potential that the VPH has to offer to industry, clinics and society generally.
Interface focus: a theme supplement of Journal of the Royal Society interface 02/2013; Interface Focus(3). · 2.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: The proper identification of differentially methylated CpGs is central in most epigenetic studies. The Illumina HumanMethylation450 BeadChip is widely used to quantify DNA methylation; nevertheless, the design of an appropriate analysis pipeline faces severe challenges due to the convolution of biological and technical variability and the presence of a signal bias between Infinium I and II probe design types. Despite recent attempts to investigate how to analyze DNA methylation data with such an array design, it has not been possible to perform a comprehensive comparison between different bioinformatics pipelines due to the lack of appropriate data sets having both large sample size and sufficient number of technical replicates. Here we perform such a comparative analysis, targeting the problems of reducing the technical variability, eliminating the probe design bias and reducing the batch effect by exploiting two unpublished data sets, which included technical replicates and were profiled for DNA methylation either on peripheral blood, monocytes or muscle biopsies. We evaluated the performance of different analysis pipelines and demonstrated that: (1) it is critical to correct for the probe design type, since the amplitude of the measured methylation change depends on the underlying chemistry; (2) the effect of different normalization schemes is mixed, and the most effective method in our hands were quantile normalization and Beta Mixture Quantile dilation (BMIQ); (3) it is beneficial to correct for batch effects. In conclusion, our comparative analysis using a comprehensive data set suggests an efficient pipeline for proper identification of differentially methylated CpGs using the Illumina 450K arrays.
Epigenetics: official journal of the DNA Methylation Society 02/2013; 8(3). · 4.58 Impact Factor
[show abstract][hide abstract] ABSTRACT: Medicine and pediatrics are changing and healthcare is moving from being reactive to becoming preventive. Despite rapid developments of new technologies for molecular profiling and systems analysis of diseases significant hurdles remain. Here we use the clinical setting of congenital heart block (CHB) to uncover and illustrate key informatics challenges impeding development of a systems medicine approach emphasizing prevention and prediction of disease. We find that there is paucity in useful bioinformatics tools enabling integrative analysis of different databases of molecular information and clinical sources in a disease context such as CHB, contrasting with the current emphasis on developing bioinformatics tools for the analysis of individual data-types. Moreover, informatics solutions for managing data, such as i2b2 or STRIDE, requires serious software engineering support for maintenance and import of data beyond the capabilities of clinicians working with CHB. Hence, there is an urgent unmet need for user-friendly tools facilitating integrative analysis and management of omics data and clinical information. Pediatrics represents an untapped potential to execute such a systems medicine program in close collaboration with clinicians and families who are keen to do what is needed for their children to prevent, predict disease and nurture wellness.Pediatric Research (2013); doi:10.1038/pr.2013.19.
[show abstract][hide abstract] ABSTRACT: Autoimmune rheumatic diseases are complex disorders, whose etiopathology is attributed to a crosstalk between genetic predisposition and environmental factors. Both variants of autoimmune susceptibility genes and environment are involved in the generation of aberrant epigenetic profiles in a cell-specific manner, which ultimately result in dysregulation of expression. Furthermore, changes in miRNA expression profiles also cause gene dysregulation associated with aberrant phenotypes. In rheumatoid arthritis, several cell types are involved in the destruction of the joints, synovial fibroblasts being among the most important. In this study we performed DNA methylation and miRNA expression screening of a set of rheumatoid arthritis synovial fibroblasts and compared the results with those obtained from osteoarthritis patients with a normal phenotype. DNA methylation screening allowed us to identify changes in novel key target genes like IL6R, CAPN8 and DPP4, as well as several HOX genes. A significant proportion of genes undergoing DNA methylation changes were inversely correlated with expression. miRNA screening revealed the existence of subsets of miRNAs that underwent changes in expression. Integrated analysis highlighted sets of miRNAs that are controlled by DNA methylation, and genes that are regulated by DNA methylation and are targeted by miRNAs with a potential use as clinical markers. Our study enabled the identification of novel dysregulated targets in rheumatoid arthritis synovial fibroblasts and generated a new workflow for the integrated analysis of miRNA and epigenetic control.
Journal of Autoimmunity 01/2013; · 8.15 Impact Factor
[show abstract][hide abstract] ABSTRACT: MOTIVATION: The Illumina Infinium 450k DNA Methylation Beadchip is a prime candidate technology for Epigenome-Wide Association Studies (EWAS). However, a difficulty associated with these beadarrays is that probes come in two different designs, characterised by widely different DNA methylation distributions and dynamic range, which may bias downstream analyses. A key statistical issue is therefore how best to adjust for the two different probe designs. RESULTS: Here we propose a novel model-based intra-array normalisation strategy for 450k data, called BMIQ (Beta MIxture Quantile dilation), to adjust the beta-values of type2 design probes into a statistical distribution characteristic of type1 probes. The strategy involves application of a 3-state beta-mixture model to assign probes to methylation states, subsequent transformation of probabilities into quantiles and finally a methylation dependent dilation transformation to preserve the monotonicity and continuity of the data. We validate our method on cell-line data, fresh frozen and paraffin embedded tumour tissue samples and demonstrate that BMIQ compares favourably to two competing methods. Specifically, we show that BMIQ improves the robustness of the normalisation procedure, reduces the technical variation and bias of type2 probe values, and successfully eliminates the type1 enrichment bias caused by the lower dynamic range of type2 probes. BMIQ will be useful as a preprocessing step for any study using the Illumina Infinium 450k platform. AVAILABILITY: BMIQ is freely available from code.google.com/p/bmiq/. CONTACT: email@example.com.
[show abstract][hide abstract] ABSTRACT: We propose a method to identify all the nodes that are relevant to compute
all the conditional probability distributions for a given set of nodes. Our
method is simple, effcient, consistent, and does not require learning a
Bayesian network first. Therefore, our method can be applied to
high-dimensional databases, e.g. gene expression databases.
[show abstract][hide abstract] ABSTRACT: In the past years, we have witnessed unprecedented attention for the study of epigenetic alterations in the context of a variety of complex disorders, including autoimmune diseases . This is in part due to the observation that genetics is insufficient to entirely explain the predisposition to their pathogenesis. The environmental influence is well illustrated by the existence of partial concordance for susceptibility to disease in monozygotic twins. In connection with this epigenetic mechanisms regulate gene expression and are sensitive to external stimuli, bridging the gap between environmental and genetic factors. There is now considerable evidence of the existence of epigenetic alterations, particularly DNA methylation alterations, in diseases like systemic lupus erythematosus or rheumatoid arthritis . Most of the studies were initially performed by using candidate-gene approaches, although the increasing availability of high-throughput methods is providing better methods for the screening of epigenetic alterations in these diseases
6th European Workshop on Immune-Mediated Inflammatory Diseases; 11/2011 · 3.46 Impact Factor
[show abstract][hide abstract] ABSTRACT: Transcription factor-induced lineage reprogramming or transdifferentiation experiments are essential for understanding the plasticity of differentiated cells. These experiments helped to define the specific role of transcription factors in conferring cell identity and played a key role in the development of the regenerative medicine field. We here investigated the acquisition of DNA methylation changes during C/EBPα-induced pre-B cell to macrophage transdifferentiation. Unexpectedly, cell lineage conversion occurred without significant changes in DNA methylation not only in key B cell- and macrophage-specific genes but also throughout the entire set of genes differentially methylated between the two parental cell types. In contrast, active and repressive histone modification marks changed according to the expression levels of these genes. We also demonstrated that C/EBPα and RNA Pol II are associated with the methylated promoters of macrophage-specific genes in reprogrammed macrophages without inducing methylation changes. Our findings not only provide insights about the extent and hierarchy of epigenetic events in pre-B cell to macrophage transdifferentiation but also show an important difference to reprogramming towards pluripotency where promoter DNA demethylation plays a pivotal role.
Nucleic Acids Research 11/2011; 40(5):1954-68. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: One of the most well described cellular processes is the cell cycle, governing cell division. Mathematical models of this gene-protein network are therefore a good test case for assessing to what extent we can dissect the relationship between model parameters and system dynamics. Here we combine two strategies to enable an exploration of parameter space in relation to model output. A simplified, piecewise linear approximation of the original model is combined with a sensitivity analysis of the same system, to obtain and validate analytical expressions describing the dynamical role of different model parameters.
We considered two different output responses to parameter perturbations. One was qualitative and described whether the system was still working, i.e. whether there were oscillations. We call parameters that correspond to such qualitative change in system response essential. The other response pattern was quantitative and measured changes in cell size, corresponding to perturbations of modulatory parameters. Analytical predictions from the simplified model concerning the impact of different parameters were compared to a sensitivity analysis of the original model, thus evaluating the predictions from the simplified model. The comparison showed that the predictions on essential and modulatory parameters were satisfactory for small perturbations, but more discrepancies were seen for larger perturbations. Furthermore, for this particular cell cycle model, we found that most parameters were either essential or modulatory. Essential parameters required large perturbations for identification, whereas modulatory parameters were more easily identified with small perturbations. Finally, we used the simplified model to make predictions on critical combinations of parameter perturbations.
The parameter characterizations of the simplified model are in large consistent with the original model and the simplified model can give predictions on critical combinations of parameter perturbations. We believe that the distinction between essential and modulatory perturbation responses will be of use for sensitivity analysis, and in discussions of robustness and during the model simplification process.
BMC Systems Biology 08/2011; 5:123. · 2.98 Impact Factor
[show abstract][hide abstract] ABSTRACT: We propose an innovative, integrated, cost-effective health system to combat major non-communicable diseases (NCDs), including cardiovascular, chronic respiratory, metabolic, rheumatologic and neurologic disorders and cancers, which together are the predominant health problem of the 21st century. This proposed holistic strategy involves comprehensive patient-centered integrated care and multi-scale, multi-modal and multi-level systems approaches to tackle NCDs as a common group of diseases. Rather than studying each disease individually, it will take into account their intertwined gene-environment, socio-economic interactions and co-morbidities that lead to individual-specific complex phenotypes. It will implement a road map for predictive, preventive, personalized and participatory (P4) medicine based on a robust and extensive knowledge management infrastructure that contains individual patient information. It will be supported by strategic partnerships involving all stakeholders, including general practitioners associated with patient-centered care. This systems medicine strategy, which will take a holistic approach to disease, is designed to allow the results to be used globally, taking into account the needs and specificities of local economies and health systems.
Genome Medicine 07/2011; 3(7):43. · 3.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: Mathematical models are increasingly used in life sciences. However, contrary to other disciplines, biological models are typically over-parametrized and loosely constrained by scarce experimental data and prior knowledge. Recent efforts on analysis of complex models have focused on isolated aspects without considering an integrated approach-ranging from model building to derivation of predictive experiments and refutation or validation of robust model behaviours. Here, we develop such an integrative workflow, a sequence of actions expanding upon current efforts with the purpose of setting the stage for a methodology facilitating an extraction of core behaviours and competing mechanistic hypothesis residing within underdetermined models. To this end, we make use of optimization search algorithms, statistical (machine-learning) classification techniques and cluster-based analysis of the state variables' dynamics and their corresponding parameter sets. We apply the workflow to a mathematical model of fat accumulation in the arterial wall (atherogenesis), a complex phenomena with limited quantitative understanding, thus leading to a model plagued with inherent uncertainty. We find that the mathematical atherogenesis model can still be understood in terms of a few key behaviours despite the large number of parameters. This result enabled us to derive distinct mechanistic predictions from the model despite the lack of confidence in the model parameters. We conclude that building integrative workflows enable investigators to embrace modelling of complex biological processes despite uncertainty in parameters.
Interface focus: a theme supplement of Journal of the Royal Society interface 06/2011; 1(3):438-49. · 2.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: New technologies to generate high-dimensional data provide unprecedented opportunities for unbiased identification of biomarkers that can be used to optimize pre-operative planning, with the goal of avoiding costly post-operative complications and prolonged hospitalization. To identify such markers, we studied the global gene expression profiles of three organs central to the metabolic and inflammatory homeostasis isolated from coronary artery disease (CAD) patients during coronary artery bypass grafting (CABG) surgery. A total of 198 whole-genome expression profiles of liver, skeletal muscle and visceral fat from 66 CAD patients of the Stockholm Atherosclerosis Gene Expression (STAGE) cohort were analyzed. Of ~50,000 mRNAs measured in each patient, the mRNA levels of the anti-inflammatory gene, dual-specificity phosphatase-1 (DUSP1) correlated independently with post-operative stay, discriminating patients with normal (≤8 days) from those with prolonged (>8 days) hospitalization (p<0.004). To validate DUSP1 as a marker of risk for post-operative complications, we prospectively analyzed 181 patients undergoing CABG at Tartu University Hospital for DUSP1 protein levels in pre-operative blood samples. The pre-operative plasma levels of DUSP1 clearly discriminated patients with normal from those with prolonged hospitalization (p=2x10-13; odds ratio = 5.1, p<0.0001; receiver operating characteristic area under the curve = 0.80). Taken together, these results indicate that blood levels of the anti-inflammatory protein DUSP1 can be used as a biomarker for post-operative complications leading to prolonged hospitalization after CABG and therefore merit further testing in longitudinal studies of patients eligible for CABG.
International Journal of Molecular Medicine 03/2011; 27(6):851-7. · 1.96 Impact Factor
[show abstract][hide abstract] ABSTRACT: The stability of atherosclerotic plaques determines the risk for rupture, which may lead to thrombus formation and potentially severe clinical complications such as myocardial infarction and stroke. Although the rate of plaque formation may be important for plaque stability, this process is not well understood. We took advantage of the atmospheric (14)C-declination curve (a result of the atomic bomb tests in the 1950s and 1960s) to determine the average biological age of carotid plaques.
The cores of carotid plaques were dissected from 29 well-characterized, symptomatic patients with carotid stenosis and analyzed for (14)C content by accelerator mass spectrometry. The average plaque age (i.e. formation time) was 9.6±3.3 years. All but two plaques had formed within 5-15 years before surgery. Plaque age was not associated with the chronological ages of the patients but was inversely related to plasma insulin levels (p = 0.0014). Most plaques were echo-lucent rather than echo-rich (2.24±0.97, range 1-5). However, plaques in the lowest tercile of plaque age (most recently formed) were characterized by further instability with a higher content of lipids and macrophages (67.8±12.4 vs. 50.4±6.2, p = 0.00005; 57.6±26.1 vs. 39.8±25.7, p<0.0005, respectively), less collagen (45.3±6.1 vs. 51.1±9.8, p<0.05), and fewer smooth muscle cells (130±31 vs. 141±21, p<0.05) than plaques in the highest tercile. Microarray analysis of plaques in the lowest tercile also showed increased activity of genes involved in immune responses and oxidative phosphorylation.
Our results show, for the first time, that plaque age, as judge by relative incorporation of (14)C, can improve our understanding of carotid plaque stability and therefore risk for clinical complications. Our results also suggest that levels of plasma insulin might be involved in determining carotid plaque age.
PLoS ONE 01/2011; 6(4):e18248. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Parkinson's disease (PD) is a common, adult-onset, neuro-degenerative disorder characterized by the degeneration of cardinal motor signs mainly due to the loss of dopaminergic neurons in the substantia nigra. To date, researchers still have limited understanding of the key molecular events that provoke neurodegeneration in this disease. Here, we present ParkDB, the first queryable database dedicated to gene expression in PD. ParkDB contains a complete set of re-analyzed, curated and annotated microarray datasets. This resource enables scientists to identify and compare expression signatures involved in PD and dopaminergic neuron differentiation under different biological conditions and across species. Database URL: http://www2.cancer.ucl.ac.uk/Parkinson_Db2/
Database The Journal of Biological Databases and Curation 01/2011; 2011:bar007. · 4.20 Impact Factor
[show abstract][hide abstract] ABSTRACT: Background and objectivesAnti-citrullinated protein antibodies (ACPA) with different fine specificities are exclusively found in sera and synovial fluid of rheumatoid arthritis (RA) patients. The presence and the levels of all known ACPA are predominantly associated with HLA-DRB1*04 alleles. Although commonly controlled by HLA-DRB1 and uniquely identify post-translationally modified citrullinated (cit) epitopes, ACPAs have distinctive fine specificities and display low degree of cross-reactivity. Hence, different pathways may selectively regulate specific anti-citrulline immunity in RA. Here, the authors examined whether non-HLA-DRB1 risk alleles influence the levels of antibodies against cyclic citrullinated peptide (CCP) and four additional citrullinated RA-associated antigens in search for shared and distinctive pathways.Material and methodsSera from 384 RA patients with an established disease were analysed for the presence of anti-CCP antibodies and reactivity towards cit-fibrinogen, cit-α-enolase, cit-type-II collagen and cit-vimentin. Genotyping for HLA-DRB1 and 64 additional RA-associated single-nucleotide polymorphisms (SNPs) was preformed. Models of linear regression and contingency tables were used to calculate the association between genes and antibody presence and levels.ResultsTwo SNPs in HLA-DQ and HLA-DRA regions (rs6457617 and rs6457620, respectively) influenced both the CCP levels as well as the other ACPAs, whereas HLA-DPB2 (rs2064476) only influenced anti-CCP levels but not other fine specificities. Outside the HLA region, PTPN22 (rs2476601) and TRAF1 (rs3761847) were found to have an effect on anti-CCP levels as well as all other fine specificities. Several other genes selectively governed the titres of two or even one single fine specificity; for example, CIITA (rs4781003) for cit-fibrinogen, CD40 for cit-α-enolase (rs4810485), CLEC4A (rs1133104) for cit-collagen and OLIG3, TNFAIP3 (rs10499194) for cit-vimentin. These results will be replicated in a bigger cohort of approximately 2000 RA patients.Conclusions
Genes with close association to the immune system, yet outside the HLA-DRB1 region were found to influence the levels of ACPAs. Interestingly, several SNPs affected the overall antibody levels, that is, both against CCP and all four citrullinated antigens. In contrast, other SNPs specifically influenced the antibody levels towards one or two specificities. This data suggests that both common and unique pathways may control anticitrulline immunity in RA.
Annals of The Rheumatic Diseases - ANN RHEUM DIS. 01/2011; 70(2).
[show abstract][hide abstract] ABSTRACT: European funding under framework 7 (FP7) for the virtual physiological human (VPH) project has been in place now for nearly 2 years. The VPH network of excellence (NoE) is helping in the development of common standards, open-source software, freely accessible data and model repositories, and various training and dissemination activities for the project. It is also helping to coordinate the many clinically targeted projects that have been funded under the FP7 calls. An initial vision for the VPH was defined by framework 6 strategy for a European physiome (STEP) project in 2006. It is now time to assess the accomplishments of the last 2 years and update the STEP vision for the VPH. We consider the biomedical science, healthcare and information and communications technology challenges facing the project and we propose the VPH Institute as a means of sustaining the vision of VPH beyond the time frame of the NoE.
Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences 06/2010; 368(1920):2595-614. · 2.89 Impact Factor
[show abstract][hide abstract] ABSTRACT: Combinatorial interactions among transcription factors are critical to directing tissue-specific gene expression. To build a global atlas of these combinations, we have screened for physical interactions among the majority of human and mouse DNA-binding transcription factors (TFs). The complete networks contain 762 human and 877 mouse interactions. Analysis of the networks reveals that highly connected TFs are broadly expressed across tissues, and that roughly half of the measured interactions are conserved between mouse and human. The data highlight the importance of TF combinations for determining cell fate, and they lead to the identification of a SMAD3/FLI1 complex expressed during development of immunity. The availability of large TF combinatorial networks in both human and mouse will provide many opportunities to study gene regulation, tissue differentiation, and mammalian evolution.
[show abstract][hide abstract] ABSTRACT: Hepatocyte nuclear factor-4alpha (HNF4A) is a transcription factor that influences plasma triglyceride metabolism via an as of yet unknown mechanism. In this study, we searched for the critical protein that mediates this effect using different human model systems.
Up- and downregulation of HNF4A in human hepatoma Huh7 and HepG2 cells was associated with marked changes in the secretion of triglyceride-rich lipoproteins (TRLs). Short interfering RNA (siRNA) inhibition of HNF4A influenced the expression of several genes, including acyl-CoA:diacylglycerol acyltransferase 1 (DGAT1). siRNA knockdown of DGAT1 reduced DGAT1 activity and decreased the secretion of TRLs. No additive effects of combined siRNA inhibition of HNF4A and DGAT1 were found on the secretion of TRLs, whereas the increase in TRL secretion induced by HNF4A overexpression was largely abolished by DGAT1 siRNA inhibition. A putative binding site for HNF4A was defined by in silico and in vitro methods. HNF4A and DGAT1 expressions were analyzed in 80 human liver samples, and significant relationships were observed between HNF4A and DGAT1 mRNA levels (r(2)=0.50, P<0.0001) and between DGAT1 mRNA levels and plasma triglyceride concentration (r(2)=0.09, P<0.01).
This study identified DGAT1 as an important protein that participates in the effect of HNF4A on hepatic secretion of TRLs.