[Show abstract][Hide abstract] ABSTRACT: Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses.
[Show abstract][Hide abstract] ABSTRACT: Cancer proteomics provide a powerful approach to identify biomarkers for personalized medicine. Particularly, biomarkers for early detection, prognosis and therapeutic intervention of bone cancers, especially osteosarcomas, are missing. Initially, we compared two-dimensional gel electrophoresis (2-DE)-based protein expression pattern between cell lines of fetal osteoblasts, osteosarcoma and pulmonary metastasis derived from osteosarcoma. Two independent statistical analyses by means of PDQuest® and SameSpot® software revealed a common set of 34 differentially expressed protein spots (p < 0.05). 17 Proteins were identified by mass spectrometry and subjected to Ingenuity Pathway Analysis resulting in one high-ranked network associated with Gene Expression, Cell Death and Cell-To-Cell Signaling and Interaction. Ran/TC4-binding protein (RANBP1) and Cathepsin D (CTSD) were further validated by Western Blot in cell lines while the latter one showed higher expression differences also in cytospins and in clinical samples using tissue microarrays comprising osteosarcomas, metastases, other bone malignancies, and control tissues. The results show that protein expression patterns distinguish fetal osteoblasts from osteosarcomas, pulmonary metastases, and other bone diseases with relevant sensitivities between 55.56% and 100% at ≥87.50% specificity. Particularly, CTSD was validated in clinical material and could thus serve as a new biomarker for bone malignancies and potentially guide individualized treatment regimes.
[Show abstract][Hide abstract] ABSTRACT: Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power. To this end, we have developed a novel variable selection method called relative recurrency variable importance metric (r2VIM). This method incorporates recurrency and variance estimation to assist in optimal threshold selection. For this study, we specifically address how this method performs in data with almost completely epistatic effects (i.e. no marginal effects). Our results show that with appropriate parameter settings, r2VIM can identify interaction effects when the marginal effects are virtually nonexistent. It also outperforms logistic regression, which has essentially no power under this type of model when the number of potential features (genetic variants) is large. (All Supplementary Data can be found here: http://research.nhgri.nih.gov/manuscripts/Bailey-Wilson/r2VIM_epi/).
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 01/2015; 20:195-206. DOI:10.1142/9789814644730_0020
[Show abstract][Hide abstract] ABSTRACT: Background and purpose:
The aim of this study was to determine the impact of functional single nucleotide polymorphism (SNP) pathways involved in the ROS pathway, DNA repair, or TGFB1 signaling on acute or late normal toxicity as well as individual radiosensitivity.
Materials and methods:
Patients receiving breast-conserving surgery and radiotherapy were examined either for erythema (n = 83), fibrosis (n = 123), or individual radiosensitivity (n = 123). The 17 SNPs analyzed are involved in the ROS pathway (GSTP1, SOD2, NQO1, NOS3, XDH), DNA repair (XRCC1, XRCC3, XRCC6, ERCC2, LIG4, ATM) or TGFB signaling (SKIL, EP300, APC, AXIN1, TGFB1). Associations with biological and clinical endpoints were studied for single SNPs but especially for combinations of SNPs assuming that a SNP is either beneficial or deleterious and needs to be weighted.
With one exception, no significant association was seen between a single SNP and the three endpoints studied. No significant associations were also observed when applying a multi-SNP model assuming that each SNP was deleterious. In contrast, significant associations were obtained when SNPs were suggested to be either beneficial or deleterious. These associations increased, when each SNP was weighted individually. Detailed analysis revealed that both erythema and individual radiosensitivity especially depend on SNPs affecting DNA repair and TGFB1 signaling, while SNPs in ROS pathway were of minor importance.
Functional pathways of SNPs may be used to form a risk score allowing to predict acute and late radiation-induced toxicity but also to unravel the underlying biological mechanisms.
Strahlentherapie und Onkologie 08/2014; 191(1). DOI:10.1007/s00066-014-0741-y · 2.91 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Two-point linkage analyses of whole genome sequence data are a promising approach to identify rare variants that segregate with complex diseases in large pedigrees because, in theory, the causal variants have been genotyped. We used whole genome sequence data and simulated traits provided by Genetic Analysis Workshop 18 to evaluate the proportion of false-positive findings in a binary trait using classic two-point parametric linkage analysis. False-positive genome-wide significant log of odds (LOD) scores were identified in more than 80% of 50 replicates for a binary phenotype generated by dichotomizing a quantitative trait that was simulated with a polygenic component (that was not based on any of the provided whole genome sequence genotypes). In contrast, when the trait was truly nongenetic (created by randomly assigning affected-unaffected status), the number of false-positive results was well controlled. These results suggest that when using two-point linkage analyses on whole genome sequence data, one should carefully examine regions yielding significant two-point LOD scores with multipoint analysis and that a more stringent significance threshold may be needed.
[Show abstract][Hide abstract] ABSTRACT: A dozen genes/regions have been confirmed as genetic risk factors for oral clefts in human association and linkage studies, and animal models argue even more genes may be involved. Genomic sequencing studies should identify specific causal variants and may reveal additional genes as influencing risk to oral clefts, which have a complex and heterogeneous etiology. We conducted a whole exome sequencing (WES) study to search for potentially causal variants using affected relatives drawn from multiplex cleft families. Two or three affected 2°, 3° and higher degree relatives from 55 multiplex families were sequenced. We examined rare single nucleotide variants (SNVs) shared by affected relatives in 348 recognized candidate genes. Exact probabilities that affected relatives would share these rare variants were calculated given pedigree structures and corrected for the number of variants tested. Five novel and potentially damaging SNVs shared by affected distant relatives were found, and confirmed by Sanger sequencing. One damaging SNV in CDH1, shared by three affected second cousins from a single family, attained statistical significance (p=0.02 after correcting for multiple tests). Family based designs such as used in this WES study offer important advantages for identifying genes likely to be causing complex and heterogeneous disorders.
[Show abstract][Hide abstract] ABSTRACT: Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios.
We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented.
The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.
[Show abstract][Hide abstract] ABSTRACT: Chromosomal aneuploidy has been identified as a prognostic factor in the majority of sporadic carcinomas. However, it is not known how chromosomal aneuploidy affects chromosome-specific protein expression in particular, and the cellular proteome equilibrium in general.
The aim was to detect chromosomal aneuploidy-associated expression changes in cell clones carrying trisomies found in colorectal cancer.
We used microcell-mediated chromosomal transfer to generate three artificial trisomic cell clones of the karyotypically stable, diploid, yet mismatch-deficient, colorectal cancer cell line DLD1 - each of them harboring one extra copy of either chromosome 3, 7 or 13. Protein expression differences were assessed by two-dimensional gel electrophoresis and mass spectrometry, compared to whole-genome gene expression data, and evaluated by PANTHER classification system and Ingenuity Pathway Analysis (IPA).
In total, 79 differentially expressed proteins were identified between the trisomic clones and the parental cell line. Up-regulation of PCNA and HMGB1 as well as down-regulation of IDH3A and PSMB3 were revealed as trisomy-associated alterations involved in regulating genome stability.
These results show that trisomies affect the expression of genes and proteins that are not necessarily located on the trisomic chromosome, but reflect a pathway-related alteration of the cellular equilibrium.
[Show abstract][Hide abstract] ABSTRACT: A large-scale RNAi screen was performed for 8 different melanoma cell lines using a pooled whole genome lentiviral shRNA library. shRNAs affecting proliferation of transduced melanoma cells were negatively selected during 10 days of culture. Overall, 617 shRNAs were identified by microarray hybridization. Pathway analyses identified mitogen-activated protein kinase (MAPK) pathway members such as ERK1/2, JNK1/2 and MAP3K7 and protein kinase Cβ (PKCβ) as candidate genes. Knockdown of PKCβ most consistently reduced cellular proliferation, colony formation and migratory capacity of melanoma cells and was selected for further validation. PKCβ showed enhanced expression in human primary melanomas and distant metastases as compared with benign melanocytic nevi. Moreover, treatment of melanoma cells with PKCβ-specific inhibitor enzastaurin reduced melanoma cell growth but had only small effects on benign fibroblasts. Finally, PKCβ-shRNA significantly reduced lung colonisation capacity of stably transduced melanoma cells in mice. Taken together, the present study identified new candidate genes for melanoma cell growth and proliferation. PKCβ seems to play an important role in these processes and might serve as a new target for treatment of metastatic melanoma. This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: To examine the association of polymorphisms in ATM (codon 158), GSTP1 (codon 105), SOD2 (codon 16), TGFB1 (position -509), XPD (codon 751), and XRCC1 (codon 399) with the risk of severe erythema after breast conserving radiotherapy.
Retrospective analysis of 83 breast cancer patients treated with breast conserving radiotherapy. A total dose of 50.4 Gy was administered, applying 1.8 Gy/fraction within 42 days. Erythema was evaluated according to the Radiation Therapy Oncology Group (RTOG) score. DNA was extracted from blood samples and polymorphisms were determined using either the Polymerase Chain Reaction based Restriction-Fragment-Length-Polymorphism (PCR-RFL) technique or Matrix-Assisted-Laser-Desorption/Ionization -Time-Of-Flight-Mass-Spectrometry (MALDI-TOF). Relative excess heterozygosity (REH) was investigated to check compatibility of genotype frequencies with Hardy-Weinberg equilibrium (HWE). In addition, p-values from the standard exact HWE lack of fit test were calculated using 100,000 permutations. HWE analyses were performed using R.
Fifty-six percent (46/83) of all patients developed erythema of grade 2 or 3, with this risk being higher for patients with large breast volume (odds ratio, OR = 2.55, 95% confidence interval, CI: 1.03-6.31, p = 0.041). No significant association between SNPs and risk of erythema was found when all patients were considered. However, in patients with small breast volume the TGFB1 SNP was associated with erythema (p = 0.028), whereas the SNP in XPD showed an association in patients with large breast volume (p = 0.046). A risk score based on all risk alleles was neither significant in all patients nor in patients with small or large breast volume. Risk alleles of most SNPs were different compared to a previously identified risk profile for fibrosis.
The genetic risk profile for erythema appears to be different for patients with small and larger breast volume. This risk profile seems to be specific for erythema as compared to a risk profile for fibrosis.
[Show abstract][Hide abstract] ABSTRACT: The mechanisms underlying the transformation from chronic Helicobacter pylori gastritis to gastric extranodal marginal zone lymphoma (MALT lymphoma) are poorly understood. This study aims to identify microRNAs that might be involved in the process of neoplastic transformation. We generated microRNA signatures by RT-PCR in 68 gastric biopsy samples representing normal mucosa, gastritis, suspicious lymphoid infiltrates, and overt MALT lymphoma according to Wotherspoon criteria. Analyses revealed a total of 41 microRNAs that were significantly upregulated (n = 33) or downregulated (n = 8) in succession from normal mucosa to gastritis and to MALT lymphoma. While some of these merely reflect the presence of lymphocytes (e.g. miR-566 and miR-212) or H. pylori infection (e.g. miR-155 and let7f), a distinct set of five microRNAs (miR-150, miR-550, miR-124a, miR-518b and miR-539) was shown to be differentially expressed in gastritis as opposed to MALT lymphoma. This differential expression might therefore indicate a central role of these microRNAs in the process of malignant transformation.
Archiv für Pathologische Anatomie und Physiologie und für Klinische Medicin 03/2012; 460(4):371-7. DOI:10.1007/s00428-012-1215-1 · 2.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: One major expectation from the transcriptome in humans is to characterize the biological basis of associations identified by genome-wide association studies. So far, few cis expression quantitative trait loci (eQTLs) have been reliably related to disease susceptibility. Trans-regulating mechanisms may play a more prominent role in disease susceptibility. We analyzed 12,808 genes detected in at least 5% of circulating monocyte samples from a population-based sample of 1,490 European unrelated subjects. We applied a method of extraction of expression patterns-independent component analysis-to identify sets of co-regulated genes. These patterns were then related to 675,350 SNPs to identify major trans-acting regulators. We detected three genomic regions significantly associated with co-regulated gene modules. Association of these loci with multiple expression traits was replicated in Cardiogenics, an independent study in which expression profiles of monocytes were available in 758 subjects. The locus 12q13 (lead SNP rs11171739), previously identified as a type 1 diabetes locus, was associated with a pattern including two cis eQTLs, RPS26 and SUOX, and 5 trans eQTLs, one of which (MADCAM1) is a potential candidate for mediating T1D susceptibility. The locus 12q24 (lead SNP rs653178), which has demonstrated extensive disease pleiotropy, including type 1 diabetes, hypertension, and celiac disease, was associated to a pattern strongly correlating to blood pressure level. The strongest trans eQTL in this pattern was CRIP1, a known marker of cellular proliferation in cancer. The locus 12q15 (lead SNP rs11177644) was associated with a pattern driven by two cis eQTLs, LYZ and YEATS4, and including 34 trans eQTLs, several of them tumor-related genes. This study shows that a method exploiting the structure of co-expressions among genes can help identify genomic regions involved in trans regulation of sets of genes and can provide clues for understanding the mechanisms linking genome-wide association loci to disease.
[Show abstract][Hide abstract] ABSTRACT: DNA aneuploidy has been identified as a prognostic factor in the majority of epithelial malignancies. We aimed at identifying ploidy-associated protein expression in endometrial cancer of different prognostic subgroups. Comparison of gel electrophoresis-based protein expression patterns between normal endometrium (n = 5), diploid (n = 7), and aneuploid (n = 7) endometrial carcinoma detected 121 ploidy-associated protein forms, 42 differentially expressed between normal endometrium and diploid endometrioid carcinomas, 37 between diploid and aneuploid endometrioid carcinomas, and 41 between diploid endometrioid and aneuploid uterine papillary serous cancer. Proteins were identified by mass spectrometry and evaluated by Ingenuity Pathway Analysis. Targets were confirmed by liquid chromatography/mass spectrometry. Mass spectrometry identified 41 distinct polypeptides and pathway analysis resulted in high-ranked networks with vimentin and Nf-κB as central nodes. These results identify ploidy-associated protein expression differences that overrule histopathology-associated expression differences and emphasize particular protein networks in genomic stability of endometrial cancer.
Cellular and Molecular Life Sciences CMLS 07/2011; 69(2):325-33. DOI:10.1007/s00018-011-0752-0 · 5.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In humans, the fraction of X-linked genes with higher expression in females has been estimated to be 5% from microarray studies, a proportion lower than the 25% of genes thought to escape X inactivation. We analyzed 715 X-linked transcripts in circulating monocytes from 1,467 subjects and found an excess of female-biased transcripts on the X compared to autosomes (9.4% vs 5.5%, p<2×10(-5)). Among the genes not previously known to escape inactivation, the most significant one was EFHC2 whose 20% of variability was explained by sex. We also investigated cis expression quantitative trait loci (eQTLs) by analyzing 15,703 X-linked SNPs. The frequency and magnitude of X-linked cis eQTLs were quite similar in males and females. Few genes exhibited a stronger genetic effect in females than in males (ARSD, DCX, POLA1 and ITM2A). These genes would deserve further investigation since they may contribute to sex pathophysiological differences.
[Show abstract][Hide abstract] ABSTRACT: eQTL analyses are important to improve the understanding of genetic association results. We performed a genome-wide association and global gene expression study to identify functionally relevant variants affecting the risk of coronary artery disease (CAD).
In a genome-wide association analysis of 2078 CAD cases and 2953 control subjects, we identified 950 single-nucleotide polymorphisms (SNPs) that were associated with CAD at P<10(-3). Subsequent in silico and wet-laboratory replication stages and a final meta-analysis of 21 428 CAD cases and 38 361 control subjects revealed a novel association signal at chromosome 10q23.31 within the LIPA (lysosomal acid lipase A) gene (P=3.7×10(-8); odds ratio, 1.1; 95% confidence interval, 1.07 to 1.14). The association of this locus with global gene expression was assessed by genome-wide expression analyses in the monocyte transcriptome of 1494 individuals. The results showed a strong association of this locus with expression of the LIPA transcript (P=1.3×10(-96)). An assessment of LIPA SNPs and transcript with cardiovascular phenotypes revealed an association of LIPA transcript levels with impaired endothelial function (P=4.4×10(-3)).
The use of data on genetic variants and the addition of data on global monocytic gene expression led to the identification of the novel functional CAD susceptibility locus LIPA, located on chromosome 10q23.31. The respective eSNPs associated with CAD strongly affect LIPA gene expression level, which was related to endothelial dysfunction, a precursor of CAD.