Mouse (m) 11β-hydroxysteroid dehydrogenase type 2 (11βHSD2) was homology-modeled, and its structure and ligand-receptor interaction were analyzed. The modeled m11βHSD2 showed significant 3D similarities to the human (h) 11βHSD1 and 2 structures. The contact energy profiles of the m11βHSD2 model were in good agreement with those of the h11βHSD1 and 2 structures. The secondary structure of the m11βHSD2 model exhibited a central 6-stranded all-parallel β-sheet sandwich-like structure, flanked on both sides by 3-helices. Ramachandran plots revealed that only 1.1% of the amino acid residues were in the disfavored region for m11βHSD2. Further, the molecular surfaces and electrostatic analyses of the m11βHSD2 model at the ligand-binding site exhibited that the model was almost identical to the h11βHSD2 model. Furthermore, docking simulation and ligand-receptor interaction analyses revealed the similarity of the ligand-receptor bound conformation between the m11βHSD2 and h11βHSD2 models. These results indicate that the m11βHSD2 model was successfully evaluated and analyzed. To the best of our knowledge, this is the first report of a m11βHSD2 model with detailed analyses, and our data verify that the mouse model can be utilized for application to the human model to target 11βHSD2 for the development of anticancer drugs.
Gene expression profiling has provided insights into different cancer types and revealed tissue-specific expression signatures. Alterations in microRNA expression contribute to the pathogenesis of many types of human diseases. Few studies have integrated all levels of gene expression, miRNA and methylation to uncover correlations between these data types. We performed an integrated profiling to discover instances of miRNAs associated with a gene expression and DNA methylation signature across multiple cancer types. Using data from The Cancer Genome Atlas (TCGA), we revealed a concordant gene expression and methylation signature associated with the microRNA hsa-miR-142 across the same samples. In all cancer types examined, we found a signature of co-expression of a gene set R and methylated sites M, which correlate positively (M+) or negatively (M-) with the expression of hsa-miR-142. The set R consistently contains many genes, such as TRAF3IP3, NCKAP1L, CD53, LAPTM5, PTPRC, EVI2B, DOCK2, LCP2, CYBB and FYB. The signature is preserved across glioblastoma, ovarian, breast, colon, kidney, lung, uterine and rectum cancer. There is 28% overlap of methylation sites in M between glioblastoma (GBM) and ovarian cancer. There is 60% overlap of genes in R between GBM and ovarian (P = 1.3e(-11)). Most of the genes in R are known to be expressed in lymphocytes and haematopoietic stem cells, while M reflects membrane proteins involved in cell-cell adhesion functions. We speculate that the hsa-miR-142 associated signature may signal haematopoietic-specific processes and an accumulation of methylation events triggering a progressive loss of cell-cell adhesion. We also observed that GBM samples belonging to the proneural subtype tend to have underexpressed hsa-miR-142 and R genes, hypomethylated M+ and hypermethylated M-, while the mesenchymal samples have the opposite profile.
We identified gene expression signatures predicting responsiveness to a Kinesin-5 (KIF11) inhibitor (Kinesin-5i) in cultured colon tumor cell lines. Genes predicting resistance to Kinesin-5i were enriched for those from chromosome 20q, a region of frequent amplification in a number of tumor types. siRNAs targeting genes in this chromosomal region identified AURKA, TPX2 and MYBL2 as genes whose disruption enhances response to Kinesin-5i. Taken together, our results show functional interaction between these genes, and suggest that their overexpression is involved in resistance to Kinesin-5i. Furthermore, our results suggest that patients whose tumors overexpress AURKA due to amplification of 20q will more likely resist treatment with Kinesin-5 inhibitor, and that inactivation of AURKA may sensitize these patients to treatment.
This study aimed at discriminating carcinogens on the basis of hepatic transcript profiling in the rats administrated with a variety of carcinogens and non-carcinogens. We conducted 28-day toxicity tests in male F344 rats with 47 carcinogens and 26 non-carcinogens, and then investigated periodically the hepatic gene expression profiles using custom microarrays. By hierarchical cluster analysis based on significantly altered genes, carcinogens were clustered into three major groups (Group 1 to 3). The formation of these groups was not affected by the gene sets used as well as the administration period, indicating that the grouping of carcinogens was universal independent of the conditions of both statistical analysis and toxicity testing. Seventeen carcinogens belonging to Group 1 were composed of mainly rat hepatocarcinogens, most of them being mutagenic ones. Group 2 was formed by three subgroups, which were composed of 23 carcinogens exhibiting distinct properties in terms of genotoxicity and target tissues, namely nonmutagenic hepatocarcinogens, and mutagenic and nonmutagenic carcinogens both of which are targeted to other tissues. Group 3 contained 6 carcinogens including 4 estrogenic substances, implying the group of estrogenic carcinogens. Gene network analyses revealed that the significantly altered genes in Group 1 included Bax, Tnfrsf6, Btg2, Mgmt and Abcb1b, suggesting that p53-mediated signaling pathway involved in early pathologic alterations associated with preceding mutagenic carcinogenesis. Thus, the common transcriptional signatures for each group might reflect the early molecular events of carcinogenesis and hence would enable us to identify the biomarker genes, and then to develop a new assay for carcinogenesis prediction.
We have previously shown the hepatic gene expression profiles of carcinogens in 28-day toxicity tests were clustered into three major groups (Group-1 to 3). Here, we developed a new prediction method for Group-1 carcinogens which consist mainly of genotoxic rat hepatocarcinogens. The prediction formula was generated by a support vector machine using 5 selected genes as the predictive genes and predictive score was introduced to judge carcinogenicity. It correctly predicted the carcinogenicity of all 17 Group-1 chemicals and 22 of 24 non-carcinogens regardless of genotoxicity. In the dose-response study, the prediction score was altered from negative to positive as the dose increased, indicating that the characteristic gene expression profile emerged over a range of carcinogen-specific doses. We conclude that the prediction formula can quantitatively predict the carcinogenicity of Group-1 carcinogens. The same method may be applied to other groups of carcinogens to build a total system for prediction of carcinogenicity.
The present paper aims at demonstrating clinically oriented applications of the multiscale four dimensional in vivo tumor growth simulation model previously developed by our research group. To this end the effect of weekend radiotherapy treatment gaps and p53 gene status on two virtual glioblastoma tumors differing only in p53 gene status is investigated in silico. Tumor response predictions concerning two rather extreme dose fractionation schedules (daily dose of 4.5 Gy administered in 3 equal fractions) namely HART (Hyperfractionated Accelerated Radiotherapy weekend less) 54 Gy and CHART (Continuous HART) 54 Gy are presented and compared. The model predictions suggest that, for the same p53 status, HART 54 Gy and CHART 54 Gy have almost the same long term effects on locoregional tumor control. However, no data have been located in the literature concerning a comparison of HART and CHART radiotherapy schedules for glioblastoma. As non small cell lung carcinoma (NSCLC) may also be a fast growing and radiosensitive tumor, a comparison of the model predictions with the outcome of clinical studies concerning the response of NSCLC to HART 54 Gy and CHART 54 Gy is made. The model predictions are in accordance with corresponding clinical observations, thus strengthening the potential of the model.
Breast tumors have been described by molecular subtypes characterized by pervasively different gene expression profiles. The subtypes are associated with different clinical parameters and origin of precursor cells. However, the biological pathways and chromosomal aberrations that differ between the subgroups are less well characterized. The molecular subtypes are associated with different risk of metastatic recurrence of the disease. Nevertheless, the performance of these overall patterns to predict outcome is far from optimal, suggesting that biological mechanisms that extend beyond the subgroups impact metastasis.
We have scrutinized publicly available gene expression datasets and identified molecular subtypes in 1,394 breast tumors with outcome data. By analysis of chromosomal regions and pathways using "Gene set enrichment analysis" followed by a meta-analysis, we identified comprehensive mechanistic differences between the subgroups. Furthermore, the same approach was used to investigate mechanisms related to metastasis within the subgroups. A striking finding is that the molecular subtypes account for the majority of biological mechanisms associated with metastasis. However, some mechanisms, aside from the subtypes, were identified in a training set of 1,239 tumors and confirmed by survival analysis in two independent validation datasets from the same type of platform and consisting of very comparable node-negative patients that did not receive adjuvant medical therapy. The results show that high expression of 5q14 genes and low levels of TNFR2 pathway genes were associated with poor survival in basal-like cancers. Furthermore, low expression of 5q33 genes and interleukin-12 pathway genes were associated with poor outcome exclusively in ERBB2-like tumors.
The identified regions, genes, and pathways may be potential drug targets in future individualized treatment strategies.
Aiming to find key genes and events, we analyze a large data set on diffuse large B-cell lymphoma (DLBCL) gene-expression (248 patients, 12196 spots). Applying the loess normalization method on these raw data yields improved survival predictions, in particular for the clinical important group of patients with medium survival time. Furthermore, we identify a simplified prognosis predictor, which stratifies different risk groups similarly well as complex signatures.
We identify specific, activated B cell-like (ABC) and germinal center B cell-like (GCB) distinguishing genes. These include early (e.g. CDKN3) and late (e.g. CDKN2C) cell cycle genes.
Independently from previous classification by marker genes we confirm a clear binary class distinction between the ABC and GCB subgroups. An earlier suggested third entity is not supported. A key regulatory network, distinguishing marked over-expression in ABC from that in GCB, is built by: ASB13, BCL2, BCL6, BCL7A, CCND2, COL3A1, CTGF, FN1, FOXP1, IGHM, IRF4, LMO2, LRMP, MAPK10, MME, MYBL1, NEIL1 and SH3BP5. It predicts and supports the aggressive behaviour of the ABC subgroup. These results help to understand target interactions, improve subgroup diagnosis, risk prognosis as well as therapy in the ABC and GCB DLBCL subgroups.
Lung cancer is the second most commonly occurring non-cutaneous cancer in the United States with the highest mortality rate among both men and women. In this study, we utilized three lung cancer microarray datasets generated by previous researchers to identify differentially expressed genes, altered signaling pathways, and assess the involvement of Hedgehog (Hh) pathway. The three datasets contain the expression levels of tens of thousands genes in normal lung tissues and squamous cell lung carcinoma. The datasets were combined and analyzed. The dysregulated genes and altered signaling pathways were identified using statistical methods. We then performed Fisher’s exact test on the significance of the association of Hh pathway downstream genes and squamous cell lung carcinoma.
395 genes were found commonly differentially expressed in squamous cell lung carcinoma. The genes encoding fibrous structural protein keratins and cell cycle dependent genes encoding cyclin-dependent kinases were significantly up-regulated while the ones encoding LIM domains were down. Over 100 signaling pathways were implicated in squamous cell lung carcinoma, including cell cycle regulation pathway, p53 tumor-suppressor pathway, IL-8 signaling, Wnt-β-catenin pathway, mTOR signaling and EGF signaling. In addition, 37 out of 223 downstream molecules of Hh pathway were altered. The P-value from the Fisher’s exact test indicates that Hh signaling is implicated in squamous cell lung carcinoma.
Numerous genes were altered and multiple pathways were dysfunctional in squamous cell lung carcinoma. Many of the altered genes have been implicated in different types of carcinoma while some are organ-specific. Hh signaling is implicated in squamous cell lung cancer, opening the door for exploring new cancer therapeutic treatment using GLI antagonist GANT 61.
Many types of tumors exhibit characteristic chromosomal losses or gains, as well as local amplifications and deletions. Within any given tumor type, sample specific amplifications and deletions are also observed. Typically, a region that is aberrant in more tumors, or whose copy number change is stronger, would be considered as a more promising candidate to be biologically relevant to cancer. We sought for an intuitive method to define such aberrations and prioritize them. We define V, the "volume" associated with an aberration, as the product of three factors: (a) fraction of patients with the aberration, (b) the aberration's length and (c) its amplitude. Our algorithm compares the values of V derived from the real data to a null distribution obtained by permutations, and yields the statistical significance (p-value) of the measured value of V. We detected genetic locations that were significantly aberrant, and combine them with chromosomal arm status (gain/loss) to create a succinct fingerprint of the tumor genome. This genomic fingerprint is used to visualize the tumors, highlighting events that are co-occurring or mutually exclusive. We apply the method on three different public array CGH datasets of Medulloblastoma and Neuroblastoma, and demonstrate its ability to detect chromosomal regions that were known to be altered in the tested cancer types, as well as to suggest new genomic locations to be tested. We identified a potential new subtype of Medulloblastoma, which is analogous to Neuroblastoma type 1.
This paper concerns a new method for identifying aberrant signal transduction pathways (STPs) in cancer using case/control gene expression-level datasets, and applying that method and an existing method to an ovarian carcinoma dataset. Both methods identify STPs that are plausibly linked to all cancers based on current knowledge. Thus, the paper is most appropriate for the cancer informatics community. Our hypothesis is that STPs that are altered in tumorous tissue can be identified by applying a new Bayesian network (BN)-based method (causal analysis of STP aberration (CASA)) and an existing method (signaling pathway impact analysis (SPIA)) to the cancer genome atlas (TCGA) gene expression-level datasets. To test this hypothesis, we analyzed 20 cancer-related STPs and 6 randomly chosen STPs using the 591 cases in the TCGA ovarian carcinoma dataset, and the 102 controls in all 5 TCGA cancer datasets. We identified all the genes related to each of the 26 pathways, and developed separate gene expression datasets for each pathway. The results of the two methods were highly correlated. Furthermore, many of the STPs that ranked highest according to both methods are plausibly linked to all cancers based on current knowledge. Finally, CASA ranked the cancer-related STPs over the randomly selected STPs at a significance level below 0.05 (P = 0.047), but SPIA did not (P = 0.083).
Colorectal cancer (CRC) is one of the most frequently occurring cancers in Japan, and thus a wide range of methods have been deployed to study the molecular mechanisms of CRC. In this study, we performed a comprehensive analysis of CRC, incorporating copy number aberration (CRC) and gene expression data. For the last four years, we have been collecting data from CRC cases and organizing the information as an "omics" study by integrating many kinds of analysis into a single comprehensive investigation. In our previous studies, we had experienced difficulty in finding genes related to CRC, as we observed higher noise levels in the expression data than in the data for other cancers. Because chromosomal aberrations are often observed in CRC, here, we have performed a combination of CNA analysis and expression analysis in order to identify some new genes responsible for CRC. This study was performed as part of the Clinical Omics Database Project at Tokyo Medical and Dental University. The purpose of this study was to investigate the mechanism of genetic instability in CRC by this combination of expression analysis and CNA, and to establish a new method for the diagnosis and treatment of CRC.
Comprehensive gene expression analysis was performed on 79 CRC cases using an Affymetrix Gene Chip, and comprehensive CNA analysis was performed using an Affymetrix DNA Sty array. To avoid the contamination of cancer tissue with normal cells, laser micro-dissection was performed before DNA/RNA extraction. Data analysis was performed using original software written in the R language.
We observed a high percentage of CNA in colorectal cancer, including copy number gains at 7, 8q, 13 and 20q, and copy number losses at 8p, 17p and 18. Gene expression analysis provided many candidates for CRC-related genes, but their association with CRC did not reach the level of statistical significance. The combination of CNA and gene expression analysis, together with the clinical information, suggested UGT2B28, LOC440995, CXCL6, SULT1B1, RALBP1, TYMS, RAB12, RNMT, ARHGDIB, S1000A2, ABHD2, OIT3 and ABHD12 as genes that are possibly associated with CRC. Some of these genes have already been reported as being related to CRC. TYMS has been reported as being associated with resistance to the anti-cancer drug 5-fluorouracil, and we observed a copy number increase for this gene. RALBP1, ARHGDIB and S100A2 have been reported as oncogenes, and we observed copy number increases in each. ARHGDIB has been reported as a metastasis-related gene, and our data also showed copy number increases of this gene in cases with metastasis.
The combination of CNA analysis and gene expression analysis was a more effective method for finding genes associated with the clinicopathological classification of CRC than either analysis alone. Using this combination of methods, we were able to detect genes that have already been associated with CRC. We also identified additional candidate genes that may be new markers or targets for this form of cancer.
Array-based comparative genomic hybridization (aCGH) allows measuring DNA copy number at the whole genome scale. In cancer studies, one may be interested in identifying DNA copy number aberrations (CNAs) associated with certain clinicopathological characteristics such as cancer metastasis. We proposed to define test regions based on copy number pattern profiles across multiple samples, using either smoothed log(2)-ratio or discrete data of copy number gain/loss calls. Association test performed on the refined test regions instead of the probes has improved power due to reduced number of tests. We also compared three types of measurement of copy number levels, normalized log(2)-ratio, smoothed log(2)-ratio, and copy number gain or loss calls in statistical hypothesis testing. The relative strengths and weaknesses of the proposed method were demonstrated using both simulation studies and real data analysis of a liver cancer study.
In order to identify somatic focal copy number aberrations (CNAs) in cancer specimens and to distinguish them from germ-line copy number variations (CNVs), we developed the software package FocalCall. FocalCall enables user-defined size cutoffs to recognize focal aberrations and builds on established array comparative genomic hybridization segmentation and calling algorithms. To distinguish CNAs from CNVs, the algorithm uses matched patient normal signals as references or, if this is not available, a list with known CNVs in a population. Furthermore, FocalCall differentiates between homozygous and heterozygous deletions as well as between gains and amplifications and is applicable to high-resolution array and sequencing data.
AVAILABILITY AND IMPLEMENTATION
FocalCall is available as an R-package from: https://github.com/OscarKrijgsman/focalCall . The R-package will be available in Bioconductor.org as of release 3.0.
Existing methods for estimating copy number variations in array comparative genomic hybridization (aCGH) data are limited to estimations of the gain/loss of chromosome regions for single sample analysis. We propose the linear-median method for estimating shared copy numbers in DNA sequences across multiple samples, demonstrate its operating characteristics through simulations and applications to real cancer data, and compare it to two existing methods.
Our proposed linear-median method has the power to estimate common changes that appear at isolated single probe positions or very short regions. Such changes are hard to detect by current methods. This new method shows a higher rate of true positives and a lower rate of false positives. The linear-median method is non-parametric and hence is more robust in estimating copy number. Additionally the linear-median method is easily computable for practical aCGH data sets compared to other copy number estimation methods.
In this paper we develop a Bayesian analysis to estimate the disease prevalence, the sensitivity and specificity of three cervical cancer screening tests (cervical cytology, visual inspection with acetic acid and Hybrid Capture II) in the presence of a covariate and in the absence of a gold standard. We use Metropolis-Hastings algorithm to obtain the posterior summaries of interest. The estimated prevalence of cervical lesions was 6.4% (a 95% credible interval [95% CI] was 3.9, 9.3). The sensitivity of cervical cytology (with a result of ≥ ASC-US) was 53.6% (95% CI: 42.1, 65.0) compared with 52.9% (95% CI: 43.5, 62.5) for visual inspection with acetic acid and 90.3% (95% CI: 76.2, 98.7) for Hybrid Capture II (with result of >1 relative light units). The specificity of cervical cytology was 97.0% (95% CI: 95.5, 98.4) and the specificities for visual inspection with acetic acid and Hybrid Capture II were 93.0% (95% CI: 91.0, 94.7) and 88.7% (95% CI: 85.9, 91.4), respectively. The Bayesian model with covariates suggests that the sensitivity and the specificity of the visual inspection with acetic acid tend to increase as the age of the women increases.
The Bayesian method proposed here is an useful alternative to estimate measures of performance of diagnostic tests in the presence of covariates and when a gold standard is not available. An advantage of the method is the fact that the number of parameters to be estimated is not limited by the number of observations, as it happens with several frequentist approaches. However, it is important to point out that the Bayesian analysis requires informative priors in order for the parameters to be identifiable. The method can be easily extended for the analysis of other medical data sets.
In the post-genomic era, computational identification of cell adhesion molecules (CAMs) becomes important in defining new targets for diagnosis and treatment of various diseases including cancer. Lack of a comprehensive CAM-specific database restricts our ability to identify and characterize novel CAMs. Therefore, we developed a comprehensive mammalian cell adhesion molecule (MCAM) database. The current version is an interactive Web-based database, which provides the resources needed to search mouse, human and rat-specific CAMs and their sequence information and characteristics such as gene functions and virtual gene expression patterns in normal and tumor tissues as well as cell lines. Moreover, the MCAM database can be used for various bioinformatics and biological analyses including identifying CAMs involved in cell-cell interactions and homing of lymphocytes, hematopoietic stem cells and malignant cells to specific organs using data from high-throughput experiments. Furthermore, the database can also be used for training and testing existing transmembrane (TM) topology prediction methods specifically for CAM sequences. The database is freely available online at http://app1.unmc.edu/mcam.
In genome-wide association studies (GWAS), regression analysis has been most commonly used to establish an association between a phenotype and genetic variants, such as single nucleotide polymorphism (SNP). However, most applications of regression analysis have been restricted to the investigation of single marker because of the large computational burden. Thus, there have been limited applications of regression analysis to multiple SNPs, including gene-gene interaction (GGI) in large-scale GWAS data. In order to overcome this limitation, we propose CARAT-GxG, a GPU computing system-oriented toolkit, for performing regression analysis with GGI using CUDA (compute unified device architecture). Compared to other methods, CARAT-GxG achieved almost 700-fold execution speed and delivered highly reliable results through our GPU-specific optimization techniques. In addition, it was possible to achieve almost-linear speed acceleration with the application of a GPU computing system, which is implemented by the TORQUE Resource Manager. We expect that CARAT-GxG will enable large-scale regression analysis with GGI for GWAS data.
The main problem for health professionals and patients in accessing information is that this information is very often distributed over many medical records and locations. This problem is particularly acute in cancerology because patients may be treated for many years and undergo a variety of examinations. Recent advances in technology make it feasible to gain access to medical records anywhere and anytime, allowing the physician or the patient to gather information from an "ephemeral electronic patient record". However, this easy access to data is accompanied by the requirement for improved security (confidentiality, traceability, integrity, ...) and this issue needs to be addressed. In this paper we propose and discuss a decentralised approach based on recent advances in information sharing and protection: Grid technologies and watermarking methodologies. The potential impact of these technologies for oncology is illustrated by the examples of two experimental cases: a cancer surveillance network and a radiotherapy treatment plan. It is expected that the proposed approach will constitute the basis of a future secure "google-like" access to medical records.
When confronted with a small sample, feature-selection algorithms often fail to find good feature sets, a problem exacerbated for high-dimensional data and large feature sets. The problem is compounded by the fact that, if one obtains a feature set with a low error estimate, the estimate is unreliable because training-data-based error estimators typically perform poorly on small samples, exhibiting optimistic bias or high variance. One way around the problem is limit the number of features being considered, restrict features sets to sizes such that all feature sets can be examined by exhaustive search, and report a list of the best performing feature sets. If the list is short, then it greatly restricts the possible feature sets to be considered as candidates; however, one can expect the lowest error estimates obtained to be optimistically biased so that there may not be a close-to-optimal feature set on the list. This paper provides a power analysis of this methodology; in particular, it examines the kind of results one should expect to obtain relative to the length of the list and the number of discriminating features among those considered. Two measures are employed. The first is the probability that there is at least one feature set on the list whose true classification error is within some given tolerance of the best feature set and the second is the expected number of feature sets on the list whose true errors are within the given tolerance of the best feature set. These values are plotted as functions of the list length to generate power curves. The results show that, if the number of discriminating features is not too small—that is, the prior biological knowledge is not too poor—then one should expect, with high probability, to find good feature sets.
Availability: companion website at http://gsp.tamu.edu/Publications/supplementary/zhao09a/
The analysis of expression and CGH arrays plays a central role in the study of complex diseases, especially cancer, including finding markers for early diagnosis and prognosis, choosing an optimal therapy, or increasing our understanding of cancer development and metastasis. Asterias (http://www.asterias.info) is an integrated collection of freely-accessible web tools for the analysis of gene expression and aCGH data. Most of the tools use parallel computing (via MPI) and run on a server with 60 CPUs for computation; compared to a desktop or server-based but not parallelized application, parallelization provides speed ups of factors up to 50. Most of our applications allow the user to obtain additional information for user-selected genes (chromosomal location, PubMed ids, Gene Ontology terms, etc.) by using clickable links in tables and/or figures. Our tools include: normalization of expression and aCGH data (DNMAD); converting between different types of gene/clone and protein identifiers (IDconverter/IDClight); filtering and imputation (preP); finding differentially expressed genes related to patient class and survival data (Pomelo II); searching for models of class prediction (Tnasas); using random forests to search for minimal models for class prediction or for large subsets of genes with predictive capacity (GeneSrF); searching for molecular signatures and predictive genes with survival data (SignS); detecting regions of genomic DNA gain or loss (ADaCGH). The capability to send results between different applications, access to additional functional information, and parallelized computation make our suite unique and exploit features only available to web-based applications.
Array-Comparative Genomic Hybridization (aCGH) is a powerful high throughput technology for detecting chromosomal copy number aberrations (CNAs) in cancer, aiming at identifying related critical genes from the affected genomic regions. However, advancing from a dataset with thousands of tabular lines to a few candidate genes can be an onerous and time-consuming process. To expedite the aCGH data analysis process, we have developed a user-friendly aCGH data viewer (aCGHViewer) as a conduit between the aCGH data tables and a genome browser. The data from a given aCGH analysis are displayed in a genomic view comprised of individual chromosome panels which can be rapidly scanned for interesting features. A chromosome panel containing a feature of interest can be selected to launch a detail window for that single chromosome. Selecting a data point of interest in the detail window launches a query to the UCSC or NCBI genome browser to allow the user to explore the gene content in the chromosomal region. Additionally, aCGHViewer can display aCGH and expression array data concurrently to visually correlate the two. aCGHViewer is a stand alone Java visualization application that should be used in conjunction with separate statistical programs. It operates on all major computer platforms and is freely available at http://falcon.roswellpark.org/aCGHview/.
Array comparative genomic hybridization (aCGH) is a high-throughput lab technique to measure genome-wide chromosomal copy numbers. Data from aCGH experiments require extensive pre-processing, which consists of three steps: normalization, segmentation and calling. Each of these pre-processing steps yields a different data set: normalized data, segmented data, and called data. Publications using aCGH base their findings on data from all stages of the pre-processing. Hence, there is no consensus on which should be used for further down-stream analysis. This consensus is however important for correct reporting of findings, and comparison of results from different studies. We discuss several issues that should be taken into account when deciding on which data are to be used. We express the believe that called data are best used, but would welcome opposing views.
The Overlay Tool has been developed to combine high throughput data derived from various microarray platforms. This tool analyzes high-resolution correlations between gene expression changes and either copy number abnormalities (CNAs) or loss of heterozygosity events detected using array comparative genomic hybridization (aCGH). Using an overlay analysis which is designed to be performed using data from multiple microarray platforms on a single biological sample, the Overlay Tool identifies potentially important genes whose expression profiles are changed as a result of losses, gains and amplifications in the cancer genome. In addition, the Overlay Tool will incorporate loss of heterozygosity (LOH) probability data into this overlay procedure. To facilitate this analysis, we developed an application which computationally combines two or more high throughput datasets (e.g. aCGH/expression) into a single categorized dataset for visualization and interrogation using a gene-centric approach. As such, data from virtually any microarray platform can be incorporated without the need to remap entire datasets individually. The resultant categorized (overlay) data set can be conveniently viewed using our in-house visualization tool, aCGHViewer (Shankar et al. 2006), which serves as a conduit to public databases such as UCSC and NCBI, to rapidly investigate genes of interest.
Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.
To explore the advantages of using artificial neural networks (ANNs) to recognize patterns in colposcopy to classify images in colposcopy.
Transversal, descriptive, and analytical study of a quantitative approach with an emphasis on diagnosis. The training test e validation set was composed of images collected from patients who underwent colposcopy. These images were provided by a gynecology clinic located in the city of Criciúma (Brazil). The image database (n = 170) was divided; 48 images were used for the training process, 58 images were used for the tests, and 64 images were used for the validation. A hybrid neural network based on Kohonen self-organizing maps and multilayer perceptron (MLP) networks was used.
After 126 cycles, the validation was performed. The best results reached an accuracy of 72.15%, a sensibility of 69.78%, and a specificity of 68%.
Although the preliminary results still exhibit an average efficiency, the present approach is an innovative and promising technique that should be deeply explored in the context of the present study.
The antitumor drug paclitaxel stabilizes microtubules and reduces their dynamicity, promoting mitotic arrest and eventually apoptosis. Upon assembly of the alpha/beta-tubulin heterodimer, GTP becomes bound to both the alpha and beta-tubulin monomers. During microtubule assembly, the GTP bound to beta-tubulin is hydrolyzed to GDP, eventually reaching steady-state equilibrium between free tubulin dimers and those polymerized into microtubules. Tubulin-binding drugs such as paclitaxel interact with beta-tubulin, resulting in the disruption of this equilibrium. In spite of several crystal structures of tubulin, there is little biochemical insight into the mechanism by which anti-tubulin drugs target microtubules and alter their normal behavior. The mechanism of drug action is further complicated, as the description of altered beta-tubulin isotype expression and/or mutations in tubulin genes may lead to drug resistance as has been described in the literature. Because of the relationship between beta-tubulin isotype expression and mutations within beta-tubulin, both leading to resistance, we examined the properties of altered residues within the taxane, colchicine and Vinca binding sites. The amount of data now available, allows us to investigate common patterns that lead to microtubule disruption and may provide a guide to the rational design of novel compounds that can inhibit microtubule dynamics for specific tubulin isotypes or, indeed resistant cell lines. Because of the vast amount of data published to date, we will only provide a broad overview of the mutational results and how these correlate with differences between tubulin isotypes. We also note that clinical studies describe a number of predictive factors for the response to anti-tubulin drugs and attempt to develop an understanding of the features within tubulin that may help explain how they may affect both microtubule assembly and stability.
This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
The reproducibility of mass spectrometry (MS) data collected using surface enhanced laser desorption/ionization-time of flight (SELDI-TOF) has been questioned. This investigation was designed to test the reproducibility of SELDI data collected over time by multiple users and instruments. Five laboratories prepared arrays once every week for six weeks. Spectra were collected on separate instruments in the individual laboratories. Additionally, all of the arrays produced each week were rescanned on a single instrument in one laboratory. Lab-to-lab and array-to-array variability in alignment parameters were larger than the variability attributable to running samples during different weeks. The coefficient of variance (CV) in spectrum intensity ranged from 25% at baseline, to 80% in the matrix noise region, to about 50% during the exponential drop from the maximum matrix noise. Before normalization, the median CV of the peak heights was 72% and reduced to about 20% after normalization. Additionally, for the spectra from a common instrument, the CV ranged from 5% at baseline, to 50% in the matrix noise region, to 20% during the drop from the maximum matrix noise. Normalization reduced the variability in peak heights to about 18%. With proper processing methods, SELDI instruments produce spectra containing large numbers of reproducibly located peaks, with consistent heights.
The aberrantly expressed signal transducer and activator of transcription 3 (STAT3) predicts poor prognosis, primarily in estrogen receptor positive (ER(+)) breast cancers. Activated STAT3 is overexpressed in luminal A subtype cells. The mechanisms contributing to the prognosis and/or subtype relevant features of STAT3 in ER(+) breast cancers are through multiple interacting regulatory pathways, including STAT3-MYC, STAT3-ERα, and STAT3-MYC-ERα interactions, as well as the direct action of activated STAT3. These data predict malignant events, treatment responses and a novel enhancer of tamoxifen resistance. The inferred crosstalk between ERα and STAT3 in regulating their shared target gene-METAP2 is partially validated in the luminal B breast cancer cell line-MCF7. Taken together, we identify a poor prognosis relevant gene set within the STAT3 network and a robust one in a subset of patients. VEGFA, ABL1, LYN, IGF2R and STAT3 are suggested therapeutic targets for further study based upon the degree of differential expression in our model.
Primary hepatocellular carcinoma (HCC) is currently the fifth most common malignancy and the third most common cause of cancer mortality worldwide. Because of its high prevalence in developing nations, there have been numerous efforts made in the molecular characterization of primary HCC. However, a better understanding into the pathology of HCC required software-assisted network modeling and analysis. In this paper, the author presented his first attempt in exploring the biological implication of gene co-expression in HCC using actor-semiotic network modeling and analysis. The network was first constructed by integrating inter-actor relationships, e.g. gene co-expression, microRNA-to-gene, and protein interactions, with semiotic relationships, e.g. gene-to-Gene Ontology Process. Topological features that are highly discriminative of the HCC phenotype were identified by visual inspection. Finally, the author devised a graph signature-based analysis method to supplement the network exploration.
Microarray technology is a powerful tool, which has been applied to further the understanding of gene expression changes in disease. Array technology has been applied to the diagnosis and prognosis of Acute Myelogenous Leukemia (AML). Arrays have also been used extensively in elucidating the mechanism of and predicting therapeutic response in AML, as well as to further define the mechanism of AML pathogenesis. In this review, we discuss the major paradigms of gene expression array analysis, and provide insights into the use of software tools to annotate the array dataset and elucidate deregulated pathways and gene interaction networks. We present the application of gene expression array technology to questions in acute myelogenous leukemia; specifically, disease diagnosis, treatment and prognosis, and disease pathogenesis. Finally, we discuss several new and emerging array technologies, and how they can be further utilized to improve our understanding of AML.
The 18,352 pancreatic ductal adenocarcinoma (PDAC) cases from the Surveillance Epidemiology and End Results (SEER) database were analyzed using the Kaplan-Meier method for the following variables: race, gender, marital status, year of diagnosis, age at diagnosis, pancreatic subsite, T-stage, N-stage, M-stage, tumor size, tumor grade, performed surgery, and radiation therapy. Because the T-stage variable did not satisfy the proportional hazards assumption, the cases were divided into cases with T1- and T2-stages (localized tumor) and cases with T3- and T4-stages (extended tumor). For estimating survival and conditional survival probabilities in each group, a multivariate Cox regression model adjusted for the remaining covariates was developed. Testing the reproducibility of model parameters and generalizability of these models showed that the models are well calibrated and have concordance indexes equal to 0.702 and 0.712, respectively. Based on these models, a prognostic estimator of survival for patients diagnosed with PDAC was developed and implemented as a computerized web-based tool.
In this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed using caGEDA (http://bioinformatics2.pitt.edu/GE2/GEDA.html) to measure the degree of differential expression of genes existing between two populations. The datasets were randomly split into at least two subsets, each analyzed for differentially expressed genes between the two sample groups, and the gene lists compared for overlapping genes. Efficiency Analysis is an intuitive method that compares the differences in the percentage of overlap of genes from two or more data subsets, found by the same test over a range of testing methods. Tests that yield consistent gene lists across independently analyzed splits are preferred to those that yield less consistent inferences. For example, a method that exhibits 50% overlap in the 100 top genes from two studies should be preferred to a method that exhibits 5% overlap in the top 100 genes. The same procedure was performed using all available normalization and transformation methods that are available through caGEDA. The 'best' test was then further evaluated using internal cross-validation to estimate generalizable sample classification errors using a Naïve Bayes classification algorithm. A novel test, termed D1 (a derivative of the J5 test) was found to be the most consistent, and to exhibit the lowest overall classification error, and highest sensitivity and specificity. The D1 test relaxes the assumption that few genes are differentially expressed. Efficiency Analysis can be misleading if the tests exhibit a bias in any particular dimension (e.g. expression intensity); we therefore explored intensity-scaled and segmented J5 tests using data in which all genes are scaled to share the same intensity distribution range. Efficiency Analysis correctly predicted the 'best' test and normalization method using the Beer dataset and also performed well with the Bhattacharjee dataset based on both efficiency and classification accuracy criteria.
The accurate prognosis for patients with resectable pancreatic adenocarcinomas requires the incorporation of more factors than those included in AJCC TNM system.
We identified 218 patients diagnosed with stage I and II pancreatic adenocarcinoma at NewYork-Presbyterian Hospital/Columbia University Medical Center (1999 to 2009). Tumor and clinical characteristics were retrieved and associations with survival were assessed by univariate Cox analysis. A multivariable model was constructed and a prognostic score was calculated; the prognostic strength of our model was assessed with the concordance index.
Our cohort had a median age of 67 years and consisted of 49% men; the median follow-up time was 14.3 months and the 5-year survival 3.6%. Age, tumor differentiation and size, alkaline phosphatase, albumin and CA 19-9 were the independent factors of the final multivariable model; patients were thus classified into low (n = 14, median survival = 53.7 months), intermediate (n = 124, median survival = 19.7 months) and high risk groups (n = 80, median survival = 12.3 months). The prognostic classification of our model remained significant after adjusting for adjuvant chemotherapy and the concordance index was 0.73 compared to 0.59 of the TNM system.
Our prognostic model was accurate in stratifying patients by risk and could be incorporated into clinical decisions.
Historically, breast cancer classification has relied on prognostic subtypes. Thus, unlike hematopoietic cancers, breast tumor classification lacks phylogenetic rationale. The feasibility of phylogenetic classification of breast tumors has recently been demonstrated based on estrogen receptor (ER), androgen receptor (AR), vitamin D receptor (VDR) and Keratin 5 expression. Four hormonal states (HR0-3) comprising 11 cellular subtypes of breast cells have been proposed. This classification scheme has been shown to have relevance to clinical prognosis. We examine the implications of such phylogenetic classification on DNA methylation of both breast tumors and normal breast tissues by applying recently developed deconvolution algorithms to three DNA methylation data sets archived on Gene Expression Omnibus. We propose that breast tumors arising from a particular cell-of-origin essentially magnify the epigenetic state of their original cell type. We demonstrate that DNA methylation of tumors manifests patterns consistent with cell-specific epigenetic states, that these states correspond roughly to previously posited normal breast cell types, and that estimates of proportions of the underlying cell types are predictive of tumor phenotypes. Taken together, these findings suggest that the epigenetics of breast tumors is ultimately based on the underlying phylogeny of normal breast tissue.
Reverse phase protein arrays (RPPA) measure the relative expression levels of a protein in many samples simultaneously. Observed signal from these arrays is a combination of true signal, additive background, and multiplicative spatial effects. Background subtraction alone is not sufficient to remove all nonbiological trends from the data. We developed a surface adjustment that uses information from positive control spots to correct for spatial trends on the array beyond additive background. This method uses a generalized additive model to estimate a smoothed surface from positive controls. When positive controls are printed in a dilution series, a nested surface adjustment performs an intensity-based correction. When applicable, surface adjustment is able to remove spatial trends and increase within slide replicate agreement better than background subtraction alone as demonstrated on two sets of arrays. This work demonstrates the importance of including positive control spots on the array.
Philadelphia positive malignant disorders are a clinically divergent group of leukemias. These include chronic myeloid leukemia (CML) and de novo acute Philadelphia positive (Ph(+)) leukemia of both myeloid, and lymphoid origin. Recent whole genome screening of Ph(+)ALL in both children and adults identified an almost obligatory cryptic loss of Ikaros, required for the normal B cell maturation. Although similar losses were found in lymphoid blast crisis the genetic background of the transformation in CML is still poorly defined. We used Significance Analysis of Microarrays (SAM) to analyze comparative genomic hybridization (aCGH) data from 30 CML (10 each of chronic phase, myeloid and lymphoid blast stage), 10 Ph(+)ALL adult patients and 10 disease free controls and were able to: (a) discriminate between the genomes of lymphoid and myeloid blast cells and (b) identify differences in the genome profile of de novo Ph(+)ALL and lymphoid blast transformation of CML (BC/L). Furthermore we were able to distinguish a sub group of Ph(+) ALL characterized by gains in chromosome 9 and recurrent losses at several other genome sites offering genetic evidence for the clinical heterogeneity. The significance of these results is that they not only offer clues regarding the pathogenesis of Ph(+) disorders and highlight the potential clinical implications of a set of probes but also demonstrates what SAM can offer for the analysis of genome data.
Integrative cancer biology research relies on a variety of data-driven computational modeling and simulation methods and techniques geared towards gaining new insights into the complexity of biological processes that are of critical importance for cancer research. These include the dynamics of gene-protein interaction networks, the percolation of sub-cellular perturbations across scales and the impact they may have on tumorigenesis in both experiments and clinics. Such innovative 'systems' research will greatly benefit from enabling Information Technology that is currently under development, including an online collaborative environment, a Semantic Web based computing platform that hosts data and model repositories as well as high-performance computing access. Here, we present one of the National Cancer Institute's recently established Integrative Cancer Biology Programs, i.e. the Center for the Development of a Virtual Tumor, CViT, which is charged with building a cancer modeling community, developing the aforementioned enabling technologies and fostering multi-scale cancer modeling and simulation.
Recently, several research groups have published methods for the determination of proteomic expression profiling by mass spectrometry without the use of exogenously added stable isotopes or stable isotope dilution theory. These so-called label-free, methods have the advantage of allowing data on each sample to be acquired independently from all other samples to which they can later be compared in silico for the purpose of measuring changes in protein expression between various biological states. We developed label free software based on direct measurement of peptide ion current area (PICA) and compared it to two other methods, a simpler label free method known as spectral counting and the isotope coded affinity tag (ICAT) method. Data analysis by these methods of a standard mixture containing proteins of known, but varying, concentrations showed that they performed similarly with a mean squared error of 0.09. Additionally, complex bacterial protein mixtures spiked with known concentrations of standard proteins were analyzed using the PICA label-free method. These results indicated that the PICA method detected all levels of standard spiked proteins at the 90% confidence level in this complex biological sample. This finding confirms that label-free methods, based on direct measurement of the area under a single ion current trace, performed as well as the standard ICAT method. Given the fact that the label-free methods provide ease in experimental design well beyond pair-wise comparison, label-free methods such as our PICA method are well suited for proteomic expression profiling of large numbers of samples as is needed in clinical analysis.
Cisplatin is a DNA-damaging anti-cancer agent that is widely used to treat a range of tumour types. Despite its clinical success, cisplatin treatment is still associated with a number of dose-limiting toxic side effects. The purpose of this study was to clarify the molecular events that are important in the anti-tumour activity of cisplatin, using gene expression profiling techniques. Currently, our incomplete understanding of this drug's mechanism of action hinders the development of more efficient and less harmful cisplatin-based chemotherapeutics. In this study the effect of cisplatin on gene expression in human foreskin fibroblasts has been investigated using human 19K oligonucleotide microarrays. In addition its clinically inactive isomer, transplatin, was also tested. Dualfluor microarray experiments comparing treated and untreated cells were performed in quadruplicate. Cisplatin treatment was shown to significantly up- or down-regulate a consistent subset of genes. Many of these genes responded similarly to treatment with transplatin, the therapeutically inactive isomer of cisplatin. However, a smaller proportion of these transcripts underwent differential expression changes in response to the two isomers. Some of these genes may constitute part of the DNA damage response induced by cisplatin that is critical for its anti-tumour activity. Ultimately, the identification of gene expression responses unique to clinically active compounds, like cisplatin, could thus greatly benefit the design and development of improved chemotherapeutics.
A computational approach for estimating the overall, population, and individual cancer hazard rates was developed. The population rates characterize a risk of getting cancer of a specific site/type, occurring within an age-specific group of individuals from a specified population during a distinct time period. The individual rates characterize an analogous risk but only for the individuals susceptible to cancer. The approach uses a novel regularization and anchoring technique to solve an identifiability problem that occurs while determining the age, period, and cohort (APC) effects. These effects are used to estimate the overall rate, and to estimate the population and individual cancer hazard rates. To estimate the APC effects, as well as the population and individual rates, a new web-based computing tool, called the CancerHazard@Age, was developed. The tool uses data on the past and current history of cancer incidences collected during a long time period from the surveillance databases. The utility of the tool was demonstrated using data on the female lung cancers diagnosed during 1975-2009 in nine geographic areas within the USA. The developed tool can be applied equally well to process data on other cancer sites. The data obtained by this tool can be used to develop novel carcinogenic models and strategies for cancer prevention and treatment, as well as to project future cancer burden.
Mathematical modeling of cancer development is aimed at assessing the risk factors leading to cancer. Aging is a common risk factor for all adult cancers. The risk of getting cancer in aging is presented by a hazard function that can be estimated from the observed incidence rates collected in cancer registries. Recent analyses of the SEER database show that the cancer hazard function initially increases with the age, and then it turns over and falls at the end of the lifetime. Such behavior of the hazard function is poorly modeled by the exponential or compound exponential-linear functions mainly utilized for the modeling. In this work, for mathematical modeling of cancer hazards, we proposed to use the Weibull-like function, derived from the Armitage-Doll multistage concept of carcinogenesis and an assumption that number of clones at age t developed from mutated cells follows the Poisson distribution. This function is characterized by three parameters, two of which (r and λ) are the conventional parameters of the Weibull probability distribution function, and an additional parameter (C(0)) that adjusts the model to the observational data. Biological meanings of these parameters are: r-the number of stages in carcinogenesis, λ-an average number of clones developed from the mutated cells during the first year of carcinogenesis, and C(0)-a data adjustment parameter that characterizes a fraction of the age-specific population that will get this cancer in their lifetime. To test the validity of the proposed model, the nonlinear regression analysis was performed for the lung cancer (LC) data, collected in the SEER 9 database for white men and women during 1975-2004. Obtained results suggest that: (i) modeling can be improved by the use of another parameter A- the age at the beginning of carcinogenesis; and (ii) in white men and women, the processes of LC carcinogenesis vary by A and C(0), while the corresponding values of r and λ are nearly the same. Overall, the proposed Weibull-like model provides an excellent fit of the estimates of the LC hazard function in aging. It is expected that the Weibull-like model can be applicable to fit estimates of hazard functions of other adult cancers as well.
Searching PubMed for citations related to a specific cancer center or group of authors can be labor-intensive. We have created a tool, PubMed QUEST, to aid in the rapid searching of PubMed for publications of interest. It was designed by taking into account the needs of entire cancer centers as well as individual investigators. The experience of using the tool by our institution's cancer center administration and investigators has been favorable and we believe it could easily be adapted to other institutions. Use of the tool has identified limitations of automated searches for publications based on an author's name, especially for common names. These limitations could likely be solved if the PubMed database assigned a unique identifier to each author.
Computer tomography (CT) imaging plays an important role in cancer detection and quantitative assessment in clinical trials. High-resolution imaging studies on large cohorts of patients generate vast data sets, which are infeasible to analyze through manual interpretation.
In this article we describe a comprehensive architecture for computer-aided detection (CAD) and surveillance on lung nodules in CT images. Central to this architecture are the analytic components: an automated nodule detection system, nodule tracking capabilities and volume measurement, which are integrated within a data management system that includes mechanisms for receiving and archiving images, a database for storing quantitative nodule measurements and visualization, and reporting tools.
We describe two studies to evaluate CAD technology within this architecture, and the potential application in large clinical trials. The first study involves performance assessment of an automated nodule detection system and its ability to increase radiologist sensitivity when used to provide a second opinion. The second study investigates nodule volume measurements on CT made using a semi-automated technique and shows that volumetric analysis yields significantly different tumor response classifications than a 2D diameter approach. These studies demonstrate the potential of automated CAD tools to assist in quantitative image analysis for clinical trials.
In the field of computer-aided mammographic mass detection, many different features and classifiers have been tested. Frequently, the relevant features and optimal topology for the artificial neural network (ANN)-based approaches at the classification stage are unknown, and thus determined by trial-and-error experiments. In this study, we analyzed a classifier that evolves ANNs using genetic algorithms (GAs), which combines feature selection with the learning task. The classifier named "Phased Searching with NEAT in a Time-Scaled Framework" was analyzed using a dataset with 800 malignant and 800 normal tissue regions in a 10-fold cross-validation framework. The classification performance measured by the area under a receiver operating characteristic (ROC) curve was 0.856 ± 0.029. The result was also compared with four other well-established classifiers that include fixed-topology ANNs, support vector machines (SVMs), linear discriminant analysis (LDA), and bagged decision trees. The results show that Phased Searching outperformed the LDA and bagged decision tree classifiers, and was only significantly outperformed by SVM. Furthermore, the Phased Searching method required fewer features and discarded superfluous structure or topology, thus incurring a lower feature computational and training and validation time requirement. Analyses performed on the network complexities evolved by Phased Searching indicate that it can evolve optimal network topologies based on its complexification and simplification parameter selection process. From the results, the study also concluded that the three classifiers - SVM, fixed-topology ANN, and Phased Searching with NeuroEvolution of Augmenting Topologies (NEAT) in a Time-Scaled Framework - are performing comparably well in our mammographic mass detection scheme.
The computational aspects of the problem in this paper involve, firstly, selective mapping of methylated DNA clones according to methylation level and, secondly, extracting motif information from all the mapped elements in the absence of prior probability distribution. Our novel implementation of algorithms to map and maximize expectation in this setting has generated data that appear to be distinct for each lymphoma subtype examined. A “clone” represents a polymerase chain reaction (PCR) product (on average ~500 bp) which belongs to a microarray of 8544 such sequences preserving CpG-rich islands (CGIs) [ 1 ]. Accumulating evidence indicates that cancers including lymphomas demonstrate hypermethylation of CGIs “silencing” an increasing number of tumor suppressor (TS) genes which can lead to tumorigenesis.
Algorithms are available on request from the authors
available on page 453.