Article

Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863−14868

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
The Agglomerative Hierarchical Clustering (AHC) algorithm is widely used in real-world applications. As data volumes continue to grow, efficient scale-out techniques for AHC are becoming increasingly important. In this paper, we propose a Partition-based distributed Agglomerative Hierarchical Clustering (PACk) algorithm using novel distance-based partitioning and distance-aware merging techniques. We have developed an efficient implementation of PACk on Spark. Compared to the state-of-the-art distributed AHC algorithm, PACk achieves 2X to 19X (median=9X) speedup across a variety of synthetic and real-world datasets.
Article
Full-text available
Determining the number of clusters in high-dimensional real-life datasets and interpreting the final outcome are among the challenging problems in data science. Discovering the number of classes in cancer and microarray data plays a vital role in the treatment and diagnosis of cancers and other related diseases. Nonnegative matrix factorization (NMF) plays a paramount role as an efficient data exploratory tool for extracting basis features inherent in massive data. Some algorithms which are based on incorporating sparsity constraints in the nonconvex NMF optimization problem are applied in the past for analyzing microarray datasets. However, to the best of our knowledge, none of these algorithms use block coordinate descent method which is known for providing closed form solutions. In this paper, we apply an algorithm developed based on columnwise partitioning and rank-one matrix approximation. We test this algorithm on two well-known cancer datasets: leukemia and multiple myeloma. The numerical results indicate that the proposed algorithm performs significantly better than related state-of-the-art methods. In particular, it is shown that this method is capable of robust clustering and discovering larger cancer classes in which the cluster splits are stable.
Article
Full-text available
Background Cd accumulation in plant cells results in dramatic problems including oxidative stress and inhibition of vital enzymes. It also affects mineral uptakes by disrupting membrane permeability. Interaction among Cd and other plant nutrient elements changes the nutritional contents of crops and reduces their yield. Methods and results In the present study, Cd stress in Brachypodium distachyon led to the upregulation of some heavy metal transport genes (influx or efflux) encoding cation-efflux proteins, heavy metal-associated proteins and NRAMP proteins. The Arabidopsis orthologs of the differentially expressed B. distachyon genes (DEGs) under Cd toxicity were identified, which exhibited Bradi4g26905 was an ortholog of AtALY1-2. Detailed co-expression network and gene ontology analyses found the potential involvement of the mRNA surveillance pathway in Cd tolerance in B. distachyon. These genes were shown to be downregulated by sulfur (S) deficiency. Conclusions This is the first transcriptomic study investigating the effect of Cd toxicity in B. distachyon, a model plant for genomic studies in Poaceae (Gramineae) species. The results are expected to provide valuable information for more comprehensive research related to heavy metal toxicity in plants.
Article
Full-text available
Barley production is essential in Egypt. In the present study, 15 different six-rowed Egyptian barley cultivars were studied. To differentiate between the different cultivars under study in terms of morphological characteristics and ISSR, molecular characterization reactions were carried out. Moreover, four cultivars (Giza 123, Giza 126, Giza 136, and Giza 138) were selected for further studies using scanning electron microscopy (SEM). Computational analysis of the DNA barcoding sequences of the two plastid markers rbcL and matK was executed, and the results were deposited in the NCBI database. The morphological traits showed low statistical significance among the different cultivars under study via the data collected from two seasons, suggesting that the mean field performance of these Egyptian cultivars may be equal under these conditions. The results showed that the phylogenetic tree was divided into four groups, one of which contained the most closely related genotypes in the genetic distance, including Giza 124, Giza 130, Giza 138, Giza 136, and Giza 137, which converge in the indicative uses of farmers. The seed coat of the studied cultivars was “rugose”. The elevation folding of the rugose pattern ranged from 11 ± 1.73 µm (Giza 126) to 14.67 ± 2.43 µm (Giza 123), suggesting variation in seed quality and its uses in feed and the food industry. According to the similarity matrix of ISSR analysis, the highest similarity value (93%) was recorded between Giza 133 and Giza 132, as well as between Giza 2000 and Giza 126. On the other hand, the lowest similarity value (80%) was recorded between Giza 130 and (Giza 133 and Giza 132), indicating that these cultivars were distantly related. Polymorphism information content (PIC) ranged from 0.26 for the primer ISSR UBC 835 to 0.37 for the primers ISSR UBC 814 and ISSR UBC 840. The current study showed that the matK gene is more mutable than the rbcL gene among the tested cultivars.
Article
Full-text available
Chikungunya virus (CHIKV) epidemics around the world have created public health concern with the unavailability of effective drugs and vaccines. This emphasizes the need for molecular understanding of host-virus interactions for developing effective targeted antivirals. Microarray analysis was carried out using CHIKV strain (Prototype and Indian) infected Vero cells and two host isozymes, MAPK activated protein kinase 2 (MK2) and MAPK activated protein kinase 3 (MK3) were selected for further analysis. The substrate spectrum of both enzymes is indistinguishable and covers proteins involved in cytokines production, endocytosis, reorganization of the cytoskeleton, cell migration, cell cycle control, chromatin remodeling and transcriptional regulation. Gene silencing and drug treatment were performed in vitro and in vivo to unravel the role of MK2/MK3 in CHIKV infection. Gene silencing of MK2 and MK3 abrogated around 58% CHIKV progeny release from the host cell and a MK2 activation inhibitor (CMPD1) treatment demonstrated 68% inhibition of viral infection suggesting a major role of MAPKAPKs during late CHIKV infection in vitro . Further, it was observed that the inhibition in viral infection is primarily due to the abrogation of lamellipodium formation through modulation of factors involved in the actin cytoskeleton remodeling pathway. Moreover, CHIKV-infected C57BL/6 mice demonstrated reduction in the viral copy number, lessened disease score and better survivability after CMPD1 treatment. In addition, reduction in expression of key pro-inflammatory mediators such as CXCL13, RAGE, FGF, MMP9 and increase in HGF (a CHIKV infection recovery marker) was observed indicating the effectiveness of the drug against CHIKV. Taken together it can be proposed that MK2 and MK3 are crucial host factors for CHIKV infection and can be considered as important target for developing effective anti-CHIKV strategies.
Article
Full-text available
Callogenesis, the process during which explants derived from differentiated plant tissues are subjected to a trans-differentiation step characterized by the proliferation of a mass of cells, is fundamental to indirect organogenesis and the establishment of cell suspension cultures. Therefore, understanding how callogenesis takes place is helpful to plant tissue culture, as well as to plant biotechnology and bioprocess engineering. The common herbaceous plant stinging nettle (Urtica dioica L.) is a species producing cellulosic fibres (the bast fibres) and a whole array of phytochemicals for pharmacological, nutraceutical and cosmeceutical use. Thus, it is of interest as a potential multipurpose plant. In this study, callogenesis in internode explants of a nettle fibre clone (clone 13) was studied using RNA-Seq to understand which gene ontologies predominate at different time points. Callogenesis was induced with the plant growth regulators α-napthaleneacetic acid (NAA) and 6-benzyl aminopurine (BAP) after having determined their optimal concentrations. The process was studied over a period of 34 days, a time point at which a well-visible callus mass developed on the explants. The bioinformatic analysis of the transcriptomic dataset revealed specific gene ontologies characterizing each of the four time points investigated (0, 1, 10 and 34 days). The results show that, while the advanced stage of callogenesis is characterized by the iron deficiency response triggered by the high levels of reactive oxygen species accumulated by the proliferating cell mass, the intermediate and early phases are dominated by ontologies related to the immune response and cell wall loosening, respectively.
Article
Composite index is a powerful and popularly used tool in providing an overall measure of a subject by summarizing a group of measurements (component indices) of different aspects of the subject. It is widely used in economics, finance, policy evaluation, performance ranking, and many other fields. Effective construction of a composite index has been studied extensively. The most widely used approach is to use a linear combination of the component indices, where the combination weights are determined by optimizing an objective function. To maximize the overall variation of the resulting composite index, the combination weights can be obtained through Principal Component Analysis. In this paper, we propose to incorporate expert opinions into the construction of the composite index. It is noted that expert opinion often provides useful information in assessing which of the component indices are more important for the overall measure of the subject. We consider the case that a group of experts have been consulted, each providing a set of importance scores for the component indices, along with a set of confidence scores which reflects the expert’s own confidence in his/her assessment. In addition, the constructor of the composite index can also provide an assessment of the expertise level of each expert. We use linear combinations to construct the composite index, where the combination weights are determined by maximizing the sum of resulting composite index variation and the negative weighted sum of squares of deviation between the combination weights used and the experts’ scores. A data-driven approach is used to find the optimal balance between the two sources of information. Theoretical properties of the procedure are investigated. Simulation examples and an economic application on constructing science and technology development index is carried out to illustrate the proposed method.
Article
Full-text available
The psychosocial genomics paradigm first proposed by Ernest Rossi established an epistemological shift in our application of hypnosis. We present original experimental research conducted within this paradigm that highlights the mind–gene relationship and, in particular, the positive health effects associated with hypnosis and mind–body integrated psychotherapy. We document that these approaches can stimulate epigenetic modifications and the expression of genes related to anti-inflammatory processes. These strategies strengthen the immune system and reduce oxidative stress both in normal and in oncological participants.
Article
Full-text available
The amino acid response (AAR) and unfolded protein response (UPR) pathways converge on eIF2α phosphorylation, which is catalyzed by Gcn2 and Perk, respectively, under different stresses. This close interconnection makes it difficult to specify different functions of AAR and UPR. Here, we generated a zebrafish model in which loss of threonyl-tRNA synthetase (Tars) induces angiogenesis dependent on Tars aminoacylation activity. Comparative transcriptome analysis of the tars-mutant and wild-type embryos with/without Gcn2- or Perk-inhibition reveals that only Gcn2-mediated AAR is activated in the tars-mutants, whereas Perk functions predominantly in normal development. Mechanistic analysis shows that, while a considerable amount of eIF2α is normally phosphorylated by Perk, the loss of Tars causes an accumulation of uncharged tRNAThr, which in turn activates Gcn2, leading to phosphorylation of an extra amount of eIF2α. The partial switchover of kinases for eIF2α largely overwhelms the functions of Perk in normal development. Interestingly, although inhibition of Gcn2 and Perk in this stress condition both can reduce the eIF2α phosphorylation levels, their functional consequences in the regulation of target genes and in the rescue of the angiogenic phenotypes are dramatically different. Indeed, genetic and pharmacological manipulations of these pathways validate that the Gcn2-mediated AAR, but not the Perk-mediated UPR, is required for tars-deficiency induced angiogenesis. Thus, the interconnected AAR and UPR pathways differentially regulate angiogenesis through selective functions and mutual competitions, reflecting the specificity and efficiency of multiple stress response pathways that evolve integrally to enable an organism to sense/respond precisely to various types of stresses.
Article
Full-text available
Human pancreatic ductal adenocarcinoma (PDAC) harboring one KRAS mutant allele often displays increasing genomic loss of the remaining wild-type (WT) allele (known as LOH at KRAS) as tumors progress to metastasis, yet the molecular ramification of this WT allelic loss is unknown. In this study, we showed that the restoration of WT KRAS expression in human PDAC cell lines with LOH at KRAS significantly attenuated the malignancy of PDAC cells both in vitro and in vivo, demonstrating a tumor-suppressive role of the WT KRAS allele. Through RNA-Seq, we identified the HIPPO signaling pathway to be positively regulated by WT KRAS in PDAC cells. In accordance with this observation, PDAC cells with LOH at KRAS exhibited increased nuclear localization and activation of transcriptional co-activator YAP1. Mechanistically, we discovered that WT KRAS expression sequestered YAP1 from the nucleus, through enhanced 14-3-3zeta interaction with phosphorylated YAP1 at S127. Consistently, expression of a constitutively-active YAP1 mutant in PDAC cells bypassed the growth inhibitory effects of WT KRAS. In patient samples, we found that the YAP1-activation genes were significantly upregulated in tumors with LOH at KRAS, and YAP1 nuclear localization predicted poor survival for PDAC patients. Collectively, our results reveal that the WT allelic loss leads to functional activation of YAP1 and enhanced tumor malignancy, which explains the selection advantage of the tumor cells with LOH at KRAS during pancreatic cancer clonal evolution and progression to metastasis, and should be taken into consideration in future therapeutic strategies targeting KRAS.
Article
Full-text available
Background Metabolic disturbance is closely correlated with intrahepatic cholangiocarcinoma (IHCC), and we aimed to identify metabolic gene marker for the prognosis of IHCC. Methods We obtained expression and clinical data from 141 patients with IHCC from public databases. Prognostic metabolic genes were selected using univariate Cox regression analysis. Unsupervised cluster analysis was applied to identify IHCC subtypes, and CIBERSORT was used for immune infiltration analysis of different subtypes. Then, the metabolic gene signature was screened using multivariate Cox regression analysis and the LASSO algorithm. The prognostic potential and regulatory network of the metabolic gene signature were further investigated. Results We screened 228 prognosis-related metabolic genes. Based on their expression levels, IHCC samples were divided into two subtypes, which showed significant differences in survival and immune cell infiltration. After LASSO analysis, eight metabolic genes including CYP19A1, SCD5, ACOT8, SRD5A3, MOGAT2, PFKFB3, PPARGC1B, and RPL17 were identified as the optimal genes for the prognosis signature. The prognostic model had excellent predictive abilities, with areas under the receiver-operating characteristic curves over 0.8. A nomogram model was also established based on two independent prognostic clinical factors (pathologic stage and prognostic model), and the generated calibration curves and c-indexes determined its excellent accuracy and discriminative ability to predict 1- and 5-year survival status (c-indexes>0.7). Finally, we found that miR-26a-5p, miR-27a-3p, and miR-27b-3p were the upstream regulators that mediate the involvement of gene signatures in metabolic pathways. Conclusion We developed eight metabolic gene signatures to predict IHCC prognosis and proposed potential upstream regulatory axes of gene signatures.
Article
Full-text available
Financial inclusion is strongly differentiated by age groups and countries and the pandemic has highlighted the increased gaps and inequalities but also the weaknesses of the system, in terms of flexibility, access and facilities of the customer-bank relationship and also from the perspective of the financial education of young generations and vulnerable people, active in the labor market. Based on the available data provided by the Global Findex database, and some findings after more than one year of COVID-19 crisis we outlined the main aspects of financial digitization, by categories of people and countries. At the same time, we identified the challenges and problems during the pandemic that significantly adjusted the consumption pattern of citizens and increased the need for on-line access for financial transactions. Starting from the analysis of the inequality of access to financial instruments in the last years, from the informational asymmetry in financial education and the challenges of the pandemic period, we underlined the main coordinates of changing the model of sustainable financial inclusion—based on five pillars—access, education, support tools, CSR and resilience. The research results highlight the need for convergence in providing opportunities to consider financial inclusion as a public good and an active tool to increase consumers’ satisfaction and the quality of life of individuals.
Article
Full-text available
DHX15 is a downstream substrate for Akt1, which is involved in key cellular processes affecting vascular biology. Here, we explored the vascular regulatory function of DHX15. Homozygous DHX15 gene deficiency was lethal in mouse and zebrafish embryos. DHX15—/— zebrafish also showed downregulation of VEGF-C and reduced formation of lymphatic structures during development. DHX15+/− mice depicted lower vascular density and impaired lymphatic function postnatally. RNAseq and proteome analysis of DHX15 silenced endothelial cells revealed differential expression of genes involved in the metabolism of ATP biosynthesis. The validation of these results demonstrated a lower activity of the Complex I in the mitochondrial membrane of endothelial cells, resulting in lower intracellular ATP production and lower oxygen consumption. After injection of syngeneic LLC1 tumor cells, DHX15+/− mice showed partially inhibited primary tumor growth and reduced lung metastasis. Our results revealed an important role of DHX15 in vascular physiology and pave a new way to explore its potential use as a therapeutical target for metastasis treatment.
Chapter
Cluster analysis is a procedure for grouping cases (objects of investigation) in a data set. For this purpose, the first step is to determine the similarity or dissimilarity (distance) between the cases by a suitable measure. The second step searches for the fusion algorithm which combines the individual cases successively into groups (clusters). The goal is to combine such cases into groups which are similar with respect to the considered segmentation variables (homogenous groups). At the same time, the groups should be as dissimilar as possible. The procedures of cluster analysis can handle variables with metric, non-metric as well as mixed scales. The focus of the chapter is on hierarchical agglomerative clustering methods, with the single-linkage method and Ward’s method presented in detail. Finally, k-means clustering and two-step cluster analysis, two partitioning cluster methods, are also explained. These methods offer particular advantages when working with large amounts of data.
Chapter
Discovering protein complexes in vivo is of vital importance to understand the evolution and function of biological systems. Proteomics analysis has evolved as a state-of-the-art technique in elucidating the above information. A combination of liquid chromatography (LC) and liquid chromatography coupled to shotgun mass spectrometry (LC-MS) provides the most exhaustive information in this regard. However, a significant amount of computational effort is required for the meaningful interpretation of the generated datasets. In this chapter we describe in detail the state-of-the-art pipeline to discover soluble protein complexes and provide practical advice focusing on typical situations a biologist faces while analyzing such proteomics datasets. Furthermore, we briefly describe two commonly used software packages to solve the described problem: Weka for training protein-protein interactions (PPIs) using machine learning (ML) and Cytoscape for clustering the interaction network.
Chapter
The present chapter focuses on the interactive and explorative aspects of bioinformatics resources that have been recently released in glycobiology. The comparative analysis of data in a field where knowledge is scattered, incomplete, and disconnected from main biology requires efficient visualization, integration, and interactive tools that are currently only partially implemented. This overview highlights converging efforts toward building a consistent picture of protein glycosylation.
Article
Purpose This article will briefly review the origins and evolution of functional genomics, first describing the experimental technology, and then some of the approaches applied to data analysis and visualization. It will emphasize application of functional genomics to radiation biology, using examples from the author’s work to illustrate several key types of analysis. It concludes with a look at non-coding RNA, alternative reading of the genome, and single-cell transcriptomics, some of the innovative areas that may help to shape future research in radiation biology and oncology. Conclusions Transcriptomic approaches have provided insight into many areas of radiation biology and medicine, and innovations in technology and data analysis approaches promise continued contributions to radiation science in the future.
Article
Full-text available
The Aldehyde dehydrogenase (ALDH) superfamily comprises a group of enzymes involved in the scavenging of toxic aldehyde molecules by converting them into their corresponding non-toxic carboxylic acids. A genome-wide study in potato identified a total of 22 ALDH genes grouped into ten families that are presented unevenly throughout all the 12 chromosomes. Based on the evolutionary analysis of ALDH proteins from different plant species, ALDH2 and ALDH3 were found to be the most abundant families in the plant, while ALDH18 was found to be the most distantly related one. Gene expression analysis revealed that the expression of StALDH genes is highly tissue-specific and divergent in various abiotic, biotic, and hormonal treatments. Structural modelling and functional analysis of selected StALDH members revealed conservancy in their secondary structures and cofactor binding sites. Taken together, our findings provide comprehensive information on the ALDH gene family in potato that will help in developing a framework for further functional studies.
Article
Key points: Analysis of data from RHC and TTE of HF patients using a closed-loop model of the cardiovascular system identifies key parameters representing hemodynamic cardiovascular function in HFrEF and HFpEF patients. Analyzing optimized parameters representing cardiovascular function using machine learning shows mechanistic differences between HFpEF groups that are not seen analyzing clinical data alone. HFpEF groups presented here can be subdivided into 3 subgroups: HFpEF1 described as "HFrEF-like HFpEF", HFpEF2 as "pure HFpEF", and a third group of HFpEF patients that do not consistently cluster. Focusing purely on cardiac function consistently captures the underlying dysfunction in HFrEF, whereas HFpEF is better characterized by dysfunction in the entire cardiovascular system. Our methodology reveals that elevated left ventricular systolic and diastolic volumes are potential biomarkers for identifying HFpEF-like HFrEF patients. Abstract: To phenotype mechanistic differences between heart failure with reduced (HFrEF) and preserved (HFpEF) ejection fraction, a closed-loop model of the cardiovascular system coupled with patient-specific transthoracic echocardiography (TTE) and right heart catheterization (RHC) data was used to identify key parameters representing hemodynamics. Thirty-one patient records (10 HFrEF, 21 HFpEF) were obtained from the Cardiovascular Health Improvement Project database at the University of Michigan. Model simulations were tuned to match RHC and TTE pressure, volume, and cardiac output measurements in each patient. The underlying physiological model parameters were plotted against model-based norms and compared between HFrEF and HFpEF. Our results confirm the main mechanistic parameter driving HFrEF is reduced left ventricular (LV) contractility, whereas HFpEF exhibits a heterogeneous phenotype. Conducting principal component analysis, means clustering, and hierarchical clustering on the optimized parameters reveal (i) a group of HFrEF-like HFpEF patients (HFpEF1), (ii) a "pure" HFpEF group (HFpEF2), and (iii) a group of HFpEF patients that do not consistently cluster (NCC). These subgroups cannot be distinguished from the clinical data alone. Increased LV active contractility (value<0.001) and LV passive stiffness (value<0.001) at rest are observed when comparing HFpEF2 to HFpEF1. Analyzing the clinical data of each subgroup reveals that elevated systolic and diastolic LV volumes seen in both HFrEF and HFpEF1 may be used as a biomarker to identify HFrEF-like HFpEF patients. These results suggest that modeling of the cardiovascular system and optimizing to standard clinical data can designate subgroups of HFpEF as separate phenotypes, possibly elucidating patient-specific treatment strategies. This article is protected by copyright. All rights reserved.
Article
Full-text available
Conditional overexpression of histone reader Tripartite motif containing protein 24 (TRIM24) in mouse mammary epithelia (Trim24COE) drives spontaneous development of mammary carcinosarcoma tumors, lacking ER, PR and HER2. Human carcinosarcomas or metaplastic breast cancers (MpBC) are a rare, chemorefractory subclass of triple-negative breast cancers (TNBC). Comparison of Trim24COE metaplastic carcinosarcoma morphology, TRIM24 protein levels and a derived Trim24COE gene signature reveals strong correlation with human MpBC tumors and MpBC patient-derived xenograft (PDX) models. Global and single-cell tumor profiling reveal Met as a direct oncogenic target of TRIM24, leading to aberrant PI3K/mTOR activation. Here, we find that pharmacological inhibition of these pathways in primary Trim24COE tumor cells and TRIM24-PROTAC treatment of MpBC TNBC PDX tumorspheres decreased cellular viability, suggesting potential in therapeutically targeting TRIM24 and its regulated pathways in TRIM24-expressing TNBC. Human metaplastic breast cancers (MpBC) are a rare, aggressive subclass of triple-negative breast cancers. Here, the authors show over-expression of histone reader TRIM24 is sufficient to generate tumors with a molecular signature of metabolic dysfunction and EMT in a mouse model of human MpBC.
Article
Full-text available
Genome-wide association studies (GWAS) have identified loci for kidney disease, but the causal variants, genes, and pathways remain unknown. Here we identify two kidney disease genes Dipeptidase 1 ( DPEP1 ) and Charged Multivesicular Body Protein 1 A ( CHMP1A ) via the triangulation of kidney function GWAS, human kidney expression, and methylation quantitative trait loci. Using single-cell chromatin accessibility and genome editing, we fine map the region that controls the expression of both genes. Mouse genetic models demonstrate the causal roles of both genes in kidney disease. Cellular studies indicate that both Dpep1 and Chmp1a are important regulators of a single pathway, ferroptosis and lead to kidney disease development via altering cellular iron trafficking.
Article
Full-text available
Background Sepsis is a dysregulated host response to pathogens. Delay in sepsis diagnosis has become a primary cause of patient death. This study determines some factors to prevent septic shock in its early stage, contributing to the early treatment of sepsis. Methods The sequencing data (RNA- and miRNA-sequencing) of patients with septic shock were obtained from the NCBI GEO database. After re-annotation, we obtained lncRNAs, miRNA, and mRNA information. Then, we evaluated the immune characteristics of the sample based on the ssGSEA algorithm. We used the WGCNA algorithm to obtain genes significantly related to immunity and screen for important related factors by constructing a ceRNA regulatory network. Result After re-annotation, we obtained 1708 lncRNAs, 129 miRNAs, and 17 326 mRNAs. Also, through the ssGSEA algorithm, we obtained 5 important immune cells. Finally, we constructed a ceRNA regulation network associated with SS pathways. Conclusion We identified 5 immune cells with significant changes in the early stage of septic shock. We also constructed a ceRNA network, which will help us explore the pathogenesis of septic shock.
Article
Full-text available
Clustering of tumor samples can help identify cancer types and discover new cancer subtypes, which is essential for effective cancer treatment. Although many traditional clustering methods have been proposed for tumor sample clustering, advanced algorithms with better performance are still needed. Low-rank subspace clustering is a popular algorithm in recent years. In this paper, we propose a novel one-step robust low-rank subspace segmentation method (ORLRS) for clustering the tumor sample. For a gene expression data set, we seek its lowest rank representation matrix and the noise matrix. By imposing the discrete constraint on the low-rank matrix, without performing spectral clustering, ORLRS learns the cluster indicators of subspaces directly, i.e., performing the clustering task in one step. To improve the robustness of the method, capped norm is adopted to remove the extreme data outliers in the noise matrix. Furthermore, we conduct an efficient solution to solve the problem of ORLRS. Experiments on several tumor gene expression data demonstrate the effectiveness of ORLRS.
Article
Label distribution learning (LDL) is a new machine learning paradigm to solve label ambiguity and has drawn increasing attention in recent years. The importance of all labels needs to be considered under the LDL settings. A series of approaches have been proposed to deal with the LDL problem by considering the correlation of labels or instances. However, none of them focuses on finding interpretable bases to reduce the dimensions of the feature space. Inspired by the semi-nonnegative matrix factorization (semi-NMF) method, we propose a new LDL learning framework to deal with the problem through learning nonnegative components. The key insight is to explore the bases, each of which represents a class, through the label distribution and to transform the input matrix into a coefficient matrix of the space constructed by the bases. Consequently, a maximum entropy model can be adopted to learn the label distribution from the coefficient matrix. Experimental results on real-world datasets comparing our method with several state-of-the-art methods validate the performance of our approach.
Article
Full-text available
Quantum dots are nanoparticles with very promising biomedical applications. However, before these applications can be authorized, a complete toxicological assessment of quantum dots toxicity is needed. This work studied the effects of cadmium-selenium quantum dots on the transcriptome of T98G human glioblastoma cells. It was found that 72-h exposure to 40 µg/mL (a dose that reduces cell viability by less than 10%) alters the transcriptome of these cells in biological processes and molecular pathways, which address mainly neuroinflammation and hormonal control of hypothalamus via the gonadotropin-releasing hormone receptor. The biological significance of neuroinflammation alterations is still to be determined because, unlike studies performed with other nanomaterials, the expression of the genes encoding pro-inflammatory interleukins is down-regulated rather than up-regulated. The hormonal control alterations of the hypothalamus pose a new concern about a potential adverse effect of quantum dots on fertility. In any case, more studies are needed to clarify the biological relevance of these findings, and especially to assess the real risk of toxicity derived from quantum dots exposure appearing in physiologically relevant scenarios.
Article
Full-text available
In the human brain, long non-coding RNAs (lncRNAs) are widely expressed in an exquisitely temporally and spatially regulated manner, thus suggesting their contribution to normal brain development and their probable involvement in the molecular pathology of neurodevelopmental disorders (NDD). Bypassing the classic protein-centric conception of disease mechanisms, some studies have been conducted to identify and characterize the putative roles of non-coding sequences in the genetic pathogenesis and diagnosis of complex diseases. However, their involvement in NDD, and more specifically in intellectual disability (ID), is still poorly documented and only a few genomic alterations affecting the lncRNAs function and/or expression have been causally linked to the disease endophenotype. Considering that a significant fraction of patients still lacks a genetic or molecular explanation, we expect that a deeper investigation of the non-coding genome will unravel novel pathogenic mechanisms, opening new translational opportunities. Here, we present evidence of the possible involvement of many lncRNAs in the etiology of different forms of ID and NDD, grouping the candidate disease-genes in the most frequently affected cellular processes in which ID-risk genes were previously collected. We also illustrate new approaches for the identification and prioritization of NDD-risk lncRNAs, together with the current strategies to exploit them in diagnosis.
Article
Full-text available
Traditional approaches to genome-wide marker discovery often follow a common top-down strategy, where a large scale ‘omics’ investigation is followed by the analysis of functional pathways involved, to narrow down the list of identified putative biomarkers, and to deconvolute gene expression networks, or to obtain an insight into genetic alterations observed in cancer. We set out to investigate whether a reverse approach would allow full or partial reconstruction of the transcriptional programs and biological pathways specific to a given cancer and whether the full or substantially expanded list of putative markers could thus be identified by starting with the partial knowledge of a few disease-specific markers. To this end, we used 10 well-documented differentially expressed markers of colorectal cancer (CRC), analyzed their transcription factor networks and biological pathways, and predicted the existence of 193 new putative markers. Incredibly, the use of a validation marker set of 10 other completely different known CRC markers and the same procedure resulted in a very similar set of 143 predicted markers. Of these, 138 were identical to those found using the training set, confirming our main hypothesis that a much-expanded set of disease markers can be predicted by starting with just a small subset of validated markers. Further to this, we validated the expression of 42 out of 138 top-ranked predicted markers experimentally using qPCR in surgically removed CRC tissues. We showed that 41 out of 42 mRNAs tested have significantly altered levels of mRNA expression in surgically excised CRC tissues. Of the markers tested, 36 have been reported to be associated with aspects of CRC in the past, whilst only limited published evidence exists for another three genes (BCL2, PDGFRB and TSC2), and no published evidence directly linking genes to CRC was found for CCNA1, SHC1 and TGFB3. Whilst we used CRC to test and validate our marker discovery strategy, the reported procedures apply more generally to cancer marker discovery.
Article
Full-text available
Diagnosis of latent tuberculosis infection (LTBI) using biomarkers in order to identify the risk of progressing to active TB and therefore predicting a preventive therapy has been the main bottleneck in eradication of tuberculosis. We compared two assays for the diagnosis of LTBI: transcript signatures and interferon gamma release assay (IGRA), among household contacts (HHCs) in a high tuberculosis-burden population. HHCs of active TB cases were recruited for our study; these were confirmed to be clinically negative for active TB disease. Eighty HHCs were screened by IGRA using QuantiFERON-TB Gold Plus (QFT-Plus) to identify LTBI and uninfected cohorts; further, quantitative levels of transcript for selected six genes (TNFRSF10C, ASUN, NEMF, FCGR1B, GBP1, and GBP5) were determined. Machine learning (ML) was used to construct models of different gene combinations, with a view to identify hidden but significant underlying patterns of their transcript levels. Forty-three HHCs were found to be IGRA positive (LTBI) and thirty-seven were IGRA negative (uninfected). FCGR1B, GBP1, and GBP5 transcripts differentiated LTBI from uninfected among HHCs using Livak method. ML and ROC (Receiver Operator Characteristic) analysis validated this transcript signature to have a specificity of 72.7%. In this study, we compared a quantitative transcript signature with IGRA to assess the diagnostic ability of the two, for detection of LTBI cases among HHCs of a high-TB burden population; we concluded that a three gene (FCGR1B, GBP1, and GBP5) transcript signature can be used as a biomarker for rapid screening. IMPORTANCE The study compares potential of transcript signature and IGRA to diagnose LTBI. It is first of its kind study to screen household contacts (HHCs) in high TB burden area of India. A transcript signature (FCGR1B, GBP1, & GBP5) is identified as potential biomarker for LTBI. These results can lead to development of point-of-care (POC) like device for LTBI screening in a high TB burdened area.
Article
Methylation, that is, the transfer or synthesis of a –CH 3 group onto a target molecule, is a pervasive biochemical modification found in organisms from bacteria to humans. In mammals, a complex metabolic pathway powered by the essential nutrients vitamin B9 and B12, methionine and choline, synthesizes S-adenosylmethionine, the methyl donor in the methylation of nucleic acids, proteins, fatty acids, and small molecules by over 200 substrate-specific methyltransferases described so far in humans. Methylations not only play a key role in scenarios for the origin and evolution of life, but they remain essential for the development and physiology of organisms alive today, and methylation deficiencies contribute to the etiology of many pathologies. The methylation of histones and DNA is important for circadian rhythms in many organisms, and global inhibition of methyl metabolism similarly affects biological rhythms in prokaryotes and eukaryotes. These observations, together with various pieces of evidence scattered in the literature on circadian gene expression and metabolism, indicate a close mutual interdependence between biological rhythms and methyl metabolism that may originate from prebiotic chemistry. This perspective first proposes an abiogenetic scenario for rhythmic methylations and then outlines mammalian methyl metabolism, before reanalyzing previously published data to draw a tentative map of its profound connections with the circadian clock.
Article
Full-text available
Behavioral neuroscience underwent a technology-driven revolution with the emergence of machine-vision and machine-learning technologies. These technological advances facilitated the generation of high-resolution, high-throughput capture and analysis of complex behaviors. Therefore, behavioral neuroscience is becoming a data-rich field. While behavioral researchers use advanced computational tools to analyze the resulting datasets, the search for robust and standardized analysis tools is still ongoing. At the same time, the field of genomics exploded with a plethora of technologies which enabled the generation of massive datasets. This growth of genomics data drove the emergence of powerful computational approaches to analyze these data. Here, we discuss the composition of a large behavioral dataset, and the differences and similarities between behavioral and genomics data. We then give examples of genomics-related tools that might be of use for behavioral analysis and discuss concepts that might emerge when considering the two fields together.
Article
We aimed to identify long noncoding RNAs involved in the genomic instability of papillary thyroid carcinoma (PTC). Expression profiles of RNA-seq and gene mutation profiles were downloaded from the Cancer Genome Atlas (TCGA) database, and differentially expressed lncRNAs (DElncRNAs) and DE messenger RNAs (DEmRNAs) were determined. We constructed an lncRNA-mRNA network, analyzed mRNA enrichment, and compared the immune cell proportions and tumor mutation burdens between the low- and high-risk groups using a prognostic model. We identified 95 DElncRNAs and 421 DEmRNAs and constructed a network comprising 33 lncRNAs and 201 mRNAs. The mRNAs in the network were enriched in 36 Gene Ontology biological processes and 7 Kyoto Encyclopedia of Genes and Genomes pathways. A five-lncRNA prognostic model was constructed; the AUCs of the training, validation, and entire sets were 0.955, 0.805, and 0.901, respectively. The proportions of six types of tumor-infiltrating immune cells and neuroblastoma RAS viral (V-ras) oncogene homolog expression differed significantly between the low- and high-risk groups. LncRNAs may be involved in the genomic instability of PTC via cytokine-cytokine receptor interactions, cell adhesion molecules, and chemokine signaling pathways. Our five-lncRNA prognostic model may enable the prognostic evaluation of PTC patients. Highlights The 5-lncRNA prognostic model may be an independent prognostic factor for PTC. Six TIICs are associated with the prognosis of PTC. NRAS gene mutation plays a vital role in the progression of PTC. The model included the lncRNAs WARS2-IT1, LINC00536, ATP13A4-AS1, LINC01561, and FENDRR. © 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
Article
Full-text available
Glioblastoma multiforme (GBM) is categorized by rapid malignant cellular growth in the central nervous system (CNS) tumors. It is one of the most prevailing primary brain tumors, particularly in human male adults. Even though the combination therapy comprises surgery, chemotherapy, and adjuvant therapies, the survival rate is on average 14.6 months. Glioma stem cells (GSCs) have key roles in tumorigenesis, progression, and counteracting chemotherapy and radiotherapy. In our study, firstly, the gene expression dataset GSE45117 was retrieved and differentially expressed genes (DEGs) were spotted. The co-expression network analysis was employed on DEGs to find the significant modules. The most significant module resulting from co-expression analysis was the turquoise module. The turquoise module related to the tumor cells, hypoxia, normoxic treatments of glioblastoma tumor (GBT), and GSCs were screened. Sixty-one common genes in the turquoise module were selected generated through the co-expression analysis and protein–protein interaction (PPI) network. Moreover, the GO and KEGG pathway enrichment results were studied. Twenty common hub genes were screened by the NetworkAnalyst web instrument constructed on the PPI network through the STRING database. After survival analysis via the Kaplan–Meier (KM) plotter from The Cancer Genome Atlas (TCGA) database, we identified the five most significant hub genes strongly related to the progression of GBM. We further observed these five most significant hub genes also up-regulated in another GBM gene expression dataset. The protein–protein interaction (PPI) network of the turquoise module genes was constructed and a KEGG pathway enrichments study of the turquoise module genes was performed. The VEGF signaling pathway was emphasized because of the strong link with GBM. A gene–disease association network was further constructed to demonstrate the information of the progression of GBM and other related brain neoplasms. All hub genes assessed through this study would be potential markers for the prognosis and diagnosis of GBM.
Article
Full-text available
Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a single modality due to the heterogeneity of disease. Using a scientifically developed and tested deep-learning approach that leverages aggregate information collected from multiple repositories with multiple modalities (e.g., mRNA, DNA Methylation, miRNA) could lead to a more accurate and robust prediction of disease progression. Here, we propose an autoencoder based multimodal data fusion system, in which a fusion encoder flexibly integrates collective information available through multiple studies with partially coupled data. Our results on a fully controlled simulation-based study have shown that inferring the missing data through the proposed data fusion pipeline allows a predictor that is superior to other baseline predictors with missing modalities. Results have further shown that short- and long-term survivors of glioblastoma multiforme, acute myeloid leukemia, and pancreatic adenocarcinoma can be successfully differentiated with an AUC of 0.94, 0.75, and 0.96, respectively.
Article
Full-text available
Several studies have worked on co-clustering analysis of spatio-temporal data. However, most of them search for co-clusters with similar values and are unable to identify co-clusters with coherent trends, defined as exhibiting similar tendencies in the attributes. In this study, we present the Bregman co-clustering algorithm with minimum sum-squared residue (BCC_MSSR), which uses the residue to quantify coherent trends and enables the identification of co-clusters with coherent trends in geo-referenced time series. Dutch monthly temperatures over 20 years at 28 stations were used as the case study dataset. Station-clusters, month-clusters, and co-clusters in the BCC_MSSR results were showed and compared with co-clusters of similar values. A total of 112 co-clusters with different temperature variations were identified in the Results, and 16 representative co-clusters were illustrated, and seven types of coherent temperature trends were summarized: (1) increasing; (2) decreasing; (3) first increasing and then decreasing; (4) first decreasing and then increasing; (5) first increasing, then decreasing, and finally increasing; (6) first decreasing, then increasing, and finally decreasing; and (7) first decreasing, then increasing, decreasing, and finally increasing. Comparisons with co-clusters of similar values show that BCC_MSSR explored coherent spatio-temporal patterns in regions and certain time periods. However, the selection of the suitable co-clustering methods depends on the objective of specific tasks.
Conference Paper
Integrative analysis of multi-omics data is important for biomedical applications, as it is required for a comprehensive understanding of biological function. Integrating multi-omics data serves multiple purposes, such as, an integrated data model, dimensionality reduction of omic features, patient clustering, etc. For oncological data, patient clustering is synonymous to cancer subtype prediction. However, there is a gap in combining some of the widely used integrative analyses to build more powerful tools. To bridge the gap, we propose a multi-level integration algorithm to identify representative integrative subspace and use it for cancer subtype prediction. The three integrative approaches we implement on multi-omics features are, (1) multivariate multiple (linear) regression of the features from a cohort of patients/samples, (2) network construction using different omics features, and (3) fusion of sample similarity networks across the features. We use a type of multilayer network, called heterogeneous network, as a data model to transition between a network-free (NF) regression model and a network-based (NB) model, which uses correlation networks. The heterogeneous networks consist of intra- and inter-layer graphs. Our proposed heterogeneous correlation network model, HCNM, is central to our algorithm for gene-ranking, integrative subspace identification, and tumor-specific subtypes prediction. The genes of our representative integrative subspace have been enriched with gene-ontology and found to exhibit significant gene-disease association (GDA) scores. The subspace in genes which is less than 5% of the total gene-set of each genomic feature is used with NB fusion integrative model to predict sample subtypes. As the identified integrative subspace data of multi-omics is less prone to noise, bias, and outliers, our experiments show that the subtypes in our results agree with previous benchmark studies and exhibit better classification between poor and good survival of patient cohorts.Clinical relevance: Finding significant cancer-specific genes and subtypes of cancer is vital for early prognosis, and personalized treatment; therefore, improves survival probability of a patient.
Article
Full-text available
The embryonic mouse brain undergoes drastic changes in establishing basic anatomical compartments and laying out major axonal connections of the developing brain. Correlating anatomical changes with gene-expression patterns is an essential step toward understanding the mechanisms regulating brain development. Traditionally, this is done in a cross-sectional manner, but the dynamic nature of development calls for probing gene–neuroanatomy interactions in a combined spatiotemporal domain. Here, we present a four-dimensional (4D) spatiotemporal continuum of the embryonic mouse brain from E10.5 to E15.5 reconstructed from diffusion magnetic resonance microscopy (dMRM) data. This study achieved unprecedented high-definition dMRM at 30- to 35-µm isotropic resolution, and together with computational neuroanatomy techniques, we revealed both morphological and microscopic changes in the developing brain. We transformed selected gene-expression data to this continuum and correlated them with the dMRM-based neuroanatomical changes in embryonic brains. Within the continuum, we identified distinct developmental modes comprising regional clusters that shared developmental trajectories and similar gene-expression profiles. Our results demonstrate how this 4D continuum can be used to examine spatiotemporal gene–neuroanatomical interactions by connecting upstream genetic events with anatomical changes that emerge later in development. This approach would be useful for large-scale analysis of the cooperative roles of key genes in shaping the developing brain.
Article
Full-text available
Brain drain is a phenomenon that, over time, has followed an upward trend. It is an important component represented by physicians' migration. For the country of destination, the migration of physicians offers several advantages, whereas the country of origin loses skilled and sometimes highly trained individuals. This process will be reflected both in the efficiency of the health system (severe employment shortage) and in the quality of the health system services. After Roma-nia's accession to the EU, the migration of doctors intensified, significantly increasing the shortage of physicians. The purpose of this article is to identify the push factors that influence the physicians' decision to migrate from Romania. For this, a panel regression analysis was applied, highlighting that physicians' migration is influenced by several factors, such as the number of beds in hospitals, the number of emigrants, unemployment rate, and income. At the same time, we analyzed the extent to which public policy measures addressed to the remuneration of medical staff influenced the propensity towards external mobility of the practicing doctors, already employed and/or graduates. The results confirm that public policies can be a tool for redistributing the labor force allocation on the labor market. Moreover, the results of our analysis highlight that specific measures do not solve the system crises facing the health sector. Systemic, multidimensional changes are needed, adapted to the needs of medical services specific to the geographical area and adequate to the health status of the population.
Article
Full-text available
Obesogens such as tributyltin (TBT) are xenobiotic compounds that promote obesity, in part by distorting the normal balance of lipid metabolism. The obesogenic effects of TBT can be observed in directly exposed (F1 and F2 generations) and also subsequent generations (F3 and beyond) that were never exposed. To address the effects of TBT exposure on germ cells, we exposed pregnant transgenic OG2 mouse dams (F0), which specifically express EGFP in germline cells, to an environmentally relevant dose of TBT or DMSO throughout gestation through drinking water. When fed with a high-fat diet, F3 male offspring of TBT-exposed F0 dams (TBT-F3) accumulated much more body fat than did DMSO-F3 males. TBT-F3 males also lost more body fluid and lean compositions than did DMSO-F3 males. Expression of genes involved in transcriptional regulation or mesenchymal differentiation was up-regulated in somatic cells of TBT-F1 (but not TBT-F3) E18.5 fetal testes, and promoter-associated CpG islands were hyper-methylated in TBT-F1 somatic cells. Global mRNA expression of protein-coding genes in F1 or F3 fetal testicular cells was unaffected by F0 exposure to TBT; however, expression of a subset of endogenous retroviruses was significantly affected in F1 and F3. We infer that TBT may directly target testicular somatic cells in F1 testes to irreversibly affect epigenetic suppression of endogenous retroviruses in both germline and somatic cells.
Article
Full-text available
Background Chilling temperature reduces the rate of photosynthesis in plants, which is more pronounced in association with phosphate (Pi) starvation. Previous studies showed that Pi resupply improves recovery of the rate of photosynthesis in plants much better under combination of dual stresses than in non-chilled samples. However, the underlying mechanism remains poorly understood. Results In this study, RNA-seq analysis showed the expression level of 41 photosynthetic genes in plant roots increased under phosphate starvation associated with 4 °C (-P 4 °C) compared to -P 23 °C. Moreover, iron uptake increased significantly in the stem cell niche (SCN) of wild type (WT) roots in -P 4 °C. In contrast, lower iron concentrations were found in SCN of aluminum activated malate transporter 1 (almt1) and its transcription factor, sensitive to protein rhizotoxicit y 1 ( stop1 ) mutants under -P 4 °C. The Fe content examined by ICP-MS analysis in -P 4 °C treated almt1 was 98.5 ng/µg, which was only 17% of that of seedlings grown under -P 23 °C. Average plastid number in almt1 root cells under -P 4 °C was less than -P 23 °C. Furthermore, stop1 and almt1 single mutants both exhibited increased primary root elongation than WT under combined stresses. In addition, dark treatment blocked the root elongation phenotype of stop1 and almt1 . Conclusions Induction of photosynthetic gene expression and increased iron accumulation in roots is required for plant adjustment to chilling in association with phosphate starvation.
Chapter
For the benefit of graduate students and research scientists of Brassica, some selected techniques with methodology, procedures, and protocols have been included in this chapter. These techniques are standardized and have been developed by experts working on genomics of crucifers’ host-pathosystem. They can be used for identification of sources of resistance with R-genes by screening germplasm from different sources. The procedures and methodology for transfer of resistance through various approaches have also been included for developing and breeding disease resistance cultivars of Brassica against range of pathogens. To get reliable and widely acceptable results in research pursuits is the primary object of any research programs which can be met only through standardized techniques included in this chapter. These techniques are of basic as well as advance nature on molecular aspects of crucifers’ host–pathogen interactions to reveal mechanisms of host resistance and breed high yielding disease resistance cultivars of crucifers against major pathogens of fungal, bacterial, and viral origins. The methodology and protocols of rapid molecular detection, identification, in vitro and in vivo culturing; maintenance, multiplication, single spore isolation, identification of R-genes, and QTLs, screening for sources of resistance are included for demonstration and use in research projects of students and researchers. Other pathogen specific and genomic techniques like inoculum preparation, pathotypes/races/strains maintenance, identification of partial/induced resistance/QTL’s development of near-isogenic lines, DNA extraction, use of PCR, qPCR; transfer of R-genes; breeding for multiple disease resistance; mapping of R-genes; cloning and transformation of genes; pyramiding of R-QTLs; marker-assisted selection and breeding; genome-wide association study; introgression of R-genes from wild relatives; virus detection, preservation, and identification, and Brassica species genomics techniques have been included to get reliable and high-throughput research results.
Chapter
To understand the molecular mechanisms of crucifers host resistance, useful protocols have been developed to analyze the host–pathogen interactions through omics approaches like genomics, proteomics, transcriptomics, and bioinformatics. Novel R-genes have been identified, molecularly characterized, and have been fine mapped on the chromosomes of different Brassica species. Techniques have been developed to introgress high yielding and disease resistance genes through traditional as well as transgenic approaches. Protocols for molecular characterization of R-genes in Brassica species, DNA extraction, purification, and quantification, construction of a linkage map and mapping of R-loci and proteome analysis of Brassica-Albugo pathosystem have been developed. The procedures for analysis of molecular and biochemical mechanisms of resistance in Brassica to Alternaria and molecular characterization of Alternaria genes showing fungicidal resistance have been described. Genome wide identification of defensin genes in B. juncea and Camelina sativa; analysis of expression of defensin genes; and identification of distribution of chitinase genes against Alternaria are the techniques described. Brassica-Erysiphe molecular techniques include identification of molecular markers linked powdery mildew R-genes and DNA sequence analysis of Erysiphe isolates. In Brassica–Hyaloperonospora interactions, assessment of small RNA role; R-genes overexpression; and c-DNA-AFLP analysis to reveal gene expression have been developed. Brassica-Leptosphaeria pathosystem has revealed phylogenetic relationship of R-loci; transcriptome analysis; cloning and transformation of Leptosphaeria avirulence genes; identification of QTLs; molecular mapping of R-genes; and identification of NBS-encoding genes in Brassica species. Molecular techniques developed with the Brassica-Plasmodiophora system are mapping of R-genes and QTLs; SNP array, mapping, population structure, and linkage disequilibrium analysis; genome-wide association study; and the methodology for transcriptome analysis of Brassica-Sclerotinia pathosystem has been developed. The protocols for TuMV detection; preservation; inoculation; and identification by ELISA have been developed. In Brassica species RNA sequencing and NBS domain and NBS-associated conserved domains protocols have been developed.
Article
Full-text available
There are numerous means to improve the tilapia aquaculture industry, and one is to develop disease resistance through selective breeding using molecular markers. In this study, 11 disease-resistance-associated microsatellite markers including 3 markers linked to hamp2, 4 linked to hamp1, 1 linked to pgrn2, 2 linked to pgrn1, and 1 linked to piscidin 4 (TP4) genes were established for tilapia strains farmed in Taiwan after challenge with Streptococcus inae. The correlation analysis of genotypes and survival revealed a total of 55 genotypes related to survival by the chi-square and Z-test. Although fewer markers were found in B and N2 strains compared with A strain, they performed well in terms of disease resistance. It suggested that this may be due to the low potency of some genotypes and the combinatorial arrangement between them. Therefore, a predictive model was built by the genotypes of the parental generation and the mortality rate of different combinations was calculated. The results show the same trend of predicted mortality in the offspring of three new disease-resistant strains as in the challenge experiment. The present findings is a nonkilling method without requiring the selection by challenge with bacteria or viruses and might increase the possibility of utilization of selective breeding using SSR markers in farms.
Article
The discovery of disease subtypes is an essential step for developing precision medicine, and disease subtyping via omics data has become a popular approach. While promising, subtypes obtained from existing approaches are not necessarily associated with clinical outcomes. With the rich clinical data along with the omics data in modern epidemiology cohorts, it is urgent to develop an outcome‐guided clustering algorithm to fully integrate the phenotypic data with the high‐dimensional omics data. Hence, we extended a sparse K‐means method to an outcome‐guided sparse K‐means (GuidedSparseKmeans) method. An unified objective function was proposed, which was comprised of (i) weighted K‐means to perform sample clusterings; (ii) lasso regularizations to perform gene selection from the high‐dimensional omics data; and (iii) incorporation of a phenotypic variable from the clinical dataset to facilitate biologically meaningful clustering results. By iteratively optimizing the objective function, we will simultaneously obtain a phenotype‐related sample clustering results and gene selection results. We demonstrated the superior performance of the GuidedSparseKmeans by comparing with existing clustering methods in simulations and applications of high‐dimensional transcriptomic data of breast cancer and Alzheimer's disease. Our algorithm has been implemented into an R package, which is publicly available on GitHub ( https://github.com/LingsongMeng/GuidedSparseKmeans).
Article
Full-text available
Objective Triple-negative breast cancer (TNBC) is aggressive cancer usually diagnosed in young women with no effective prognosis prediction model to use. The present study was performed to develop a useful prognostic model for predicting overall survival (OS) for TNBC patients. Methods The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) databases were used as training and validation data sets, respectively, in which the gene expression levels and clinical prognostic information of TNBC were collected. Differentially expressed genes (DEGs) between TNBC and non-TNBC (NTNBC) were identified with the thresholds of false discovery rate < 0.05 and |log 2 Fold Change| > 1. DEGs in AmiGO2 and the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases were retained for further study. Univariate, multivariate Cox, and logistic regression analysis were conducted for detecting DEG signature with the threshold of log-rank P < 0.05. The prognosis models of mRNA signature, clinical factors were constructed and compared. Results One five-DEG signature, including CHST4 , COCH , CST9 , SOX11 , and TDGF1 was identified in DEG prognosis model. Stratified analysis showed that the patients aged over 60, with higher pathologic stage (III-IV) and recurrence induced a significantly lower survival rate than those aged below 60, lower pathologic stage and without recurrence. Compared with patients with low-risk scores, those presented high-risk scores demonstrated significantly lower survival rate in the subgroup aged over 60 [HR = 3.780 (1.801–7.933), P < 0.0001]. For patients who obtained a higher pathologic stage and recurrence, high-risk scores were correlated with a significantly lower survival rate than patients with low-risk scores. The five-mRNA signature combined with clinical model (AUC = 0.950) predicted better than single clinical model (AUC = 0.795) or five-mRNA signature model (AUC = 0.823). Conclusion Our present study identified a prognostic prediction model (combined with five-mRNA signature and clinical factors) for TNBC patients receiving immunotherapy, which will benefit future research and clinical therapies.
Article
Full-text available
Myelodysplastic syndrome (MDS) is a clonal hematopoietic stem cell disease characterized by inefficient hematopoiesis and the potential development of acute leukemia. Among the most notable advances in the treatment of MDS is the hypomethylating agent, decitabine (5-aza-2′deoxycytidine). Although decitabine is well known as an effective method for treating MDS patients, only a subset of patients respond and a tolerance often develops, leading to treatment failure. Moreover, decitabine treatment is costly and causes unnecessary toxicity. Therefore, clarifying the mechanism of decitabine resistance is important for improving its therapeutic efficacy. To this end, we established a decitabine-resistant F-36P cell line from the parental F-36P leukemia cell line, and applied a genetic approach employing next-generation sequencing, various experimental techniques, and bioinformatics tools to determine differences in gene expression and relationships among genes. Thirty-eight candidate genes encoding proteins involved in decitabine-resistant-related pathways, including immune checkpoints, the regulation of myeloid cell differentiation, and PI3K-Akt signaling, were identified. Interestingly, two of the candidate genes, AKT3 and FOS, were overexpressed in MDS patients with poor prognoses. On the basis of these results, we are pursuing development of a gene chip for diagnosing decitabine resistance in MDS patients, with the goal of ultimately improving the power to predict treatment strategies and the prognosis of MDS patients.
Article
Full-text available
The gram-negative plant-pathogenic β-proteobacterium Ralstonia pseudosolanacearum strain OE1-1 produces methyl 3-hydroxymyristate as a quorum sensing (QS) signal via the methyltransferase PhcB and senses the chemical through the sensor histidine kinase PhcS. This leads to functionalization of the LysR family transcriptional regulator PhcA, regulating QS-dependent genes responsible for the QS-dependent phenotypes including virulence. The phc operon consists of phcB, phcS, phcR, and phcQ, with the latter two encoding regulator proteins with a receiver domain and a histidine kinase domain and with a receiver domain, respectively. To elucidate the function of PhcR and PhcQ in the regulation of QS-dependent genes, we generated phcR-deletion and phcQ-deletion mutants. Though the QS-dependent phenotypes of the phcR-deletion mutant were largely unchanged, deletion of phcQ led to a significant change in the QS-dependent phenotypes. Transcriptome analysis coupled with quantitative reverse transcription-PCR and RNA-sequencing revealed that phcB, phcK, and phcA in the phcR-deletion and phcQ-deletion mutants were expressed at similar levels as in strain OE1-1. Compared with strain OE1-1, expression of 22.9% and 26.4% of positively and negatively QS-dependent genes, respectively, was significantly altered in the phcR-deletion mutant. However, expression of 96.8% and 66.9% of positively and negatively QS-dependent genes, respectively, was significantly altered in the phcQ-deletion mutant. Furthermore, a strong positive correlation of expression of these QS-dependent genes was observed between the phcQ-deletion and phcA-deletion mutants. Our results indicate that PhcQ mainly contributes to the regulation of QS-dependent genes, in which PhcR is partially involved.
Article
Full-text available
The budding yeast Saccharomyces cerevisiae is a facultative aerobe that responds to changes in oxygen availability (and carbon source) by initiating a biochemically complex program that ensures that energy demands are met under two different physiological states: aerobic growth, supported by oxidative and fermentative pathways, and anaerobic growth, supported solely by fermentative processes. This program includes the differential expression of a large number of genes, many of which are involved in the direct utilization of oxygen. Research over the past decade has defined many of the cis-sites and trans-acting factors that control the transcription of these oxygen-responsive genes. However, the manner in which oxygen is sensed and the subsequent steps involved in the transduction of this signal have not been precisely determined. Heme is known to play a pivotal role in the expression of these genes, acting as a positive modulator for the transcription of the aerobic genes and as a negative modulator for the transcription of the hypoxic genes. Consequently, cellular concentrations of heme, whose biosynthesis is oxygen-dependent, are thought to provide a gauge of oxygen availability and dictate which set of genes will be transcribed. But the precise role of heme in oxygen sensing and the transcriptional regulation of oxygen-responsive genes is presently unclear. Here, we provide an overview of the transcriptional regulation of oxygen-responsive genes, address the functional roles that heme and hemoproteins may play in this regulation, and discuss possible mechanisms of oxygen sensing in this simple eukaryotic organism.
Article
Full-text available
The characteristics of an organism are determined by the genes expressed within it. A method was developed, called serial analysis of gene expression (SAGE), that allows the quantitative and simultaneous analysis of a large number of transcripts. To demonstrate this strategy, short diagnostic sequence tags were isolated from pancreas, concatenated, and cloned. Manual sequencing of 1000 tags revealed a gene expression pattern characteristic of pancreatic function. New pancreatic transcripts corresponding to novel tags were identified. SAGE should provide a broadly applicable means for the quantitative cataloging and comparison of expressed genes in a variety of normal, developmental, and disease states.
Article
Full-text available
A high-capacity system was developed to monitor the expression of many genes in parallel. Microarrays prepared by high-speed robotic printing of complementary DNAs on glass were used for quantitative expression measurements of the corresponding genes. Because of the small format and high density of the arrays, hybridization volumes of 2 microliters could be used that enabled detection of rare transcripts in probe mixtures derived from 2 micrograms of total cellular messenger RNA. Differential expression measurements of 45 Arabidopsis genes were made by means of simultaneous, two-color fluorescence hybridization.
Article
Full-text available
Detecting and determining the relative abundance of diverse individual sequences in complex DNA samples is a recurring experimental challenge in analyzing genomes. We describe a general experimental approach to this problem, using microscopic arrays of DNA fragments on glass substrates for differential hybridization analysis of fluorescently labeled DNA samples. To test the system, 864 physically mapped lambda clones of yeast genomic DNA, together representing >75% of the yeast genome, were arranged into 1.8-cm x 1.8-cm arrays, each containing a total of 1744 elements. The microarrays were characterized by simultaneous hybridization of two different sets of isolated yeast chromosomes labeled with two different fluorophores. A laser fluorescent scanner was used to detect the hybridization signals from the two fluorophores. The results demonstrate the utility of DNA microarrays in the analysis of complex DNA samples. This system should find numerous applications in genome-wide genetic mapping, physical mapping, and gene expression studies.
Article
Full-text available
Microarrays containing 1046 human cDNAs of unknown sequence were printed on glass with high-speed robotics. These 1.0-cm2 DNA "chips" were used to quantitatively monitor differential expression of the cognate human genes using a highly sensitive two-color hybridization assay. Array elements that displayed differential expression patterns under given experimental conditions were characterized by sequencing. The identification of known and novel heat shock and phorbol ester-regulated genes in human T cells demonstrates the sensitivity of the assay. Parallel gene analysis with microarrays provides a rapid and efficient method for large-scale human gene discovery.
Article
Full-text available
Genetic and physical maps for the 16 chromosomes of Saccharomyces cerevisiae are presented. The genetic map is the result of 40 years of genetic analysis. The physical map was produced from the results of an international systematic sequencing effort. The data for the maps are accessible electronically from the Saccharomyces Genome Database (SGD: http://genome-www.stanford. edu/Saccharomyces/).
Article
Full-text available
DNA microarrays containing virtually every gene ofSaccharomyces cerevisiae were used to carry out a comprehensive investigation of the temporal program of gene expression accompanying the metabolic shift from fermentation to respiration. The expression profiles observed for genes with known metabolic functions pointed to features of the metabolic reprogramming that occur during the diauxic shift, and the expression patterns of many previously uncharacterized genes provided clues to their possible functions. The same DNA microarrays were also used to identify genes whose expression was affected by deletion of the transcriptional co-repressorTUP1 or overexpression of the transcriptional activatorYAP1. These results demonstrate the feasibility and utility of this approach to genomewide exploration of gene expression patterns.
Article
Full-text available
Diploid cells of budding yeast produce haploid cells through the developmental program of sporulation, which consists of meiosis and spore morphogenesis. DNA microarrays containing nearly every yeast gene were used to assay changes in gene expression during sporulation. At least seven distinct temporal patterns of induction were observed. The transcription factor Ndt80 appeared to be important for induction of a large group of genes at the end of meiotic prophase. Consensus sequences known or proposed to be responsible for temporal regulation could be identified solely from analysis of sequences of coordinately expressed genes. The temporal expression pattern provided clues to potential functions of hundreds of previously uncharacterized genes, some of which have vertebrate homologs that may function during gametogenesis.
Article
Structural features produced during the rifting of continents depend on the layered rheological properties of the crust and lithosphere and, in particular, on the presence of any transitions between brittle and ductile behaviour1. Here we use a wax model to explore the gross structural response of continental lithosphere under pure shear extension in the presence of a continuous brittle–ductile transition. The wax models were deformed under various boundary conditions to reflect a variety of different regions, most notably the Basin and Range province of North America. Our experiments show the development of listric normal faults, structures common to regions of continental extension. We also observe the formation of distributed and discrete rifting, and intrusion and occlusion of the upper brittle layer by the ductile lower layer. The factor controlling deformation style in each case appears to be the relative thickness of the brittle and ductile layers, although a relatively high rate of strain generally promotes discrete rifting.
Article
The levels of H2A and H2B mRNAs as a function of cell-cycle stage were determined by hybridization methods. The analysis was extended to H3 and H4 mRNAs by in vitro translation. Cells were partitioned into cell-cycle stages either by centrifugal elutriation or by G1 synchronization with the yeast mating pheromone, alpha factor. The data lead to the following conclusions. First, histone mRNA can be detected in significant quantities only in S-phase cells. Second, the point of maximal accumulation of histone mRNA is not coincident with the point of maximal DNA synthesis; rather, histone mRNA begins accumulating very early in S, reaching a maximum when less than one half of the DNA has replicated. From this point in the cell cycle the histone mRNA levels decrease, reaching basal levels at the end of S. Third, in spite of the fact that the rate of histone mRNA accumulation is not coincident with the rate of DNA synthesis, the two processes are coupled; inhibition of DNA synthesis results in an extremely rapid disappearance of histone mRNA that is much shorter than the normal histone mRNA half-life. Fourth, there is no visible accumulation of mRNA precursors at any cell-cycle stage. We can conclude that, in yeast, histone mRNA levels are tightly and coordinately regulated throughout cell division and that this regulation most likely occurs at both transcriptional and posttranscriptional levels. We also show that the two genetically unlinked H2B genes present in yeast are both expressed at comparable levels and are regulated. The regulation is probably sequence-specific, since genes in close proximity to the histones are not subject to cell-cycle control.
Article
We investigated the regulation of ribosome synthesis in Saccharomyces cerevisiae growing at different rates and in response to a growth stimulus. The ribosome content and the rates of synthesis of ribosomal ribonucleic acid and of ribosomal proteins were compared in cultures growing in minimal medium with either glucose or ethanol as a carbon source. The results demonstrated that ribosome content is proportional to growth rate. Moreover, these steady-state concentrations are regulated at the level of synthesis of ribosomal precursor ribonucleic acid and of ribosomal proteins. When cultures growing on ethanol were enriched with glucose, the rate of ribosomal ribonucleic acid synthesis, measured by pulsing cells with [methyl-3H]methionine, increased by 40% within 5 min, doubled within 15 min, and reached a steady state characteristic of the new growth medium by 30 min. Labeling with [3H]leucine reveal a coordinate increase in the rate of synthesis of 30 or more ribosomal proteins as compared with that of total cellular proteins. Their synthesis was stimulated approximately 2.5-fold within 15 min and nearly 4-fold within 60 min. The data suggest that S. cerevisiae responds to a growth stimulus by preferential stimulation of the synthesis of ribosomal ribonucleic acid and ribosomal proteins.
Article
The rate of ribosomal protein gene (rp-gene) transcription in yeast is accurately adjusted to the cellular requirement for ribosomes under various growth conditions. However, the molecular mechanisms underlying this co-ordinated transcriptional control have not yet been elucidated. Transcriptional activation of rp-genes is mediated through two different multifunctional transacting factors, ABF1 and RAP1. In this report, we demonstrate that changes in cellular rp-mRNA levels during varying growth conditions are not parallelled by changes in the in vitro binding capacity of ABF1 or RAP1 for their cognate sequences. In addition, the nutritional upshift response of rp-genes observed after addition of glucose to a culture growing on a non-fermentative carbon source turns out not to be the result of increased expression of the ABF1 and RAP1 genes or of elevated DNA-binding activity of these factors. Therefore, growth rate-dependent transcription regulation of rp-genes is most probably not mediated by changes in the efficiency of binding of ABF1 and RAP1 to the upstream activation sites of these genes, but rather through other alterations in the efficiency of transcription activation. Furthermore, we tested the possibility that cAMP may play a role in elevating rp-gene expression during a nutritional shift-up. We found that the nutritional upshift response occurs normally in several mutants defective in cAMP metabolism.
Article
The Janus family of tyrosine kinases (JAK) plays an essential role in development and in coupling cytokine receptors to downstream intracellular signaling events. A t(9;12)(p24;p13) chromosomal translocation in a T cell childhood acute lymphoblastic leukemia patient was characterized and shown to fuse the 3′ portion ofJAK2 to the 5′ region of TEL, a gene encoding a member of the ETS transcription factor family. The TEL-JAK2 fusion protein includes the catalytic domain of JAK2 and the TEL-specific oligomerization domain. TEL-induced oligomerization of TEL-JAK2 resulted in the constitutive activation of its tyrosine kinase activity and conferred cytokine-independent proliferation to the interleukin-3–dependent Ba/F3 hematopoietic cell line.
Article
The human genome encodes approximately 100,000 different genes, and at least partial sequence information for nearly all will be available soon. Sequence information alone, however, is insufficient for a full understanding of gene function, expression, regulation, and splice-site variation. Because cellular processes are governed by the repertoire of expressed genes, and the levels and timing of expression, it is important to have experimental tools for the direct monitoring of large numbers of mRNAs in parallel. We have developed an approach that is based on hybridization to small, high-density arrays containing tens of thousands of synthetic oligonucleotides. The arrays are designed based on sequence information alone and are synthesized in situ using a combination of photolithography and oligonucleotide chemistry. RNAs present at a frequency of 1:300,000 are unambiguously detected, and detection is quantitative over more than three orders of magnitude. This approach provides a way to use directly the growing body of sequence information for highly parallel experimental investigations. Because of the combinatorial nature of the chemistry and the ability to synthesize small arrays containing hundreds of thousands of specifically chosen oligonucleotides, the method is readily scalable to the simultaneous monitoring of tens of thousands of genes.
  • M Schena
  • D Shalon
  • R W Davis
  • P O Brown
Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. (1995) Science 270, 467–470.
  • D J Lockhart
  • H Dong
  • M C Byrne
  • M T Follettie
  • M V Gallo
  • M S Chee
  • M Mittmann
  • C Wang
  • M Kobayashi
  • H Horton
Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., et al. (1996) Nat. Biotechnol. 14, 1675–1680.
  • V R Iyer
  • M B Eisen
  • D R Ross
  • G Schuler
  • T Moore
  • J C F Lee
  • J M Trent
  • J Hudson
  • M Boguski
  • D Lashkari
Iyer, V. R., Eisen, M. B., Ross, D. R., Schuler, G., Moore, T., Lee, J. C. F., Trent, J. M., Hudson, J., Boguski, M., Lashkari, D., et al. (1998) Science, in press.
  • R R Sokal
  • C D Michener
Sokal, R. R. & Michener, C. D. (1958) Univ. Kans. Sci. Bull. 38, 1409–1438.
  • V E Velculescu
  • L Zhang
  • B Vogelstein
  • K W Kinzler
Velculescu, V. E., Zhang, L., Vogelstein, B. & Kinzler, K. W. (1995) Science 270, 484–487.
  • L M Hereford
  • M A Osley
  • T R Ludwig
Hereford, L. M., Osley, M. A., Ludwig, T. R. 2nd. & McLaughlin, C. S. (1981) Cell 24, 367–375.
  • Kwast