Carl F Schaefer

Bar Ilan University, Gan, Tel Aviv, Israel

Are you Carl F Schaefer?

Claim your profile

Publications (26)129.39 Total impact

  • Cancer genomics & proteomics 01/2014; 11(1):1-12.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The development and progression of cancer is associated with disruption of biological networks. Historically studies have identified sets of signature genes involved in events ultimately leading to the development of cancer. Identification of such sets does not indicate which biologic processes are oncogenic drivers and makes it difficult to identify key networks to target for interventions. Using a comprehensive, integrated computational approach, the authors identify the sonic hedgehog (SHH) pathway as the gene network that most significantly distinguishes tumour and tumour-adjacent samples in human hepatocellular carcinoma (HCC). The analysis reveals that the SHH pathway is commonly activated in the tumour samples and its activity most significantly differentiates tumour from the non-tumour samples. The authors experimentally validate these in silico findings in the same biologic material using Western blot analysis. This analysis reveals that the expression levels of SHH, phosphorylated cyclin B1, and CDK7 levels are much higher in most tumour tissues as compared to normal tissue. It is also shown that siRNA-mediated silencing of SHH gene expression resulted in a significant reduction of cell proliferation in a liver cancer cell line, SNU449 indicating that SHH plays a major role in promoting cell proliferation in liver cancer. The SHH pathway is a key network underpinning HCC aetiology which may guide the development of interventions for this most common form of human liver cancer.
    IET Systems Biology 12/2013; 7(6):243-51. · 1.54 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High resolution, system-wide characterizations have demonstrated the capacity to identify genomic regions that undergo genomic aberrations. Such research efforts often aim at associating these regions with disease etiology and outcome. Identifying the corresponding biologic processes that are responsible for disease and its outcome remains challenging. Using novel analytic methods that utilize the structure of biologic networks, we are able to identify the specific networks that are highly significantly, nonrandomly altered by regions of copy number amplification observed in a systems-wide analysis. We demonstrate this method in breast cancer, where the state of a subset of the pathways identified through these regions is shown to be highly associated with disease survival and recurrence.
    PLoS ONE 01/2011; 6(1):e14437. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The PathOlogist is a new tool designed to transform large sets of gene expression data into quantitative descriptors of pathway-level behavior. The tool aims to provide a robust alternative to the search for single-gene-to-phenotype associations by accounting for the complexity of molecular interactions. Molecular abundance data is used to calculate two metrics--'activity' and 'consistency'--for each pathway in a set of more than 500 canonical molecular pathways (source: Pathway Interaction Database, http://pid.nci.nih.gov). The tool then allows a detailed exploration of these metrics through integrated visualization of pathway components and structure, hierarchical clustering of pathways and samples, and statistical analyses designed to detect associations between pathway behavior and clinical features. The PathOlogist provides a straightforward means to identify the functional processes, rather than individual molecules, that are altered in disease. The statistical power and biologic significance of this approach are made easily accessible to laboratory researchers and informatics analysts alike. Here we show as an example, how the PathOlogist can be used to establish pathway signatures that robustly differentiate breast cancer cell lines based on response to treatment.
    BMC Bioinformatics 01/2011; 12:133. · 3.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.
    Genome Research 09/2009; 19(12):2324-33. · 14.40 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Microorganisms have been associated with many types of human diseases; however, a significant number of clinically important microbial pathogens remain to be discovered. We have developed a genome-wide approach, called Digital Karyotyping Microbe Identification (DK-MICROBE), to identify genomic DNA of bacteria and viruses in human disease tissues. This method involves the generation of an experimental DNA tag library through Digital Karyotyping (DK) followed by analysis of the tag sequences for the presence of microbial DNA content using a compiled microbial DNA virtual tag library. To validate this technology and to identify pathogens that may be associated with human cancer pathogenesis, we used DK-MICROBE to determine the presence of microbial DNA in 58 human tumor samples, including brain, ovarian, and colorectal cancers. We detected DNA from Human herpesvirus 6 (HHV-6) in a DK library of a colorectal cancer liver metastasis and in normal tissue from the same patient. DK-MICROBE can identify previously unknown infectious agents in human tumors, and is now available for further applications for the identification of pathogen DNA in human cancer and other diseases.
    BMC Medical Genomics 02/2009; 2:22. · 3.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Pathway Interaction Database (PID, http://pid.nci.nih.gov) is a freely available collection of curated and peer-reviewed pathways composed of human molecular signaling and regulatory events and key cellular processes. Created in a collaboration between the US National Cancer Institute and Nature Publishing Group, the database serves as a research tool for the cancer research community and others interested in cellular pathways, such as neuroscientists, developmental biologists and immunologists. PID offers a range of search features to facilitate pathway exploration. Users can browse the predefined set of pathways or create interaction network maps centered on a single molecule or cellular process of interest. In addition, the batch query tool allows users to upload long list(s) of molecules, such as those derived from microarray experiments, and either overlay these molecules onto predefined pathways or visualize the complete molecular connectivity map. Users can also download molecule lists, citation lists and complete database content in extensible markup language (XML) and Biological Pathways Exchange (BioPAX) Level 2 format. The database is updated with new pathway content every month and supplemented by specially commissioned articles on the practical uses of other relevant online tools.
    Nucleic Acids Research 11/2008; 37(Database issue):D674-9. · 8.81 Impact Factor
  • Source
    Sol Efroni, Carl F Schaefer, Kenneth H Buetow
    [Show abstract] [Hide abstract]
    ABSTRACT: Cancer is recognized to be a family of gene-based diseases whose causes are to be found in disruptions of basic biologic processes. An increasingly deep catalogue of canonical networks details the specific molecular interaction of genes and their products. However, mapping of disease phenotypes to alterations of these networks of interactions is accomplished indirectly and non-systematically. Here we objectively identify pathways associated with malignancy, staging, and outcome in cancer through application of an analytic approach that systematically evaluates differences in the activity and consistency of interactions within canonical biologic processes. Using large collections of publicly accessible genome-wide gene expression, we identify small, common sets of pathways - Trka Receptor, Apoptosis response to DNA Damage, Ceramide, Telomerase, CD40L and Calcineurin - whose differences robustly distinguish diverse tumor types from corresponding normal samples, predict tumor grade, and distinguish phenotypes such as estrogen receptor status and p53 mutation state. Pathways identified through this analysis perform as well or better than phenotypes used in the original studies in predicting cancer outcome. This approach provides a means to use genome-wide characterizations to map key biological processes to important clinical features in disease.
    PLoS ONE 02/2007; 2(5):e425. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Pathway Interaction Database (*PID*, http://pid.nci.nih.gov) is a freely available collection of curated and peer-reviewed signaling pathways composed of human biomolecular interactions and cellular processes. Created in a collaboration between the U.S. National Cancer Institute and Nature Publishing Group, the database is a research tool for cell biologists, biochemists, computational biologists and bioinformaticians. The PID offers a range of tools to facilitate pathway exploration. Users can browse the pre-defi ned set of pathways and also create interaction network maps centered on a single molecule of interest or an extensive list of molecules. In addition, users can download complete data sets in extensible markup language (XML) and Biological Pathway Exchange (BioPAX) Level 2 formats. The database is updated every month and supplemented by a concise editorial section that provides synopses of recent noteworthy papers in cell signaling and specially commissioned articles on the practical uses of other relevant online tools. Users can sign up for free email alerts or RSS feeds to receive database updates.
    Nature Precedings. 01/2007;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cancers have been described as wounds that do not heal, suggesting that the two share common features. By comparing microarray data from a model of renal regeneration and repair (RRR) with reported gene expression in renal cell carcinoma (RCC), we asked whether those two processes do, in fact, share molecular features and regulatory mechanisms. The majority (77%) of the genes expressed in RRR and RCC were concordantly regulated, whereas only 23% were discordant (i.e., changed in opposite directions). The orchestrated processes of regeneration, involving cell proliferation and immune response, were reflected in the concordant genes. The discordant gene signature revealed processes (e.g., morphogenesis and glycolysis) and pathways (e.g., hypoxia-inducible factor and insulin-like growth factor-I) that reflect the intrinsic pathologic nature of RCC. This is the first study that compares gene expression patterns in RCC and RRR. It does so, in particular, with relation to the hypothesis that RCC resembles the wound healing processes seen in RRR. However, careful attention to the genes that are regulated in the discordant direction provides new insights into the critical differences between renal carcinogenesis and wound healing. The observations reported here provide a conceptual framework for further efforts to understand the biology and to develop more effective diagnostic biomarkers and therapeutic strategies for renal tumors and renal ischemia.
    Cancer Research 08/2006; 66(14):7216-24. · 8.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Membrane proteins are responsible for many critical cellular functions and identifying cell surface proteins on different keratinocyte populations by proteomic approaches would improve our understanding of their biological function. The ability to characterize membrane proteins, however, has lagged behind that of soluble proteins both in terms of throughput and protein coverage. In this study, a membrane proteomic investigation of keratinocytes using a two-dimensional liquid chromatography (LC) tandem-mass spectrometry (MS/MS) approach that relies on a buffered methanol-based solubilization, and tryptic digestion of purified plasma membrane is described. A highly enriched plasma membrane fraction was prepared from newborn foreskins using sucrose gradient centrifugation, followed by a single-tube solubilization and tryptic digestion of membrane proteins. This digestate was fractionated by strong cation-exchange chromatography and analyzed using microcapillary reversed-phase LC-MS/MS. In a set of 1306 identified proteins, 866 had a gene ontology (GO) annotation for cellular component, and 496 of these annotated proteins (57.3%) were assigned as known integral membrane proteins or membrane-associated proteins. Included in the identification of a large number of aqueous insoluble integral membrane proteins were many known intercellular adhesion proteins and gap junction proteins. Furthermore, 121 proteins from cholesterol-rich plasma membrane domains (caveolar and lipid rafts) were identified.
    Journal of Investigative Dermatology 11/2004; 123(4):691-9. · 6.19 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.
    Genome Research 11/2004; 14(10B):2121-7. · 14.40 Impact Factor
  • Carl F Schaefer
    [Show abstract] [Hide abstract]
    ABSTRACT: Network representations of biological pathways offer a functional view of molecular biology that is different from and complementary to sequence, expression, and structure databases. There is currently available a wide range of digital collections of pathway data, differing in organisms included, functional area covered (e.g., metabolism vs. signaling), detail of modeling, and support for dynamic pathway construction. While it is currently impossible for these databases to communicate with each other, there are several efforts at standardizing a data exchange language for pathway data. Databases that represent pathway data at the level of individual interactions make it possible to combine data from different predefined pathways and to query by network connectivity. Computable representations of pathways provide a basis for various analyses, including detection of broad network patterns, comparison with mRNA or protein abundance, and simulation.
    Annals of the New York Academy of Sciences 06/2004; 1020:77-91. · 4.38 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A combined, detergent- and organic solvent-based proteomic method for the analysis of detergent-resistant membrane rafts (DRMR) is described. These specialized domains of the plasma membrane contain a distinctive and dynamic protein and/or lipid complement, which can be isolated from most mammalian cells. Lipid rafts are predominantly involved in signal transduction and adapted to mediate and produce different cellular responses. To facilitate a better understanding of their biology and role, DRMR were isolated from Vero cells as a Triton X-100 insoluble fraction. After detergent removal, sonication in 60% buffered methanol was used to extract, solubilize and tryptically digest the resulting protein complement. The peptide digestate was analyzed by microcapillary reversed-phase liquid chromatography-tandem mass spectrometry. Gas-phase fractionation in the mass-to-charge range was employed to broaden the selection of precursor ions and increase the number of identifications in an effort to detect less abundant proteins. A total of 380 proteins were identified including all known lipid raft markers. A total of 91 (24%) proteins were classified as integral alpha-helical membrane proteins, of which 51 (56%) were predicted to have multiple transmembrane domains.
    Electrophoresis 06/2004; 25(9):1307-18. · 3.26 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cleavable isotope-coded affinity tag (cICAT) reagents were utilized to identify and quantitate protein expression differences in control and inorganic phosphate-treated murine MC3T3-E1 osteoblast cells. Proteins extracted from control and treated cells were labeled with the light and heavy isotopic versions of cICAT reagents, respectively. The cICAT-labeled samples were combined, proteolytically digested, and the cICAT-derivatized peptides isolated using immobilized avidin chromatography. The cICAT-labeled peptides were resolved into 96 fractions by strong cation-exchange (SCX) liquid chromatography (LC). Analysis of the SCX-LC cICAT peptide fractions by microcapillary reversed-phase LC-tandem mass spectrometry resulted in the identification and quantitation of 7227 unique peptides corresponding to 2501 proteins, or roughly 9% of the proteins currently predicted to be encoded by the mouse genome. A false positive analysis indicated a 98% confidence in the peptide identifications. To corroborate changes in abundance measured by cICAT with those detectable in traditionally prepared cell lysate, we chose to analyze cyclin D1. Cyclin D1 has been previously identified as a phosphate-responsive gene and was likewise identified as a phosphate-responsive protein in the current analysis. The 1.76-fold increase in abundance in cyclin D1 determined from cICAT corresponds well with the 2.41-fold increase as determined by Western blotting. These results demonstrate that quantitative proteomics is capable of providing a quantitative view of thousands of proteins in mammalian cells within a defined set of experiments.
    Electrophoresis 06/2004; 25(9):1342-52. · 3.26 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this study, we utilized a multidimensional peptide separation strategy combined with tandem mass spectrometry (MS/MS) for the identification of proteins in human serum. After enzymatically digesting serum with trypsin, the peptides were fractionated using liquid-phase isoelectric focusing (IEF) in a novel ampholyte-free format. Twenty IEF fractions were collected and analyzed by reversed-phase microcapillary liquid chromatography (microLC)-MS/MS. Bioinformatic analysis of the raw MS/MS spectra resulted in the identification of 844 unique peptides, corresponding to 437 proteins. This study demonstrates the efficacy of ampholyte-free peptide autofocusing, which alleviates peptide losses in ampholyte removal strategies. The results show that the separation strategy is effective for high-throughput characterization of proteins from complex proteomic mixtures.
    Electrophoresis 02/2004; 25(1):128-33. · 3.26 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sites with substantive bioinformatics operations are challenged to build data processing and delivery infrastructure that provides reliable access and enables data integration. Locally generated data must be processed and stored such that relationships to external data sources can be presented. Consistency and comparability across data sets requires annotation with controlled vocabularies and, further, metadata standards for data representation. Programmatic access to the processed data should be supported to ensure the maximum possible value is extracted. Confronted with these challenges at the National Cancer Institute Center for Bioinformatics, we decided to develop a robust infrastructure for data management and integration that supports advanced biomedical applications. We have developed an interconnected set of software and services called caCORE. Enterprise Vocabulary Services (EVS) provide controlled vocabulary, dictionary and thesaurus services. The Cancer Data Standards Repository (caDSR) provides a metadata registry for common data elements. Cancer Bioinformatics Infrastructure Objects (caBIO) implements an object-oriented model of the biomedical domain and provides Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. caCORE has been used to develop scientific applications that bring together data from distinct genomic and clinical science sources. caCORE downloads and web interfaces can be accessed from links on the caCORE web site (http://ncicb.nci.nih.gov/core). caBIO software is distributed under an open source license that permits unrestricted academic and commercial use. Vocabulary and metadata content in the EVS and caDSR, respectively, is similarly unrestricted, and is available through web applications and FTP downloads. http://ncicb.nci.nih.gov/core/publications contains links to the caBIO 1.0 class diagram and the caCORE 1.0 Technical Guide, which provide detailed information on the present caCORE architecture, data sources and APIs. Updated information appears on a regular basis on the caCORE web site (http://ncicb.nci.nih.gov/core).
    Bioinformatics 01/2004; 19(18):2404-12. · 5.32 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Changes in serum proteins that signal histopathological states, such as cancer, are useful diagnostic and prognostic biomarkers. Unfortunately, the large dynamic concentration range of proteins in serum makes it a challenging proteome to effectively characterize. Typically, methods to deplete highly abundant proteins to decrease this dynamic protein concentration range are employed, yet such depletion results in removal of important low abundant proteins. A multi-dimensional peptide separation strategy utilizing conventional separation techniques combined with tandem mass spectrometry (MS/MS) was employed for a proteome analysis of human serum. Serum proteins were digested with trypsin and resolved into 20 fractions by ampholyte-free liquid phase isoelectric focusing. These 20 peptide fractions were further fractionated by strong cation-exchange chromatography, each of which was analyzed by microcapillary reversed-phase liquid chromatography coupled online with MS/MS analysis. This investigation resulted in the identification of 1444 unique proteins in serum. Proteins from all functional classes, cellular localization, and abundance levels were identified. This study illustrates that a majority of lower abundance proteins identified in serum are present as secreted or shed species by cells as a result of signalling, necrosis, apoptosis, and hemolysis. These findings show that the protein content of serum is quite reflective of the overall profile of the human organism and a conventional multidimensional fractionation strategy combined with MS/MS is entirely capable of characterizing a significant fraction of the serum proteome. We have constructed a publicly available human serum proteomic database (http://bpp.nci.nih.gov) to provide a reference resource to facilitate future investigations of the vast archive of pathophysiological content in serum.
    Clinical Proteomics 01/2004; 1(2):101-225.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Sites with substantive bioinformatics operations are challenged to build data processing and delivery infrastruc- ture that provides reliable access and enables data integration. Locally generated data must be processed and stored such that relationships to external data sources can be presen- ted. Consistency and comparability across data sets requires annotation with controlled vocabularies and, further, metadata standards for data representation. Programmatic access to the processed data should be supported to ensure the maximum possible value is extracted. Confronted with these challenges at the National Cancer Institute Center for Bioinformatics, we decided to develop a robust infrastructure for data man- agement and integration that supports advanced biomedical applications. Results: We have developed an interconnected set of soft- ware and services called caCORE. Enterprise Vocabulary Services (EVS) provide controlled vocabulary, dictionary and thesaurus services. The Cancer Data Standards Repos- itory (caDSR) provides a metadata registry for common data elements. Cancer Bioinformatics Infrastructure Objects (caBIO) implements an object-oriented model of the bio- medical domain and provides Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. caCORE has been used to develop scientific applications that bring together data from distinct genomic and clinical science sources. Availability: caCORE downloads and web interfaces can be accessed from links on the caCORE web site (http://ncicb.nci.nih.gov/core). caBIO software is distributed under an open source license that permits unrestricted aca- demic and commercial use. Vocabulary and metadata content in the EVS and caDSR, respectively, is similarly unrestricted, and is available through web applications and FTP downloads.
    Bioinformatics. 01/2003; 19:2404-2412.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http:mgc.nci.nih.gov).
    Proceedings of the National Academy of Sciences 12/2002; 99(26):16899-903. · 9.81 Impact Factor

Publication Stats

2k Citations
129.39 Total Impact Points

Institutions

  • 2011
    • Bar Ilan University
      Gan, Tel Aviv, Israel
  • 2004–2011
    • National Institutes of Health
      • Laboratory of Genetics (LG)
      Bethesda, MD, United States
  • 2008
    • National Cancer Institute (USA)
      Maryland, United States