ArticlePDF Available

The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website


Abstract and Figures

The discovery of mutations in cancer genes has advanced our understanding of cancer. These results are dispersed across the scientific literature and with the availability of the human genome sequence will continue to accrue. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website have been developed to store somatic mutation data in a single location and display the data and other information related to human cancer. To populate this resource, data has currently been extracted from reports in the scientific literature for somatic mutations in four genes, BRAF, HRAS, KRAS2 and NRAS. At present, the database holds information on 66 634 samples and reports a total of 10 647 mutations. Through the web pages, these data can be queried, displayed as figures or tables and exported in a number of formats. COSMIC is an ongoing project that will continue to curate somatic mutation data and release it through the website.Keywords: somatic, mutation, database, website
Content may be subject to copyright.
The COSMIC (Catalogue of Somatic Mutations in Cancer)
database and website
S Bamford
, E Dawson
, S Forbes
, J Clements
, R Pettett
, A Dogan
, A Flanagan
, J Teague
, PA Futreal*
MR Stratton
and R Wooster
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK;
Department of Histopathology, Royal
Free and University Medical School, University Street, London WC1E 6JJ, UK;
The Institute of Orthopaedics, UCL, Stanmore, Middlesex HA7 4LP, UK
The discovery of mutations in cancer genes has advanced our understanding of cancer. These results are dispersed across the
scientific literature and with the availability of the human genome sequence will continue to accrue. The COSMIC (Catalogue of
Somatic Mutations in Cancer) database and website have been developed to store somatic mutation data in a single location and
display the data and other information related to human cancer. To populate this resource, data has currently been extracted from
reports in the scientific literature for somatic mutations in four genes, BRAF, HRAS, KRAS2 and NRAS. At present, the database holds
information on 66 634 samples and reports a total of 10 647 mutations. Through the web pages, these data can be queried, displayed
as figures or tables and exported in a number of formats. COSMIC is an ongoing project that will continue to curate somatic mutation
data and release it through the website.
British Journal of Cancer (2004) 91, 355 358. doi:10.1038/sj.bjc.6601894
Published online 8 June 2004
&2004 Cancer Research UK
Keywords: somatic; mutation; database; website
Approximately one in three individuals in Europe and North
America develops one of the approximately 200 different classes of
cancer and it is the cause of death of one in five (Higginson, 1992).
All cancers arise as a result of the acquisition of a series of fixed
DNA sequence abnormalities, each of which ultimately confers
growth advantage upon the clone of cells in which it has occurred
(Vogelstein and Kinzler, 1998). These abnormalities include base
substitutions, deletions, amplifications and rearrangements. The
extent to which each of these mechanisms contributes to cancer
varies markedly between different genes, and probably also
between different cancer types. Identification of the genes that
are mutated in cancer is a central aim of cancer research. Over the
past 25 years, approximately 300 genes have been shown to be
somatically mutated in cancer (Futreal et al, 2004). This work
forms the foundation for understanding the biological abnormal-
ities within neoplastic cells, provides information on the function
of gene products and sheds light on more complex questions such
as the relationships between genes and biochemical pathways.
Current strategies for the development of new therapeutic and
preventive agents in cancer are increasingly dependent upon
modulation of these critical molecular targets.
The scientific literature is a rich source of mutation data that, in
general, is published in a piecemeal fashion. More comprehensive
data sources do exist, such as Online Mendelian Inheritance in
Man (OMIM, Wheeler et al, 2004), HGVbase (Fredman et al, 2002)
and the Human Gene Mutation Database (HGMD, Stenson et al,
2003). These databases give overviews of the genetics and biology
of many genes and associated diseases (OMIM), genome variants
and associated genotypephenotype relationships (HGVbase) or
germline mutation data (HGMD). For somatic mutations in cancer,
there are many locus-specific web resources, such as those for p53
(Olivier et al, 2002; Be
´roud and Soussi, 2003), that cover a single
gene in depth. The value of these various databases should not be
underestimated; however, none of them offer a comprehensive
view of all previously reported somatic mutations in cancer.
Looking to the future, the volume of somatic mutation data will
continue to expand and the scientific community will be better
served if this data is provided in a coherent fashion. A public,
comprehensive, intuitive, accessible and integrated database is
required to maximise the benefit from this rich data set. The
Catalogue of Somatic Mutations in Cancer (COSMIC), (http:// is a database that holds somatic
mutation data and associated information, and can be interrogated
through a series of web pages to provide a graphical or tabular
view of the data along with various export options. To date, the
database has been populated with data from four genes: HRAS,
Gene selection
The genes that have been selected for curation are taken from the
list of cancer genes assembled in the Cancer Gene Census (Futreal
et al, 2004). In the first instance, data was obtained for four genes
that are known to be somatically mutated in cancer: HRAS (Reddy
et al, 1982), KRAS2 (McCoy et al, 1983), NRAS (Hall et al, 1983)
and BRAF (Davies et al, 2002).
Received 4 March 2004; accepted 1 April 2004; published online 8 June
*Correspondence: Dr PA Futreal; E-mail:
British Journal of Cancer (2004) 91, 355 – 358
2004 Cancer Research UK All rights reserved 0007 – 0920/04
Genetics and Genomics
Data extraction from the literature
PubMed (Wheeler et al, 2004) is broadly searched for references
containing relevant somatic mutation data in cancer (example
search: (ras OR genes, ras) AND human AND mutation). In
the first instance, the abstract is read to identify, and select
for inclusion in the database, papers that are likely to include
somatic mutation information relating to cancer or precancerous
conditions. Primary research papers are read and informa-
tion about the samples, mutations and experimental methods
(see Table 1) is extracted and entered into the database. Reviews
are also selected if thought to be specific to a gene of interest.
In order to avoid duplication of data, this source is used to
identify the relevant primary literature and not as the source
of the mutation data. Any references containing incomplete data
(e.g. mutations reported but not fully described) or data of
insufficient quality (e.g. errors identified in the data) are not
fully curated but are added to a list of additional references
containing somatic mutation information. Simple mutations are
fed through Mutation Checker (Stajich et al, 2002) before being
imported to COSMIC, while more complex alterations are
manually annotated.
The COSMIC database is implemented in an Oracle relational
database and has five sections each containing multiple tables.
Gene information
A static version of each gene is maintained in COSMIC. The
genomic structure of each gene and chromosome location is
derived from Ensembl (Birney et al, 2004) and cDNA sequence and
protein sequence from the RefSeq project (Wheeler et al, 2004).
Other information is held to provide links to web resources such as
Ensembl (Birney et al, 2004), Pfam (Bateman et al, 2004), InterPro
(Mulder et al, 2003) and OMIM (Wheeler et al, 2004).
Table 1 Data entered in COSMIC
Reference Sample
Title Gene
Authors Experimental information
Journal Sample ID
Year Mutation status
Volume Normal tissue tested
Page start and stop Site primary
PubMed ID Site subtype 1
Experimental information Site subtype 2
Gene Histology
Histology subtype 1
Mutation Histology subtype 2
Mutation ID Stage
Mutation type Grade
DNA location Source tissue
DNA change Loss of heterozygosity
DNA evidence Gender
Is somatic Age
RNA label Other mutations
RNA change Ethnicity
RNA region Geographical location
RNA location Parent tested
RNA evidence Family ID
Amino-acid label Remark
Amino-acid location Reference
Amino-acid change Environmental variables
Amino-acid evidence
Gene Gene
Sequence Name
Remark Symbol
Other names
Experimental information Chromosome
Primary detection method Chromosome band
Secondary detection method cDNA sequence accession
Confirmation method cDNA sequence version
Exons/codons screened Ensembl gene start and stop
Whole gene screened Swissprot accession
Remark OMIM accession
Section heading for the data in COSMIC are in bold.
S Bamford et al
British Journal of Cancer (2004) 91(2), 355 – 358 &2004 Cancer Research UK
Genetics and Genomics
Paper information
The details of the papers that have been curated are maintained in
the paper section and include title, journal, author lists and links to
PubMed. There are currently 1483 papers in COSMIC, 865 of these
have been curated for mutations, while 618 either have no relevant
data or incomplete data that could not accurately be extracted. By
gene 30, 249, 718 and 303 papers report BRAF, HRAS, KRAS2 and
NRAS mutations, respectively. Of the 865 papers reporting
mutations, 615 report data on only one gene, while 72, 174 and
four contain data on two, three or all four genes, respectively.
Mutation information
COSMIC can accommodate information on base substitutions,
insertions and deletions, translocations and changes in copy
number. For the four genes presently in COSMIC, there are 147
unique mutations (36 for BRAF, 27 for HRAS, 52 for KRAS2 and 32
for NRAS). In the tumours that have been analysed, there are a
total of 10 647 mutations, 736 in BRAF, 477 in HRAS, 8302 in
KRAS2 and 1132 in NRAS.
Tumour classification system
The tissue site and histology data is taken from the curated papers
and entered into COSMIC (this forms the ‘paper definition’).
Tumour classification is a continually evolving field and there is no
standard nomenclature adhered to for the purposes of publication
in the various journals. Identical tissues and histologies can have
different labels depending on the origin and age of the study. To
overcome difficulties caused by these alternate nomenclatures, a
standardised system of definitions has been developed (the
‘COSMIC definitions’) through consultation with experts in the
field. This groups data from the same tissue types and histologies
and can be used to translate the ‘paper definitions’ to ‘COSMIC
definitions’. Every sample has up to eight definitions; primary
tissue, tissue subtype 1, 2 and 3, primary histology and histology
subtypes 1, 2 and 3. If there is no data for any of these definitions,
COSMIC records an entry of NS, not specified. A total of 513 tissue
definitions have been noted in the papers in COSMIC and have
been translated to 372 COSMIC tissue definitions. Likewise, a total
of 1150 histology definitions were found in the papers in COSMIC
that were translated to 425 COSMIC histology definitions. This
unified classification system is presented through the web pages to
present a normalised browsing tool.
Individual/tumour/sample data
The sample data is taken from the curated papers and linked to the
appropriate gene, paper, classification and when present a
mutation. This forms the core of the COSMIC database. An
individual can have many tumours and each tumour can have
many samples. However in the COSMIC scheme, each sample is
unique and could be considered as a single experiment. There are
66 634 sample records in COSMIC (5158, 11 876, 35 716 and 13 884
for BRAF, HRAS, KRAS2 and NRAS, respectively). These samples
are derived from 57 444 tumours of which 51 988 were analysed in
one gene, 2353 in two genes, 2930 in three genes and 173 in all four
A series of web pages provides query tools to interrogate COSMIC
and produces graphical (Figure 1) and tabular (Table 2) displays of
the data. Currently the output is provided at the amino-acid level
based on the protein structure of each gene.
Browse by gene
Immediate access to the data is provided through the Browse by
Gene link. This gives an instant overview of the mutation data for
one or more genes and gives links to display data for individual
Browse by tissue
More complex queries can be constructed using the Browse by
Tissue link. The user has the option to select one or more tissues,
Figure 1 The initial output from COSMIC is a graphical view of the mutations distributed along the linear amino-acid sequence of the gene. The scale bar
incorporates a zoom function to generate a more detailed view of the protein to the point where individual amino acids are named (when there are fewer
than 31 amino acids displayed). When a Pfam or Interpro domain is present, a link is provided to these resources (adjacent to the Domain label) while links
to the papers that were curated are positioned beneath the mutations (in red) with an option of either viewing the papers that have data for a particular
location in the protein or all of the papers for the selected gene.
Table 2 Mutation Details from COSMIC
Details for BRAF
(% of All Samples) All Samples Mutation Data
NS 0 3 More Details
adrenal gland 0 2 More Details
autonomic ganglia 0 27 More Details
bile duct 16 (23%) 70 More Details
bladder 0 37 More Details
bone 1 (3%) 31 More Details
brain 4 (7%) 56 More Details
breast 1 (1%) 78 More Details
cervix 0 49 More Details
endometrium 0 5 More Details
eye 0 31 More Details
haematopoietic and
lymphoid tissue
4 (1%) 322 More Details
head neck 6 (4%) 152 More Details
kidney 0 12 More Details
large intestine 148 (13%) 1135 More Details
larynx 0 25 More Details
liver 1 (3%) 32 More Details
lung 15 (2%) 829 More Details
mouth 0 13 More Details
ovary 57 (20%) 282 More Details
pancreas 5 (4%) 114 More Details
pharynx 3 (6%) 51 More Details
placenta 0 1 More Details
pleura 0 3 More Details
prostate 0 43 More Details
skin 282 (61%) 460 More Details
small intestine 0 1 More Details
soft tissue 5 (2%) 211 More Details
stomach 7 (2%) 407 More Details
testis 0 7 More Details
thyroid 181 (27%) 669 More Details
The mutations from COSMIC are presented by tissue and where selected by
histology with a figure for the number of samples analysed for each tissue (All
Samples) and the number of mutations reported (Mutated). The ‘More Details’
column gives further navigation options to view data for the selected tissue, view data
for the same tissue in other genes or provide more details on the mutations for the
selected tissue.
S Bamford et al
British Journal of Cancer (2004) 91(2), 355 – 358&2004 Cancer Research UK
Genetics and Genomics
then one or more histologies, and finally one or more genes. If only
one tissue or histology is selected, it is possible to select one or
more tissue or histology subtypes before making a gene selection.
All of the tissues present in the COSMIC classification scheme are
available from the first page; however, subsequent pages only show
the relevant options and not the entire list of options, for example
having selected eye, the tissue subtype options are retina and uveal
Data display
After querying the database, the results are displayed as a figure
(Figure 1) and as a series of tables (Table 2) for each gene that was
selected. The figure shows the linear amino-acid sequence derived
from the gene with the mutations positioned along its length.
Further information and links are provided as appropriate to the
protein sequence. The table gives a summary of the mutations
stratified by tissue and histology. The depth of the stratification
relates to the depth of the original query. If only tissue was
selected, the data will be stratified by tissue; however, if tissue,
subtissue, histology and subhistology are selected, the data will be
broken down further. Links from this table reload the figure to
display a subset of the data and provide more details of the specific
mutations. Two other tables provide a summary of the statistics in
COSMIC for the selected gene and a summary of the mutations
shown in the figure.
Exports and downloads
Having displayed the results from a query, the data can be
formatted in simple text, Excel or HTML that can be downloaded
from the COSMIC site. The cDNA and protein sequences are
available through the Additional Info. link on the COSMIC home
page as is the Classification Scheme.
There is a continuing effort to enter additional somatic mutation
data in to COSMIC. In order to keep the data in COSMIC up-to-
date, we regularly monitor the literature for new reports of
mutations in the genes that exist in COSMIC. In addition, further
cancer genes will be taken from the Cancer Gene Census (Futreal
et al, 2004) and curated. The COSMIC website will be developed
further to make use of the underlying data. This will include a
DNA view of the mutations and methods to display insertions and
deletions. In addition, we will display other data that has already
been captured such as the patient sex and age for the samples and
the experimental methods used to screen for the mutations. There
are however limitations to this data as we can only collect data that
is described in the original work. Even with this caveat the data
provides a direct summary of the somatic mutation literature.
Considering the data set as a whole it will be possible to analyse, in
greater detail, the wider aspects of the biology underlying the
genetic changes that take place in cancer.
We thank Frances Martin and the Institute of Cancer Research and
The Wellcome Trust for funding this work.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S,
Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats
C, Eddy SR (2004) The Pfam protein families database. Nucleic Acids Res
32: D138 – D141
´roud C, Soussi T (2003) The UMD-p53 database: new mutations and
analysis tools. Hum Mutat 21: 176 – 181
Birney E, Andrews D, Bevan P, Caccamo M, Cameron G, Chen Y, Clarke L,
Coates G, Cox T, Cuff J, Curwen V, Cutts T, Down T, Durbin R, Eyras E,
Fernandez-Suarez XM, Gane P, Gibbins B, Gilbert J, Hammond M, Hotz
H, Iyer V, Kahari A, Jekosch K, Kasprzyk A, Keefe D, Keenan S,
Lehvaslaiho H, McVicker G, Melsopp C, Meidl P, Mongin E, Pettett R,
Potter S, Proctor G, Rae M, Searle S, Slater G, Smedley D, Smith J,
Spooner W, Stabenau A, Stalker J, Storey R, Ureta-Vidal A, Woodwark C,
Clamp M, Hubbard T (2004) Ensembl 2004. Nucleic Acids Res 32: D468
Davies H, Bignell GR, Cox C, Stephens P, Edkins S, Clegg S, Teague J,
Woffendin H, Garnett MJ, Bottomley W, Davis N, Dicks E, Ewing R,
Floyd Y, Gray K, Hall S, Hawes R, Hughes J, Kosmidou V, Menzies A,
Mould C, Parker A, Stevens C, Watt S, Hooper S, Wilson R, Jayatilake H,
Gusterson BA, Cooper C, Shipley J, Hargrave D, Pritchard-Jones K,
Maitland N, Chenevix-Trench G, Riggins GJ, Bigner DD, Palmieri G,
Cossu A, Flanagan A, Nicholson A, Ho JW, Leung SY, Yuen ST, Weber
BL, Seigler HF, Darrow TL, Paterson H, Marais R, Marshall CJ, Wooster
R, Stratton MR, Futreal PA (2002) Mutations of the BRAF gene in human
cancer. Nature 417: 949 – 954
Fredman D, Siegfried M, Yuan YP, Bork P, Lehva
¨slaiho H, Brookes AJ
(2002) HGVbase: a human sequence variation database emphasizing data
quality and a broad spectrum of data sources. Nucleic Acids Res 30:
387 – 391
Futreal PA, Down T, Coin L, Marshall M, Rahman N, Wooster R, Timothy
Hubbard T, Bateman A, Stratton MR (2004) A census of human cancer
genes. Nat Rev Cancer 4: 177 – 183
Hall A, Marshall CJ, Spurr NK, Weiss RA (1983) Identification of
transforming gene in two human sarcoma cell lines as a new member
of the ras gene family located on chromosome 1. Nature 303: 396 – 400
Higginson J (1992) Human cancer: epidemiology and environmental
causes. In: Higginson, Muis, Munoz (eds). Cambridge Monographs on
Cancer Research. Cambridge, UK: Cambridge University Press
McCoy MS, Toole JJ, Cunningham JM, Chang EH, Lowy DR, Weinberg RA
(1983) Characterization of a human colon/lung carcinoma oncogene.
Nature 302: 79 – 81
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A,
Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E,
Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D,
Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R,
Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D,
Ponting CP, Selengut JD, Servant F, Sigrist CJA, Vaughan R, Zdobnov EM
(2003) The InterPro Database, 2003 brings increased coverage and new
features. Nucleic Acids Res 31: 315 – 318
Olivier M, Eeles R, Hollstein M, Khan MA, Harris C.C, Hainaut P (2002)
The IARC TP53 Database: new online mutation analysis and recom-
mendations to users. Hum Mutat 19: 607 – 614
Reddy EP, Reynolds RK, Santos E, Barbacid M (1982) A point mutation is
responsible for the acquisition of transforming properties by the T24
human bladder carcinoma oncogene. Nature 300: 149 – 152
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C,
Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall
CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E,
Wilkinson MD, Birney E (2002) The Bioperl toolkit: Perl modules for the
life sciences. Genome Res 12: 1611 – 1618
Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS,
Abeysinghe S, Krawczak M, Cooper DN (2003) Human Gene Mutation
Database (HGMD(R)): 2003 update. Hum Mutat 21: 577 – 581
Vogelstein B, Kinzler K (1998) The Genetic Basis of Human Cancer. New
York: McGraw Hill
Wheeler DL, Church DM, Edgar R, Federhen S, Helmberg W,
Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E,
Suzek TO, Tatusova TA, Wagner L (2004) Database resources of the
National Center for Biotechnology Information: update. Nucleic Acids
Res 32: D35 – D40
S Bamford et al
British Journal of Cancer (2004) 91(2), 355 – 358 &2004 Cancer Research UK
Genetics and Genomics
... Noncoding alterations were excluded and alterations predicted to be germline by the somatic-germline zygosity algorithm were not counted (16). Somatic alterations listed in the Catalogue of Somatic Mutations in Cancer (COSMIC) and truncations in tumor suppressor genes were not counted (17). ...
Full-text available
Background Intra-tumor heterogeneity (ITH) plays a vital role in drug resistance and recurrence of lung cancer. We used a mutant-allele tumor heterogeneity (MATH) algorithm to assess ITH and investigated its association with clinical and molecular features in advanced lung adenocarcinoma.Methods Tissues from 63 patients with advanced lung adenocarcinoma were analyzed by next-generation sequencing (NGS) using a panel targeting 520 cancer-relevant genes. We calculated the MATH values from NGS data and further investigated their correlation with clinical and molecular characteristics.ResultsAmong the 63 patients with advanced lung adenocarcinoma, the median value of MATH was 33.06. Patients with EGFR mutation had higher level of MATH score than those with wild-type EGFR status (P = 0.008). Patients with stage IV disease showed a trend to have a higher MATH score than those with stage III (P = 0.052). MATH was higher in patients with disruptive TP53 mutations than in those with non-disruptive mutations (P = 0.036) or wild-type sequence (P = 0.023), but did not differ between tumors with non-disruptive mutations and wild-type TP53 (P = 0.867). High MATH is associated with mutations in mismatch repair (MMR) pathway (P = 0.026) and base excision repair (BER) pathway (P = 0.008). In addition, MATH was found to have a positive correlation with tumor mutational burden (TMB) (Spearman ρ = 0.354; P = 0.004). In 26 patients harboring EGFR mutation treated with first generation EGFR TKI as single-agent therapy, the objective response rate was higher in the Low-MATH group than in the High-MATH group (75% vs. 21%; P = 0.016) and Low-MATH group showed a significantly longer progression-free survival than High-MATH group (median PFS: 13.7 months vs. 10.1 months; P = 0.024).Conclusions For patients with advanced lung adenocarcinoma, MATH may serve as a clinically practical biomarker to assess intratumor heterogeneity.
... mutations somatiques identifiées et leur impact dans le cancer[240]. Il a été identifié 155 variants de ROBO4 comportant des mutations faux-sens, donc 34 ont été associés à des cancers du poumon[241]. ...
Bien que les progrès en matière de dépistage et de thérapie permettent une prise en charge toujours plus efficace des cancers du sein, ils représentent toujours la première cause de mortalité par cancer chez la femme. Le développement de métastases osseuses fait partie des complications les plus fréquentes dans ces cancers, affectant plus de 70% des patientes à un stade avancé. Les traitements actuels ne sont que palliatifs. Comprendre les mécanismes moléculaires régulant l’ancrage précoce des cellules tumorales à l’os est donc d’un intérêt sanitaire majeur. Une analyse transcriptomique comparative entre la lignée de cancer du sein triple négatif MDA-MB-231 métastasant dans différents organes chez la souris et une sous-population de cette lignée, nommée B02, métastasant spécifiquement à l’os, a mis en évidence une surexpression de ROBO4, un récepteur transmembranaire faisant partie de la famille des récepteurs de guidage axonal Roundabout (ROBO), connue pour jouer un rôle dans l’angiogenèse, le maintien de l’intégrité vasculaire ainsi que dans l’ancrage des cellules souches hématopoïétiques à la moelle osseuse. De précédentes études au laboratoire ont démontré l’existence de deux isoformes de ROBO4 dans les cellules endothéliales : ROBO4s, une forme de 160 kDa décrite dans la littérature, et ROBO4v, nouvellement découverte, de 92 kDa. Les cellules de cancers du sein expriment également ROBO4, mais ne produisent, que ROBO4v. Un séquençage des ARNm totaux a mis en évidence une incroyable diversité de transcrits pouvant expliquer l’origine de ce nouveau variant, impliqué dans les interactions entre cellules tumorales. Dans ce contexte, nous avons démontré un lien entre la présence de ROBO4 dans les cellules tumorales et leur expression d’ICAM-1, impliquée dans les interactions cellulaires et la différenciation des ostéoclastes. Ces résultats permettent ainsi de mieux comprendre le rôle de ROBO4 dans la formation de métastases osseuse.
... Finally, we examined our pipeline's performance in identifying a set of 55 known melanoma somatic driver genes found in the COSMIC database 45 (Supplementary Data 4). We found that for the 43 genes in this group that carried at least one true somatic mutation in our dataset, our pipeline achieves an even higher precision and recall levels, with median values of 1 and 0.95, respectively, further demonstrating its high value. ...
Full-text available
Detection of somatic mutations using patients sequencing data has many clinical applications, including the identification of cancer driver genes, detection of mutational signatures, and estimation of tumor mutational burden (TMB). We have previously developed a tool for detection of somatic mutations using tumor RNA and a matched-normal DNA. Here, we further extend it to detect somatic mutations from RNA sequencing data without a matched-normal sample. This is accomplished via a machine-learning approach that classifies mutations as either somatic or germline based on various features. When applied to RNA-sequencing of >450 melanoma samples high precision and recall are achieved, and both mutational signatures and driver genes are correctly identified. Finally, we show that RNA-based TMB is significantly associated with patient survival, showing similar or higher significance level as compared to DNA-based TMB. Our pipeline can be utilized in many future applications, analyzing novel and existing datasets where only RNA is available.
... The p.Q209R variant occurs at the same codon as the most common GNAQ mutation in cancer, p.Q209L, but the arginine substitution is a rare variant (1.2% of GNAQ mutations) in the COSMIC database (Catalog of Somatic Mutations in Cancer) [18]. Previous studies have indicated that constitutive activation of Gαq by somatic GNAQ mutations such as Q209L or R183Q activates MAPK/ERK and JNK signaling pathways [19,20] and induces serum response element (SRE)-dependent gene transcription [21]. ...
Full-text available
Sturge-Weber syndrome (SWS) is a sporadic, congenital, neuro-cutaneous disorder characterized by a mosaic, capillary malformation. SWS and non-syndromic capillary malformations are both caused by a somatic activating mutation in GNAQ encoding the G protein subunit alpha-q protein. The missense mutation R183Q is the sole GNAQ mutation identified thus far in 90% of SWS-associated or isolated capillary malformations. In this study, we sequenced skin biopsies of capillary malformations from 9 patients. We identified the R183Q mutation in nearly all samples, but one sample exhibited a Q209R mutation. This new mutation occurs at the same residue as the constitutively-activating Q209L mutation, commonly seen in tumors. However, Q209R is a rare variant in this gene. To compare the effect of the Q209R mutation on downstream signaling, we performed reporter assays with a GNAQ-responsive reporter co-transfected with either GNAQ WT, R183Q, Q209L, Q209R, or C9X (representing a null allele). Q209L showed the highest reporter activation, with R183Q and Q209R showing significantly lower activation. To determine whether these mutations had similar or different downstream consequences we performed RNA-seq analysis in microvascular endothelial cells (HMEC-1) electroporated with the same GNAQ variants. The R183 and Q209 missense variants caused extensive dysregulation of a broad range of transcripts compared to the WT or null allele, confirming that these are all activating mutations. However, the missense variants exhibited very few differentially expressed genes (DEGs) when compared to each other. These data suggest that these activating GNAQ mutations differ in magnitude of activation but have similar downstream effects.
... The DLG and ETGE regions are pivotal for the KEAP1-NRF2 interaction. In the case of KEAP1 and CUL3, loss-of-function mutations are found throughout the coding region [12]. Indeed, patients with these mutations have a poor prognosis and increased mortality. ...
Full-text available
The Keap1-Nrf2 system is the master regulator of the cellular response against oxidative and xenobiotic stresses. Constitutive activation of Nrf2 is frequently observed in various types of cancers. Nrf2 hyperactivation induces metabolic reprogramming in cancer cells, which supports the increased energy demand required for rapid proliferation and confers high-level resistance against anticancer radio/chemotherapy. Hence, Nrf2 inhibition has emerged as an attractive therapeutic strategy to counter such acquired resistance in Nrf2-activated tumors. We previously identified Halofuginone (HF) as a promising Nrf2 inhibitor. In this study, we pursued preclinical characterization of HF and found that while HF markedly reduced the viability of cancer cells, it also caused severe hematopoietic and immune cell suppression in a dose-dependent manner. Hence, to overcome this toxicity, we decided to employ a nanomedicine approach to HF. We found that encapsulation of HF into a polymeric micelle (HF micelle; HFm) largely relieved the systemic toxicity exhibited by free HF while maintaining the tumor-suppressive properties of HF. LC-MS/MS analysis revealed that the reduction in the magnitude of adverse effects was the result of the ability to release HF from the HFm core in a slow and sustained manner. These results thus support the contention that HFm will potentially counteract Nrf2-activated cancers in the clinical settings.
... In addition, another database COSMIC (Catalogue of Somatic Mutations in Cancer) stores all somatic mutations related to different human cancers, specifically on mutations of BRAF, KRAS2, HRAS, and NRAS genes. 42 This database reports about 10 647 mutations, while MutaXome currently has about 4181 mutations. Despite this, MutaXome offers information on several 1000 genes that are possibly related to different cancer types. ...
Full-text available
Advancements in the field of cancer research have enabled researchers and clinicians to access a massive amount of data to aid cancer patients and to add to the existing knowledge of research. However, despite the existence of reliable sources for extricating this data, it remains a challenge to accurately comprehend and draw conclusions based on the entirety of available information. Therefore, the current study aimed to design and develop a database for the identified variants of 5 different cancer types using 20 different cancer exomes. The exome data were retrieved from NCBI SRA and an NGS data clean-up protocol was implemented to obtain the best quality reads. The reads which passed the quality checks were then used for calling the variants which were then processed and filtered. This data was used to normalize and the normalized data generated was used for developing the database. MutaXome, which stands for mutations in cancer exome was designed in SQL, with the front end in bootstrap and HTML, and backend in PHP. The normalized data containing the variants inclusive of Single Nucleotide Polymorphisms (SNPs), were added into MutaXome, which contains detailed information regarding each type of identified variant. This database, available online via , serves as a knowledge base for cancer exome variations and holds much potential for enriching it by linking it to a decision support system as prospective studies.
Both the composition of cell types and their spatial distribution in a tissue play a critical role in cellular function, organ development, and disease progression. For example, intratumor heterogeneity and the distribution of transcriptional and genetic events in single cells drive the genesis and development of cancer. However, it can be challenging to fully characterize the molecular profile of cells in a tissue with high spatial resolution because microscopy has limited ability to extract comprehensive genomic information, and the spatial resolution of genomic techniques tends to be limited by dissection. There is a growing need for tools that can be used to explore the relationship between histological features, gene expression patterns, and spatially correlated genomic alterations in healthy and diseased tissue samples. Here, we present a technique that combines label-free histology with spatially resolved multiomics in unfixed and unstained tissue sections. This approach leverages stimulated Raman scattering microscopy to provide chemical contrast that reveals histological tissue architecture, allowing for high-resolution in situ laser microdissection of regions of interests. These microtissue samples are then processed for DNA and RNA sequencing to identify unique genetic profiles that correspond to distinct anatomical regions. We demonstrate the capabilities of this technique by mapping gene expression and copy number alterations to histologically defined regions in human oral squamous cell carcinoma (OSCC). Our approach provides complementary insights in tumorigenesis and offers an integrative tool for macroscale cancer tissues with spatial multiomics assessments.
The CRISPR-Cas9 technology has revolutionized the scope and pace of biomedical research, enabling the targeting of specific genomic sequences for a wide spectrum of applications. Here we describe assays to functionally interrogate mutations identified in cancer cells utilizing both CRISPR-Cas9 nuclease and base editors. We provide guidelines to interrogate known cancer driver mutations or functionally screen for novel vulnerability mutations with these systems in characterized human cancer cell lines. The proposed platform should be transferable to primary cancer cells, opening up a path for precision oncology on a functional level.
The KRAS and BRAF genes are attractive for a predictive biomarker to determine whether the colorectal cancer (CRC) patient predicts response to anti-epidermal growth factor receptor treatment. The detection of KRAS and BRAF mutation statuses has to be considered as a diagnostic and prognostic factor for the treatment of metastatic CRC patients because the targeted therapy may be expensive as well as toxic. In this work, we develop an YbTixOy electrolyte-insulator-semiconductor (EIS) biosensor to detect the KRAS and BRAF gene mutations in CRC. X-ray diffraction, atomic force microscopy, X-ray photoelectron spectroscopy, and secondary ion mass spectrometry were used to explore the structural features of YbTixOy films prepared under three Ti plasma power conditions. We found that the YbTixOy-based EIS sensor fabricated at the 80 W condition exhibited a higher sensitivity of 69.54 mV/pH (in pH range 2–12), a lower hysteresis voltage of 1 mV (in pH loop of 7→4→7→10→7), and a smaller drift rate of 0.15 mV/h (in pH 7) than those of the other Ti plasma power conditions. Furthermore, we successfully demonstrated that the single strain DNA probe-immobilized YbTixOy EIS biosensors functionalized with 3-aminopropyl triethoxysilane followed by glutaraldehyde were screened for the KRAS and BRAF gene mutations in CRC. High sensitivity and rapid detection of the YbTixOy-based EIS biosensor are expected to serve as the diagnostic tool for clinical examination and screening with CRC.
Myxoid liposarcoma (MLPS) is a lipogenic sarcoma, characterized by myxoid appearance histology and the presence of the FUS-DDIT3 fusion gene. MLPS shows frequent recurrence and poor prognosis after standard treatments, such as surgery. Therefore, novel therapeutic approaches for MLPS are needed. Development of novel treatments requires patient-derived cell lines to study the drug responses and their molecular backgrounds. Presently, only three cell lines of MLPS have been reported, and no line is available from public cell banks. Thus, this study aimed to establish and characterize novel MLPS cell lines. Using surgically resected tumor tissue from two patients with MLPS, two novel lines NCC-MLPS2-C1 and NCC-MLPS3-C1 were established. The presence of FUS-DDIT3 fusion, slow growth, spheroid formation, and invasive capability in these cell lines was confirmed. Growth retardation was monitored for 213 anti-cancer agents using NCC-MLPS2-C1 and NCC-MLPS3-C1 cells, and the results were integrated with the response to treatments in an MLPS cell line, NCC-MLPS1-C1, which was previously established in our laboratory. We found that romidepsin suppressed cell proliferation at considerably low concentrations in all three examined cell lines. NCC-MLPS2-C1 and NCC-MLPS3-C1 cell lines developed here represent a useful tool for basic and preclinical studies of MLPS.
Full-text available
DNA sequences capable of inducing oncogenic transformation of NIH3T3 mouse cells are found in a number of human tumour cell lines. When DNAs of these cell lines are applied to monolayer cultures of the mouse fibroblasts, foci of transformed cells are observed 2-3 weeks later. DNA from cells of such primary foci can be used in turn to induce foci in a second cycle of gene transfer. The human DNA sequences responsible for transformation have been called oncogenes, the best characterized of which is closely related to the Harvey murine sarcoma virus oncogene. Here we present a characterization of an oncogene which we found originally to be present in DNA of the SW480 colon carcinoma cell line. We indicate its structural outlines and demonstrate, in extension of reported results, its presence in an activated form in the genome of several types of human tumour cell lines as well as in biopsy tissue from an adenocarcinoma of the large bowel. We identify this tumour oncogene with c-Ki-ras2, one of two known members of the Kirsten ras family of human proto-oncogenes, extending a series of recent reports which have demonstrated homologies between human oncogenes and those of Harvey and Kirsten murine sarcoma viruses. The c-Ki-ras2 oncogene of several tumour cell lines is shown to be amplified.
Full-text available
The genetic change that leads to the activation of the oncogene in T24 human bladder carcinoma cells is shown to be a single point mutation of guanosine into thymidine. This substitution results in the incorporation of valine instead of glycine as the twelfth amino acid residue of the T24 oncogene-encoded p21 protein. Thus, a single amino acid substitution appears to be sufficient to confer transforming properties on the gene product of the T24 human bladder carcinoma oncogene.
Full-text available
HGVbase (Human Genome Variation database;, formerly known as HGBASE) is an academic effort to provide a high quality and non-redundant database of available genomic variation data of all types, mostly comprising single nucleotide polymorphisms (SNPs). Records include neutral polymorphisms as well as disease-related mutations. Online search tools facilitate data interrogation by sequence similarity and keyword queries, and searching by genome coordinates is now being implemented. Downloads are freely available in XML, Fasta, SRS, SQL and tagged-text file formats. Each entry is presented in the context of its surrounding sequence and many records are related to neighboring human genes and affected features therein. Population allele frequencies are included wherever available. Thorough semi-automated data checking ensures internal consistency and addresses common errors in the source information. To keep pace with recent growth in the field, we have developed tools for fully automated annotation. All variants have been uniquely mapped to the draft genome sequence and are referenced to positions in EMBL/GenBank files. Data utility is enhanced by provision of genotyping assays and functional predictions. Recent data structure extensions allow the capture of haplotype and genotype information, and a new initiative (along with BiSC and HUGO-MDI) aims to create a central repository for the broad collection of clinical mutations and associated disease phenotypes of interest.
Full-text available
Cancers arise owing to the accumulation of mutations in critical genes that alter normal programmes of cell proliferation, differentiation and death. As the first stage of a systematic genome-wide screen for these genes, we have prioritized for analysis signalling pathways in which at least one gene is mutated in human cancer. The RAS RAF MEK ERK MAP kinase pathway mediates cellular responses to growth signals. RAS is mutated to an oncogenic form in about 15% of human cancer. The three RAF genes code for cytoplasmic serine/threonine kinases that are regulated by binding RAS. Here we report BRAF somatic missense mutations in 66% of malignant melanomas and at lower frequency in a wide range of human cancers. All mutations are within the kinase domain, with a single substitution (V599E) accounting for 80%. Mutated BRAF proteins have elevated kinase activity and are transforming in NIH3T3 cells. Furthermore, RAS function is not required for the growth of cancer cell lines with the V599E mutation. As BRAF is a serine/threonine kinase that is commonly activated by somatic point mutation in human cancer, it may provide new therapeutic opportunities in malignant melanoma.
Full-text available
The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort. [Supplemental material is available online at . Bioperl is available as open-source software free of charge and is licensed under the Perl Artistic License ( ). It is available for download at . Support inquiries should be addressed to .]
Full-text available
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver ( and anonymous FTP (
Full-text available
The tumor suppressor gene TP53 (p53) is the most extensively studied gene involved in human cancers. More than 1,400 publications have reported mutations of this gene in 150 cancer types for a total of 14,971 mutations. To exploit this huge bulk of data, specific analytic tools were highly warranted. We therefore developed a locus-specific database software called UMD-p53. This database compiles all somatic and germline mutations as well as polymorphisms of the TP53 gene which have been reported in the published literature since 1989, or unpublished data submitted to the database curators. The database is available at or at In this paper, we describe recent developments of the UMD-p53 database. These developments include new fields and routines. For example, the analysis of putative acceptor or donor splice sites is now automated and gives new insight for the causal role of "silent mutations." Other routines have also been created such as the prescreening module, the UV module, and the cancer distribution module. These new improvements will help users not only for molecular epidemiology and pharmacogenetic studies but also for patient-based studies. To achieve theses purposes we have designed a procedure to check and validate data in order to reach the highest quality data.
A molecular clone containing part of the transforming gene from two human sarcoma cell lines, HT1080 and RD, has been obtained and shown to represent a new member of the human ras gene family. The transforming gene has undergone no major rearrangements and has not been amplified in either sarcoma cell line. The major transcript from the gene is 2,200 nucleotides long and is present at the same levels in both normal fibroblasts and tumour cells. The same gene is also activated in HL60, a promyelocytic leukaemia line and in SK-N-SH, a neuroblastoma line. The gene, N-ras, is located on chromosome 1.
Mutations in the tumor suppressor gene TP53 are frequent in most human cancers. Comparison of the mutation patterns in different cancers may reveal clues on the natural history of the disease. Over the past 10 years, several databases of TP53 mutations have been developed. The most extensive of these databases is maintained and developed at the International Agency for Research on Cancer. The database compiles all mutations (somatic and inherited), as well as polymorphisms, that have been reported in the published literature since 1989. The IARC TP53 mutation dataset is the largest dataset available on the variations of any human gene. The database is available at In this paper, we describe recent developments of the database. These developments include restructuring of the database, which is now patient-centered, with more detailed annotations on the patient (carcinogen exposure, virus infection, genetic background). In addition, a new on-line application to retrieve somatic mutation data and analyze mutation patterns is now available. We also discuss limitations on the use of the database and provide recommendations to users.