Cancer Epidemiology

Cancer Epidemiology

  • James Leigh added an answer:
    I would like to know what OR of occupational bladder cancer?

    I want to know the number of cases of occupational bladder cancer required in case-control study of risk factors for bladder cancer?

    James Leigh

    The size of the study (no. of cases and controls) will depend on which causal effects you wish to detect, which degree of increased risk you wish to detect at which alpha level with which power. There are numerous reference to methods in RG answers including mine.

    I assume you will be looking at things like aniline dyes, HIV, PAH, tobacco, alcohol, radiation therapy to other organs, general radiation and other chemicals..

    you will also need some a priori estimates of likely effects for power studies., based on the literature.

  • Kunal Suradkar added an answer:
    How can I compare survival across three time periods with most recent follow up available for first time period?

    I have three time periods starting from year 1970. All of them have follow up till 2015. As you can imagine the first time period has longer survival curve than the other two survival curves. When I run the log rank test, I get a significant value. But I feel that the value is affected by the fact that the first time period has longer follow up and survival. I hope to find a significantly increasing survival trend between the three time periods. Which test should I use for the same in SPSS?

    Kunal Suradkar

     Thank you so much for your answer. That is also indeed a great suggestion.

  • Rajaraman Swaminathan added an answer:
    How can I calculate disease free survival at 5 years follow-up?

    I have a group of postoperative oncologic patients at 5 years follow-up which I divided into four subgroups according to the alive status (dead or alive) and recurrence (disease-free, recurrent). What are the subgroups when calculating the disease-free survival at 5 years of follow-up? Any theoretical literature background on how the calculation is performed is appreciated. Thanks.

    Rajaraman Swaminathan

    The minimum requirement for calculating DFS is to have good documentation on disease recurrence: local or distant or both (considered as events or failures) or none (no event). There may be information on deaths of patients not due to cancer. You have to take a call on this whether to treat these also as "events". These should technically be treated as censored.

    Data on all patients whether alive or dead or with or without disease at the end of follow up must be included. A substantial proportion of your patients must have a potential follow up of 5 years.

    Your data should be in case-listing form in a spread-sheet with variables in columns and patients in rows. The important variables are related to the outcomes:

    (a) disease status (event-local recurrence or distant recurrence or both (all coded as 1) vs. none (coded as zero));

    (b) time to event - in months or years (date of event minus date of starting treatment or diagnosis for those with event; for those without events, date of last folloow up minus date of starting treatment or diagnosis).

    All other variables are only co-variates.

    Any life-table method of survival estimation is preferable: Kaplan-Meier or actuarial.

    Please refer to

    for further guidance

    There is also a paper in IJROBP 2008 on breast cancer by Shanta et al. Please refer it.

  • Tulsi Bhandari added an answer:
    What is the scope of the emerging field of "cancer pharmacoepidemiology" in India?
    Future scope of pharmacoepidemiology in India.
    Tulsi Bhandari

    Thanks Tangnasa. It's my great pleasure.  Are you perusing it?

  • Puriya Gharavi-Kouchebagh added an answer:
    Does someone know how to use Oncomine?

    Hi, I want to see if my gene is more upregulated or downregulated in cancer, in which cancer type it is overexpressed and in which suppressed. Does someone know how to get this info from Oncomine databank?

    Puriya Gharavi-Kouchebagh

    Dear Dr. Sparago

    Visit this site:

    and this one:



  • Alexander Zlotnik added an answer:
    Are there any suggestions for creating a nomogram for the prediction of cancer patient survival?

    Is there anyone that can give me some suggestions? I normally use Stata for my analyses. Thank you.

    Alexander Zlotnik

    Dear Sandro,

    Stata-based nomogram generators for logistic regression models and survival models (developed with Cox regressions) can be downloaded on this webpage:

    This page also includes several methodological notes, applications and frequently asked questions about nomogram usage in clinical research.

    The basic usage of these nomogram generators is easy:

    1) execute the regression command (logistic  / logit / stcox)
    You should use the i.var or bx.var syntax for categorical variables.

    2) execute either nomolog (if you used logistic / logit )
    or nomocox (if you used stcox)

    The programs nomolog and nomocox include several options for refining the resulting graphs (such as changing variable ranges, label sizes and numbers). These graphs may also be modified with Stata's Graph Editor.

    You may execute db nomolog or db nomocox to display a dialog box (graphical user interface) for either of these commands with all execution options or, alternatively, you can use the command line.

    On a side note, these nomogram generators support models with two-variable interactions (categorical ## categorical, continuous ## categorical, continuous ## continuous).

    Best regards,

    Alexander Zlotnik

  • Jinsong Qiu added an answer:
    How can I find upregulated genes for cancer genes?

    The process to calculate upregulated genes if given a cancer gene.

    Jinsong Qiu

    Do you mean oncogene regulated genes? If so, you can check NCBI GEO to find out oncogene related RNAi or related expression dataset. TCGA contains even more RNAseq data

  • James Grellier added an answer:
    Do you have any thoughts for a good reference book for molecular cancer epidemiology?

    I've just looked through De Vita's Prime Molecular Biology Cancer, but not sure it works for me. 

    Another one is principle for cancer epidemiology, which is really old (around 1991). 

    I need a book describing both the key concepts in molecular cancer epidemiology and cutting edge lab and statistical techniques in this field.


    James Grellier

    Hi Wang,

    The link that Jignasa provided seems to have been truncated and doesn't work. The book description is on the IARC website at:

    You should be able to find a copy for around 90USD:

  • Jun- Jun Yeh added an answer:
    Isolated part-solid ground glass nodule incidentally detected in a nonsmoker man aged 56, your expert opinion on management?

    What should be the next evidence-based approach in clinical management?

    Jun- Jun Yeh

    yes! Rita!

  • Jan Hill-Jordan added an answer:
    Are there readily available datasets for clinicians or medical epidemiologists that are available for analysis?

    In the information age, there is more data out there than can be analyzed in a lifetime!  As a faculty member, I have been often asked about readily available and wanted to compile a list of readily available data.  I am sure, many of you have encountered the same question from your students.  Here is a compilation of medical data available for download.  I have personally worked with many of these and have found them very helpful.  These are mostly exclusive to the United States and would like input about international datasets which are freely accessible as well.  Navigating through some of these datasets may take some getting used to:


    National Health and Nutrition Examination Survey(NHANES)


    The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.



    The following link is a link that will bring you to many sets of data outlined below.  These are very helpful.  Some are self-reported data (NHANES), while others are performed by health care professionals (i.e NAMCS data).  There is some longitudinal data, and others have longitudinal data if you incorporate the mortality linkage files.  These are excellent for Cox-proportional models.


    For other datasets available from the national center for health statistics (NCHS) please see below:

    This page allows you to search the CDC and NCHS sites. NCHS is the Federal Government's principal vital and health statistics agency. NCHS data systems include data on vital events as well as information on health status, lifestyle and exposure to unhealthy influences, the onset and diagnosis of illness and disability, and the use of health care. Some of the NCHS data systems and surveys are ongoing annual systems while others are conducted periodically. NCHS has two major types of data systems: systems based on populations, containing data collected through personal interviews or examinations; and systems based on records, containing data collected from vital and medical records. Data include: National Health Interview Survey, National Immunization Survey, National Survey of Family Growth, National Health Care Survey , National Employer Health Insurance Survey, National Vital Statistics System, and Mortality Data. Research activities include: Aging, AIDS, Classification of Diseases, Data on America's Children, Evaluation of Certificates, Healthy People 2000, International Activities, Minority Health, National Death Index, Nutrition Monitoring, and Public Health Conference on Records and Statistics. The National Center for Health Statistics (NCHS) is a part of the Centers for Disease Control and Prevention, U.S. Department of Health and Human Services. NCHS is located in Hyattsville, Maryland, with offices in Research Triangle Park, North Carolina, and with a CDC-liaison office in Atlanta, Georgia.



    Behavioral Risk Factor Surveillance System


    The name explains what the scope of this database.  There is wonderful data physical activity, cardiovascular disease, chronic pulmonary diseases, and other self-reported data.  More than 500,000 interviews were conducted in 2011, making the BRFSS the largest telephone survey in the world. Also in 2011, new weighting methodology—raking, or iterative proportional fitting—replaced the post stratification weighting method that had been used with previous BRFSS data sets.



    Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute

    This database works to provide information on cancer statistics in an effort to reduce the burden of cancer among the U.S. population.  This is an excellent resource to study risk factors of cancer and longitudinal mortality studies.

    Jan Hill-Jordan

    Here's more information about ICPSR:

    Name:  Inter-university Consortium for Political and Social Research (ICPSR)

    Hosted by the University of Michigan. Includes data available from the following organizations:

     Data Sharing for Demographic Research, with NICHD funded studies

    National Archive of Criminal Justice Data

    Health and Medical Care Archive (HMCA), data archive of the Robert Wood Johnson Foundation

    National Archive of Data on Arts & Culture (NADAC)

    Measures of Effective Teaching Longitudinal Database

    Child Care & Early Education Research Connections

    National Addiction & Data Archive Program (NAHDAP)

    Resource Center for Minority Data (RCMD)

    National Archive of Computerized Data on Aging (NACDA)

    Substance Abuse and Mental Health Services Administration (SAHMSA) Data Archive

    Using “Find and Analyze Data,” you can search for data in a variety of categories with datasets from ICPSR including:

    ·       Census Enumerations: Historical and Contemporary Population Characteristics

    ·       Community and Urban Studies

    ·       Conflict, Aggression, Violence, Wars

    ·       Economic Behavior and Attitudes

    ·       Education

    ·       Elites and Leadership

    ·       Geography and Environment

    ·       Government Structures, Policies, and Capabilities

    ·       Health Care and Facilities

    ·       Instructional Packages

    ·       International Systems: Linkages, Relationships, and Events

    ·       Legal Systems

    ·       Legislative and Deliberative Bodies

    ·       Mass Political Behavior and Attitudes

    ·       Organizational Behavior

    ·       Social Indicators

    ·       Social Institutions and Behavior

    ·       Repklication Datasets

    ·       External Data Resources


  • Iñigo Romon added an answer:
    Have you treated a child diagnosed with ALL and whose mother had been diagnosed with APS or other autoimmune disease during/before pregnancy?

    Causes of childhood leukemias remain largely unknown. We found a 2-years old girl diagnosed with ALL whose mother had been diagnosed with antiphospholipid syndrome (APS) when she was about 26 (4 years before pregnancy), but she had also been diagnosed with deep venous thrombosis 10 years before. We think that maternal APS could be a part of the causal network for developing childhood ALL but other exposures were also found. Have you seen/diagnosed/treated a child with ALL and whose mother had been diagnosed with APS or any other autoinmune disease? What do you think about to do a systematic review of this relationship? (It is possible that our specific case become published as an epidemiological case in a scientific journal).  

    Iñigo Romon

    Hello Miguel,

    I think the causative link would be a very distant one. You should characterize better your hypothesis, particularly because we don't know precisely how APS syndroms appear. In your case, is it part of  systemic disease or a single entity in the mother? . I could venture that if you found lymphoid micro chimeric populations in the child arising from the mother, that could be a first step, but many other questions would have to be answered. What could be the causative link between APS and ALL in a child? How could that work when the child is two years old? How can you work out environmental influences that could cause both diseases in two different people (say exposure to virus)? It would be nice to review the topic you mention, but very hard work to link both.

  • Roman Mezencev added an answer:
    I have a question regarding SEER database. Can anyone help me how to compare two survival graphs derived from SEER database?

    SEER database has epidemiological cancer data from US. It has its own software that gets incidence rates from the database and also gives the survival graph from the database. 

    After getting survival graphs for two different study cohorts I am not sure how to statistically compare them and get a p-value for the comparison.

    can anybody please help and give me some suggestions.

    Roman Mezencev

    Babu: In SEER Survival Session you need to check the box "Case Listing" in the "Parameters" tab. Than you need to define all your analysis variables as required by your project (site, age, sex etc...) and hit the "Execute" button and you will get the table with single line for every individual. The most important columns from the table will be "Survival Months" and "Vital Status Recode". You can use highlight all-copy/paste commands to get the data into Excel, save it in whichever format is needed and process by a statistical SW of your choice for tests that I mentioned above (after coding for dead/censored individuals vs time between diagnosis and follow-up).

  • Mona Ellaithi added an answer:
    Why is oesophageal adenocarcinoma increasing?

    Over the past three decades, adenocarcinoma of the oesophagus has increased in incidence more than any other tumour. The cancer is thought to be the result of gastroesophageal reflux damaging and inflaming the distal oesophagus and causing its squamous mucosa to undergo columnar metaplasia . This Barrett’s mucosa has an increased risk of progressing to dysplasia and adenocarcinoma.

    What factors are responsible for the rapid rise?

    Mona Ellaithi

    The rising of adenocarcinoma of oesphygous is more common in westren world. It has been connected in many studies with type of diet in wetren world. As Olivier mentioned to you maybe with the rising of obesity and changing in diet we will see adenocarcinoma raise in countries were we used to see SCC more common. However I doubt that will happen with the increase in socioeconomic status. Moreover and based on my observation oesphygeal cancer is more connected with occupation.

  • Mona Ellaithi added an answer:
    Can someone help me with the incidence or prevalence of Burkitts lymphoma in African countries or Nigeria?

    Need some prevalence data on Burkitts Lymphoma

    Mona Ellaithi

    Go to GLOBOCAN data published by IARC.

  • Rajshree Dayanand Katke added an answer:
    What is the incidence of huge ovarian tumours in Adolesent girls?

    Rdeently I operated on a young girl of 16 yrs,with huge mass in the Abdomen.on CT scan revealed of Large ovarian tumour extending all over abdomen probably of neoplastic etiology.all tumour markers were in normal range.huge rt.ovarian tumour of 4.5 kg taken out.frozen showed cystadenoma of ovary provisionally histopath often one can say such huge ovarian tumours in Adolesent girls ?

    Rajshree Dayanand Katke

    sir Asharaf tumours histopathology report came as Benign ovarian tumour.patient went home attending school.

  • Julie Cwikel added an answer:
    How can public concern regarding a risk-related problem or issue be measured over time, without conducting a survey?
    Methodology question: I am trying to figure out how to measure the degree to which a risk-related problem has become a public issue. The metric needs to be easily analyzed, accessible, quantitative, and stable over at least five years (from 2015 to 2020) and have high face validity. The measurement should be robust – it does not have to be particularly refined or capable of resolving small differences.

    The application for this metric is that I am developing a project that has to do with factors (such as Peter Sandeman’s “outrage”) that determine which risk-related (environmental, health, sustainability) issues become public issues over time. I need a way to measure outcomes and compare them against predictions.

    Social media is the most obvious approach. One metric would be Google hits, which is convenient, free, and cumulative, and which almost certainly will be around in five years in roughly the present form without too much bias from algorithm changes introduced over the period. On the other hand, I am concerned about Twitter because I’m not sure it will be as stable a platform over the time period and I’m not sure how much people tweet about issues as opposed to people and events. Newspaper inches in a journal of record (such as the New York Times), which used to be an old standby, might be completely obsolete by 2020.

    I would be grateful for practical ideas.
    Julie Cwikel

    We once contemplated a study of the salience of public health issues (in our case stress related) that were loaded up and viewed on u-tube. This can be monitored if you have a good tech person. Another metric to think about Tee.

  • Paul A Macmullan added an answer:
    Is there any known anticorrelation between cancer and autoimmunity?
    As a possibly pathway to understanding the role the immune system plays in attempt to control cancer growth, an interesting test would be to perform an epidemiological study of those with autoimmunity diseases and find the incidence of cancer in that sub-population.

    Are cancer rates lower in those with some form of autoimmunity? Have studies been carried out to test this?
    Paul A Macmullan
    First of all, apologies for late reply. Was cleaning up my email when I came across this.
    The short answer is Yes. In SLE , while there is a higher risk of certain cancers such as lymphoma, a disease which predominately effects women, the risk of both ovarian and breast cancer is dramatically reduced.
    Why? well thats a different story.........
  • Gilbert L'Italien added an answer:
    What is the best approach to adjust for smoking as a potential confounder?
    I want to adjust for smoking in an analysis and want to go beyond the simple current, former and never categorization of smokers, which seems to be a bit too crude for my purposes. Of course, the choice of how to parameterize smoking status depends on the outcome of interest (here, my outcome is composite cardiovascular disease). I am thinking that I need to account for dose/duration and perhaps recency of quitting. I am wondering what advice people have and if there are any key references that specifically address this issue.
    Gilbert L'Italien
    Agree that smoking duration is more important than pack years. In my view the deleterious effects of smoking on CVD pertain to artherothrombotic effects and are cumulative. But those effects diminish over time with stopping. So classify smokers as current (smoking during the period of follow up for CVD events) or past (quit smoking before the period of follow up for CVD events) or recent (quit smoking during the period of follow up for CVD events). Add another variable that gives the duration of smoking (eg 5 years). If you have a large enough sample size, you could also add a variable for age at onset of smoking.
    As an example, I smoked for 6 years, I began smoking at age 18, I am now 63.
    So class me as past smoker , 6yrs duration, age of smoking onset 18. These variables would exert very little effect on my current risk for CVD, which is what you wish (ie dose response). A current smoker, who smoked for 20 years, who started at 18 and is currently 38 should exhibit a much higher risk for CVD than a non smoker of similar age
  • Gurdeep Singh Mannu added an answer:
    Is the increase in DCIS incidence over the last few decades solely due to the introduction of breast screening?
    To what extent have lifestyle, diet, patient, treatment and tumour factors influenced this change (if at all)?
    Gurdeep Singh Mannu
    Thank you Constantine, that is a most informative contribution. Thank you Narender, however I have found ultrasound is rarely used in some centres for screening purposes in the screening age group (>40) (with mammography preferred) but I agree other factors such as urbanization and lifestyle that you mention may be important.
  • R. Bruce Paton added an answer:
    What is the scientific grounding for sustainability? Is there such a thing as "sustainability science"?
    Writing a book and one chapter is on how managing sustainability seems to be turning into a profession, with full-time managers doing this in business. What do professionals in sustainability need to know? Is there a distinct science underlying the field? If so, what does it involve? Are environmental sciences and studies programs providing adequate grounding in this science?

    Please concentrate on the questions, not the definition of sustainability. There are lots of different definitions of sustainability (my book will offer another one), but for the purpose of this discussion, please assume that "sustainability" means doing business and managing enterprises in a way that works toward the goal that there is minimum impact on the environment, good prospects for the future, no degradation that would compromise the future, and that protects health and a decent life.
    R. Bruce Paton
    for an insightful framing of the scientific questions, look at The Natural Step's "system conditions" for sustainability. The Natural Step seems to be somewhat out of fashion among Americans working in this area, but the framework is immensely valuable conceptually.
  • Mojca Bozic-Mijovski added an answer:
    Determing cut off value to consider as risk while calculating odds ratio
    How to determine cut off value of independent variables (e.g. pack year of smoking, parity, amount of alcohol etc.) to consider as risk while analyzing OR in case control study.
    Mojca Bozic-Mijovski
    I absolutely agree that continuous variates should be used as often as possible. However, for researches as myself (I'm a biochemist with no access to statisticians), calculating an odds ratio is much simpler.
    I just surfed the net trying to find how to plot log odds (for example log odds of developing cardiovascular disease against number of cigarettes smoked daily) and found absolutely nothing useful.
  • Kota Katanoda added an answer:
    How can I calculate risk-factor-specific incidence rates from overall population incidence rate and prevalence/relative risk of that risk factor?
    I am looking for a standard way to calculate risk-factor-specific incidence rate from overall population incidence rate. The available data are overall population incidence rate of a disease, prevalence of a risk factor, and relative risk of that disease according to the risk factor. What I want to do is to estimate the incidence rate of people with or without that risk factor. My image is to allocate the overall incidence rate into two groups (risk factor holders and non-holders), keeping consistency with the overall incidence rate. Are there any good references?
    Kota Katanoda
    Thanks Freddy! That's exactly what I was looking for.
  • Suman Saurabh added an answer:
    How to calculate the the confidence intervals for a Mantel Haenszel stratified Odds ratio?
    I am synthesizing some data from a few observational studies for a meta analysis later, I calculated a weighted odds ratio using the M.H method, however I am not sure how to calculate the confidence intervals. I am not using any statistical software at the moment, just doing calculations and putting in the data into an excel file. Can I just calculate the standard error by adding all exposed and unexposed groups in the cases and controls, and then using the standard formula for standard error calculation and so on for C.I ? Or is there any specific method for calculating C.I of the M.H odds ratios.
    Suman Saurabh
    A specific method for calculating confidence interval of Mantel-Haenszel Odds Ratio was first described in Clayton D. & Hills M. (1993) Statistical Methods in Epidemiology. Oxford University Press, Oxford. It is also reproduced in Page 183 of Essential Medical Statistics by Betty Kirkwood and Jonathan E. Sterne. The following steps should be followed to calculate the Mantel-Haenszel CI :

    i) Here 95% CI ranges from OR/EF to OR*EF, where OR is Mantel-Haenszel Odds ratio and EF is the exposure factor;

    ii) EF = exp (1.96* SE) , where exp is exponential and SE is the Standard error of the Mantel-Haenszel Odds Ratio;

    iii) SE = Sqrt [V/(Q*R)];

    iv) Now just think about a, b, c, d as the values the four cells of a 2 X 2 table of each stratum (Remember that using this approach we usually describe the simple Odds Ratio as (a*d)/ (b*c) ). Thus ai, bi, ci ,di represent the a, b, c, d values of the i-th stratum. Similarly ni is the sum of ai, bi, ci and di in the i-th stratum;

    v) Using the above notational style,
    Q = Sigma (i.e. summation of) [(ai*bi) / ni]
    R = Sigma [(ci*di) / ni]
    V = Sigma [(ai +bi)* (ci+di)*(ai+ci)*(bi+di)]/ [ni*ni*(ni-1)].

    You can easily generate the 95% Ci in Excel itself provided you carefully translate the above to Excel formula. If you are still not sure, you can directly consult the books mentioned in the beginning.
  • Rute Barbosa Marques added an answer:
    Intermittent Androgen Deprivation (IAD) therapy in prostate cancer patients: Do you know the degree of implementation in your country or area?
    At least since 2008 IAD has been considered an option that may be offered to men with metastatic prostate cancer but I have no idea about actual application of this strategy. Any feedback or publications would be welcomed.
    Rute Barbosa Marques
    You may find this article interesting.
    • [Show abstract] [Hide abstract]
      ABSTRACT: To compare intermittent treatment (IT) versus continuous treatment (CT) using cyproterone acetate (CPA) in bone metastatic prostate cancer patients, we conducted an open-label, multicenter randomized trial. Continuous androgen deprivation therapy is the standard treatment in metastatic prostate cancer. Intermittent treatment might maintain efficacy while toxicity and costs are reduced. Patients received CPA 100 mg tid in the prephase. Patients with a PSA decline of ≥90 % or PSA <4 ng/ml were randomized. If patients were progressive, LHRH analogues were added. Primary end point was time to PSA progression. A total of 366 patients were recruited; 258 reached a good response after 3 or 6 months and were randomized. A total of 131 patients randomized to IT and 127 to CT. Patients on IT had an average of 1.7 episodes on CPA, before LHRH analogues were started. The mean time without treatment in IT was 463 days versus 422 days on treatment. There were statistical significant differences between IT and CT in 3 of the 5 functional scales of EORTC QLQ C 30; however, the clinical relevance of this finding appears modest. Symptom and potency scales showed significant advantages for IT. There were no differences in time to PSA progression on CPA, time to PSA and/or clinical progression on LHRH analogues and time to cancer-specific and overall survival. IT by CPA is associated with less symptoms and modest advantages in QOL domains. There were no differences in time to PSA progression, clinical progression or survival.
      World Journal of Urology 11/2013; 32(5). DOI:10.1007/s00345-013-1206-0
  • Tom Nadarzynski asked a question:
    Would targeted (risk-based) HPV vaccination for men who have sex with men be considered in the absence of gender-neutral HPV vaccination?
    Could that be considered in countries with high coverage in females due to the low anticipated benefit and insufficient cost-effectiveness? There are MSM-targeted hepatitis vaccination programmes already in place in some countries such as the UK.

About Cancer Epidemiology

A forum for discussions about the epidemiology of cancer.

Topic followers (2,657) See all