Questions related to Cancer Epidemiology
The BEIR VII Phase 2 Report presents the lifetime attributable cancer risk for incidence and mortality:
BEIR VII Report: https://nap.nationalacademies.org/download/11340 (You can choose Download as guest), Tables 12D-1, 12D-2 and 12D-3, pages 311-312
The values for mortality from lung cancer are slightly greater than the values for incidence. How is this possible? Thanks for answers.
A few days ago a colleague of mine made me think about the impact the COVID19 crisis will have on cohort studies. Especially those focused on causes of mortality and the elderly will be deeply impacted by the number of deaths due to the pandemic.
Is this something manageable?
How big is this matter in your mind?
How can this be handled?
At fiften minutes past eight in the morning, on August 6, 1945, Japanese time, an atomic bomb was detonated above Hiroshima. Most of the city was destroyed, and by the end of that year 90,000–166,000 inhabitants had died as a result of the blast and its short-term effects. Epidemiological studies have documented increased disease burdens for malignant conditions among survivors including those exposed in utero, as well as risks for some noncancer diseases The psychosocial effects and consequences are less well studied, but remain substantial to this day. August 6, 2020 will be an opportunity for global remembrance of this human catastrophe.
DISCUSSION TOPIC: What as human beings and as scientists have we learned?
The event of death by metastasis or recurrence is very common, many researchers have linked the tendency of some tumors to re-appear to the presence of occult or dormant cancer cells that may have a phenotype that allows them to remain "hidden" from the immune response. However, the understanding of these cells and the mechanisms that they use to achieve evasion remain mostly unknown. It is critical to understand these cells better and to be able to detect their presence for they are of great therapeutic importance.
In the information age, there is more data out there than can be analyzed in a lifetime! As a faculty member, I have been often asked about readily available and wanted to compile a list of readily available data. I am sure, many of you have encountered the same question from your students. Here is a compilation of medical data available for download. I have personally worked with many of these and have found them very helpful. These are mostly exclusive to the United States and would like input about international datasets which are freely accessible as well. Navigating through some of these datasets may take some getting used to:
National Health and Nutrition Examination Survey(NHANES)
The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.
The following link is a link that will bring you to many sets of data outlined below. These are very helpful. Some are self-reported data (NHANES), while others are performed by health care professionals (i.e NAMCS data). There is some longitudinal data, and others have longitudinal data if you incorporate the mortality linkage files. These are excellent for Cox-proportional models.
For other datasets available from the national center for health statistics (NCHS) please see below:
This page allows you to search the CDC and NCHS sites. NCHS is the Federal Government's principal vital and health statistics agency. NCHS data systems include data on vital events as well as information on health status, lifestyle and exposure to unhealthy influences, the onset and diagnosis of illness and disability, and the use of health care. Some of the NCHS data systems and surveys are ongoing annual systems while others are conducted periodically. NCHS has two major types of data systems: systems based on populations, containing data collected through personal interviews or examinations; and systems based on records, containing data collected from vital and medical records. Data include: National Health Interview Survey, National Immunization Survey, National Survey of Family Growth, National Health Care Survey , National Employer Health Insurance Survey, National Vital Statistics System, and Mortality Data. Research activities include: Aging, AIDS, Classification of Diseases, Data on America's Children, Evaluation of Certificates, Healthy People 2000, International Activities, Minority Health, National Death Index, Nutrition Monitoring, and Public Health Conference on Records and Statistics. The National Center for Health Statistics (NCHS) is a part of the Centers for Disease Control and Prevention, U.S. Department of Health and Human Services. NCHS is located in Hyattsville, Maryland, with offices in Research Triangle Park, North Carolina, and with a CDC-liaison office in Atlanta, Georgia.
Behavioral Risk Factor Surveillance System
The name explains what the scope of this database. There is wonderful data physical activity, cardiovascular disease, chronic pulmonary diseases, and other self-reported data. More than 500,000 interviews were conducted in 2011, making the BRFSS the largest telephone survey in the world. Also in 2011, new weighting methodology—raking, or iterative proportional fitting—replaced the post stratification weighting method that had been used with previous BRFSS data sets.
Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute
This database works to provide information on cancer statistics in an effort to reduce the burden of cancer among the U.S. population. This is an excellent resource to study risk factors of cancer and longitudinal mortality studies.
We are currently designing a nested case-control study based on retrospective registry healthcare data available over a 4-year period. We have identified all cases, and are now looking into the selection of controls.
We want to use incidence density or risk set sampling, so selected controls should still be eligible as controls for future cases, and can be selected twice or more.
How should we select controls exactly when selecting from a retrospective registry dataset? We believe that if we select a control for a case occurring at time x, we should add all registry data we have available for this control up to time x, and delete all registry data available for this control after x. However, this control should still be available as a control for a case occurring at time x+y. If we select this control again for another case at time x+y, should we just add all available registry between x and y data to that control's data in the dataset? Or should we end up with two separate rows in our dataset by duplicating the control, and add data up to time x for the first duplicate, and up to time y for the second duplicate?
We believe the former makes more sense but do not find guidance on this in the literature.
Any advice in this would be greatly appreciated!
Over the past three decades, adenocarcinoma of the oesophagus has increased in incidence more than any other tumour. The cancer is thought to be the result of gastroesophageal reflux damaging and inflaming the distal oesophagus and causing its squamous mucosa to undergo columnar metaplasia . This Barrett’s mucosa has an increased risk of progressing to dysplasia and adenocarcinoma.
What factors are responsible for the rapid rise?
With some evidence emerging of NCD onset at earlier ages in LMICs compared to HICs, and some evidence about possible shifts in SES gradients in both NCDs and NCD risk factor profiles - are these coalescing into transitions in the traditional SES gradients/patterns for these conditions and risks in both lower and higher income countries?
See for example, Global Health Watch 4: An Alternative World Health Report. or Remais, et al., IJE 2012. https://academic.oup.com/ije/article-lookup/doi/10.1093/ije/dys135
When I think about genetic diseases like cystic fibrosis, sickle cell anemia, diabetes, and various forms of cancer I don't imagine that those can be easily treated by surgery because the issue is that the wrong protein(s) are written into a patient's genetic code. I imagine that if one were to look at the protein expression map of a diseased individual and a normal individual there should be statistical differences most of which are insignificant, but one upregulated or downregulated protein in a large pathway has a critical mutation.
So what would be the most efficient means to figure out the culprit protein and the gene that codes for it? I consider myself more proficient in bioinformatics than experimental wet-lab biochemistry. But I'm probably overthinking it as scientists usually collect a crude sample, filter it as much as they can, and send it off for a sequencing lab to analyze, and then collect the results and analyze them, right?
Once the malformed protein and its pathway is discovered, perhaps from biostatistics of control and diseased groups, how does one fix the gene for which bad insertions, deletions, or frameshifts happened in? I know from my reading research papers that viruses often leave fragments of their dna in their host, even if their host's immune system manages to suppress them or wipe them out, so probably the least harmful virus available could function as a vector to fix the point mutation. But how do you know that it will do what you calculate?
Perhaps CRISPR is the best way to go about it. I imagine that correcting the genetic diseases might still be a difficult task given that biochemists often work with cell lysates in vitro, but the cure in vivo isn't supposed to lyse the cells.
Can drugs be used to treat genetic diseases permanently after a few doses? Many drugs are inhibitors of some protein pathway but they are gradually eliminated from the body, but is there any evidence that some drugs can change protein pathways permanently after x number of administration doses? Perhaps they can if drugs change the behavior of immune system cells, and those cells then change other cells' pathways, then there might be a way to treat genetic diseases.
I can sort of imagine solutions on computers. But those solutions have to work in the complicated matrix which is the chambers of the human body.
How do you think we scientists can treat genetic diseases?
I need help in merging two SAS data sets:
(1) data set 1 (matched): cases & controls matched 1:2 on age groups, chemo, and radiation
(2) data set 2 (main): main data set containing all patients and their characteristics.
Data set 1 (matched) looks like this:
caseid controlid agegrp chemo rad num
0001 00052 45+ 1 1 1
0001 00082 45+ 1 1 2
0002 00045 25-30 1 0 1
0002 00036 25-30 1 0 2
Data set 2 (main) looks like this:
id stage er pr status
0001 1 0 1 0
0002 2 1 0 1
How do I merge data set 1 with the data set 2 to incorporate other characteristics of patients?
Thank you for your help!
I am analyzing the incidence of cancer among Iraqi Kurds in Northern Iraq/ abroad and looking for resources or expertise in this area. Any advice would be greatly appreciated.
When reading articles using the Charlson Comorbidity Score (Charlson ME et al J Chron Dis. 1987;40:373–383) to describe the comorbidity of a cohort of patients with cancer, for instance head neck cancer, I have the impression that some authors include the primary tumor into the calculation (Charlson's category "any tumor" with 2 points) and others not (only when the patients have at the same time or within last 5 years another type of cancer). The same problem I see with Charlson's category "solid metastatic tumor = 6 points). Some authors seem to include the primary tumor if metastatic (M+) into the calculation, others not.
What is correct?
I have three time periods starting from year 1970. All of them have follow up till 2015. As you can imagine the first time period has longer survival curve than the other two survival curves. When I run the log rank test, I get a significant value. But I feel that the value is affected by the fact that the first time period has longer follow up and survival. I hope to find a significantly increasing survival trend between the three time periods. Which test should I use for the same in SPSS?
I have a group of postoperative oncologic patients at 5 years follow-up which I divided into four subgroups according to the alive status (dead or alive) and recurrence (disease-free, recurrent). What are the subgroups when calculating the disease-free survival at 5 years of follow-up? Any theoretical literature background on how the calculation is performed is appreciated. Thanks.
I've just looked through De Vita's Prime Molecular Biology Cancer, but not sure it works for me.
Another one is principle for cancer epidemiology, which is really old (around 1991).
I need a book describing both the key concepts in molecular cancer epidemiology and cutting edge lab and statistical techniques in this field.
SEER database has epidemiological cancer data from US. It has its own software that gets incidence rates from the database and also gives the survival graph from the database.
After getting survival graphs for two different study cohorts I am not sure how to statistically compare them and get a p-value for the comparison.
can anybody please help and give me some suggestions.
Rdeently I operated on a young girl of 16 yrs,with huge mass in the Abdomen.on CT scan revealed of Large ovarian tumour extending all over abdomen probably of neoplastic etiology.all tumour markers were in normal range.huge rt.ovarian tumour of 4.5 kg taken out.frozen showed cystadenoma of ovary provisionally benign.final histopath awaited.how often one can say such huge ovarian tumours in Adolesent girls ?
Methodology question: I am trying to figure out how to measure the degree to which a risk-related problem has become a public issue. The metric needs to be easily analyzed, accessible, quantitative, and stable over at least five years (from 2015 to 2020) and have high face validity. The measurement should be robust – it does not have to be particularly refined or capable of resolving small differences.
The application for this metric is that I am developing a project that has to do with factors (such as Peter Sandeman’s “outrage”) that determine which risk-related (environmental, health, sustainability) issues become public issues over time. I need a way to measure outcomes and compare them against predictions.
Social media is the most obvious approach. One metric would be Google hits, which is convenient, free, and cumulative, and which almost certainly will be around in five years in roughly the present form without too much bias from algorithm changes introduced over the period. On the other hand, I am concerned about Twitter because I’m not sure it will be as stable a platform over the time period and I’m not sure how much people tweet about issues as opposed to people and events. Newspaper inches in a journal of record (such as the New York Times), which used to be an old standby, might be completely obsolete by 2020.
I would be grateful for practical ideas.
I want to adjust for smoking in an analysis and want to go beyond the simple current, former and never categorization of smokers, which seems to be a bit too crude for my purposes. Of course, the choice of how to parameterize smoking status depends on the outcome of interest (here, my outcome is composite cardiovascular disease). I am thinking that I need to account for dose/duration and perhaps recency of quitting. I am wondering what advice people have and if there are any key references that specifically address this issue.
Writing a book and one chapter is on how managing sustainability seems to be turning into a profession, with full-time managers doing this in business. What do professionals in sustainability need to know? Is there a distinct science underlying the field? If so, what does it involve? Are environmental sciences and studies programs providing adequate grounding in this science?
Please concentrate on the questions, not the definition of sustainability. There are lots of different definitions of sustainability (my book will offer another one), but for the purpose of this discussion, please assume that "sustainability" means doing business and managing enterprises in a way that works toward the goal that there is minimum impact on the environment, good prospects for the future, no degradation that would compromise the future, and that protects health and a decent life.
To what extent have lifestyle, diet, patient, treatment and tumour factors influenced this change (if at all)?
I am looking for a standard way to calculate risk-factor-specific incidence rate from overall population incidence rate. The available data are overall population incidence rate of a disease, prevalence of a risk factor, and relative risk of that disease according to the risk factor. What I want to do is to estimate the incidence rate of people with or without that risk factor. My image is to allocate the overall incidence rate into two groups (risk factor holders and non-holders), keeping consistency with the overall incidence rate. Are there any good references?
I am synthesizing some data from a few observational studies for a meta analysis later, I calculated a weighted odds ratio using the M.H method, however I am not sure how to calculate the confidence intervals. I am not using any statistical software at the moment, just doing calculations and putting in the data into an excel file. Can I just calculate the standard error by adding all exposed and unexposed groups in the cases and controls, and then using the standard formula for standard error calculation and so on for C.I ? Or is there any specific method for calculating C.I of the M.H odds ratios.
At least since 2008 IAD has been considered an option that may be offered to men with metastatic prostate cancer but I have no idea about actual application of this strategy. Any feedback or publications would be welcomed.
Higher limit of C.I - lower limit of C.I / 3.92 is the formula I used, my question is that this gives me a standard error of the regular odds ratio as opposed as the Log odds ratio, so should I just calculate the natural log of the standard errors I get through the above formula or should I first convert the limits of the confidence intervals into their natural log and calculate the S.E then?
I am trying to look at the association between a polymorphism and risk of cancer incidence by a meta-analysis. However for some studies we do not have the complete genotype information for the same. We noted allele distribution from the publications and tried to contact authors to obtain the complete genotype data. Some of them, however, haven't responded. So we have done a primary analysis including only studies that have the complete genotype data. Now I wish to do a sensitivity analysis by including the omitted studies and perform the comparisons with the available data. Is this approach correct? or unnecessary?
How to determine cut off value of independent variables (e.g. pack year of smoking, parity, amount of alcohol etc.) to consider as risk while analyzing OR in case control study.
As a possibly pathway to understanding the role the immune system plays in attempt to control cancer growth, an interesting test would be to perform an epidemiological study of those with autoimmunity diseases and find the incidence of cancer in that sub-population.
Are cancer rates lower in those with some form of autoimmunity? Have studies been carried out to test this?
What do you think abut the USPHS task force recommendations on mammography and prostate caner screening?
We are planning to evaluate the quality of our cancer registry data for its completeness by using 3 sources of data, is there any one who would like to help us in capture-recapture analysis?