Science topic

Epidemiological Statistics - Science topic

Explore the latest questions and answers in Epidemiological Statistics, and find Epidemiological Statistics experts.
Questions related to Epidemiological Statistics
  • asked a question related to Epidemiological Statistics
Question
41 answers
Is it necessary to test the questionnaire before starting the study? and how to do this?
Relevant answer
Todos los cuestionarios llevan un pilotaje para su validación
  • asked a question related to Epidemiological Statistics
Question
7 answers
newcastle ottawa scale is good for cohort and case control observational studies, but I am doing meta-analysis with both randomized an non randomized clinical trials. some of the non randomized trials have single arms (without comparison) and I dont think Newcastle Ottawa scale can be used here. 
Relevant answer
Answer
i think the best option in this case is the Cochrane Collaboration tool for assessing risk of bias for RCTs and Risk Of Bias In Non-randomized Studies – of Interventions (ROBINS-I) for non-randomized studies
  • asked a question related to Epidemiological Statistics
Question
5 answers
How to calculate R_0, R_t, beta (the average number of contacts per person per time) and gamma? I try to defined SIR model, my data have infected, death and recovered. My country show the next information(I don’t know to calculate R_t) and I have the before said variable. Thank you.
Relevant answer
Answer
”Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China”
  • asked a question related to Epidemiological Statistics
Question
4 answers
I am a bit stuck with the following problem. I am currently performing a meta analysis on observational studies using CMA where my outcome variable is the performance on a cognitive test as a function of adherence to an exposure variable (on a scale of 0 to 9). Normally,  the results are presented either as means differences between tertiles according to the level of adherence (low, middle or high) or as a regression coefficient per additional unit of the exposure variable.
My main question is, how can I pool together both types of studies into the same meta analysis? I have found a similar question on risk estimates that suggest to estimate the linear trend of the categorical results. I don't have access to raw data and I only know sample sizes, mean differences and confidence interval. Is it possible to do the same in this case? If so, how should I do it?
I was thinking of just including each tertile comparison as a subgroup of the same study, and leave the continuous variables as they are. But I am not sure if this lousy approach is acceptable. 
Thanks.
Relevant answer
Answer
Maybe you can refer to Cochrane hand book. The chapter 9.4.6 described the problem.
9.4.6  Combining dichotomous and continuous outcomes
Occasionally authors encounter a situation where data for the same outcome are presented in some studies as dichotomous data and in other studies as continuous data. For example, scores on depression scales can be reported as means or as the percentage of patients who were depressed at some point after an intervention (i.e. with a score above a specified cut-point). This type of information is often easier to understand and more helpful when it is dichotomized. However, deciding on a cut-point may be arbitrary and information is lost when continuous data are transformed to dichotomous data.
There are several options for handling combinations of dichotomous and continuous data. Generally, it is useful to summarize results from all the relevant, valid studies in a similar way, but this is not always possible. It may be possible to collect missing data from investigators so that this can be done. If not, it may be useful to summarize the data in three ways: by entering the means and standard deviations as continuous outcomes, by entering the counts as dichotomous outcomes and by entering all of the data in text form as ‘Other data’ outcomes.
There are statistical approaches available which will re-express odds ratios as standardized mean differences (and vice versa), allowing dichotomous and continuous data to be pooled together. Based on an assumption that the underlying continuous measurements in each intervention group follow a logistic distribution (which is a symmetrical distribution similar in shape to the normal distribution but with more data in the distributional tails), and that the variability of the outcomes is the same in both treated and control participants, the odds ratios can be re-expressed as a standardized mean difference according to the following simple formula (Chinn 2000):
.
The standard error of the log odds ratio can be converted to the standard error of a standardized mean difference by multiplying by the same constant (√3/ π = 0.5513). Alternatively standardized mean differences can be re-expressed as log odds ratios by multiplying by π/√3 = 1.814. Once standardized mean differences (or log odds ratios) and their standard errors have been computed for all studies in the meta-analysis, they can be combined using the generic inverse-variance method in RevMan. Standard errors can be computed for all studies by entering the data in RevMan as dichotomous and continuous outcome type data, as appropriate, and converting the confidence intervals for the resulting log odds ratios and standardized mean differences into standard errors (see Chapter 7, Section 7.7.7.2).cutually, cochrane hab
  • asked a question related to Epidemiological Statistics
Question
20 answers
Studies have shown that as many as 50% of submissions are declined directly by editors after being submitted. If the paper receives a “yay” instead of a “nay,” the journal sends it to reviewers. How do journals select competent reviewers?
Common sense says that more experience and a higher rank translate to better reviewing skills. However, a PLOS Medicine study in 2007 showed no such relationship. The authors examined 2,856 reviews by 308 reviewers for Annals of Emergency Medicine, a revered journal that for over 15 years has rated the quality of each review using a numerical scoring system. The results showed that experience, academic rank, and formal training in epidemiology or statistics did not significantly predict subsequent performance of higher-quality reviews. It also suggested that, in general, younger reviewers submitted stronger reviews.
So what? When presented the opportunity, any physician can and would produce a scrupulous review of a manuscript — right? Wrong.
Flashback to 1998, when Annals of Emergency Medicine cleverly put together a fictitious manuscript riddled with errors and distributed it to 203 reviewers for evaluation. The errors were divided into major and minor categories. The major errors included such blunders as faulty or plainly unscientific methods, as well as blatantly erroneous data analyses. Minor errors consisted of failure to observe or report negative effects on study participants, incorrect statistical analysis, and fabricated references — just to mention a few. According to the authors, the majority of peer reviewers failed to identify two-thirds of the major errors in the manuscript. Forty-one percent of reviewers indicated that the manuscript should be accepted for publication.
What about consistency? In 1982, two authors took twelve papers that had been published by prestigious psychology journals within the previous three years and resubmitted them to the respective journals. The names of the authors for the resubmitted papers, and the names of their affiliations, were all changed to fictitious ones. Three manuscripts were recognized as being duplicates. Of the nine remaining papers, eight were rejected for “methodological inconsistencies,” not for lack of originality. One paper was accepted again.
Last week, I received an email from a well-respected medical journal. The editor wanted my help reviewing a manuscript that was being considered for publication. Noticing the request was addressed to “Dr. Spencer,” I shot back a quick reply saying there’d been a mistake. I’m not a doctor; I’m a medical student.
Hours later, I got this response:
Thank you for your email. We would very much appreciate your review of this manuscript. Your degree information has been corrected.
The peer review process clearly has flaws. It’s no wonder so many publications are retracted every year, or that each issue of a journal includes several corrections of previously published articles. Without universal standards, manuscript reviews remain subjective, imperfect, and inconsistent at best. The time has come to re-examine the quality of this system.
Meanwhile, those who rely on medical journals for practice guidelines should be educated on the flawed nature of this system and should be encouraged to judge publications on their merit rather than the apparent prestige of a journal.
Now, if you’ll please excuse me, I have a manuscript to review.
Robert Spencer is a medical student.
Relevant answer
Answer
More worrisome is when reviewers delay and reject research they intend to plagiarize, or steal and publish under their name:
  • asked a question related to Epidemiological Statistics
Question
3 answers
What is the justification for 1:4 ratio in Case Control study?
What are epidemiological and statistical reasons for considering 1: 4 ratio in a case control study? How can we choose four controls per one case?
Relevant answer
Answer
Thank you so much Surendra Karki for your quick response & very valuable information.
  • asked a question related to Epidemiological Statistics
Question
5 answers
Dear All,
I need to estimate sensitivity, specificity, PPV and NPV for clustered data using GEE and programming in SAS. I will use PROC GENMOD with dist=binomial link=log. However it is not clear to me how the model should be specified. E.G. How should I tell sas to calculate for se the probability of true positives on true positives + false negatives? 
Is anybody there who can help me?
Thanks in advance
Federica
Relevant answer
Answer
Did you find a way to do so? I also trying to calculate the sensitivity/specificity on a dataset with repeated measurements and planed on using proc genmod as well. However can find that anyone else have fitted a model like that.
Best Kathrine
  • asked a question related to Epidemiological Statistics
Question
11 answers
  1. Hello ..I am working on Heart Disease Prediction using Data Mining Techniques.So for that I need Dataset for more than 1000 patient records,so plz anyone can send me the link.Thankyou.
Relevant answer
Answer
Dear Shohana,
You may find data set from UCI Machine Learning Repository
  • asked a question related to Epidemiological Statistics
Question
8 answers
I'm conducting a meta-analysis on hypoglycemic risk associated with diabetic drugs. Some studies report only the incidence rate of hypoglicemic events and the number of patients. Data are in the following format: 3.2 hypo/patient/year with 100 patients on drug A versus 2.1 hypo/patient/year with 100 patients on drug  B; the event can occur more than once in each patient. Is there any way of estimating standard error when nor confidence intervals or standard deviations are given? Thanks
Relevant answer
Answer
I'm not clear on what information you have, but if you have an incidence rate (i.e., cases per person-time) then you can assume a Poisson distribution.  Suppose you have 150 cases over 20,000 person-years.  Your incidence rate (IR) would be cases/(total pt):  150/20,000 or 75 cases per 10,000 person-years.  The variance for the IR is cases/((total pt)^2):  150/(20000^2) or 3.75 x e-7.  Your standard error the square root of this:  sqrt( 3.75 x e-7) or 0.00061237.   Thus your 95% CI for the IR (0.0075)  would range from 0.0063 to 0.0087.  So if you only have the incidence rate and total person-time, you can work your way back to an SE.
If you know the number of incident cases, then the SE for a Poisson random variable is:  sqrt(cases). (For me, this is the easier approach).   For the above problem we have 150 cases (ignore the person-time for the moment).  The 95% CI for the number cases is:  150 - 1.96*(sqrt(150)) or 126 as the lower bound, and 150 + 1.96*(sqrt(150)) or 174 for the upper bound.  From here, you can the 95% CI for the incidence rate by dividing the lower and upper bounds for the cases by the person-time.  126/20000 or 0.0063, and 174/20000 or 0.0087.   These agrees well with the previous IR estimates.
Paul
  • asked a question related to Epidemiological Statistics
Question
3 answers
Annals of Internal Medicine published today a very interesting paper introducing the "E-value" as a way of assessing robustness of Relative Risk, Odds Ratios, Hazard Ratios etc which may change how we interpret and present these statistics.  I'd like to hear the opinion of statisticians?
If anyone want to play with the E-value I've attached a calculator
Relevant answer
Answer
The article is not open for reading so it's hard to comment.
Usually "e value" is "expected value"  so there seems to be risk of semantic confusion  
And... No one accused me of being a statistician. But all these kinds of tools are dangerous in the hands of the unwary.  There are ALWAYS built in assumptions such as the underlying distributions are Gaussian or some other silliness.
  • asked a question related to Epidemiological Statistics
Question
6 answers
Does  it even make sense to study other outcomes in a matched nested case-control study? In these type of studies, do we always have to use  case /control as our outcome?
Relevant answer
Answer
no you cannot do that. In case of case-control study your outcome will be binary that is diseased and nondiseased. Plus the variable you are matching on cannot be studies.
  • asked a question related to Epidemiological Statistics
Question
4 answers
If I do a meta analysis of incidence rate of observational studies, can I also include an incidence rate of  a control group of RCT? Could the control group of RCT be treated as a comparable cohort? ( I am not interested in effect size.)
Relevant answer
Answer
Hi Mari
There is an article that discuss this issue in details and may help you. The article name is 'Should Meta-Analyses of Interventions Include Observational Studies in Addition to Randomized Controlled Trials? A Critical Examination of Underlying Principles'.
Good luck!
  • asked a question related to Epidemiological Statistics
Question
4 answers
When prevalence is not known and difficult to get mean and standard deviation in that cases how to calculate sample size. Does it matter for a descriptive, Analytical and empirical studies.
Relevant answer
Answer
You can always guess the standard deviation by asking someone who knows the area to tell you what would be an unexpectedly high and an unexpectedly low value. Divide this range by four. 
I got interested in this and used to ask, say, obstetricians what would be a high and a low fetal heart rate. They produced more or less a range of four SD. 
  • asked a question related to Epidemiological Statistics
Question
2 answers
I am planning to perform a meta-analysis of randomized studies. However, I would like to exclude the possibility of Type I and II error due to repeated hypothesis testing. Does anyone have experience with TSA? Is it compatible with statistical software other than RevMan
Relevant answer
Answer
The metacumbounds package in Stata by Miladinovic is indeed a good and easy choice 
Another choice would be to use the trial sequential analysis software  by the Copenhagen Trial Unit
  • asked a question related to Epidemiological Statistics
Question
8 answers
I am planning to perform meta-analysis of side effects for a medication. The outcome is the number of people who experienced the side effect of interest out of the total exposed. The duration of follow up (weeks to years) and sample size of the studies (14-30000) vary. The incidence is in the range of 0.3 %. Do I need to transform the data and which form of transformation will be the best. Secondly which measure will be the most appropriate for meta-analysis.
Relevant answer
Answer
Hi Muhammad
The Stata procedure metaprop will undertake a fixed or random effects meta analysis for you. It will generate the pooled incidence rate, 95% CI, a forest plot, and test of heterogeneity.
If you wish to do the calculations yourself, then you could try the Freeman-Tukey Double Arcsine Transformation (Freeman, M. F. , and Tukey, J. W. 1950).
  • asked a question related to Epidemiological Statistics
Question
1 answer
I was trying to do meta analysis o effect of ART on congenital anomalies. I was trying  to do subgroup analysis. I am getting data for subgroups  (ART, PI, NRTI, NNRTI) from one study. Can I take and put them together in Rev man. When i didthe author may appear 2 times in the pooled analysis. Did any one has information how to inter this type of data in Revman or Stata.
Relevant answer
Answer
There are several epidemiologists specifically involved in doing these kinds of analyses in ART from Columbia and Harvard  to whom I could refer you if you are interested.  I can ask them the question for you, if you prefer and then if they are interested to advise, may put you directly in touch with them. To me, the key issue in what you are asking is to ensure that you do not misrepresent the meta-analysis as coming from two studies when in fact they are two papers/studies coming from one common set of data/main study. You should be looking at independent studies with equivalent data that would allow for such comparisons.  I cannot speak to entering these types of data (data is a plural word by the way) in Revman or Stata.
  • asked a question related to Epidemiological Statistics
Question
12 answers
at the moment i am critically appraising a meta-analysis and the researchers used a random effects meta-analysis using the inverse variance method obtaining odds ratio.
Relevant answer
Answer
Hi Rosa,
To teach this to students, I demonstrate how the OR overestimates the RR by calculating the inflation of the OR relative to a prospective RR using a series of 2 x 2 tables with hypothetical data.  For example, given a study with n = 1000 (500 E and 500 not E and a constant RR = 1.5), when the overall outcome incidence (OI) = 1% the OR inflation = 0.4%; when OI = 2%, OR inf. = 0.8%; OI = 5%, OR inf. = 2.1%; OI = 10%, OR inf. = 4.5%; OI = 20%, OR inf. = 10.5%.   If our operational definition of 'too much error' is >=10%, using these numbers we would start to worry at around 20% outcome incidence.  But while this approach helps demonstrate the mechanism underlying the inflated OR and gets students to practice calculating these estimates, it glosses over all the complexities to do with the choice of design, sampling methods and specific biases raised by Greenland in his classic papers (Am J Epidemiol. 1982 Sep;116(3):547-53;  Am J Epidemiol. 1986 Dec;124(6):869-83) and by other authors.  I think for all these reasons there is no one answer to 'how rare is rare?'.
  • asked a question related to Epidemiological Statistics
Question
3 answers
My PICOT is as follows: for nurses at a jail setting, how does implementing a policy of EBP guidelines on TDM influence serum drug lab testing?
i plan on comparing the percentage of completed serum drug lab testing pre-intervention to the percentage of completed serum drug lab testing post-intervention 
Relevant answer
Answer
Hi,
You have pre-intervention tests and after intervention tests (if they are the same individuals then this is a matched pair ed sample). If the sample size is low <30 or if the distribution is not normal then  Wilcoxon signed rank test for paired sample should done. If the sample was large and the distribution was normal, then T test for matched paired sample should be done. 
I understood that you are not randomizing 2 groups to intervention and control, right? Because that will be different then the suggested tests above.
  • asked a question related to Epidemiological Statistics
Question
8 answers
1. Is it correct to mix all study types (randomized-controlled trials, cohort, and case-control studies) into a single meta-analysis and get overall RR, and then stratify the analysis by study type (randomized-controlled trials vs. cohort/case-control studies)? If yes, whether the overall RR is reliable for the final report by taking into account both the heterogeneity and publication bias.
Would you agree with me that that the advantages of including both RCTs and observational studies in a final meta-analysis could outweigh the disadvantages in many.situations?
2. In our study, to explore the source of heterogeneity, we performed subgroup analyses according to the types of study design. We also identified the studies which had both the largest variance (wide intervals) and the extreme outlier weight in each clinical outcome group. We then conducted a leave-one-out sensitivity analysis to assess the impact of individual studies, and thus the average RR was estimated in the absence of each study and heterogeneity was quantified using both the I2 and τ2 statistics. What do you think about the "leave-one-out sensitivity analysis"? Do you agree with the following statement?
"In meta-analyses, the rationale of deleting studies should be clearly stated and medically or methodologically sound, not only based on variance."
3. Is the following statement correct?
"if heterogeneity is observed, the common estimate is not of much meaning and should not be cited."
Relevant answer
Answer
1. No. you can not combine differents types of studies in a single meta-analysis, because of the risk of bias of the studies (Cochrane's handbook). In my case, I pool the results from RCT and from observational studies and compared the estimates. Then, I discuss about the trend of effects. It is expected that the estimates from observational studies will be more overestimated than that from RCT, epecially because the selection bias.
2. I do not usually use leave-one-out sensitivity analysis. For purposes of sub-group analysis you have to defined the variables that you think that could impact in the results, in advance, in the protocol of your systematic review. By definition, sub-group analysis are exploratory analysis and you have to have a clinical ou metodological reason for do this
3. I do not totally agree with your statement. A high heterogeneity means that the estimates of the effect is not the same across the population. If you identify a variable that is responsible for this (by subgroup analysis or meta-regression), you will have a new hypothesis, and a primary study have to be conducting for corroborate with your meta-analysis findings.
if you could not find any clinical, demographic or metodogical variable able to exploring your heterogeneity, maybe it is better you do not pool the results.
  • asked a question related to Epidemiological Statistics
Question
5 answers
I am completing a Systematic Review of interventions for my MA thesis. The main focus of my thesis was going to be a meta-analysis. I have 5 articles that meet my inclusion criteria. Only one study contains data necessary for a meta-analysis. Two of these studies use multi-level modelling, and do not provide group Standard Deviations(SD). My concern with using Standard Estimates (SE) rather SD, or even converting the SE to SD myself is confounding variables. Is there a was to include this in a meta-analysis without original data or SD's? Because I don't know specifically what the original researchers would have statistically controlled for different variables across studies, would it make sense to include them or would it be better to use a narrative synthesis method to summarize the data rather than comparing overall effect sizes?  I know a meta-analysis is only as good as the data included and I have to make sure that I am  contributing to my field in a meaningful way rather that inflating (or circulating inaccurate information). Any information and/or articles you could give me on pooling effects across studies or using SE in meta-analysis. 
Relevant answer
Answer
Hi Priscilla,
In terms of counfounding it makes no difference whether you use standard deviation (SD) or standard error (SE). I am not sure what you mean by standard estimate?
You can easily convert standard deviation to standard error: SD = SE x squar root of sample size
  • asked a question related to Epidemiological Statistics
Question
16 answers
Recently a collaborator came with a project which aims to validate at his setting some prediction models for digestive bleeding. They are the rockall score, glsagow blatchford score, AIMS65 score, Charlson morbidity index, Child-Pugh and MELD classification. His question was how to estimate a reasonable sample size. I took a look at the original papers and their validation sets ranged from 197 subjects with 41% outcome to 32504 subjects with 2% outcome. An absence of a refreshing happiness occurred when I saw that in the largest set of validation there were significant coefficients as low as 0.3 with SE of 0.18. Also, I did Iook around for some guidance and found the following comments in Steyerberg's book where there is a sample size for validation studies part: "modest sample sizes are required for model validation"; "to have reasonable power, we need at least 100 events and at least 100 non-events in external validation studies, but preferably more (>250 events). With lower numbers the uncertainty in performance measures is large." But in the text there are several simulations results showing that it depends a lot of the coefficients and SE, which could lead, even with these amounts of outcomes to power as low as 50%. Taking these rule of thumb, and expecting a 4% outcome in a validation cohort, it would be necessary to include 2500 to 6250 subjects. Pretty scary and with very wide range, which does not help much in the planning time. I found a logistic regression sample size formula, but it did not help much as it allows only two predictors at a time and the permutation predictors coeff and Se in the formula, the N ranged from a few hundred to dozens of thousands. http://www.dartmouth.edu/~eugened/power-samplesize.php
I would like very much to be comfortable in recommending a sample size of 2500 taking the Styerberg's rule of thumb as the basis of it. I would like to hear from those who have some experience in this issue.
Relevant answer
  • asked a question related to Epidemiological Statistics
Question
2 answers
I would like to calculate the Fleiss kappa for a number of nominal fields that were audited from patient's charts.  I have a situation where charts were audited by 2 or 3 raters.  Is anyone aware of a way to calculate the Fleiss kappa when the number of raters differs? I've been trialing a macro in SAS by Chen called MKAPPA that accounts for "missing" observations, but I don't think this really addresses my needs.  Any suggestions are greatly appreciated!
Relevant answer
Answer
Dear Michelle,
I recommend, that you take a look at the following papers. The problem is a combinatorical one, which has been approached by several researchers.
I do not understand why you think that Chen's SAS macro MKAPPA does not adress your problem. To me it seems to be exactly what you are looking for.
BTW: I also found this SAS macro, but I have to admit that I never used it before. The author also gives a lot of good literature references on this topic.
Good luck!
Best regards, Ann
  • asked a question related to Epidemiological Statistics
Question
5 answers
I had faced difficulties in calculating sample size for a cluster randomized controlled trail with three groups (one control and two different intervention groups)?Do you have some advice or recommendation?Is there assumption what rho (ICC), coefficient of variation, cluster number and cluster size should be? Which statistical software is appropriate for calculating sample size for a cluster randomized controlled trail?
Relevant answer
Answer
For general orientation on this - see
This software will allow you to work out the 'cost' of different designs and will  allow you to see the gains (which can be very important) from taking into account a pre-intervention measure; it is easy to use
Optimal Design for Longitudinal and Multilevel Research Documentation for the “Optimal Design” Software
For a more bespoke approach this software enables a 'sandpit' approach so you can mimic your actual problem as closely as possible
the manual is a very good primer on the underlying concepts.
And as a UK academic you can get a free copy of MLwiN to undertake the analysis http://www.bristol.ac.uk/cmm/software/mlwin/download/
  • asked a question related to Epidemiological Statistics
Question
2 answers
Recently, we completed a TST survey among diabetic population in some Malaysian primary care clinics. We used 2 tuberculin unit (2 TU of RT23) during the TST survey. I would like to know how the use different tuberculin unit (2 TU, 5 TU, 10 TU) may affect the TST results and how the results should be interpreted. Are there any evidence in literature to substantiate 2 TU reduces false positives in a high burden country with wide BCG coverage?
Relevant answer
Answer
In Spain it is used only the dose of 2UT and it does not seem to exist any basis to use other in the literature. In those cases where this dose has been used and previously vaccinated with BCG, there is a high probability of a false positive. Therefore, it is useful and it is what we do to determine IGRA.
I hope that this reply serves you.
Best regards.
  • asked a question related to Epidemiological Statistics
Question
4 answers
I am combining multiple studies on mortality in CKD per different categories of 25(OH)D. One study have reported Hazard Ration per 1 unit standard deviation increase in 25(OH)D  and some reported Hazard Ratios (HR or relative risk). what is the difference between HR/SD and HR? Is there any possibility to convert HR/SD to HR? Assuming both types of study measure mortality, can both be combined together? Thank you.
Relevant answer
Answer
Dear Mehmet,
Thank you for your response. I did it but the log of HR/SD became negative.
In the paper, this is the HRs per SD= 0.67(0.43-1.05)
and SD of 25(OH)D is 13.
Would you please direct me why my calculation I mean logarithm of HRs/SD resulted in negative number.
Thanks
  • asked a question related to Epidemiological Statistics
Question
33 answers
In Epidemiological study, we often find the term of predictor and risk factor, what is the difference between them and when do we use them? Is it also related to statistical analysis that we use?
Relevant answer
Answer
Agree with all; to add in simple terms ; risk is preconceived especially in case control or even in cohort studies and what we do is reaccertain the association using RR; predictors terminology becomes more relevant in a descriptive or longitudinal study wherein several factors are assessed, confounding is adjusted and maybe population specific predictors identified. However i agree that somewhere the causality assessment makes these terms to be used interchangeably and perhalps depending on the study design and objectives.
  • asked a question related to Epidemiological Statistics
Question
9 answers
Dear RG members. 
Im planing to conduct a Systematic review about the choice of a surgical approach for the treatment of a surgical condition. 
My problem is that the majority of the studies are observational and they report results like this: "Patients with the condition were admitted to the ER. Intervention X was done in 15 of the 45 patients while intervention Y was done in 30 of the 45 patients"... Well these could be taken a comparison groups, but I dont want to compare this in a meta-analysis since the data is from observational research. 
Instead of conduct a meta analysis of comparisons. I want to do a proportion meta-analysis by taking the proportion of outcome A from the intervention X and the proportion of outcome A from the intervention Y and pooling these proportions between studies instead of compare them. Do you believe this is possible? 
Form example if one of my outcomes si "Re-operation" i would say something like: Re operation for bleeding was done in approximately 56% (CI95% XX-XX) of the patients with the X intervention. On the other hand, re operation for bleeding was done in approximately 30% of patients with the Y intervention. (NO COMPARING)
Do you thing is better to perform a classic meta analysis using a random effects model by comparing outcomes in groups and between studies? or do you believe my approach is a good one. 
Thanks. 
Ramiro
Relevant answer
Answer
I can guess you are using Review Manager software because it lacks this option.
I advise you to try these softwares:
  1. Open Meta[analyst]: http://www.cebm.brown.edu/openmeta/
  2. StatsDirect
  3. MetaXL add on MS Excel
You can do such analysis on Open Meta[analyst] software.
Select "One Group" > "Proportions". It is very easy!
Good Luck
  • asked a question related to Epidemiological Statistics
Question
9 answers
We will be conducting a project to asses the risk factors of developing type 2 diabetes in a young population through stats analysis of the data that we generate from volunteers in a cohort study of around 95 volunteers, it will involve data such as blood analysis, BMI, diet, age etc….
What statistical test using SPSS would be useful to include and why?
Relevant answer
Answer
I also agree with Adrian's suggestions.  But given that your sample size is only 95, I fear that you are going to have too few 'events' to support the number of explanatory variables you are considering--e.g., if 10% of your subjects develop diabetes, you'll only have 9-10 events.  To avoid over-fitting, you should have 15-20 events per variable.  See Mike Babyak's nice article (link below) for more details.  HTH.
  • asked a question related to Epidemiological Statistics
Question
12 answers
I am a PhD student trying to see if there is a statistical significant differences in my educational intervention to minimize self-medication practice with antibiotics among students  ...I have 100 pharmacy students and 160 dental students ..how can i calculate the required sample size to detect a significant difference ?
thank you 
Relevant answer
Answer
Hi Khalid
Assuming that you cannot follow each student up, so that in fact you have two cross-sectional studies, your sample size of 260 would allow you to show that an effect size of 0.25 could be shown to be statistically significant with 80% power and an alpha of 0.05. In other words, a difference between pre and post of 0.25 standard deviations would be statistically significant.
  • asked a question related to Epidemiological Statistics
Question
4 answers
PLEASE guide me how can I use or even convert Mean(%) to Mean(mm)
Relevant answer
Answer
You cannot. Here are some options:
- Simply describe the results without doing a formal meta-analysis.
- If the studies are randomised, you can calculate the effect size, and adjust for dispersion (standard deviation).
- Contact the authors for the information you require.
Sorry!
Norman.
  • asked a question related to Epidemiological Statistics
Question
6 answers
I wish to demonstrate the ordinal property of a set of items, which are supposed to be a Guttman scale. I was not able to find any useful suggestion in the internet about how to compute that analysis in Stata 13 (and neither "by paper&pencil").
Could anyone help me?
Thanks for any suggestion.
Relevant answer
Answer
What you are probably referring to by “scalogram analysis” is iirc known also as Guttman or optimal scaling (I think Kristian is also right in calling MCA a synonemous). The general purpose of such optimal scaling is to construct a one-dimensional continuum for the concept of interest underlying your set of items. I am not aware of procedures for specific scalogram analysis implemented in Stata, but there are commands following a very similar goal. For instance, the MSP module (commands trace, msp, loevh; sj 11-1 st0216) performs Mokken Scale Analysis, which aims to evaluate response patterns to scales designed to be cumulative. The resulting scale satisfies the assumptions of item response analysis, that are also also inherent in Guttman scaling (i.e., unidimensionality, monotonicity). You end up with a set of items that is valid in the sense of an ordinal measure, and you get different indices on the degree of scalability. An advantage is that you can also analyze polytomous, not just dichotomous items.
Maybe this helps you to go on.
  • asked a question related to Epidemiological Statistics
Question
2 answers
We assess regulatory cell markers in CD4+ T cells in patients by flow cytometry and would like to compare patients with "mild disease" versus "severe disease". My question is which statistical test I should apply and why. We compare frequencies of marker positive cells among the CD4+ T cell population. Patient numbers are rare and in one analysis, we only have 13 patients in the "mild disease" group and 3 in the "severe disease" group (for other markers we have 20 versus 10 or 30 versus 15). The results range for mild disease from 7- 34% and for the severe disease group from 32-39%. We used the student t test throughout the results but during the revision process, one reviewer pointed out that he/she would use a non-parametric test for small n (without further specifying this statement) .
My question is: which test would you use? Do you think there is a cut-off for "n", under which you would prefer non-parametric tests ( n< 10, for example)? I already consulted a biostatistican at our institution and he said that he was not convinced the data was not normally distributed but suggested a cut-off < 10 to use non-parametric tests. As this biostatician was not at all familiar with immunological data I would like to hear other opinions. What would you do?
Thank you for your help!
Relevant answer
Answer
That's a reviewer comment I encountered several times. I bet 100 bucks that this reviewer is not a statistician, because his statement does not make sense.
Your biostatistician is probably right to be concerned about the properties of the data. Cell frequencies may not have a symmetric distribution. Therefore it might be questionable what a test of the expected difference should reveal. It might make more sense to test the ratio instead. This, in turn, is done by simply testing the expected difference of the logarithms (i.e. a t-test using the log cell frequencies). This would make some sense. Just using some other ("nonparametric") test will only change the hypothesis you are testing. Then actually the question remains what hypothesis you test and why. This is not as clear as most people think. Although it might be technically ok to perform, say, a Mann-Whitney test, and ist will provide a technically valid p-value, it is not at all clear to what hypothesis this p-value refers and if this is a hypothesis you actually aim to test.
Unfortunately, you are in a bad position, because it will ne next to impossible for you to (successfully) explain the reviewer that and why he's wrong. And even if - he would likely not be willing to accept that. So this will either end in a longer battle and a possibly rejected publication or in a more or less unquestioned fullfillment of the reviewer's wishes...
  • asked a question related to Epidemiological Statistics
Question
4 answers
I have censored survival data. The Gaussian distribution appears to be an excellent fit. How can I compute an R-square statistic? The model has no covariates; it only estimated mu and sigma.
Relevant answer
Answer
The function OXS of the 'survAUC' package might do the trick. It is referring to the coefficient proposed by O'Quigley and collegues (2005). Other measures are included as well.
  • asked a question related to Epidemiological Statistics
Question
1 answer
I am attempting to track down reliable data concerning BMI/BSA (anthropometrics) of the German population. So far, my search has not revealed any optimal data. I would be interested in Data from the last 10 years ideally. 
Thank you
Relevant answer
Answer
The KIGGS study of RKI should be a good source for children and adolescents. The RKI data collection on the adult population will also help. RKI offers free access to survey data. 
good luck!
  • asked a question related to Epidemiological Statistics
Question
5 answers
cystic fibrosis is increasing in our area and need to establish registry
  • asked a question related to Epidemiological Statistics
Question
3 answers
Hi,
Does anybody know if there is a command in STATA that allows you to explore the relative differences in the distribution of continuous variables between groups?
I have found reference on the internet to a command called 'reldist' that was written back in 2008 but I don't think it is available via ssc download. Does anybody know if this has been superseded by anything else?
Many thanks
Elaine
Relevant answer
Answer
Have a look at Roger Newsome's package -somersd-.
  • asked a question related to Epidemiological Statistics
Question
4 answers
Today, there are many screening tests for a variety of diseases. Most of these tests are valid and reliable and for them calculated sensitivity, specificity, positive predictive value, negative predictive value and other diagnostic accuracy criteria. For now, I want to know How can compare the diagnostic accuracy of two or more screening test?
For example, we have three tests A, B, C and profile of each test is as follows (Profile of each test has been extracted from a separate article.):
Test A:
Sensitivity 87%, specificity 79%, positive predictive value 75%, and negative predictive value 64%
Test B:
Sensitivity 84%, specificity 81%, positive predictive value 70%, and negative predictive value 71%
Test c:
Sensitivity 90%, specificity 85%, positive predictive value 82%, and negative predictive value of 73%
Now I would like to compare the sensitivity, specificity, and other indicators statistically these tests together and determine whether are statistically significant differences between these tests or not?
Please suggest me related books and articles.
Please guide me. Thanks a lot
Relevant answer
Answer
Since those are proportions you can compare them statistically using comparison tests for two proportions.
AUCs are also an important indicator about diagnostic accuracy.
  • asked a question related to Epidemiological Statistics
Question
4 answers
I am working with the Human plasma and PBMC. I measure Immunoglobulins in plasma and Interleukins in stimulated PBMC. I am in confusion about using the central tendency in analysis of data because I have seen some articles using mean and some using Median. So what will be optimum to use?
Relevant answer
Answer
Whether you use an expression of central tendency depends upon the question you are trying to answer. The shape of the distribution may give a totally different answer. Parts of the distribution may have different meanings, i.e., those values greater than or less than some reference value.
  • asked a question related to Epidemiological Statistics
Question
89 answers
I am currently writing a systematic review and the majority if not all my studies are descriptive. I looked for quality assessment tools and found out that the
QAT is widely used: http://www.nccmt.ca/registry/view/eng/14.html but it is somehow applicable to intervention rather than descriptive studies.
I also came across circum which seems appropriate but I haven't seen any review that used circum before http://circum.com/index.cgi?en:appr
Do you think I should be using QAT? what other tools would you suggest?
Thank you
Mohamad
Relevant answer
Answer
Below are tools that I have previously used, are easy to use, and some of them are recommended by Cochrane:
For cross-sectional/survey studies: the NIH Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies: http://www.nhlbi.nih.gov/health-pro/guidelines/in-develop/cardiovascular-risk-reduction/tools/cohort
For intervention studies: the EPHPP tool http://www.ephpp.ca/tools.html (first two files)
For risk of bias assessment for interventions: the EPOC criteria https://www.biomedcentral.com/content/supplementary/2046-4053-3-103-S2.pdf
  • asked a question related to Epidemiological Statistics
Question
11 answers
I have reaserch about Distribution of disease samples using Spatial Analysis and statistic analysis. At the end of work ,I have predict hotspot cluster in the future with use of detected hotspot cluster in statistic analysis.
Relevant answer
Answer
Thank you , but my problem is Identify trajectory of hotspot cluster during the times and The main problem is predict for next step (predict x y and value).
with statistic analysis we can tacking cluster by Retrospective but i want find a analysis by Prospective (predict for future).
  • asked a question related to Epidemiological Statistics
Question
3 answers
Dear Colleagues,
Our research group is preparing a meta-analysis of neonatal imitation.  We are hereby requesting any unpublished data, regardless of the size of the sample or the pattern of results.  
To accomplish the meta-analysis, we don't necessarily need full datasets, but we do require a description of the methodology, the resulting means and standard deviations, summary statistics, effect sizes, and/or test statistics.
If you have an unpublished neonatal imitation study that you are willing to contribute, can you please send it to Jonathan Redshaw: j.redshaw@uq.edu.au.  
Please also include a citation for the unpublished work so that we can acknowledge the source.
 
With sincere thanks,
Siobhan
Relevant answer
Dear Athanasios,
Our preliminary analyses indicate this area of research may have been influenced by publication bias. We wish to analyse and compare the results of both published and unpublished studies to establish if there is any differences in the data.
Regards,
Siobhan
  • asked a question related to Epidemiological Statistics
Question
14 answers
What is the best clinical interpretation for the odds ratio of 0.5. Does it make sense to tell 200% risk (or odds) reduction for exposed people? This OR extracted from a hospital based case-control study.
Relevant answer
Answer
I agree what was written till now, anyway, please, pay attention to the confidence interval before concluding that there is a  protective effect. If the confidence interval contains 1, there is not statistical significant difference between exposed and not exposed!
  • asked a question related to Epidemiological Statistics
Question
4 answers
I have repeated measurement data and single time measure outcome variable. I want to see the effect of each time measurement on outcome variable, as well as I want to see which time measure has more influence on the outcome variable after adjustment of some covariates. It would be a great help if anyone tells me the appropriate statistical method for this analysis. 
Relevant answer
Answer
@Andrew, I think she is asking about a different meaning of covariance, in the context of parameters in statistical studies in biological and social sciences.  The use of the same word, covariance, in relativistic physics is an unfortunate ambiguity, and in fact one I hold in disdain.  It is very unclear in relativity and should be replaced by another word.
@Farzana, sorry, I don't actually have an answer for your question, but thought I'd try to get this thread pointed in the right direction.  : )
  • asked a question related to Epidemiological Statistics
Question
1 answer
I have 240 human plasma samples divided in 8 treatment groups (30 in each)  that I want to measure endothelial activation and other markers by bioplex, but I  don´t have enough money to run all the samples. What is the best approach: 1. pool samples from the different treatment groups? or 2. randomly select representative samples from each group? 
Other question, could I do only one replicate of each sample?
Relevant answer
Answer
Hi Stefanie,
I would randomly select a subset of your patients and I think you should do duplicates, if not triplicates. You can ask Carmen Fernandez. She has a lot of experience in Bioplex and I'm pretty sure she will be happy to help.
Maria
  • asked a question related to Epidemiological Statistics
Question
62 answers
An Odds Ratio of 3.25 with Confidence Interval, 0.975 to 10.901, P = 0.055. Is it significant or not? Please see the figure in attachment  
Relevant answer
Answer
The null value for an odds ratio is 1.0, so a 95% CI of 0.975 to 10.901 includes the null value and therefore indicates that the OR is not statistically significantly different from 1.0, so not significant. The p-value of 0.055 also indicates that the OR is not statistically significantly different from 1.0. So the answer is that your OR of 3.2 is not significant. As pointed out earlier your sample size may be too small to have enough power to detect a statistically significant result if one exists. 
  • asked a question related to Epidemiological Statistics
Question
2 answers
Earwax composition among Caucasians, among Ethnic Indians of Canada and South America, among Asians and among Indigenous Africans.
Relevant answer
Answer
Hello Bamgboye, I am expert in this field but have read sometimes about this; sharing this with you
An international team of researchers studied the genes of people from 33 populations across the world and found ethnicity effects. Yoshiura et al (2006) indicates that human earwax consists of wet and dry types. Dry earwax is frequent in East Asians, whereas wet earwax is common in other populations.In their research they identified the specific genes that are responsible for earwax and investigated why there were different types.  Yoshiura and colleagues are very specific as to which genes and variations are responsible, they suggest that a specific gene alters the shape of a channel that controls the flow of molecules directly affecting earwax type.
They also found in East Asians, a mutation of the gene that prevents the molecule that makes earwax wet, from entering the mix.   Yoshiura et al (2006) study suggesting that dry earwax is seen in up to 95% of East Asians, but in no more than 3% of people of European and African origin.   Populations in Southern Asia, the Pacific Islands, Central Asia, Asia Minor, and Native North Americans and those of Asian ancestry, fall in the middle with dry wax incidence ranging from 30 to 50 percent.
  • asked a question related to Epidemiological Statistics
Question
3 answers
Actually, we are doing a Chinese version questionnaire validation study. We would like to perform a Pearson correlation test to examine (1) its concurrent reliability between telephone interview and clinician interview, and (2) test-retest reliability by tel interview at time 1 and 2.
The sample is 200, including 80 respondents classified as “case”, 65 as “subthreshold cases”, and 55 as “non-case.” (in the SPSS, we would code the variable, “diagnosis”, as 0 = non-case; 1 = subthreshold case; 2 = case). However, in the real world situation, the prevalence of case and subthreshold cases are 10% and 20% respectively. Therefore, the proportion of case and subthreshold in the sample are oversampled.
The study that we replicated described that “The completed clinician interviews were weighted to adjust for the oversampling of positives (i.e. case and subthreshold case)” and then described in the Statistic method that “The Taylor series linearization method was used to adjust estimates of statistical significance for the effects of weighting.”
I just wonder how to make the weighting and Taylor series linearization in SPSS.
Relevant answer
Answer
 You can also correct for over-/under-sampling by calculating fractional weights for your cases. Here's a brief overview http://www-01.ibm.com/support/docview.wss?uid=swg21477404
  • asked a question related to Epidemiological Statistics
Question
6 answers
In a systematic review, if outcome information in some articles are not uniform (ex; some in mean, some in medians & some in proportions) then how to summarize it in forest plot?
Relevant answer
Answer
No problem mixing apples and oranges(?mangos)  you will get a .....fruit salad.. and providing thats  the only  option for you and you recognise and document it as such its better than piles of apples and piles of oranges.(mangos).
  • asked a question related to Epidemiological Statistics
Question
6 answers
MOOSE: Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group (2000)
PRISMA: Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement (2009)
Relevant answer
Answer
Except using that guidelines reporting a Meta-analysis, you also need some questionnaires to evaluate each articles of your meta-analysis. There are multiple questionnaires  could be used, and you should choose an appropriated one for you study. I am sorry that I cannot give you more information about that because I do not know what kinds of articles in your study.
I used o the 9-star Newcastle-Ottawa Scale of my study. You could see the article by the link:   
Hope this helps!
  • asked a question related to Epidemiological Statistics
Question
2 answers
I have data about monthly number of patients of a disease X admitted to a tertiary care hospitals in Pakistan. The data is zero inflated and has seasonality (one cycle per year) but no trend. I want to evaluate association between numbers of cases and climate variables. May you please guide me which analysis I should use?
Relevant answer
Answer
Greeting Tariq,
I would try a zero-inflated poisson regression model with time-lagged predictors, a negative binomial model, or even ARIMAX/SARIMAX.  You can see what the predictions look like for each model and if they can help you to estimate the months with zero events.  Another option, which I have used is to use a poisson model with time-lagged predictors (climate variables from +/- months) and the variable month included as a random effect (this is continuous from time of study or from 1-12 each year; please try both).  You first use a bivariate correlogram to see the time lag with the most significant seasonal correlation, then use that time lag in a mixed-effects Poisson model with the fixed effect of the climate variable and the random effect of study month.  This is done to help control overdispersion in the data and to help with estimation of the zero counts for many months.
I had a similar issue with cases of Legionellosis and rainfall, the negative binomial and the mixed effect poisson gave similar fit, but the poisson was much easier to interpret. At the end of the day, you must ask yourself, is this truly a poisson process?  For this you must consider the outcome and the denominator.
Goodluck!
Alex
  • asked a question related to Epidemiological Statistics
Question
13 answers
Dears researchers,I want to analyse association between disease(absence or presence :dependent variables ) and SNP (independents )and others parameters  by using logistic regression binary (spss),please how can i do adjustment for age and sex. Thanks
Relevant answer
Answer
Your questions reveal you are new to statistical modeling including logistic regression. That’s all right. Every one has to start somewhere.
To make it simple, I will start explaining how you interpret the results with only SNP and sex as covariates.
You have told us that you have coded women as “1” and men as “0”, and you have probably coded the presence of SNP as “1” and the absence of SNP as “0”.
In you logistic regression you want to predict the odds ratio of disease (coded “1”) dependent on the presence of SNP, without and with control for sex:
Without control for sex your model is simple to include the SNP variable as the only explanatory variable. If the presence of SNP predict the disease you will find an odds ratio higher than 1, with an associated p-value less that 0.05 (or a 95% confidence interval not including 1).
Now you want to control for sex: You do that simply by adding the sex-variable to your model so that both SNP and sex are explanatory variables. BUT before you do that it would be a good idea first to see if you get what you expect when you include sex as the only explanatory variable. If women have higher risk of disease than men, then again you will find an odds ratio higher than 1, with an associated p-value less that 0.05 (or a 95% confidence interval not including 1).
Now you are ready to see what happens when you include both SNP and sex as explanatory variables. Let’s imagine you find an odds ratio for SNP = 7 and an odds ratio for women = 2.
Then the interpretation is this:
  • The odds of disease for men without SNP is set to “1”. This is because men without SNP with the coding you have chosen automatically becomes the reference group (sex=0 and SNP=0).
  • The odds for disease for women without SNP compared to men without SNP is “2” (that is, the odds ratio found for sex).
  • The odds ratio for men with SNP compared with men without SNP is 7 (that is the odds ratio fond for SNP).
  • The odds ratio for women with SNP compared with women without SNP is also 7 (that is the odds ratio fond for SNP).
  • Finely the odds of disease for women with SNP is 14 (that is the odds ratio for sex multiplied by the odds ratio for SNP (2x7)).
The above interpretation is based on the assumption that there is no interaction between SNP and sex. That is, that you assume that the effect of SNP is multiplicative, and is the same for women and men. This might not be true. To check for interaction you will need to also include an interaction term between SNP and sex in your model. In SPSS there is probably a smart way to do that, but to make it quite clear, you can make such an interaction term yourself: You need to generate a new variable which is simple the product of the SNP and the sex variable. E.g. SNP_sex= SNP x sex. This new variable will take on the value “1” whenever a woman has SNP and “0” for everyone else.
Now you repeat the previous model but now you also include the new variable “SNP_sex” in the model.
If the odds ratio for this interaction term is significant, the interpretation of your data will be another. Let’s say you get the following results:
Odds ratio for SNP:                       8                           p=0.03
Odds ratio for sex:                         1.8                       p=0.02
Odds ratio SNP_sex:                    0.8                       p=0.04
Then the interpretation is this:
The effect of SNP for men is an increase in the odds by a factor of 8, whereas the effect of SNP for a women is only 8 x 0.8=6,4, and the odds of disease for a women with SNP as compared with a man without SNP is 8 x 1.8 x 0.8 = 11,52.
If the odds ratio for the interaction term is in-significant, the interpretation of your data will be that from the previous model without the interaction term included.
In the example above, the odds ration for the interaction term was less than 1 (OR<1) which means that the effect of sex and SNP is not fully multiplicative, that is, sub-multiplicative. You could also find that the effect is supra-multiplicative, that is OR>1.
Regards Kim
  • asked a question related to Epidemiological Statistics
Question
3 answers
Can you suggest me some indications of databases and / or sociological studies of thalidomidics' life courses of in Italy? Thank you
Qualche indicazione su banche dati e/o studi sociologici sui percorsi di vita dei talidomidici in Italia? Grazie
Relevant answer
Answer
Cara giulia, temo di non poterti essere di aiuto. Non ho riferimenti in merito..
Come va ? Posso sempre sperare in un tuo contributo più avanti?
Ciao
Giovanna Artioli
  • asked a question related to Epidemiological Statistics
Question
4 answers
How many cases of complications of influenza A(H1N1) in the form of meningitis or menigoencephalitis in the world
Relevant answer
Le sirvieron los articulos!!!
  • asked a question related to Epidemiological Statistics
Question
3 answers
Does anyone know if there is a population attributable risk equation for continuous outcomes?
Relevant answer
Answer
Thank you so much, James!!!
  • asked a question related to Epidemiological Statistics
Question
7 answers
I am studying long term epidemiology trends of dengue in Barbados. I am looking at ways and means to define endemicity and possibly quantify the burden of dengue so as to be able to gauge the changes in epidemiological characteristics over time.
Relevant answer
Answer
Dear Alok,
 This response will help you to define best what, why and how to approach your work and expectations
Dengue endemicity can be defined as the quality and/or intensity of  dengue been endemic ( incidence, prevalence nature and extend, geographical coverage to control programs) and can be categorized onto hyper-, meso, hyo, holoendemicity in a particular settings. Thus,
 This can help to appreciate the diversity of dengue
 This can also to understand the risk factors affected the patterns of vector and human interface
 This can be can useful in assessing the local or imported transmision dynamics
This can further help you to intepret the local trends and burden including cost effectivenens and performance of existing  rograms and interventions
Thanks
Dr Tambo
Disease Surveillance Specialist & Pharmacologist
  • asked a question related to Epidemiological Statistics
Question
13 answers
I've been doing meta-analyses on medication use and/or exposure during pregnancy for a while. Because ethical barriers exist regarding RCTs, the best evidence in this field usually comes from prospective cohort and case-control studies both of which may have different types of biases.  
My question is; would you combine the case-control and cohort data in the same meta-analysis or would you prefer to do them seperately? Is combining both in the same meta-analysis definetely an unacceptable thing or is it a choice?
I'd very much appreciate your suggestions.
Relevant answer
Answer
Have you looked online at the Cochrane guide for the analysis of non-randomised study designs?  It covers most of what has been discussed and more
  • asked a question related to Epidemiological Statistics
Question
8 answers
Dear all,
We have observed and catalogued clinical pneumonia patients into three  groups. The three groups are patients infected with bacterial P, patients with bacterial S, and patients with both of bacterial P and S. The prognosis for the patients with combined infection of P and S is pretty gloomy than other 2 groups of patients. It seems a synergy effect of these two pathogens in a patient. Which statistical method I should use to analyse these three groups of samples? 
Thank you. 
Relevant answer
Answer
one conceptual approach could be to take all patients combined and regression model each quantitative clinical response variable (Y) as  Y = alpha +beta1 x1 + beta2 x2 + beta3 x1x2 where  x1 and x2 are measures of infection by pathogens 1 and 2 respectively. The beta3 would measure the degree of synergy between pathogens 1 and 2.
  • asked a question related to Epidemiological Statistics
Question
3 answers
Does anyone knows which is the right form of database for the poprisktime command in stata for age-period-cohort analysis? Could please someone help me with an example?
Relevant answer
Answer
  • asked a question related to Epidemiological Statistics
Question
49 answers
I am reviewing studies to calculate pooled prevalence of a disease in the country. I need to calculate pooled prevalence and to plot Forest Plots for overall prevalence and for each subgroup. Please suggest the type of review I have to use (Methodology, Flexible, etc.) and any reference link I can read. Thanks.
Relevant answer
Answer
Hi, there is no appropriate reference, because I myself have written to authors of other articles and figured out the method. But for a detailed explanation, if you know RevMan, these are the following steps
1) Enter your studies in RevMan
2) Go to Data and analysis
3) Click on add outcome
4) Click on the Generic Inverse Variance option
5) Enter the name of your outcome. Click next
6) The statistical method should be chosen as Inverse Variance
7) You can choose the model you wish to use. Click on next 
8) Click on the option for which CI you wish to use (90 or 95%)
9) Customize your graph labels, since I am analysing for prevalence data, I prefer that the graph labels do not read anything.
10) Please adjust the scale of the graph at this point. Click next
11) Choose the option to enter the study data. Click finish
12) Add in the study data. For this you will need your study references ready, Prevalence data with the standard error of mean calculated separately for prevalence data. The formula to be used is SQRT(p*(1-p)/n).
I hope these steps help you all out !!! 
  • asked a question related to Epidemiological Statistics
Question
13 answers
Specifically, my review is a systematic review of the construct of financial well-being. It is systematic because I want to respect all the guidelines for systematic review (e.g. PRISMA) but my aim is not to evaluate an clinical intervention's efficacy but to synthesize how previous studies defined and operazionalized the construct of financial well-being.
Registries as PROSPERO or COCHRANE accept only reviews that have clinical outcomes.
Relevant answer
Answer
While registration may not be required, it is best academic practice. If you have trouble registering your review, I suggest trying to get your protocol published. This can enforce similar academic rigour to registration. I think the Centre for Reviews and Dissemination is also where the Systematic Reviews journal was born, check out the journal and start a dialogue with the editorial staff. I would expect that financial well-being will need a bit of explanation in your protocol linking it to health, but you don't need a direct connection, PROSPERO has systematic reviews of educational interventions in it. The problem here might be that your review doesn't appear to have an outcome, it is a review of methods to define an idea construct; it may not be a systematic review in the traditional Populations, Interventions, Comparators, Outcomes, Study Design (PICOS) sense that CRD and Cochrane subscribe to. A polite conversation with the PROSPERO administrator should help you figure out what you need to do.
  • asked a question related to Epidemiological Statistics
Question
7 answers
Due to the different sampling weights, I have concerns. I have included some background info for context and questions are at the bottom.
ISSUE--Developing a database: Individual-level collision records data will be merged with community characteristics gathered from several data sources (Census, population-based surveys, other individual-level data).
Population-based survey data have different population sampling weights.
All 4 datasets will be linked to the collision data using a geographic variable contained within each dataset.
DATASETS---
Individual-level Collision Data (linking variable=drilled down geographic variable which can be modified for ease of linking to different geographic levels)
+ Census data
+ County-level population-based survey #1 (weighted at the county level & has zip code)
+ County-level population-based survey #2 (weighted at the county level & has zip code)
QUESTIONS:
1) If the appropriate sampling weights are applied to the population surveys, is it allowable to link them to the collision data and other datasets?
2) The population survey data are representative of the County. However, we would like to present data at a more granular level (ex: census tract, planning area, etc.).
  • Is there a methodological approach to accomplish this?
Thank you!
Relevant answer
Answer
You can use a multilevel model to analyze the point data at level 1 and the county data at level 2 simultaneously - it is possible to use survey weights in such models see
It is  important to use such an approach - one reason is that there will be many more points than counties and the standard errors of the effect of variables at each level wil lneed to be calculated appropriately.
Inferring from counties to census tracts: there is a large literature on 'small area estimation'  
for ONE  way - see David Martin's grid-based approach
  • asked a question related to Epidemiological Statistics
Question
25 answers
I have been documenting strange step-like changes in deaths in a number of countries and would like others to check and see if these observations can be replicated using small-area death statistics. Attached is a paper documenting the parallel effects of these events on medical admissions to hospital and it gives an idea of the sort of analysis which could be required.
If needed many of the supporting studies can be accessed at www.hcaf.biz in the 'Emergency Admissions' web page - which also contains the published stuidies on deaths.
Much appreciated if you can assist.
Relevant answer
Answer
Dear Arthur, I have demonstrated that the increase in both deaths and medical admissions associated with these infectious-like events is diagnosis specific. Hence while analysis of all-cause mortality picks up the timing of the events the obvious next step is to analyse trends for specific conditions/diagnoses. I have done this in a preliminary way in several of the papers listed in the Emergency Admissions folder at www.hcaf.biz which can be downloaded. Have a quick read. Cheers Rod
The attached book chapter may be helpful. Type in 299174 to open a printable version. I have purchased the right to distribute this chapter.
  • asked a question related to Epidemiological Statistics
Question
2 answers
Is it possible to make an estimation of Hepatitis B Virus prevalence in a society depending on the data collected during blood donation and sample examination ??
Relevant answer
Answer
Thanks Ernest, but can we extrapolate the results somehow?? 
  • asked a question related to Epidemiological Statistics
Question
1 answer
I am conducting a meta-analysis and I plan to only include quasi-experimental studies if the treatment and comparison groups are closely matched on particular baseline characteristics.  I have seen "closely matched" defined many different ways in various meta-analyses.  I am looking for an efficient way to consistently determine the equivalence of groups when coding studies.  Thanks in advance for any suggestions you can provide!
Relevant answer
Answer
Hi Angela
The determination of "good matches" is typically done by comparing the standardized differences and variance ratios of each baseline covariate between the treatment and control groups. We are looking for values as close to zero as possible. A review of QQ-plots is also a recommended approach, however you won't get a numerical value out of it.
The presentation of these values is similar to what you'd find in a typical "Table 1"in a journal article that would show the baseline characteristics between groups. Here, you would add some columns to show the values for the matched groups and the std diff and VR values to demonstrate that there is balance in the covariates.
I am attaching a couple of papers that discuss both the numerical approach (std diffs and VR) and the QQ Plots.
I hope this helps
Ariel
  • asked a question related to Epidemiological Statistics
Question
3 answers
In meta-analysis 3RCTs, unlucky random baseline(+SD) & end(+SD) PRO psychometric scores. Cochrane h'book ref 17.7 discourages using means difference? Non-significant result when outcome measure is used alone, significant benefit shown if the change in measure is used, due to unlucky randomisation (all three had worse baseline for treatment arm thus reducing the end score - despite significant improvement). H'book ref 16.1.3.2 advises imputing an SD for the change but worst case assumptions for correlation still just approximate the average of baseline/end SDs.
This issue also sent to Gotzsche and Glasziou, as benefit is 'obvious', but conclusion is negative!
Relevant answer
Answer
Not sure if any of these thoughts cover what you're looking for:
There are (at least) 3 options for analysing the data: (1) just use outcomes at end of study, (2) use change from baseline, or (3) use outcomes at end, adjusting for baseline. The third option is more powerful and makes fewer assumptions.
In the context of meta-analysis, we want everything to be somehow on the same scale. So may be you want all estimates included to be estimates of the difference between the means adjusting for any baseline. Both (1) and (2) approximate to this in the long-run, so just use the one that's closest / that has the information presented.
Final thought is that if it really is "unlucky randomisation" then in the long-run, there'll be studies that are unlucky in the other direction. So maybe in the context of meta-analysis, this is just a bit of expected heterogeneity, and nothing to be scared of?
But these thoughts may not touch on what you were trying to ask. Maybe you could spell it out a bit more for me if I haven't grasped what you meant.
  • asked a question related to Epidemiological Statistics
Question
6 answers
This is a question for all the statisticians out there! I am conducting an interrater reliability study. I have decided that using the Intraclass Correlation coefficient (ICC) would be the most appropriate statistical procedure, given that I have more than 2 raters involved (k=4). I have also calculated my appropriate sample size. My question relates to the type of model. I have chosen absolute agreement over consistency as I want to measure the ratings with absolute agreement to test the reliability of utilizing a proposed assessment rubric. I am not really sure if it is a two-way mixed or two-way random model, given that I have “randomly’ selected and invited raters from different disciplines, however, all four raters are fixed and will rate the same  subjects drawn randomly from a larger population pool. The raters do not know the subjects. I am inclined to think this is a two-way mixed model rather than a two-way random model. What is the general consensus? I would greatly appreciate any feedback.
Relevant answer
Answer
Thanks Saeed for that information.
Regards,
Cherie Tsingos
  • asked a question related to Epidemiological Statistics
Question
5 answers
The model including one binary outcome (0/1; incident rate ~1.2%), one main exposure, and 13 covariates. The whole model is significant and the goodness-of-fit is OK. However, model diagnostic is completely questionable (See Fig). Almost all the observations fall into the category of y=1 had residuals larger than 3. I wonder whether this is due to the low event rate of the outcome, and can anyone give me some advice as to how to deal with this problem? If the purpose of the model is to explain an exposure-disease relationship rather than to predict, can I just ignore the model diagnostic results?
Relevant answer
Answer
Hi,
If your data in Binary form then you have to go for Binary Logistic Regression model.
If you want to do the Binary Logistic Regression model for your data set, it is good and appropriate one.  From that you can predict the influencing variables very nicely (p - value is less than 0.05 ie., Significant).  You have to find the goodness of fit from the Chi-square p - value.  If your Chi-Square p - Value > 0.05 then you would conclude your choosen variables are very nicely fit to your model.  This is the conclusion will arrive by you.
Kinldy see the following links:
Thanks.
All the best for you.
  • asked a question related to Epidemiological Statistics
Question
3 answers
Hi, I am looking for recent  articles (on future projections on HIV/TB co-infection burden in low-income settings (esp. sub-saharan Africa). Epidemiological studies., mathematical modelling, etc.
Thanks.
Relevant answer
Answer
The WHO Country profiles report is really comprehensive: http://www.who.int/tb/publications/global_report/gtbr14_annex2_country_profiles.pdf
  • asked a question related to Epidemiological Statistics
Question
13 answers
I am running survival analysis using SAS. I have a very large sample size (>19,000) and I am finding very narrow CIs, For example, 1.066 (1.062-1.069). The model is also weighted because it is a complex survey with mortality follow-up. What would explain the narrow CI?
Relevant answer
Answer
As already said, the width of the CI is (also) a function of the sample size. Larger sample size -> smaller CI. Huge samples -> tiny CIs.
The width of the CI is (an estimate of) the precision obtained to estimate the parameter based on the sampling variation. Note that this precision can be arbitrarily high with arbitrarily large sample sizes - and that it does not indicate any "correctness" of the estimate. The "precision" measured by the CI is a sample-statistical feature and it must not be mistaken as "likely range of the 'true' parameter value"! (as such it is often wrongly described in stats textbooks) It is easily possible to get a very highly precise estimate that is grossly "wrong", simply because the underlying model was "wrong" (for instance missing a non-linear relationship, missing a relevant predictor or interaction) or the experiment was inadequate (for instance by non-random sampling, introducing biases, "wrong" read-outs etc.).
You first have to discuss the adequacy of the experiment, you then have to discuss different sensible models, then you may decide for a favorable model (due to some scientific [not statistical!] reasons), assuming that it is adequate or the best available for your purpose (you may also really use different models and see how this impacts the estimation of your parameter of interest). Then you can take the withs of the CIs as indication for the statistical precision of the parameter estimates and discuss their location (i.e. the effect sizes) in the context of your scientific question (e.g.: is an OR of 1.066 or up to 1.069 somehow relevant?).
  • asked a question related to Epidemiological Statistics
Question
10 answers
I want to understand the physical significance of the product of D (Delta I) (term of diffusion with D the diffusion coefficient) and S the number of the susceptible.
Delta = Laplace operator
Thanks
Relevant answer
Answer
I think Prof. James Leigh explained it nicely.i agrre with the answer
  • asked a question related to Epidemiological Statistics
Question
18 answers
I am doing a cross sectional study, using surveys, about prevalence of tobacco chewing. I will study the prevalence and its characteristic in population. Survey includes sociodemographic factors and behavioral questions about risk factors.how are we going to calculate the reliability and validity ? for this type of questionnaire. I have adapted from state forms for surveillance
Relevant answer
Answer
There are many different forms of reliability and validity, so you need to specify which types of these measures you are interested in.
  • asked a question related to Epidemiological Statistics
Question
3 answers
Interested in improved glycemic control RCTs and Meta-Analysis looking at MDI or glargine injections versus CSII pump administration
  • asked a question related to Epidemiological Statistics
Question
11 answers
Hi,
I have data from a randomized controlled trial which assessed the effect of drug A versus B on stroke. Now I want to select some specific subgroup (eg. the patients with cancer) and see what the risk factors for death in this specific subgroup would be.
Please of note, the trial has heterogenous groups, eg. some patients with cancer, some with heart diseases, some with COPD, etc. Now my research question is: what are the risk factors for death in patients with cancer? That means, I want to use the data for only some subgroup, but my interest is not the interventions (in the trial protocol, i.e., drug A and B). And similarly the outcome (death) is not the one in the trial protocol (stroke).
The drug (A or B) may be one of the risk factors for death in the subgroup patients. And I did some preliminary analysis for this subgroup stratified by drug A and B, and found their baseline characteristics were not significantly different, which means the randomized procedure was well achieved in this subgroup. 
Now I wonder is it feasible to do such a project? Of course, it has limitations to the validity because we use a trial data (rather than cohort data). However is it feasible and clinically meaningful to run a project that chooses a subgroup and assesses the risk factors for death in this specific subgroup? However, I am not interested in the effect of the drug (A or B) on stroke (as in the trial protocol) in this subgroup..
Many many thanks for any advice and suggestion in advance!!
Regards,
GL
Relevant answer
Answer
As you know drug development is a process.  It includes hypothesis generation and conformation.  The confirmatory step expects to see a well designed and executed clinical trial with prior statement of the null and alternative hypothesis, control of type one error rate, etc.  Your proposal is fine for hypothesis generation but not conformation.  If you see something of interest (if you stare at the sphinx long enough he will wink at you) you need decide if you want to go on to a next step.  It might be an attempt to see if similar findings can be found in other data sets without doing a new randomized trial.  This might be from a post hoc examination of a different clinical trial or from some other form of observational study.  You could immediately consider a high power large clinical trial but the risk is usually considered very high-- confirmation based on your current level of data is a rather long shot.  
So, if you see something it should be considered a lead, a new hypothesis, as if it was fished out of an observational data set that needs confirmation.  You could follow with a carefully designed observational study (see Rosenbaum's book on Observational Studies) and conclude with a randomized clinical trial.  It is great to have leads.  It is a great deal of work to have leads.
Alan
  • asked a question related to Epidemiological Statistics
Question
6 answers
I carried out a Dersimonian-Laird random-effects meta-analysis using Cochrane (RevMan) software to create a summary effect estimate of weighted mean difference. I notice that the distribution of study effects (n=20) is negatively skewed. Do you recommend using the Permutation Method for non-parametric RE meta-analysis (e.g. Stata)? How skewed do the estimates have to be to require this form of RE meta-analysis. Thanks.
Relevant answer
Answer
So, i think skewness is not important in your research. Normality is not an assumption for each type of meta-analysis. CMA software can help you for better assessment of the data. 
  • asked a question related to Epidemiological Statistics
Question
7 answers
Assume you have 4 treatment groups (A,B,C,D) with five subjects per group. Should one expect your Two- sample T-stats results and conclusion for A and B to be similar to the One Way ANOVA F-test for A,B,C and D treatment grps?
Relevant answer
Answer
@ Sandro: ANOVA can be done with 5 subjects per group with no problem. What may be difficult is to check that its assumptions are correct (or more exactly, to detect that they aren't...), but if they are, there is no problem to use ANOVA with such sample sizes. Or are you thinking about something special?
  • asked a question related to Epidemiological Statistics
Question
8 answers
Besides the type of epidemic how do we determine the overall severity (i.e mild moderate severe)?
Relevant answer
Answer
You can also use a software called "Spectrum" from UNAIDS. When you enter certain parameters, it can produce some prediction about HIV magnitudes.
  • asked a question related to Epidemiological Statistics
Question
11 answers
I want to enter data which I have gathered through my study tool into epi info. I am designing the data form at the moment but am unsure whether I should include the answers as it is or just put in the codes pre-assigned to the responses ? Also is it possible to assign codes to the choices at a later point or would I even need to assign codes at all?
Relevant answer
Answer
You are much better pre-coding your data on the data collection form. I would start by pre-coding the categories you want, and then do a pilot to see if any other common categories occur. 
You can run into trouble coding after the event.
1. It takes far longer than coding at the time
2. It is best to code "live", when you can follow-up or clarify incomplete or vague responses. After the data are collected you have a much reduced opportunity to do so.
For example, I worked with data where occupation included "Works for the city corporation" and "Operative". These responses were uncodable (both the street sweeper and the Lord Mayor work for the city corporation, and there was no way of knowing what the operative operated.) Pre-coding occupation would have meant that the interviewer could have clarified the person's job at the time of data collection.
A pilot is very useful to see if there are categories you didn't anticipate. But I would sooner run a pilot than have to hand-code an "other" category afterwards.
Think forward to your final report. What categories will you be using to report on your data? Use these categories, and don't be tempted to collect excessive detail or data you don't need.
  • asked a question related to Epidemiological Statistics
Question
17 answers
Meta-analyses, epidemiology, public health
Relevant answer
Answer
Very nice answer from Roger, especially about meta-regression (what you might also read as "mixed-effects models" in the meta-analysis literature. To add to Roger's response, you can think of the fixed effects model as providing statistical inference only about the studies included in your analysis; you can use it to ask what the (weighted) average effect is among your studies. You can consider the random effects model so test similar things (the average), but now provides inference about the larger population of studies from which your studies are drawn. The random effects model tests for significant heterogeneity among the effects (tau squared). Significant heterogeneity (from the Q statistic and p value) is often cause for doing a meta-regression to explain that heterogeneity.
Viechtbauer (2010) has a very nice explanation and step by step walk through of these analyses tailored for R, but you may find the information useful regardless of your statistical software:
  • asked a question related to Epidemiological Statistics
Question
4 answers
How can we get number needed to harm from pooled odds ratio in meta analysis? 
Relevant answer
Answer
Thank you for the help. When I run analysis it comes as OR, because I used that effect measure. However, when i try to change it to risk difference it becomes blank. 
  • asked a question related to Epidemiological Statistics
Question
17 answers
  1.  If we get  a significant Odds ratio of very high magnitude  say more than 10 or 15. can we say that there is very very strong association associated with?
  2. If we do not get a significant association in an Un-adjusted model but get it in the adjusted model does it mean there is still an association associated.
Relevant answer
Answer
1. such a high OR is very rarely seen but I would say that, yes, it means a very strong association. As an epidemiologist, I would like to have more information and have a close look at the sample, confidence interval, etc.
2. this situation exists of course and looks better than the other way round. However, in my experience, it is not that frequent though - usually the unadjusted is close to significance - and I'd put some efforts to understant why significance has changed. In other terms, what was/were the factors that have modified the significance once taken into account. This analysis sometimes reveals some very interesting patterns.
  • asked a question related to Epidemiological Statistics
Question
4 answers
Sudan is a country where consanguinous marriage is very common. Recently the country was divided in to two countries, so we do not have a recent population number. Many disease associated with different patter of inheritance are common.
I would like to meaure the percentage of consangunity in the country. That is why I am looking for biostatistic or epidemiology or statistical method with which I can answer the question of what is the percentage of the consanguinity in Sudan.
Relevant answer
Answer
Hi,
A huge amount of work in early population genetics (around the 1930s) was focused on exactly this problem! The tool you are looking for might be 'F-statistics', and the related concept of the 'probability of identity by descent'. These slides show how you can calculate various measures of consanguinity from a pedigree:
If you want to dig deeper, look at anything written by Sewall Wright. Hope this helps!
Bob
  • asked a question related to Epidemiological Statistics
Question
5 answers
Using ILI cumulative summation currently. Is there another method that has proven valuable?
Relevant answer
Answer
Another method is to consider the dynamic of the flu: the seasons begins when the number of cases grows exponentially
  • asked a question related to Epidemiological Statistics
Question
4 answers
If we are taking daily samples to study Aspergillus development as surveillance in high complex hospital, which kind of epidemiological and statistical approach do you suggest to use: Time series or incidence study? The outcome is positive culture.
Relevant answer
Answer
Normally in hospital surveillance three important aspects should be included.
1. Number of hospitalized cases versus diseased number: treated ones
2. Type of morbidity shown by persons at earlier stage to last stage
3. Ratio of hospital discharged and dead.
You have to keep in mind the rate of hospital based mortality may be due to some other reasons also, but authopsy reports clear that reason of death. If infection rate is quite high and mortality rate is low in such cases population has strong immunity and chances of hospitalized cases will be cure and become healthy and regain their normal life. If infection rate is high and mortality rate is also high it means incubation period of pathogen is very short and invasion rate of parasite is very high. In such cases both mortality and morbidity rates may be higher.
A diseased and undiseased ratio and hospital based mortality and mortality outside hospital can tell about over all status of the disease.
  • asked a question related to Epidemiological Statistics
Question
8 answers
Lots has been written about quarantine and closing borders domestically (US) but not much else.
Relevant answer
Answer
Hi Brandon,
I think that rather than thinking about "domestic" ethical issues in connection with the Ebola epidemic, we should focus on the epidemic as further evidence of Globalization diminishing the relevance of the nation-state and its imaginary boundaries.  We live in a time of fluid borders and porous nationalities brought on by ease of travel, technological innovations, an aggressive global media and an equally obdurate "global civil society" made up of transnational NGOs.  This denies us the intellectual luxury of viewing global issues such as HIV/AIDS, global warming and, now, Ebola from the narrow lens of a domestic crisis.
Remember how the 2008 "domestic" financial crisis brought on by "domestic" hubris in the U.S.  financial industry ("securitization," toxic loans, etc.) spread like wildfire to the rest of the world?  While I have described the morally challenged financial industry in terms of the weaknesses of domestic regulatory agencies (see "After Shame...." among my RG publications), I definitely do not view the lack of moral fiber in the financial industry as a "domestic ethical issue".  Indeed, it reflects a pandemic of global amorality brought on by the enhanced respectability of "Neoliberalism" as a euphemism for moral decay.
Gwen
  • asked a question related to Epidemiological Statistics
Question
13 answers
I  know that this isn't recommended, but I don't why. Can anyone explain to me?
Relevant answer
Answer
Hi Valter,
if you have change scores and SD of change scores for all studies then you ARE allowed to perform meta-analysis on change scores and compute a SMD.  What the Cochrane book is referring to in the specific passage that you quoted is that you are not allowed to mix final scores and change scores for computation of SMD.
  • asked a question related to Epidemiological Statistics
Question
33 answers
To perform a test of significance between means of two groups is well known. But if I use the medians is it possible to test for significance between medians of two groups?
Relevant answer
Answer
When you compare medians, you should stop and ask yourself if you are interested in the difference in the medians of the two groups, or in the median difference between the observations in the two groups. The median difference is an interesting measure of effect size. It's harder to interpret the difference in medians, since we tend to extrapolate group differences to individuals, and in the case of medians, this doesn't hold. 
  • asked a question related to Epidemiological Statistics
Question
5 answers
Since the calculation of binary exposures, Levine formula is introduced. But how to calculate when the study has multi categorical exposure and confounding variables 
Relevant answer
Answer
There is no easy way. You may have to look at each category separately to determine effect. Confounding makes it worse. You are entering the territory of a potential Simpson's paradox. See the attached.
  • asked a question related to Epidemiological Statistics
Question
3 answers
The data was coded wrongly mixing the false positive and false negative and so the true values, and it's impossible to get the detailed data. Can I do cohen's Kappa test?
Relevant answer
Answer
I think, the answer strongly depends upon the formulation of the problem that you need to solve (and that you haven't described here).
Do I get it right that you have effectively a 2*2 contingency table of some binary ("yes/no") test results against the real existence/nonexistence of a feature (and instead of two off-diagonal elements of the table you know their sum only)? And what do you want to know about the test (and what to use it for)?
  • asked a question related to Epidemiological Statistics
Question
13 answers
I am running a multiple regression model and I an realizing there is an interaction between two covariates and with this interaction term, both covariates remain in the model while the R² increases by about 10%. Because the interaction is not with the main exposure of interest, the regression coefficient for the main exposure of interest does not change with the inclusion of this interaction term. Should I still keep it in the model? How best do I explain the importance of this interaction to the association between the exposure of interest and the outcome? Thank you for your answers.
Relevant answer
Answer
I am generally not in favour of introducing interaction terms in a model, unless they are pre-specified. This because with (say) 5 independent variables, there are 20 possible interaction terms, and you end up with a bit of a fishing expedition! So, unless there is a good reason to suspect a significant interaction, I would be inclined not to include these terms.
  • asked a question related to Epidemiological Statistics
Question
16 answers
In leon gordis, these methods were explained under validity and I felt it logically correct, but few other books covered these measures under diagnostic agreement.
Relevant answer
Answer
A dichotomous diagnosis problem gives rise to a four cell contingency table with three degrees of freedom: TP+FP+TN+FN=N. This means that any three independent measures suffice to completely specify the outcomes. Any further measures used will necessarily be dependent on those first three independent measures.
Specificity and Sensitivity are measures of validity or informedness with respect to the disease rather than accuracy or reliability of the test, and are commonly plotted against each other in ROC (Receiver Operating Characteristics), allowing optimization of the operating point, e.g. by tuning or setting a threshold. However, usually what is plotted is mirror image: Specificity (True Positive Rate) vs 1-Sensitivity (False Positive Rate). The best operating point in terms of maximizing informedness is the one that is furthest from the chance diagonal (TPR=FPR). Under the assumptions that the (error) cost associated with the set of all negative cases matches that for all positive cases, this is also the minimum cost solution. Informedness = ΔP = TPR-FPR = Specificity+Sensitivity-1 is the distance of the operating point above the chance line, and give the probability of an informed decision (as opposed to guessing). Informedness = 0 corresponds to guessing, that is operating at chance level or being on the chance line. Different cost assumptions change the gradient of the equal cost lines from the 45 degrees of the chance line, but these isocost lines remain parallel, and the optimum cost points are those that lie on the isocost line that is the tangent to the curve closest to the point (TPR,FPR)=(1,0) or (Sp,Se)=(1,1).
The area under the curve (ROC AUC) allows seeing not only how good this optimum operating point is for the current cost and prevalence conditions, but how much leeway there is for effective operation as they change, which goes to reliability. The Gini coefficient, Gini = 2AUC-1 corresponds to Informedness for the triangular curve joining the operating point to the chance line at (0,0) and (1,1). To the extent that an actual operating point tuning curve dominants this, and there is a region between the curve and the triangle, this is identifying places that will do better than would be expected by chance-level interpolation - the Informedness component of Gini or AUC corresponds to Certainty, and the extratriangular component corresponds to Consistency (this ROC ConCert approach thus goes beyond evaluating a single operating point with its individual Specificity, Sensitivity, Positive Predictive Value and Negative Predictive Value and addresses reliability).
An equivalent graphical analysis can be done with PPV and NPV, defining Markedness = PPV+NPV-1. Whereas Informedness reflects the probability of an informed diagnosis of the true condition, Markedness reflects the probability of the true condition affecting or marking the fused set of variables (test outcomes, symptoms, etc.) used for diagnosis. It tells you more about the test than the patient!
  • asked a question related to Epidemiological Statistics
Question
3 answers
I am trying to collect between- and within- person variance data for common environmental and medical biomarkers. Right now, I am focussing on hydroxy PAHs such oh-pyrene in urine, cytokines, lipid (cholesterol) in blood, pesticides in various biological media, volatiles (chloroform, BTEX) in breath, etc.
Surprisingly, these data are difficult to find in the literature. Would love some assistance.
Relevant answer
Answer
Something of interest, even if it is a bit late.
George Box and his friends performed an experiment where they ran a Gauge R&R study on blood samples and cholesterol levels. George and a few others gave blood samples, and a lot of them, and sent 2-3 samples from each person out to different medical testing labs. The results showed that the cholesterol levels of each participant were similar within any one lab. However, some labs showed a large positive bias. Others showed a large negative bias. So, according to the results, some labs reported really high cholesterol levels that would require medication. Others showed cholesterol levels that were so low, they were unhealthy.
Having worked in commercial testing facilities, I know the same thing happens with environmental samples too.
I would try to set up a Gauge R&R study with repeated measures.
  • asked a question related to Epidemiological Statistics
Question
2 answers
Dear all,
I am using Epi Info 7 to create form for a survey data entry.
The survey contains a general section and sections specific for male, female and children.
The form has 8 pages and I used the following codes so that entry for children will skip the irrelevant pages.
Page [P1 Demographic]
After
//add code here
IF FormCoding = "Children" THEN
GOTO 7
ELSE
GOTO 2
END-IF
However, when I fill up the form, entries for children do not skip Page 2 - 6.
I have tried variations of the codes above, and have used similar codes on different pages.
However, none of them worked.
The variable "FormCoding" is created using as Option.
Have also tried creating it as a legal value but also did not seem to work.
What could be the possible reasons?
Is there any rule which I have accidentally omitted?
Please advise.
Thank you very much in advance.
Relevant answer
Answer
Hi Jose Angel,
Thanks for your response!
I want to create 3 skip patterns like the one I mentioned in my previous message.
I tried using the name of the variable instead of the page number, and removed "ELSE".
For example:
Field N7EmploymentStatus
Click
//add code here
IF FormCode = 2 THEN
GOTO 7
END-IF
End-Click
End-Field
Somehow the code above worked but the this following one does not.
Field Forwhat1
After
//add code here
IF FormCode = 2 THEN
GOTO 6
END-IF
End-After
End-Field
The difference between them is that "n7EmploymentStatus" variable is an option (only one correct answer) whereas "Forwhat1" is a free text box.
Any thoughts are very appreciated.
Thank you very much.
Regards,
Ka Keat
  • asked a question related to Epidemiological Statistics
Question
4 answers
I have been using the Cox proportional model option in SPSS to estimate relative risk between two groups (which appears as the Exp B statistic in the output). But the estimates I get don't seem to reflect the survival plots. I tried to do some calculations by hand using the life table outputs. These estimates are different from the SPSS outputs. Has anyone experienced a difference between manually calculated relative risk and SPSS computed RR, and if so how did you reconcile the difference?
Relevant answer
Answer
Yes, If you perform survival analyses using the KM test with the SPSS you will obtain the curves but the results are not adjusted for confounding factors. With Cox regression you can adjust for confounding factors and obtaint relative risk values more accurate and the estimated survival curves after adjustment.
  • asked a question related to Epidemiological Statistics
Question
4 answers
How must sample size calculations be adjusted, if a hawthorne effect might be occuring during study conduction? Is there a publication concerning this problem?
Relevant answer
Answer
The Hawthrone Effect is a design issue, and cannot be "adjusted" for in analysis. You will have to design your study to limit the effect as much as possible, and then note it as a potential limitation of the study.
Aside:
Definition of the Hawthrone Effect: The effect (usually positive or beneficial) of being under study upon the person being studied; their knowledge of the study often influences their behaviour. (Last)
  • asked a question related to Epidemiological Statistics
Question
2 answers
Using observation data where the data can not be assumed to be missing at random,
Relevant answer
Answer
Thanks Pascal. I haven't been able to determine for sure that the data is NMAR but I don't think I can rule it out. I haven't come across any papers discussing how to identify whether data is missing at random or NMAR.
  • asked a question related to Epidemiological Statistics
Question
13 answers
I need to show the statistical power of a significant p-value obtained for a survival curve in melanoma.
Relevant answer
Answer
Hi Cecilia,
As has been already suggested post hoc power analysis is not a good idea. However, there is more to the story.... making claims based on a single univariate analysis is fraught with danger. Biological systems are universally multifactoral and so it is highly unlikely that a single factor will account for the difference in survival that you are observing - a cox regression analysis with all your potential prognostic factors should be run, and this will enable you to give an estimate of the hazard ratio for you factor adjusted by the other factors. Obviously, you need to have sufficient numbers of patients in the study to conduct a multivariate analysis - at least 10 patients per covariate is a minimum.
  • asked a question related to Epidemiological Statistics
Question
7 answers
I am trying to estimate parameters driving the spatial spread of a disease. For the moment, I am using simulated data to check whether the estimation process performs well.
My objective would be to estimate the transmission parameters using a maximum likelihood approach.
So I used the optim() function in R from which I extracted the Hessian matrix. To derive the confidence intervals, I computed the standard errors by taking the root square of the diagonal elements of the inverse of the Hessian (http://stats.stackexchange.com/questions/27033/in-r-given-an-output-from-optim-with-a-hessian-matrix-how-to-calculate-paramet). My problem is that the confidence intervals derived are too wide (the CI for my probabilities range from 0 to 1). This problem seems to be relatively common, but I have no clue where it comes from and how to solve it.
Is there anyone to give me some tips?
PS: Here is my code:
fit <- optim(inits, MinusLogLikelihood_function, method="BFGS",hessian=T)
fisher_info <- solve(fit$hessian)
prop_sigma <- sqrt(diag(fisher_info))
upper <- fit1$par+1.96*prop_sigma
lower <- fit1$par-1.96*prop_sigma
Relevant answer
Answer
Hi there,
For normal standard errors assuming gradient is well approximated by quadratic function (I think) you can just use:
stderr=sqrt(abs(diag(solve(out1$hessian))))
You can then conduct t-tests for significance.
However, if your parameter is bounded by [0,1] as you describe it might be better to run with a bootstrap. This is quite a simple routine and just involves looping over your fit routine many times using a sub-sample of your data and storing estimated coefficients. The sub-sample is a sample of size n from your original data WITH REPLACEMENT. Each time you draw a new sub-sample you re-estimate the model and store estimated coefficients. Repeat s times.
You can then derive confidence intervals from the resultant coefficient distributions. several methods are available - just search bootstrap confidence intervals on the internet.
Whilst there are other methods available (profile likelihood maybe) the bootstrap is very easy to program, relatively fast (compared to say profile likelihood), and is relatively robust most of the time. Bayesian methods would also provide you with more appropriate confidence intervals.
Hope that helps.
-Daniel
  • asked a question related to Epidemiological Statistics
Question
19 answers
For example, suppose a certain method was chosen for power/sample size calculation for a future experiment, with the intent to use the same method to analyze the experimental data. But the experimental data turned out to violate the assumptions of this method. What is an appropriate strategy to analyze these data?
Relevant answer
Answer
@Igor : How big is the deviation from the assumption? About which variances are we talking by the way? Where does the assumption of linear growth fit in the model? What is the exact model? Is time taken as a predictor variable? In case it is, why use MANOVA (as that's far from the best option for longitudinal data)? Did you check for autocorrelation?
All questions a statistician would ask you before even trying to answer your question.
Did you think about including random factors in the model? Why not using a Generalized Linear Mixed Model, or a Generalized Additive Mixed Model? These can deal more correctly with longitudinal data than MANOVA. GAMMs come the closest to the approach you suggest, but then in a statistically correct manner. They also require quite some data for a stable fit, so I won't say that these are your best option either.
So again: take everything to a statistician and let him/her take a closer look at it. Without the actual data and research questions it's impossible to even come close to a sensible advice for a "best practice".
Your question is the equivalent of:
"I have a headache instead of food poisoning, so which alternative diet would you suggest?".
Without trying to understand the nature of the headache by physical examination, not one doctor will suggest a diet (and rightfully so!). Tons of herb enthousiasts will though.
  • asked a question related to Epidemiological Statistics
Question
49 answers
If we conducted a control study without determining the sample size and power of study, is it possible to calculate the power of study at the end (after data collection is completed)?
Relevant answer
Answer
No! If you want to look post-hoc, look at the confidence interval instead.
Why would you look at power for a study you have completed? Arguably you would do it because you wanted to know whether or not you could trust a negative result.
The argument would go something like this "I didn't get a statistically significant result, but then for an effect size of x my power was only 50% so this doesn't really tell me very much."
But if you look at the confidence interval you will see the range of values that are consistent with your data, and if this includes an important effect size, then you know that your study was uninformative. Confidence intervals are almost always more informative than significance tests.
Of course, for a non-significant result, if you calculate power using the effect size seen in your study you are bound to get low power. You then have a beautifully circular argument for resurrecting your hypothesis and concluding that your experiment just wasn't big enough. So never do that.
If you are doing a genuinely post-hoc analysis - that is trying to use power analysis to make sense of the results of a study you have completed, not to lan the next study, then the basis rules are:
1. Don't do post-hoc power analysis;
2. If you really must do post-hoc power analysis, don't do it yet;
3. If you are forced to do it now and can no longer delay, make sure that you never use the effect size observed in your results.
Even a priori, power analyses are based on a ehole load of assumptions about the nature of the response, the variances and the effect size. Always remember to look at power under a range of scenarios, and remember that we tend to be over opti istic about both effect sizes and variances!
Good luck.
  • asked a question related to Epidemiological Statistics
Question
6 answers
We are in the process of analyzing the data from a case control study to identify the various risk factors for esophageal cancer and want to assess the possible interaction among such factors in determining the risk associated with certain genetic markers.
Relevant answer
Answer
Amir: You can use the i. prefix before categorical variables:
2 . regress mpg i.foreign##i.rep78
note: 1.foreign#1b.rep78 identifies no observations in the sample
note: 1.foreign#2.rep78 identifies no observations in the sample
note: 1.foreign#5.rep78 omitted because of collinearity
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 7, 61) = 4.88
Model | 839.550121 7 119.935732 Prob > F = 0.0002
Residual | 1500.65278 61 24.6008652 R-squared = 0.3588
-------------+------------------------------ Adj R-squared = 0.2852
Total | 2340.2029 68 34.4147485 Root MSE = 4.9599
-------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
foreign |
Foreign | -5.666667 3.877352 -1.46 0.149 -13.41991 2.086579
|
rep78 |
2 | -1.875 3.921166 -0.48 0.634 -9.715855 5.965855
3 | -2 3.634773 -0.55 0.584 -9.268178 5.268178
4 | -2.555556 3.877352 -0.66 0.512 -10.3088 5.19769
5 | 11 4.959926 2.22 0.030 1.082015 20.91798
|
foreign#rep78 |
Foreign#1 | 0 (empty)
Foreign#2 | 0 (empty)
Foreign#3 | 10 4.913786 2.04 0.046 .1742775 19.82572
Foreign#4 | 12.11111 4.527772 2.67 0.010 3.057271 21.16495
Foreign#5 | 0 (omitted)
|
_cons | 21 3.507197 5.99 0.000 13.98693 28.01307
-------------------------------------------------------------------------------
The double hash signifies a fully linear defined interaction.
Cheers,
Robert