Science topic

Medical Statistics - Science topic

Explore the latest questions and answers in Medical Statistics, and find Medical Statistics experts.
Questions related to Medical Statistics
• asked a question related to Medical Statistics
Question
We're conducting a research design as follow:
• An observational longitudinal study
• Time period: 5 years
• Myocardial infarction (MI) patients without prior heart failure are recruited (we'll name this number of people after 5 years of conducting our study A)
• Exclusion criteria: Death during MI hospitalization or no data for following up for 3-6 months after discharge.
• Outcome/endpoint: heart failure post MI (confirmed by an ejection fraction (EF) < 40%)
• These patients will then be followed up for a period of 3 to maximum 6 months. If their EF during this 3-6 months after discharge is <40% -> they are considered to have heart failure post MI. (we'll name this number of people after 5 years of conducting our study B)
• Otherwise they are not considered to have the aforementioned outcome/endpoint.
My question is as follow:
1. What is the A/B best called? Is it cumulative incidence? We're well-aware of similar studies to ours but the one main different is they did not limit the follow up time (i.e: a patient can be considered to have heart failure post MI even 4 years after they were recruited). I wonder if this factor limits the ability to calculate cumulative incidence in our study?
2. Is there a more appropriate measure to describe what we're looking to measure? How can we calculate incidence in this study?
3. We also wanted to find associated factors (risk factor?) with heart failure post-MI. We collected some data about the MI's characteristics, the patients' comorbidities during the MI hospitalization (when they were first recruited). Can we use Cox proportional hazards model to calculate the HR of these factors?
Hi,
The study starts with a Cohort A and on Follow up if Ef<40 then it will be in Group B. This Shift suggests that the survival decreases (Failure to be in Group A) i.e Survival Analysis is applicable. Since factors affecting the survival would be examined, then Cox Proportional Hazards Model is applicable. Survival curves are cumulative curves.
• asked a question related to Medical Statistics
Question
Hi We are comparing mortality of two therapies in COVID patients. We have identified 74 patients in our hospital records (details in the attached image). We also have data of vital signs and lab data for these patients taken at different intervals.
Our idea is using logistic regression with mortality/discharge as endpoint, adjusted by patient status on admission.
Is this sample size enough for this kind of analysis? If not, how do you suggest we analyze this data?
Use one of the following.
A- odds ratio,
B- relative risk,
C- risk differences.
Put group on the rows and outcomes on the columns.
• asked a question related to Medical Statistics
Question
The specific queries are;
1.       The baseline value in a study is 47.9±13.6 (mean ±SD) and the percentage reduction following intervention is 69.1±14.9 (mean ±SD). How to get the final post interventional SD value by calculating the percentage reduction from baseline SD?
2.       The baseline value is 1702.7±876 (mean ± SD) and the average decrease after intervention is 38.27% (95% CI for percent change is -58.59 to -17.94; p=0.0047). How to calculate the final SD value post intervention?
Can anybody please guide us whether it is possible to calculate SD using any formula, with the above available details?
My preintervention values in group a and group b are 83.78±38.16 and 216 ±121.73 and post-intervention values are 36.20±29.75 and 35.40±14.71.In this scenario,how do we calculate the percentage reduction in values in both group A and Group B ? Please Help
• asked a question related to Medical Statistics
Question
Hello,
I have two data sets -
1) 10 years of data on what percentage of the total patients in a hospital emergency room are diabetic
2) 10 years of data on what percentage of the total patients admitted in the same hospital are diabetic.
While I can easily compare the above data in the form of two linear trend lines (X-axis - years, Y-axis - Percentage of total patients being diabetic ), I wanted to ask how to statistically compare the two trend lines?
Thank you
If your interest is in linear trend, then plot x-axis as year and Y-axis as % of patients then calculate the trend and identify slope values.
Rearrange the data monthly and you may repeat the same monthly-wise.
Then you will have 10 slopes values for both sets ..then attempt t-test based on slope. if you have subject details maybe you can calculate subject-wise slope as well.
best!!
• asked a question related to Medical Statistics
Question
In my project, I have many variables but a very small sample (non-parametric). I'm trying to prove a link between two variables while correcting for covariates. In short, we are looking at white matter tracts and their links to visuospatial function and quality of life (QoL). We are analyzing 3 tracts, which can either be normal, displaced or ruptured (ordinal) and we have 8 scores for visuospatial abilities. Let's define variables:
• Y1 = tract 1 integrity (3 categories)
• Y2 = tract 2 integrity (3 categories)
• Y3 = tract 3 integrity (3 categories)
• X1 to X8 = visuospatial test scores (quantitative discret, could be dichotomous with pass/fail)
Others:
• X9 = QoL score (quantitative discrete)
• X10 ... = age, gender, etc. (these covariates are not important for right now, I'll had them in my study latter)
1) The thing is that while we are looking the links between the integrity of the first tract and the visuospatial function (Y1 and X1-X9) for example, there is a possibility that the second tract (Y2 and/or Y3) is also affected and thus, Y2 might be the one responsible of the deficit and not Y1 (I need to correct for Y2 and Y3 to prove that Y1 is responsible alone). I was thinking of logistic regression, but I'm not sure how to treat Y2 and Y3.
2)There is a second part to the project. Each patient undergo a surgery and we want to compare if the change in the integrity of the tract (Y) correlates with the change in visuospatial capabilities (X) pre and post surgery (so if a white matter tract is repaired, does visuospatial function return and conversely, if we disrupt a tract in surgery, does the visuospatial function decrease?). So it's like a repeated mesures (2 times), but I still have the same issue with Y2 and Y3 vs Y1 (if Y1 is repaired, but Y2 is not and Y3 is ruptured during surgery for example)... I was thinking about a generalized estimating equation (mixed model), but still not sure how to treat my variables.
3) We also want to do 1) and 2) with QoL instead of visuospatial tests (I guess I'll use the same test)
I had the idea of categorized differently, but my sample will probably be too small to do this:
Y1 = tract 1 only
Y2 = tract 2 only
Y3 = tract 3 only
Y4 = tract 1 and 2 (but how can I capture the difference between 1 displace and 2 rupture vs both rupture or displaced?)
Y5 = tract 2 and 3 (id.)
Y6 = tract 1 and 3 (id.)
Y7 = all tracts
I looked into non parametric test (such as Mann-Whitney, Kruskal-Wallis, Pearson, tau, etc.) but none can be applied to my research (many variables interact with each other...)
sorry for the vague term, I used simple term to explain to someone and forgot to edit it in this post (and English is not my first language, sorry!). I want to find the correlation (or prove there is a relationship) between my variables. My hypothesis is that when Y1 is normal, visuospatial tests are normal and as we go to displaced and ruptured, the visuospatial tests are getting worse (probably linearly). After, if we find a correlation between the integrity of the tract and visuospatial skills, I want to show that if we ´´repair’´ the tract during surgery, the visuospatial skills are reversible and the patient will score better on the test. I have some background in stats, but not in biostats and the statistician I contacted isn’t either…
exactly, in the litterature those 3 tracks have some unclear link to visuospatial abilities, so that’s why we chose them. They should be independent (meaning that when only y1 is rupture for example, there is a decrease in visuospatial score, same for y2 and y3 taken alone). But, if y1 and y2 are ruptured, I want to show that there is even worse score
• asked a question related to Medical Statistics
Question
when i search the articles to know the correlation coefficient between two variables, i got only regression equations showing relationship between two variables.
for example: total length of femur= 32.19+0.16 (segment 1)
from this equation can i calculate value of correlation coefficient between total length of femur and segment 1.
Yes, you can. The beta value and correlation value are exactly the same when the standard deviations of both variables are the same.
If we know the standard deviations of both variables and the beta value, we can easily calculate the correlation coefficient. Please check out the example below
• asked a question related to Medical Statistics
Question
Hiiiii everyone!
I have an enquiry on statistical analysis. I was looking for many forum and it's still cannot solve my problem.
I want to compare means of two groups of data but only with two measurements. For example:
Treatment A:
Before treatment
After treatment
Treatment B:
Before treatment
After treatment
For the case, I would like to compare is it have any significance between two groups. But I couldn't use One-way ANOVA since it only have two measurement.
Hope anyone can help me! Thank you vey much!!!
You can run an ANCOVA (analysis of covariance), i.e. a regression:
Post = Pre + Treatment/Group (in that order). The assumption is that there is no interaction between 'Pre*Group'. i.e. the heterogeneity of slopes.
If this assumption is not met, you can use the Johnson-Neyman technique to remediate.
• asked a question related to Medical Statistics
Question
We wish to test for both true positive instances and true negative instances. As tests are done separately for each diseases with the device, should we consider it as 3 separate tests or should it be considered as single test for the purpose of sample size calculation?
If you want to validate the device for all 3 diseases in one step, then you should also consider this in the sample size calculation. You might want to look for multiple co-primary endpoints.
Best Matthias
• asked a question related to Medical Statistics
Question
I'm trying to assess severity of symptoms among GI patients pre and post intervention over time (post 12+ months). Dataset provided collected 1000 patients at 7 time-points (pre surgery, 0 - 1 mo., 2 -3 mo., 4 - 5 mo., 6 - 8 mo., 9 - 11 mo, 12 + mo.) with 767 unique patients (who responded only once). Total N for each time point vary between 80 - 185. Data collected used a survey which total score ranged between 0 - 50. scores were then categorized in 3 groups: none, moderate, and severe. Was asked to conduct a group x time interaction. But unsure how do it with unequal time intervals, different participants for each time point, and data not normally distributed. When assessing proportions per group results for moderate and severe show a "U" shape. Only additional data was given was gender and age.
Currently, I've only been able to do chi-square, or fisher as needed. But would like to do additional stats tests and would like ask what would be the best stats to do?
I have limited stats experience and have assess to Prism GraphPad. Any recommendations how to best review the data would be extremely helpful.
Thank you.
I apologize for the delay of response. Yes, I would encourage you to follow PI's advice. My input is based on my training in statistical science, but by all means, I do not claim to be an expert in your field. Please take information below as a mere suggestion for alternative approach to answer your research question.
Having said that, here are my elaborated version of the previous response: it is my understanding that your interest is on evaluating 'slopes' at each time point by group. For instance, your 'none' group shows positive slope between 'pre-treatment' and '0-1 month', whereas the other two groups show negative slope.
You can run linear or generalized linear mixed effect model to quantify time-specific, age- and sex-adjusted slopes at the individual level, and extract those values and save them as data. You would then have individual-level slope information at multiple time points, which in turn multiple comparison tests can be used to evaluate slopes at each time point by group. For instance based on the graph you shared, your 'none' group slope would be statistically different to the other two groups for obvious reasons (positive slope vs. negative slope).
As you take this longitudinal 'piece-wise' regression approach, the benefits are: 1) you can control for auto-correlation for repeated observations; 2) you have a secondary dataset that can be manipulated further (for tables and graphs); 3) you can also make your time variable as continuous and predict individual's outcome based on the formula you built. (This would be a possible solution for your unequal time interval problem).
In my training, to do any kind of meaningful trend analysis, you need more intensive time points. You may consider doing a simple change for the two periods and run a simple binomial hypothesis testing, but the problem with this approach would be centering the sample mean, which can be tricky. Hope information were not overwhelming. Once again, take these information as a potential alternative approach. Good luck.
• asked a question related to Medical Statistics
Question
I want to teach about Bayes conditional probability.
I would like to use data about the probability of having a number of symptoms for a number of common diseases.
I am surprised that this is not easily found!
Where can I get some data (symptoms probabilities by disease) to use on my class?
• asked a question related to Medical Statistics
Question
Hi all,
I've been reading up on parametric survival analysis, especially on accelerated failure time models, and I am having trouble wrapping my head around the family of distributions.
I am aware that there are the exponential/weibull/log-normal/log-logistic distributions. But what I can't seem to find a clear and consistent answer on is which of the following is the one that is actually assumed to follow one of those distributions? Is it the survival time T, the log survival time ln(T), the hazard function h(t), the survival function S(t), or the residuals ε?
Dear Shaun,
Yes, if you have enough data then the empirical density estimates will be very helpful. If you know enough about the process(es) by which T was generated this may be enough on its own. In other cases one might make decisions by comparing various models.
Take care,
• asked a question related to Medical Statistics
Question
I am a bit stuck with the following problem. I am currently performing a meta analysis on observational studies using CMA where my outcome variable is the performance on a cognitive test as a function of adherence to an exposure variable (on a scale of 0 to 9). Normally,  the results are presented either as means differences between tertiles according to the level of adherence (low, middle or high) or as a regression coefficient per additional unit of the exposure variable.
My main question is, how can I pool together both types of studies into the same meta analysis? I have found a similar question on risk estimates that suggest to estimate the linear trend of the categorical results. I don't have access to raw data and I only know sample sizes, mean differences and confidence interval. Is it possible to do the same in this case? If so, how should I do it?
I was thinking of just including each tertile comparison as a subgroup of the same study, and leave the continuous variables as they are. But I am not sure if this lousy approach is acceptable.
Thanks.
Maybe you can refer to Cochrane hand book. The chapter 9.4.6 described the problem.
9.4.6  Combining dichotomous and continuous outcomes
Occasionally authors encounter a situation where data for the same outcome are presented in some studies as dichotomous data and in other studies as continuous data. For example, scores on depression scales can be reported as means or as the percentage of patients who were depressed at some point after an intervention (i.e. with a score above a specified cut-point). This type of information is often easier to understand and more helpful when it is dichotomized. However, deciding on a cut-point may be arbitrary and information is lost when continuous data are transformed to dichotomous data.
There are several options for handling combinations of dichotomous and continuous data. Generally, it is useful to summarize results from all the relevant, valid studies in a similar way, but this is not always possible. It may be possible to collect missing data from investigators so that this can be done. If not, it may be useful to summarize the data in three ways: by entering the means and standard deviations as continuous outcomes, by entering the counts as dichotomous outcomes and by entering all of the data in text form as ‘Other data’ outcomes.
There are statistical approaches available which will re-express odds ratios as standardized mean differences (and vice versa), allowing dichotomous and continuous data to be pooled together. Based on an assumption that the underlying continuous measurements in each intervention group follow a logistic distribution (which is a symmetrical distribution similar in shape to the normal distribution but with more data in the distributional tails), and that the variability of the outcomes is the same in both treated and control participants, the odds ratios can be re-expressed as a standardized mean difference according to the following simple formula (Chinn 2000):
.
The standard error of the log odds ratio can be converted to the standard error of a standardized mean difference by multiplying by the same constant (√3/ π = 0.5513). Alternatively standardized mean differences can be re-expressed as log odds ratios by multiplying by π/√3 = 1.814. Once standardized mean differences (or log odds ratios) and their standard errors have been computed for all studies in the meta-analysis, they can be combined using the generic inverse-variance method in RevMan. Standard errors can be computed for all studies by entering the data in RevMan as dichotomous and continuous outcome type data, as appropriate, and converting the confidence intervals for the resulting log odds ratios and standardized mean differences into standard errors (see Chapter 7, Section 7.7.7.2).cutually, cochrane hab
• asked a question related to Medical Statistics
Question
Hello everyone!
I have a theoretical problem with a statistical analysis. I was looking a lot at different fora but I could not find an easy explanation for my problem.
I want to compare means of two groups of data. In a simple case, I would use "t-test". However, in each group, I have few measurements for each individual. First, I wanted to measure a mean for every individual in a group, then compare the means of groups, but I know that it is not a good idea (mean of means/average of averages...).
For example how to compare this data sets:
GROUP A:
Individual 1: 5, 6, 7
Individual 2: 5, 7, 7, 6
Individual 3: 6, 7, 7
GROUP B:
Individual 1: 4, 5, 4
Individual 2: 7, 8, 6
Individual 3: 4, 3, 4, 2
You can imagine two groups of people. A - treated, B - untreated. In each group there are 3 people and some variable were measured with 3-4 repeats.
As you can see there are two groups made of few individuals for which few repeated measurements were made. I would like to compare two groups using means calculated for individuals, not measure simple mean for the whole group.
I have read a lot about pooled data, weighed means etc. but I still do not know how to perform t-test in that case (or another).
I hope you can help me!
Dear Chuang Liu,
Kindly explain the idea behind the "nested-test" as mentioned by you.
Also, mention the types of hypothesis that can be tested by this method along with the test statistic, with the probability distribution followed by it, used in the method.
• asked a question related to Medical Statistics
Question
If correlation coefficient (r) is 0.6 then is it considered as low/moderate/high relationship with variable specially in medical research?
Statistical analysis
Statistics Corner: A guide to appropriate use of Correlation coefficient in medical research. M.M Mukaka. Malawi Medical Journal; 24(3): 69-71 September 2012
0.90-1.00 Very high ; 0.70-0.90 High; 0.50-0.70 Moderate ; 0.30-0.50 Low ; 0.00-0.30 Negligible
• asked a question related to Medical Statistics
Question
I am a beginner and having a hard time finding statistics reference for obesity/pancreatic cancer.
Thank you
Kindly see the following RG links and PDF attachments.
• asked a question related to Medical Statistics
Question
I run a network meta-analysis. I try to run egger's and begg's test for different subgroups using these commands in stata:
metabias logrr selogrr, egger by(treatment)
metabias logrr selogrr, begg by(treatment)
but I get this response:
option by(treatment) not allowed
For those who are interested, the Statalist thread can be viewed here:
• asked a question related to Medical Statistics
Question
Dear fellow researchers,
Does anyone hold any useful information (or possibly a reasonable estimate) on the incidence rates (by age groups, if possible) of lactose intolerance and/or lactase deficiency in US?
• asked a question related to Medical Statistics
Question
Very frequently in papers, devoted to parallel clinical trials, I face a situation, when a calculated SD for an effect in each group is approximately equal to an SD of effect difference between the two groups.
An example may be found, e.g. in the following paper (Table 2):
Pelubiprofen achieved an efficacy in VAS scale of 26.2 with SD = 19.5. Celecoxib achieved efficacy of 21.2 with SD = 20.8. However, a difference is 5.0 with SD = 20.1! I was expecting SD ~sqrt(2) more, since the samples are independent and has approximately equal size.
Amr Muhammed , yes, to be more precise the weights will be not exactly 1/2 but will depend on the groups sizes. But in the mentioned table (and many other similar papers) this SD is referred to a difference: difference mean = 5.0, difference SD = 20.1.... Probably it is some kind of conventional notation for a united sample SD but it seems to be too confusing.
• asked a question related to Medical Statistics
Question
Please write for all kinds of statistics and more.
Our aim is to reach and help new researchers by sharing medical statistics.
Best regards
The new research methodology and statistics book of our group is in the stage of publication and we expect your questions and contributions.
• asked a question related to Medical Statistics
Question
I am looking into factors affecting grip strength into sonographers and am taking it further but struggling to work out which statistical test to use.
I have compared factors affecting grip strength and as am considering age, BMI, gender, years scanning, past injury and self reported estimates of percentage of obese patients scanned I am now looking at some of those in relation to each other e.g. Injury and estimate of number of obese cases, years of scanning, age.
I am really struggling with how do decide which statistical test is correct looking at these factors against each other, any suggestions appreciated.
Which statistical tests should be used when analyzing, Factors Associated with Stunting among Children (0-24 Months).
1. Birth order
2. Education (parents)
3. Hb (Mother)
4. Breast Feeding
5. Complementary feeding
6. Gender
Regards,
• asked a question related to Medical Statistics
Question
Hello,
I have noted contradictory advice from statisticians on how to model time-varying covariates in a repeated measures mixed effect model. For instance, you may have BMI measured every month as the exposure and a blood biomarker measured at the same time (or maybe different times) every month as the outcome. If you wanted to determine the effect of BMI throughout the follow-up period then how would you do this?
Some statisticians say you don't do anything differently (at least not in STATA coding) because in long format STATA can determine which variables are level 1 and level 2 by whether or not they vary by time-point. Other statisticans state that you need to create new variables that equate to the between- and within-person effect on the outcome.
Just wondering what you all thought of this.
Thank you,
Spencer
Here is some Stata Code for explanation:
mixed blood bmi || id:, mle /* classic model with entangled effects */
bys id: center bmi, prefix(z_) mean(m) /* fixed effect by hand */
egen m_bmi=mean(bmi), by(id) /* random effect by hand */
mixed blood z_bmi m_bmi || id:, mle /* hybrid model by hand */
ssc install xthybrid
xthybrid blood bmi, clusterid(id) family(gaussian) link(identity) /* automated hybrid model */
xthybrid blood bmi, clusterid(id) family(gaussian) link(identity) cre /* Mundlak model */
• asked a question related to Medical Statistics
Question
Suppose I have a Questionnaire for a Stress assessment that contains 30 questions, each question has 5 answers (0- no stress, 1-mild stress, 2- moderate , 3-High stress, 4- Severe stress). The Total score of the 30 question varies from 0 - 120.
How we can categories the Total score (the range of total score is 0-120) into mild , moderate and severe? Which cut off s should l take for mild, moderate and severe?
Hello there,
Step #1:
Compute the overall mean score for the items of that variable of interest
Step #2:
Make cutoff points using a calculator (Maximum- Minimum / n). How? In case of a five-point Likert scale, you may, for example, have assigned the value (1) for Strongly Disagree, (2) Slightly Disagree, (3) Neutral, (4) Slightly Agree, (5) Strongly Agree. So in order to make cutoff points, you've to do this simple math, as per the formula stated above: (5-1/3). Upon calculation you will get this value (1.33), right! This is the interval value.
* "Highest" refers to the highest score of the given Likert scale (5, in our example)
* "Lowest" refers to the lowest (1)
* "n" refers to the number of CATEGORIES you intend to create
Step #3:
Do the math for the three categories (Low, Mid & High). How? Just add up the score of your interval value to the three category, as in the following:
Low (1 - 2.339),
Mid (2.34 - 3.669)
High (3.67 - 5)
Step #4:
If you are using SPSS, enter these category-related values by navigating through "Transform", and "Recode into Different Variable". Here you will need to recode the Mean score we've obtained in Step# 1. After you give it a new Name and Label, click on "Old and New Values". A new window will pop up. Select "Range", and then insert the values of the categories given above. You should give a value for each range (i.e. 1 for Range no 1), in the New Value empty field. Then click "Add". Follow the same routine for the other range values of your categories, 2 & 3. Click "Continue", and then "Ok".
Step #5:
Go to the Data Editor page and scroll down to find your new created variable. You'll need to click on Value to make labels. Please add 1 in the Value field, and in the Label field the word "Low", and thus 2 for (Mid), 3 for (High). Click "Add" each time you make a new try, and then "Ok".
That's all.
Cheers,
Ahmed
• asked a question related to Medical Statistics
Question
Can anyone point me to a way to test for the normality of censored data in R?  I have heard of the Crame'r-von Mises statistic but the paper is behind a paywall.
• asked a question related to Medical Statistics
Question
I have a survey with dichotomous variables and need to do a factor analysis.  I am working with SPSS and not very familirar with how to write syntax for that.  I seen some examples but it is too complicated for me.
Does anyone know of a step-by-step procedures how to do tetrachoric factor analysis?
Or, can anyone help me with that process?
Thank you very much.
Tatyana
Hello Dr. Greg Camilli.
I am working on running EFA for dichotomous data for 2000 observations. I read your previous comment in the on-going discussion (which was really helpful).
I am curios that whether using "ml" in "fa" is a good way while dealing with dichotomous data. I am a new learner of IRT and EFA, just want to make sure that I understand its application in a refined way.
Tatyana,
I'd use R. You need to install the psych package, as previously noted. Create a data file x of dichotomous responses and then:
out <- tetrachoric(x,y=NULL,correct=TRUE,smooth=TRUE,global=TRUE,
weight=NULL,na.rm=TRUE, delete=TRUE)
mla <- fa(out\$rho,nfactors=Q,covar=TRUE,rotate="oblimin",fm="ml")
Although it's very simple. this approach is time consuming with large data sets, and it no so good in the presence of missing data--not to mention that factoring tetrachorics is a very old technology). A  modern approach would be obtained with the software flexMIRT or IRTPRO.
• asked a question related to Medical Statistics
Question
I was wondering if anyone had any resources on how to do a pooled prevalence in R? Is it possible to have a forest plot as a result? Any help would be greatly appreciated.
Thanks
Dearbhla
You can use the meta package in R.
First step, perform the metaprop command in an object (e.g. metaprop_results). The metaprop function needs three arguments: the number of events per study (event), the study population (n), the dataframe in which they are stored (df).
metaprop_results <- metaprop(event = your_events, n = your_n, data = df)
Second step plot your results as forest plot
forest(metaprop_results)
• asked a question related to Medical Statistics
Question
Let's say I want to compare the trend of CRP and the trend of WBC over a period of time of the same patient.
As I understand correlation coefficient compares two data points at single time point, and what I want is a trend of CRP compared to a trend of WBC.
Which test would be appropriate for that?
The statistical approach to be applied, in this research project, depends upon the type of results (or values) of the trend that are available.
If the values of the trend are available then the test of significance of the difference between the trend of CRP and the trend of WBC can be applied.
This will enable to know whether there is any significant difference between the trend of CRP and the trend of WBC.
However, if the dependence of one on the other (between the trend of CRP and the trend of WBC) is supposed to be known then regression analysis can be applied.
• asked a question related to Medical Statistics
Question
I am after some suggestions on what statistical analysis I can perform to show a before-and-after effect in a longitudinal electronic healthcare record (EHR). I have N number of EHRs, of varying sizes/time-spans. Each record has a history of recurrent disease records (for the one disease). To see whether a particular drug has had an effect on the disease outcome (duration before the next relapse), I have used time-gap recurrent cox regression.
However, I would now like to see whether the disease outcome (a series of remissions into relapses, good = long durations in between, bad = short durations in between) is immediately clear from the first prescription of a particular drug. In my head I imagine, taking all of the records (of vary time-span sizes -- very important to remember), and adjusting so every record overlaps when the drug of interest is first prescribed. Y axis is disease prevalence or risk, and x axis is time. From before the initial drug prescription event, disease prevalence/risk should be high, then after crossing the initial prescription time, disease prevalence/risk should drop. This would help demonstrate the efficacy of the drug.
Some points to remember: 1) Each medical record maybe unique in timespan. 2) The first prescription event of a particular drug will happen at different times across the record set. 3) Some records may have no medical events before the drug was prescribed (as all the diseases of interest feel after the drug prescription of interest). 4) The number of medical events either before or after the first prescription of the drug may be sparsely populated (making binning by time very difficult) or richly populated.
Is there a name for this kind of analysis? I am using R. Any suggestions are very welcome.
Look like the data description fits into an interrupted time-series analysis
• asked a question related to Medical Statistics
Question
Hi,
I am looking to find a method to compare the likelihood ratio for two non-binary diagnostic tests, performed on the one group of patients.
Specifically, I have the results of two different antibody staining results that have 4 possible scores (e.g. 0, 1+, 2+, 3+). I then compared the results to the gold standard, which is a binary result (e.g. FISH +ve, or FISH -ve). Using this information, I was able to generate a likelihood ratio and 95% CI for each score.
I found a paper on how to calculate interactions between two estimates (e.g. risk ratios or LRs) by analyzing the values on a log scale to generate a z-score [Altman and Bland (2003). Interaction revisited: the difference between two estimates]. Would this be valid on my sample? I think there was a mention that the test only works for independent measurements, and should not be used on two estimates from the same patients. The two antibodies are independent tests, but does it make the test invalid if I compared their results on the same set of patients?
As the biology of a patient has some influence on each of your results, those estimates are in a way dependent on each other. Let's say a subject has low health - this will cause his antibody results be lower in both results compared to someone who is healthy. I assume that the test you used depends on random samples (at least) because of that matter.
My suggestion is to split your subjects sample randomly and use one part for diagnostic test 1, the second part of the sample for test 2. Then compare those results as you did. (be sure to have a large enough sample and to split the sample randomly or else the effect will be biased)
I hope this helps you a little.
• asked a question related to Medical Statistics
Question
Hi all
I am new in the field of medical statistics, I want a link to vedios on how to perform meta-analysis
If you are very new to medical statistics, you may register the course 'Introduction to Systematic Review and Meta-analysis' for free in Coursera. The course is taught by Bloomberg School of Public Health at Johns Hopkins University. It will help you to get some understanding on how to perform and interpret meta-analysis.
Thanks.
• asked a question related to Medical Statistics
Question
For disease X there exists no real, modern gold standard. The disease can be diagnosed with 4 diagnostic tests (A,B,C,D), all leading to a yes or no answer (binary).
I have a data set with results of those 4 tests of 100 persons. Not every test has been run in every person. Now I want to calculate the sensitivity of test A.
In order to receive a relative sensitivity of test A (relative to B,C,D), could I
- summ up the number of all the positive test results of B,C and D of those cases (I believe all positive, "BP"), in which test A has been run
- and divide this number by all positive results of test A?
Is that legit?
Furthermore I want to to calculate the sensitivity of test B (summ up the number of all the positive test results of A,C and D of those cases, in which test B has been run and divide this number by all positive results of test B), C (same procedure) and D (same procedure).
Is that a legit way to compare relative sensitivities in my data set? Is there any literature confirming / strengthening this procedure?
Thank you very much!
Why not use culture or PCR as a gold standard for calculations?
• asked a question related to Medical Statistics
Question
I am comparing between two popular shoulder scores ( different values for high possible scores) and I would like to produce conversion formula by using simple linear regression model but I have found heteroskedasticity between two shoulder scores ( Unstandardized residuals Vs Independent variables ) ?
How I can correct heteroskedasticity if I would not use any transformation approaches because it will affect on final linear regression equation.
Nawaf is predicting one measure from another using a regression coefficient (b) as a "conversion" factor.  WLS should be used with the heteroscedasticity in the error structure.  I hope I have now stated that more clearly.
• asked a question related to Medical Statistics
Question
• Reviewers and Editors operate at the cutting edge of science, at a frontier where fact and fancy/fiction intermix, at the border of the measurable and the immeasurable, on the slippery slope of insight where a nebulous cloud stubbornly refuses to lift sometimes for decades or centuries, and at times battle with conflict of interest if they themselves are active researchers in that field.
• What are the qualities of an ideal reviewer? What is the role of instinct in review? Can any two or three reviewers have the same mental horizon, the same willingness to consider new proposals with equanimity, the same ability to understand the complex mathematical game of medical statistics, the same ability to see through the written lines and the hedging terms used and the claims of originality or being the first to present a view or an investigation, and to judge with impartiality the ultimate objective of the authors who in general fervently wish to place a stake in the field as if buying a piece of real estate?
• Has any editor ever recused herself/himself on the grounds of conflict of interest? Should they?
• How can journals compensate reviewers for their time and effort?
• Since the reviewer-editor combine forms the most significant gate-keeper function for science, this column should produce lively discussion and contribute to a better general understanding for all stakeholders, both authors and reviewers.
• I have been a reviewer also for around 2 decades now for several high profile medical journals (see file). I will also participate in the discussion that will surely follow to hopefully usher in a better future for medical/scientific publishing.
In a scientific sense, transparency is great. Do science without political or social consequences, a pure search for fact/truth. However, few people work that way.
In a social/political sense I am not so certain that this is a good idea. In an open system, the mistakes that you make can haunt you forever as author or as reviewer. It is hard enough reading the published literature, but adding all the reviewer comments and author responses is too much. The only thing that I can see happing with making drafts and reviews public is that people with an axe to grind will be the ones to sift through to find trouble.
I agree that sometimes the flaws are more interesting. Yet multiple analyses are "confusing" and an "over-analysis of the data." The more frequent comment is to make the methods shorter, get rid of extraneous material, and keep it simple.
• asked a question related to Medical Statistics
Question
I want to conduct research of 5 malnourished children (stunting, wasting and underweight)?
/*(For NFHS-3)
For Underweight (-SD), Stunting (-SD), Wasting (-SD)*/
use IAKR52FL
gen haz06=hw70
replace haz06=. if hw70>=9996
gen waz06=hw71
replace waz06=. if hw71>=9996
gen whz06=hw72
replace whz06=. if hw72>=9996
(For -2SD)
gen below2_haz = ( haz06 < -200)
replace below2_haz=. if haz06==.
gen below2_waz = ( waz06 < -200)
replace below2_waz=. if waz06==.
gen below2_whz = ( whz06 < -200)
replace below2_whz=. if whz06==.
(For -3SD)
gen below3_haz = ( haz06 < -300)
replace below3_haz=. if haz06==.
gen below3_waz = ( waz06 < -300)
replace below3_waz=. if waz06==.
gen below3_whz = ( whz06 < -300)
replace below3_whz=. if whz06==.
label variable haz06 "Length/height-for-age Z-score (stunting)"
label variable waz06 "Weight-for-age Z-score (Underweight)"
label variable whz06 "Weight-for-length/height Z-score (Wasting)"
label variable below2_haz "Stunting (-2SD)"
label variable below3_haz "Stunting (-3SD)"
label variable below2_waz "Underweight (-2SD)"
label variable below3_waz "Underweight (-3SD)"
label variable below2_whz "Wasting (-2SD)"
label variable below3_whz "Wasting (-3SD)"
save IAKR52FL, replace
• asked a question related to Medical Statistics
Question
We are testing a new diagnostic tool and comparing it to the actual gold standard for this diagnosis.
Briefly, we examined 25 patients with the new diagnostic tool (test A) and the gold standard diagnostic tool (test B). Test A gives a positive result or a negative result (no variability or range in numbers, just "positive" or "negative" as outcome). We then performed test B which also gives a "positive" or "negative" results and which is considered the true result since this is the gold standard diagnostic tool.
All patients having a positive result on test A (n=18), had a positive result on test B (n=18).
Of all patients having a negative result on test A (n=7), 5 were negative on test B but 2 were positive on test B.
Overall, 23 patients had the same outcome on test A and test B, 2 were different, which means that our new diagnostic test has a sensitivity of 92% (if we consider test B to have 100% sensitivity).
Can you recommend me any more statistics on this data, to draw conclusions? Any idea to look at this data from another perspective? Any help or insight is appreciated.
Thank you
good question I follow
• asked a question related to Medical Statistics
Question
This is my first ever medical statistics/epidemiology questions, so please be patient if I come across as naive, I normally focus on drug and protein chemistry.
I have a huge medical dataset. From this set I have divided up the population by certain characteristics, a particular disease, an age group, and gender. I have four disease to consider over seven age groups of gender, thus 56 sub populations. Per sub-population I determine the prevalence within that population of reported comorbidities. This will leave me with 56 lists of "disease (a name) - prevalence ((0-1])". That is a lot of data. Putting aside a particular hypothesis or particular descriptive question, how would you go about displaying this amount of data in a report or a publication?
I agree with Michael S. Martin.
• asked a question related to Medical Statistics
Question
Best medical statistical software ?, I used spss but want to try easier software. A friend mentioned medcalc but lisence cost 450\$, Does it worth ?
Using R is the best option as suggested by Carmen , it can do everything but one thing is that it is not so easy.
One other option you can go through is Epiinfo , it is specially designed for Medical scholars and is free as well.
I hope it helps.
• asked a question related to Medical Statistics
Question
In modeling time to event data using a proportional hazards regression approach for repeated events, in which some patients have multiple events, the situation is often conditional since a patient can only have a subsequent event if they had a previous event. For example, a cardiac patient having one or more arrhythmias after heart surgery or a metastatic breast cancer patient having multiple recurrences or progressions of their disease after chemotherapy treatment. Are there useful ways of estimating the hazard ratio with reliable standard errors in these kind of recurrent event processes? It would seem that the correlation between the events within each patient or subject should be accounted for in the model.
Dear Dr Zurakowski,
One of the most applied approaches in multiple failure-time framework are extensions of the Cox's regression model (all of them are proportional hazards models). According with Kelly and Lim (2000), there have been proposed several extensions which seek to address to the most varies particularities of each case study. For reccurent event analysis, I recommend to you apply the Prentice, Williams and Peterson (PWP) model or the Andersen and Gill (AG) model. The PWP model is suggested to be applies when the risk of occurrence of the following events is affected by the previous ones and the AG model is applies when the events reveals that they have the same risks of occurence. Read the following paper of Amorin and Cai (2015) to easily understand the implementation of this two models: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339761/
Further information can be found on my researchgate profile: https://www.researchgate.net/profile/Ivo_Sousa-Ferreira
Best regards,
Ivo Sousa-Ferreira.
• asked a question related to Medical Statistics
Question
Let's say recruitment period is X months after which randomization to exercise intervention groups occurs. The outcome is changes in depression. What are some methodological and ethical issues in choosing to defer treatment until after the recruitment period. The other option would be randomization and start of intervention as participants are recruited, but I can think of many reasons why this would be tough to do.
Dear Spencer:
May I say that your question leaves out details that would enable one to provide a cogent response to you.  However, let’s assume that all subjects are already undergoing treatment with standard of care for depression and will remain on such treatment(s) during the study.  If that is the case, then it will depend on what it is you are doing during the intervening period between recruitment and randomization.  Another consideration or question is when you consent subjects and what you tell them this period is to be used for.  There are a number of other points of consideration that come to mind, but in deference to pithiness in this forum I shall stop with the ones I’ve highlighted above.  Good luck
Cheers,
Chuke
• asked a question related to Medical Statistics
Question
I would like to use the propensity score matching in measuring the effect of treatment between the control and treated group
doing it by spss 22 after the R plug is easy but I would like to understand the output and measure the effect
1.  Why Propensity Scores Should Not Be Used for Matching - Gary King
https://gking.harvard.edu/files/gking/files/psnot.pdfDec 16, 2016 ... We show that propensity score matching (PSM), an enormously popular ... proximates random matching which, we show, increases imbalance ...
2.  An Introduction to Propensity Score Methods for Reducing the ...
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/Jun 8, 2011 ... Propensity score matching entails forming matched sets of treated and untreated subjects who share a similar value of the propensity score ...
3.  Propensity Score Matching: Definition & Overview - Statistics How To
http://www.statisticshowto.com/propensity-score-matching/Feb 3, 2017 ... Statistics Definitions > Propensity Score Matching What is a Propensity Score? A propensity score is the probability that a unit with certain ...
4.  Propensity-Score Matching (PSM)
http://cega.berkeley.edu/assets/cega_events/31/Matching_Methods.pptPropensity score matching: match treated and untreated observations on the estimated probability of being treated (propensity score). Most commonly used.
5.  Understanding Propensity Scores
http://bayes.cs.ucla.edu/BOOK-09/ch11-3-5-final.pdf11.3.5 Understanding Propensity Scores. The method of propensity score ( Rosenbaum and Rubin 1983), or propensity score matching (PSM), is the most ...
Dennis
Dennis Mazur
• asked a question related to Medical Statistics
Question
Effect size is reported in literature in multiple ways. One common form is risk ratio. Using this risk ratio of a paper as reference for sample size calculations in a new study, is difficult for me as it doesn't give me enough inputs to put into the sample size calculations.
I don't know if there some formula to convert risk ratio to Cohen's D, but certainly you can convert odds ratio to Cohen's D. And, i suppouse if you have an 2x2 table you can calculated both: a risk ratio and a odds ratio. So i recomend you to use odds ratio.
Micheal Borenstein's book "Introduction to Meta-Analysis" (2010) give the formulas for converting odds ratio to cohen's d and viceversa in the chapeter 7. I attach the link to that chapeter.
Also, there are sites when you can do the calculations automatically, i'll post you one link that i found useful.
• asked a question related to Medical Statistics
Question
knowing that the outcome y has two options only
the treatment [0,1]
What T tests do you know?
• asked a question related to Medical Statistics
Question
I'm conducting a meta-analysis on hypoglycemic risk associated with diabetic drugs. Some studies report only the incidence rate of hypoglicemic events and the number of patients. Data are in the following format: 3.2 hypo/patient/year with 100 patients on drug A versus 2.1 hypo/patient/year with 100 patients on drug  B; the event can occur more than once in each patient. Is there any way of estimating standard error when nor confidence intervals or standard deviations are given? Thanks
I'm not clear on what information you have, but if you have an incidence rate (i.e., cases per person-time) then you can assume a Poisson distribution.  Suppose you have 150 cases over 20,000 person-years.  Your incidence rate (IR) would be cases/(total pt):  150/20,000 or 75 cases per 10,000 person-years.  The variance for the IR is cases/((total pt)^2):  150/(20000^2) or 3.75 x e-7.  Your standard error the square root of this:  sqrt( 3.75 x e-7) or 0.00061237.   Thus your 95% CI for the IR (0.0075)  would range from 0.0063 to 0.0087.  So if you only have the incidence rate and total person-time, you can work your way back to an SE.
If you know the number of incident cases, then the SE for a Poisson random variable is:  sqrt(cases). (For me, this is the easier approach).   For the above problem we have 150 cases (ignore the person-time for the moment).  The 95% CI for the number cases is:  150 - 1.96*(sqrt(150)) or 126 as the lower bound, and 150 + 1.96*(sqrt(150)) or 174 for the upper bound.  From here, you can the 95% CI for the incidence rate by dividing the lower and upper bounds for the cases by the person-time.  126/20000 or 0.0063, and 174/20000 or 0.0087.   These agrees well with the previous IR estimates.
Paul
• asked a question related to Medical Statistics
Question
Hello all,
I conduct a study with a group of 44 students on the effectiveness of some tasks on their writing quality. However, I do not assign control group and treatment group but instead compare the difference between those who follow the treatment and those do not.
There's a possibility that the sample size of one group might be much larger than the other. But I still wonder whether is it possible to conduct the study this way and what kind of test should I use to the calculation of differences?
Of the 44 how many stopped taking the tasks, and how many continued. The imbalance may be not so worrisome unless a lot of students did not continue the task.
• asked a question related to Medical Statistics
Question
It is said that for retrospective studies in hospital setup does not required a sample size calculation as limited number of samples are available for research. If appropriate numbers are not there then how can one justify the p-value at the time of interpretation.
I think an optimum number of sample size is required for every research whether it is clinical trial, prospective or retrospective study...
Coming from a clinical quality improvement background, my projects tend to be pilot or exploratory studies. I typically don't run power calculations beforehand. Still, I suggest when presenting results that a limitation might be the sample size.
• asked a question related to Medical Statistics
Question
I have 31 patients with arthritis and 10 coders for ultrasound from different countries. Every coder got 3-4 patients for rating (ordinal rate 1-3). How can I determine inter-rater reliability? What type of kappa coefficient should I use in this situation? Should I use ICC or just Fliess kappa?
Is it applicable to calculate PABAK and prevalence index for more than 2 coders and ordinal data (1-3)?
Hi Aleksandra, how's it going?
I have the same exact question about the prevalence index and bias index with more than 2 observers and with variables that contain more than 2 categories.
Thanks!
• asked a question related to Medical Statistics
Question
Does  it even make sense to study other outcomes in a matched nested case-control study? In these type of studies, do we always have to use  case /control as our outcome?
no you cannot do that. In case of case-control study your outcome will be binary that is diseased and nondiseased. Plus the variable you are matching on cannot be studies.
• asked a question related to Medical Statistics
Question
I've been analyzing a few studies, but was struggling to classify these two as case controlled studies or RCTs or something else. The studies are:
It seems like a case-controlled study because it is comparing those with colorectal against those who do not, with the aim of identifying the level of microRNAs (which I assume is the risk factor) to determine the relationship between microRNA expression and colorectal cancer. There's also no intervention introduced. However, my understanding was that case-controlled trials are typically retrospective, but the two studies above seem to have prospectively enrolled subjects.
Could someone let me know whether the two linked studies are Case Controlled Studies or not? Many thanks!
Dear Tang
If I correctly understand the design of the two studies you quoted: The investigators started with identifying cases of Colo-rectal cancer and measure the RNA expression. In the mean time they measured the RNA expression in a comparative group who did not have Colo-rectal cancer. Thus both the disease status and the assumed risk factors were measured at the same time. This is a clear cross-sectional comparative study. Now, it is assumed that the RNA expression preceded the appearance of cancer , i.e. RNA expression could be a risk factor for Colo-rectal cancer. The analysis of data then will be like analysis of case -control studies. No intervention was made by the investigators. This means that you consider both of these two studies as case-control ones in your classification. In standard epidemiological text books you will find support to my view> Keep in your mind that in teaching epidemiology, we tend to make sharp distinction among various designs. In real life, overlaps  and modification do exist and feasible. For example from cohort studies we can make nested case-control studies, from cross sectional studies we can measure incidence rates and  also make a case-control sub-design.
With best regards
• asked a question related to Medical Statistics
Question
for G power software users: when to use chi-square and when to use Z test as test family in a priori sample size calculation.
Another question:
If my primary outcome is nominal data (clinical cure), I count the number of patients cured and then I get the % of patients cured in every group, and I am having two groups (test and standard therapy), also previous trials in this discipline used non parametric analysis for their primary endpoint (which is clinical cure also), can I use chi-square in Priori sample size calculation, and use it in post hoc power analysis ?? (the primary outcome was non normally distributed
Cross Validated

Questions
Tags
Users

Read this post in our app!
up vote4down votefavorite
Are different p-values for chi-squared and z test expected for testing difference in proportions?
r chi-squared p-value binomial proportion
I'm trying to test the difference in proportions using the z test method and chi-squared method, but am getting very different answers. Is that normal?
My data:
CI CII
Male 205 102
Female 83 39
Calculating the z score I get 0.25 which should correlate to a p-value of 0.4013. Calculating the chi-squared score I get 0.0626 correlating to a p-value of 0.8025.
I read that the z-score requires some assumptions (probability of success is ~0.5 and n is high). Is this violating those? Or is it just the nature of these different approaches that gives very different answers with the same meaning (no evidence of difference).
I'm certainly open to miscalculations, but I've re-checked. If this behaviour isn't normal I'll recheck again!
Here are my calculations in R.
> r1 <- 205
> r2 <- 102
> n1 <- 288
> n2 <- 141
> (p1 <- r1/n1)
[1] 0.7118056
> (p2 <- r2/n2)
[1] 0.7234043
> (common.proportion <- (r1+r2)/(n1+n2))
[1] 0.7156177
> (se.pooled <- sqrt(common.proportion*(1-common.proportion)*(1/n1+1/n2)))
[1] 0.0463676
> (zscore <- (p1-p2)/se.pooled)
[1] -0.2501466
>
> # chi-squared
> prop.test(c(205,102), c(288,141), correct = FALSE)
2-sample test for equality of proportions without continuity
correction
data: c(205, 102) out of c(288, 141)
X-squared = 0.0626, df = 1, p-value = 0.8025
alternative hypothesis: two.sided
95 percent confidence interval:
-0.10208385 0.07888645
sample estimates:
prop 1 prop 2
0.7118056 0.7234043
share improve this question
Tom
542●1●7●14
editedMar 13 '15 at 17:33
gung
87.5k●23●194●398

A related issue. If I'm presenting confidence intervals around each proportion calculated using the binomial distribution, but then comparing them and presenting a p-value using chi-squared, that seems a bit wrong. I might have overlapping CIs, but then a p-value that's less than 0.05. Am I thinking correctly? – Tom Mar 13 '15 at 4:29
3

Tom, can you show your math? These two tests should give very similar results for your sample sizes (especially if making the same choice about continuity corrections). – Alexis Mar 13 '15 at 5:19

Thanks, @Alexis. I've added the math in now. – Tom Mar 13 '15 at 7:24
order by               active
oldest

up vote7down voteaccepted
Very simple: both the z test and the contingency table χ2χ2 test are two tailedtests, but you have got the one-sided pp-value for your z test statistic. That is for H0:p1−p2=0H0:p1−p2=0, the pp-value = P(|Z|≥|z|)P(|Z|≥|z|), but your reported pp-value is only P(Z≤z)P(Z≤z).
Notice that 0.4013×2≈0.80250.4013×2≈0.8025. Easy!
Alexis
10.8k●2●31●70
editedMar 13 '15 at 19:05
5

And the square of the Z-score is (−0.2501466)² = 0.06257, which equals the test statistic X-squaredfrom the prop.test() output. – Karl Ove Hufthammer Mar 13 '15 at 18:55
1

Thank you for that "easy" answer! (Which prompted a good schooling on one-tailed tests. And now @KarlOveHufthammer you're sending me down another schooling. If x-squared is simply the z score squared, why do we even have it? And why isn't it called z-squared? (Obviously, not to be answered here. I have a lot to learn!) – Tom Mar 16 '15 at 6:29
2

@Tom The reason that it’s called X-squared in the R output, is that the X is really an ASCII interpretation of the greek capital letter chi (Χ) (the lowercase version of this letter looks like this: χ). And it’s a chi-squared test, which is used in lots of other situations. That said, the test(s) should never be used for comparing binomial proportions, as they have terrible statistical properties. See stats.stackexchange.com/questions/82720/… – Karl Ove Hufthammer Mar 16 '15 at 16:42

ORName
Email
2017 Stack Exchange, Inc
• asked a question related to Medical Statistics
Question
I performed association study of SNPs with type 2 diabetes and diabetic retinopathy. I am confused whether I need to apply Bonferroni correction after multivariate logistic analysis. Even, for haplotype analysis is it necessary to consider Bonferroni correction? Because I am getting p value less than 0.05 but greater than threshold p value after correction in both the cases. So how to interpret? Any suggestion?
I wouldn't draw a conclusion based on a p-value at all.
If you designed an experiment to specifically test the diabetes-association of one particular SNP because you had some prior hypothesis why this should play a role, then a large p-value would tell you that your data is insufficient to show that association (your data is inconclusive). If the p-value is small, then this is only the very first step towards an interpretation that should appreciate the rest of the biological knowledge about the role of this SNP in the pathology of diabetes and the estimated effect size.
If you did a screening, p-values (corrected or not) are essentially only a proxy to rate the results according to their "statistical signal-to-noise ratio". A good ratio (= a small p-value) can be the result of a strong signal or the result of a small noise. I would use the results only to generate a ranked list to highlight the "top candidates" (according to thier "statistical signal-to-noise ratio" or p-value). I would see it only as an experiment-internal feature, not to generalize. The aim of a screening is to identify the most promising candidates for further studies. This essentially means: a screening is done to generate hypotheses, not to test them.
And please do not confuse statistical significance (p-values) with effect sizes. A tiny p-value does not (neccesarily) indicate a strong association, and a strong association may give a large p-value. These are two different things only indirectly linked via the model (what cofactors are considered), the sample size (what one can experimentally control) and the variance (what one cannot control).
• asked a question related to Medical Statistics
Question
If I do a meta analysis of incidence rate of observational studies, can I also include an incidence rate of  a control group of RCT? Could the control group of RCT be treated as a comparable cohort? ( I am not interested in effect size.)
Hi Mari
There is an article that discuss this issue in details and may help you. The article name is 'Should Meta-Analyses of Interventions Include Observational Studies in Addition to Randomized Controlled Trials? A Critical Examination of Underlying Principles'.
Good luck!
• asked a question related to Medical Statistics
Question
Hi,
I want to conduct a randomized controlled trial to examine the effect of hypnosis in women with breast cancer during chemotherapy in their health-related quality of life, fatigue, anxiety, depression, and insomnia. I did not find any previous similar studies. Therefore, is it ok to conduct this study even though I don't have evidence?  Also, I do not have effect size to calculate sample size too. How can I estimate sample in this situation? Also, I want to know that whether is it ok to divide the group into hypnosis and control group (I mean need to adopt  other method)? I want your valuable suggestions.
Thank you so much.
Hi Saraswati
Peter's advice is excellent. If you decided to power up for a medium effect, then you would need 64 patients in each study arm for a 2-arm trial with 80% power and alpha=0.05. If as Peter suggested, you go for a 3-arm trial, then you would need 69 patients per study arm to allow for (say) two pairwise comparisons and with alpha set to 0.04 to adjust for multiple comparisons. Although choosing a medium effect size might seem a bit arbitrary, a difference of 10 on a 100-point scale (eg the SF-36) has often been found to be of clinical importance. Further, a 100-point scale is likely to have an SD of around 20. This gives an effect size of 0.5 (or medium), which is what we have powered for.
• asked a question related to Medical Statistics
Question
The 2 year overall survival rate and p value of log rank test between two groups are available. Is it possible to calculate the hazard ratio(HR)?
Do you have the table of survival rates over times in that 2 year interval or is it just a pair of times 2 years apart?
If it is the former, you have the ability to work out the p-value for the log rank test.  In computing the log rank test, you need the information needed for the hazard ratio anyway if my understanding is correct.
If it is the latter, I would expect that the p-value wouldn't mean anything.  You need the observed survival information and then you need to compare it to expectation.  You can calculate expectation simply using methods in the following:
Medical Statistics such as this are not my first field, so I may be off base, but I cannot see that the p-value matters if you have the survival data itself given what I know about the hazard ratio's equation.
• asked a question related to Medical Statistics
Question
My research is medical research (retrospective review) of the effect of having a novel medical procedure versus none, on the outcome parameter(death/heart attack) over a one year period. I performed Cox regression analysis to look for predictors of the outcome which included various explanatory variables such as age, gender etc and also having the procedure. But SPSS output gave hazard ratio of not having the procedure as 1.8 with Confidence intervals. However for publishing, I need to express it as the hazard ratio of having the procedure with CI. Can I calculate using inverse of HR and CI (that is 1/HR and 1/CI). Is there any other method?
There are two (or more) options here:
i) Convert the coding of your variable. If it were 0/1, change it to 1/0. This requires some more efforts.
ii) AFAIR, SPSS has an option in the box when "Categorical" button is hit. This option indicates that which category is the reference(First or Last). You can simply swap it to get the desired HR.
• asked a question related to Medical Statistics
Question
Hello friends
I'm conducting a case-control study from the analysis of 5 SNPs in differents genes and I have 62 patients with the disease and 68 from healthy individuals* (dependent variable).
Now I need to get the adjusted odds ratio by age and gender* (independent variables) for the genotypes and alleles for each SNP.
With this sample size, can I do a logistic regression? If not, what are the alternatives?
Obs: Gender is matched across the groups
You can use logistic regression, but for obtaining odds ratio it is better to categorized age variable. If gender is matched across the groups the gender variable would be non significant.
• asked a question related to Medical Statistics
Question
Hello.
I want to conduct an impact assessment study from a randomised sample and no baseline data. Having read literature, I became confused as I realised that some authors tend to combine the two approaches (ESR and PSM). What is the best model to use?
Thanks.
Thank you.
• asked a question related to Medical Statistics
Question
I have a considerable amount of multi variate data which includes clinical parameters (Age, gender, obesity) as well as non clinical data (Cytokine expression data, Seroconversion data etc) for two groups (Severe and non severe) who were infected with a specific Virus.
I wanted to associate certain factors with disease severity (For example, over expression of a certain cytokine in obese individuals who succumbed to severe disease). I am familiar with R programming and made heatmaps for the cytokine data. But I wanted to know if its possible to make something such as a circos plot which would enable me to make correlations?
Thank you for you time!
Thank you all for your respective inputs!
• asked a question related to Medical Statistics
Question
While the estimate and 95% Confidence Interval are available, it is unclear what the degrees of freedom would be. For example, with completely made up data, one might want to compare the association between sleep disturbance and depression (e.g., OR=1.2 [1.1, 1.3] k=10) as well as sleep disturbance and anxiety (e.g., OR=1.25 [1.15, 1.35], k=15).
Hi Bruce
I am glad you find it useful
The mvmeta in Stata is quite straightforward to use. There also other commands (including a mvmeta) in R.
I attach the link of a review paper for multivariate meta-analysis with guidance on how to perform it in various software
• asked a question related to Medical Statistics
Question
208 subjects were randomly assigned into control group and intervention group on average, medical costs were counted at 3- ,6- ,12-, 24- month, If I want to know the differences between 2 groups at different time points, what kind of statistical methods should I use? By the way, medical costs are not normal distribution data. Really need some help on Statistical analysis!
This problem can be solved by using the technique of functional data analysis, where no distributional assumption is required.
• asked a question related to Medical Statistics
Question
Hi, If I want to conduct a research using Randomized control trial, cross-over design with 4 groups of treatment, how do I calculate the sample size? what is the equation that I can use? thank you very much for your help
Thank you Dianatinasab for the reference.
I wonder whether we can use the given formula in the paper to do the sample size calculation for cross-over design or not?
given formula in the paper --> n= 2 ((a+b)^2) x Variance)/ ((mean group 1- mean group 2)^2)
• asked a question related to Medical Statistics
Question
Hello,
I am looking for a way of how to compare Brown and Hauenstein's (2005) agreement index awg (independent awgs). Ideally, this method should also allow to compare the mean awgs: I got several states and two groups (e.g., women and men) in each state. I now wanna test whether people in one group agree on average more across all states than people in the other group.
Thanks
Paul
In case anyone comes across the same question: I ended up using standard deviations which can be compared easily (e.g., using LeveneTest). The SD correlates with the a_wg and related measures of dispersion above .90 (Roberson, Sturman, & Simons, 2007)
• asked a question related to Medical Statistics
Question
The study is to explore the exposures (risk factors) that related to the effect. Effect itself defined as positive response to a treatment.
Exposures: some risk factors (demographic, family and past history of diseases, clinical conditions, intensity and hemodynamics response during event)
Main effects: cTn elevation
Treatment/intervention: event of long distance cycling
Population study: a sample group of participants join the event
Measurement:
observe risk factors before treatment
measure some marker before and after treatment
measure intensity and hemodynamics response during event
What is the suitable design for this research; observational study? (what kind of observational: cohort or cross sectional) Or One group pre test - post test design?
since you want to study the effect of multiple risk factors and you are measuring before and after cycling, the design is experimental - not randomized and not observational with One group of cyclists, since your comparison group is the same cyclists before event. For analysis if you can dichotomize cTn elevation and define what is considered illness or not, then a conditional logistic regression would be enough. If not a repeated measures analysis also known as longitudinal data analysis should be done.
• asked a question related to Medical Statistics
Question
small pilot RCT, looking at an intervention relative to usual care. n<40.
Thanks for the answers. This isn't for my work in fact, but a student project which is being undertaken over the summer, hence not having many details to share yet. Useful advice, thank you.
• asked a question related to Medical Statistics
Question
2 x 2 between within design, 45 x 3 trials per condition
Hi,
Basically, there are two approaches to sample size calculation, precision-based (relevant if you are trying to estimate the proportion, mean difference etc) and power-based (relevant if you are trying to test a hypothesis; how small a difference is it important to detect and with what degree of certainty). See file attached for more details.
If you are unable to find previous studies, you can consider conducting a small pilot trial and use the estimates from that to calculate your ideal sample size.
Good luck!
• asked a question related to Medical Statistics
Question
When prevalence is not known and difficult to get mean and standard deviation in that cases how to calculate sample size. Does it matter for a descriptive, Analytical and empirical studies.
You can always guess the standard deviation by asking someone who knows the area to tell you what would be an unexpectedly high and an unexpectedly low value. Divide this range by four.
I got interested in this and used to ask, say, obstetricians what would be a high and a low fetal heart rate. They produced more or less a range of four SD.
• asked a question related to Medical Statistics
Question
From what I can work out criterion 10 is looking at statistical significance e.g. p values.
Is criterion 11 looking at standard mean difference and confidence intervals?
Most of the RCT's I have looked at reported the mean and SD before and after treatment for the two groups and reported a significance level comparing groups but this seems to be answering criterion 10 not 11?
I'm struggling to work out what criterion 11 is asking for?
Any advice would be much appreciated.
• asked a question related to Medical Statistics
Question
For publication bias how can we interpret Begg-Mazumdar: Kendall's tau value, Egger bias and Harbord bias on basis of results?
e.g
Begg-Mazumdar: Kendall's tau = 0.047 P > 0.9999 (low power)
Egger: bias = -26.99 (95% CI = -85.69 to 31.69)  P = 0.29
Harbord: bias = 8.73 (92.5% CI = -4.86 to 22.33)  P = 0.20
How we can know any publication bias exist or not?
Based on the test results you have provided, you do not have evidence to conclude that publication bias exists. That may be because there is no publication bias or you do not have the statistical power to detect it.
• asked a question related to Medical Statistics
Question
I am working on a meta-analysis of RCTs, and I have to calculate .metabias (several tests, including Egger's) for continuous data (variables such as means and standard deviations). What is the process? Which are the commands used?
Thank you so much.
Thank you so much dear Mr. Weaver.
• asked a question related to Medical Statistics
Question
I have 43 questions for my research. Now i want to covert these questions into 12 variables through SPSS. If you provide further guidelines for calculating mean and standard deviation, i would be grateful to you. thanks in advance.
It is very unlikely that 43 items can be converted into 12 reliable and valid variables. Use the exploratory factor analysis (Dimension Reduction / Factor) to determine the number of factors (variables) in your data set. You might also want to analyze your items scores to determine the consistency between your items (Scale / Reliability Analysis).
• asked a question related to Medical Statistics
Question
Hi
do you know how to run chi square in non parametric data ,using 2 descriptive variables like for e.g. sex and age >60 age <=60 on SPSS ?
Thanks a lot for all your answers I used the age <60 and >60  as an examples my question was about any 2 descriptive variables thanks again you were very helpful
• asked a question related to Medical Statistics
Question
In medical and surgical research the hazard function is often used to estimate risk of an event across time. However the assumption of constant hazard is often not met and is not ideal when analyzing repeated events such are valve replacements, reoperations, reinterventions, multiple episodes of infection or repeated transplant rejections. I have read some information on modulated renewal theory and the Nelson estimator which involve calculating interfailure event times and segments/gaps between events within the same patient. Can anyone suggest a useful and relatively simple way of approaching this in SAS, Stata or R? Thank you!
A simple and efficient approach is to use models for discrete-time, grouped data, as described in
Prentice, R. L. & Gloeckler, L. A. Regression analysis of grouped survival data with application to breast cancer data; Biometrics, 1978, 34, 57-67
You can use marginal (eg binomial logistic regression) or random-effect models. We have implemented some models for grouped data in the package aods3 for R 'on CRAN). A case study in
This approach has the advantage of accounting for competing risks when the time step is short. Also, when using a complementary log-log link and a small time step, estimates are equivalent to those obtained with continuous time (Cox) models.
• asked a question related to Medical Statistics
Question