Science topic
Applied Biostatistics - Science topic
Explore the latest questions and answers in Applied Biostatistics, and find Applied Biostatistics experts.
Questions related to Applied Biostatistics
Just bounced on me.
Before statistically analysing significant difference, shouldn't we see if data fits normal distribution first? Is 3 replicates enough to testify the hypothesis of normal distribution? (In undergraduate statictics course we used Shapiro-Wilk test which needs at least 8 samples) Or just according to the central limit theorem that we don't need to testify whether the data fits normal distribution or not?
Thanks!
Assuming this is my hypothetical data set (attached figure), in which the thickness of a structure was evaluated in the defined positions (1-3) in 2 groups (control and treated). I emphasize that the structure normally increases and decreases in thickness from position 1 to 3. I would also like to point out that each position has data from 2 individuals (samples).
I would like to check if there is a statistical difference in the distribution of points (thickness) depending on the position. Suggestions were to use the 2-sample Kolmogorov-Smirnov test.
However, my data are not absolutely continuous, considering that the position of the measurement in this case matters (and the test ignores this factor, just ordering all values from smallest to largest and computing the statistics).
In this case, is the 2-sample Komogorov-Smirnov test misleading? Is there any other type of statistical analysis that could be performed in this case?
Thanks in advance!
I'm excited to speak at this FREE conference for anyone interested in statistics in clinical research. 👇🏼👇🏼
The Effective Statistician conference features a lineup of scholars and practitioners who will speak about professional & technical issues affecting statisticians in the workplace.
I'll be giving a gentle introduction to structural equation modeling! I hope to see you there.
Sign up here:
Hi,
I have performed an epidemiological survey on insomnia prevalence using ISI and am looking forward to testing internal consistency using Cronbach's alpha. I missed finding any reference example for estimating the same for each survey question. It would be helpful to receive assistance from your expertise.
I would appreciate your help in enhancing my knowledge.
Hello, I'm a master student working with fungi. And one of my studies includes the evaluation of the mycelium growth efficiency and biomass production of mushroom strains on different culture media and incubation temperatures. On my experiment, I'm working with 4 media (PDA, MYPA, YGA and Soy Agar) and 4 temperatures (20, 25, 30 and 35ºC). The 2 way Anova tests shows the two factors (medium and temperature) have a significant interaction between each one. Now, I would like to know if there would be a statistical test that could quantify this interaction effect. I'd be glad if anyone could point me in a direction.
Thanks in advance,
Denis
In many biostatistics books, the negative sign is ignored in the calculated t value.
in left tail t test we include a minus sign in the critical value.
eg.
result of paired t test left tailed
calculated t value = -2.57
critical value = - 1.833 ( df =9; level of significance 5%) (minus sign included since
it is a left tailed test)
now, we can accept or reject the null hypothesis.
if we do not ignore the negative sign i.e. -2.57<1.833 null hypothesis accepted
if we ignore the negative sign i.e. 2.57>1.833 null hypothesis rejected.
Hi,
I have performed an insomnia prevalence study among academics using ISI. I have come across the floor and ceiling effect in a cross-sectional survey. I want to estimate the same percentage of each ISI question and the total score. It would be helpful to see an example to calculate the same.
I would appreciate your help in enhancing my knowledge.
I have diet composition of a species in an area (10 different components) for two different years. So I have two columns (year 1 and 2) and 10 rows (the food items), and the cells are filled with proportions. I want to test if there is a statistical difference in diet between the two years. What test do I use?
Hi,
We received a statistical reviewer comments on our manuscript and one of the comments goes as follows: '... Note that common tests of normality are not powered to detect departures from normality when n is small (eg n<6) and in these cases normality should be support by external information (eg from larger samples sizes in the literature) or non-parametric tests should be used.'
This is basically the same as saying that 'parametric tests cannot be used when n<6', at least without the use of some matching external data which would permit accurate assumption of data distribution (of course in real life such datasets do not exist). And this just doesn't seem right. t-test and ANOVA can be used with small sample sizes as long as they satisfy test assumptions, which according to the reviewer cannot be accurately assumed and thus cannot be used...
I see two possible ways of addressing this:
- Argue that parametric tests are applicable and that normality can be assumed using residual plots, testing homogeneity or variance, etc. This sounds as the more difficult, risky and really laborious option.
- Redo all the comparisons with non-parametric test based on this one comment. Which just doesn't seem right and empirically would not yield a different result. It would be applicable to 15-20 comparisons presented in the paper..
Maybe someone else would have other suggestions on the correct way to address this?
For every dataset in the paper, I assume data distribution by identifying outliers (outliers - >Q3 + 1.5xIQR or < Q1 - 1.5xIQR; extreme outliers - > Q3 + 3xIQR or < Q1 - 3xIQR), testing normality assumption by Shapiro-Wilk’s test and visually inspecting data distribution using frequency histograms, distribution density and Q-Q (quantile-quantile) plots. Homogeneity of variance was tested using Levene’s test.
Datasets are usually n=6 and are exploratory gene expression (qPCR) pairwise comparisons or functional in vivo and in vitro (blood pressure, nerve activity, response magnitude compared to baseline data) repeated measures data between 2-4 experimental groups.
For example, we are reviewing an article and the sensitivity of a testing modality 87% while including 50 patients. How we will be able to calculate its upper and lower limit at 95% confidence interval while making a forest-plot?
I have a dataset of 5 variables of quantitative continuous type: 4 independent and 1 dependent (see attached). I tried using linear multiple regression for this (using the standard lm function in R), but no statistical significance was obtained. Then I decided to try to build a nonlinear model using the nls function, but I have relatively little experience in this. Could you help me, please: how to choose the right "equation" for a nonlinear model? Or maybe I'm doing everything wrong at all? So far I have used the standard linear model in the "non-linear" model.
I would be very grateful for your help.
If you do not have the opportunity to open the code and see the result, I copy it here:
------
library(XLConnect)
wk <- loadWorkbook("base.xlsx")
db <- readWorksheet(wk, sheet=1)
INDEP <- NULL
DEP <- NULL
DEP <- as.numeric(db[,1])
for(i in 1:4){
INDEP[[i]] <- as.numeric(db[,i+1])
}
MODEL <- NULL
SUM <- NULL
MODEL<-nls(DEP ~ k0 + INDEP[[1]]*k1 + INDEP[[2]]*k2 + INDEP[[3]]*k3 + INDEP[[4]]*k4, start=list(k0=0,k1=0,k2=0,k3=0,k4=0))
SUM <- summary(MODEL)
-----
The result is:
-----
Formula: DEP ~ k0 + INDEP[[1]] * k1 + INDEP[[2]] * k2 + INDEP[[3]] * k3 +
INDEP[[4]] * k4
Parameters:
Estimate Std. Error t value Pr(>|t|)
k0 6.04275 1.30085 4.645 6.41e-06 ***
k1 0.03117 0.01922 1.622 0.107
k2 -0.02274 0.01663 -1.367 0.173
k3 -0.01224 0.01717 -0.713 0.477
k4 -0.01435 0.01541 -0.931 0.353
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.418 on 186 degrees of freedom
Number of iterations to convergence: 1
Achieved convergence tolerance: 2.898e-08
-----
Can someone explain cohen's d test, in a simple way, please?
It is kindly requested to elaborate it for medical students in simple words.
I have data set for S vs t and x vs t. The yield coefficient needs to be calculated. What is procedure to calculate it? Do I take the data for logarithmic growth phase only?
Journal of Multidisciplinary Applied Natural Science (abbreviated as J. Multidiscip. Appl. Nat. Sci.) is a double-blind peer-reviewed journal for multidisciplinary research activity on natural sciences and their application on daily life. This journal aims to make significant contributions to applied research and knowledge across the globe through the publication of original, high-quality research articles in the following fields: 1) biology and environmental science 2) chemistry and material sciences 3) physical sciences and 4) mathematical sciences.
We invite the researcher related on our scope to join as section editor based on their interest or as regional handling editor in their region. The role of editor is help us to maintain and improve the Journal’s standards and quality by:
- Support the Journal through the submission of your own manuscripts where appropriate;
- Encourage colleagues and peers to submit high quality manuscripts to the Journal;
- Support in promoting the Journal;
- Attend virtual Editorial Board meetings when possible;
- Be an ambassador for the journal: build, nurture, and grow a community around it;
- Increase awareness of the articles published in the journal in all relevant communities and amongst colleagues;
- Regularly agreeing to review papers when invited by Associate Editors, and handle these promptly to ensure fast turnaround times
- Suggest referees for papers that you are unable to review yourself
Journal website: https://jmans.pandawainstitute.com
I am new to stats to this level in ecology. I am trying to compare DNA and RNA libraries with thousands of OTUs. I summarized taxa to get the most abundant species, but I can obtain only relative abundances. I was thinking to use SIMPER as I read in several comments to test which species differ the most per station between DNA and RNA based libraries. However I read that SIMPER is a more or less robust test. I was wondering if the manyglm was also an alternative for my question or if you suggest another way. Thank you for your help!
I have treated THP1 and AGS cells for 12, 24 and 48 hours with a bacterial toxin concentrations 0, 5, 10, 20, 40, and 80 ug/ml. Now I want to prove my results with statistical methods but I'm confused which one to use. Is it will be ANOVA post hoc test or simply t test. If it will be one tailed or two tailed, paired or unpaired.
Does anybody know an estimation method for calculating the prevalence of a given risk factor among general population, given that the odds ratio/relative risk, the prevalence of the risk factor among diseased and the prevalence of the disease are available?
I want to discriminate type I and Type II diabetes with certain factors.
I wish to done discriminant analysis with type of diabetes as dependent variable and I have both categorical and continues independent factors. My doubt is , can I include "categorical Independent variables in discriminant analysis"
This is an anti-tumor efficacy study. there are 2 compounds and each compound has 3 dose levels. For example:
Group 1: vehicle control group (n=10 mice)
Group 2: drug A treatment group, dose 1 (n=10 mice)
Group 3: drug A treatment group, dose 2 (n=10 mice)
Group 4: drug A treatment group, dose 3 (n=10 mice)
Group 5: drug B treatment group, dose 1 (n=10 mice)
Group 6: drug B treatment group, dose 2 (n=10 mice)
Group 7: drug B treatment group, dose 3 (n=10 mice)
At the end of the study, mice will be euthanized and tumors are weighed. To compare if the tumor weight of treatment groups is significantly different from that of the vehicle group. The question is: when we use one-way anova do the statistics, will all the 7 groups be seen as a whole or drug A and B will be compared with vehicle separately?
Many thanks in advance!
Hello everyone,
Currently I am trying to do K - mean clustering on microarray dataset which consists of 127 columns and 1000 rows. When I plot the graph, it gives an error like "figure margins too large". Then, I write this in R console:
par("mar") #It will give current dimensions
par(mar=c(1,1,1,1) #Tried to update the dimensions
But; it did not work. So, can anyone suggest me another way of fixing this problem? (Attached the some part of code in below)
Thanks,
Hasan
--------------------------------------------------------------------------------------------------------------
x = as.data.frame(x)
km_out = kmeans(x, 2, nstart = 20)
km_out$cluster
plot(x, col=(km.out$cluster+1), main="K - Means Clustering Results with K=2",xlab"", ylab"", pch=20, cex=2)
>Error in plot.new() : figure margins too large
Some times we want to conduct a reliability study on some diagnostic modality for a specific disease but the gold standard for the diagnosis of that disease is either invasive procedure or surgery. which is not justified to be performed on normal individuals (control group). In such a case is it justified to take control group as negative of the gold standard?
For example:
We want to diagnose Infantile Hypertrophoid pyloric stenosis (IHPS) with the help of ultrasound but the gold standard for its diagnosis is surgery. If we perform Ultrasound of 50-infants with projectile vomiting and the sonographic findings of 40 of them are likely for IHPS and 10 for normal. But after surgery (Gold-Standard) 38- were confirmed as IHPS but 2 were false positive. Now we want to perform ultrasound of 50-normal (control). Is it justified to put all the 50 normal infant as True negative and false positive as 0 of the gold Standard, To perform chi-Square statistics?
"Comparison of scoring system 1 versus scoring system 2 predicting in-hospital mortality".
The study is non-interventional, will follow patients from admission to discharge. Please suggest the best suitable design.
Hi,
This seems a bit unusual to me, since I could not find any related paper. This is my situation:
We have CT values (3 replicates each) for following conditions for gene A:
- Wild type (WT) cells, treated.
- WT cells, untreated,
- Mutated (MUT) cells, treated,
- MUT cells, untreated.
We are interested to study the effect of a certain mutation (MUT) on the expression pattern of gene A in treated vs. untreated conditions. To do so, I simply calculated ddCT_WT as: dCT_WT_treated - dCT_WT_untreated, and similarly for MUT: ddCT_MUT: dCT_MUT_treated - dCT_MUT_untreated. Finally log2 fold change expression (FC) was calculated as: log2(ddCT_MUT/ddCT_WT). I am not sure if this approach makes sense or not; so, I appreciate if you can help me to better interpret/represent/analyse my results.
Any reference to similar conditions is highly appreciated!
Thanks!
Hello everyone,
I would like to calculate the cluster similarities between two clusterings of the same dataset and want to see that similarity statistic specified of the clusterings from the comemberships of' the observations. However; I could not implement the code in R. Is there anyone who can help?
Best regards,
Hasan
Dear colleagues,
I just received a comment by the reviewer as a "What is the ordinate unit on the each graph?"
Actually, I do not know the meaning of the ordinate unit. What is the ordinate unit? How can I calculate or demonstrate ordinate unit in plots in the R?
Note: I had used season packages in the R software to create plots in the submitted manuscript.
I want to calculate the Standarized incidence ratio of second primary malignancies.
I have a database with patients that were diagnosed with lung cancer as first cancer and these were followed up to find if they developed a second primary malignancy. The patients were diagnosed with lung cancer between 1990 and 2013 and the follow up started in 1990 and finished in 2014.
For calculating the amount of expected cases one needs the amount of person years for every age category and also the age specific incidence rate . I only have the age specific incidence rates between 1999 and 2014.
How could I calculate the SIR? Should I exclude all the patients that were diagnosed with lung cancer before 1999 and:
- Calculate the person years from date of first diagnosis ( 1999) until outcome or end of follow up , count the observed cases from 1999 onwards?
Does anyone with experience in this topic have a suggestion?
Thanks in advance!
I have rapid light curve data (ETR for each PAR value) for 24 different specimen of macroalgae. The dataset has three factors: species (species 1 and species 2), pH treatment (treatment 1 and treatment 2) and Day (day 1 of the experiment and day 8 of the experiment).
I have fitted a model defined by Webb 1974 to 8 subsets of the data:
species 1,pH treatment 1, day 1
species 1, pH treatment 1, day 8
species 1, pH treatment 2, day 1...etc.
I have plotted the curves of the data that is predicted by the model. The model also gives the values and standard error of two parameters: alpha (the slope of the curve) and Ek (the light saturation coefficient). I have added an image of the scatterplot + 4 curves predicted by the model for species 1 (so each curve has a different combination of the factors pH treatment and Day).
I was wondering what the best way would be to statistically test if the 8 curves differ from each other? (or in other words: how to test if the slopes and Ek of the models are significantly different?). When googling for answers, I found many ways to check which models with your data better, but not how to test if the different treatments also cause differences in rapid light curves.
Any help would be greatly appreciated.
Cheers,
Luna
I was wondering if anyone had any resources on how to do a pooled prevalence in R? Is it possible to have a forest plot as a result? Any help would be greatly appreciated.
Thanks
Dearbhla
Hello people,
I want to know how to use GLM to compare mean number of granivore birds for "high water level" years and "low water level" years as shown in the picture provided below. This is an arbitrary data set I made up but the data I have is similar and they are not normally distributed. What step should I follow? Where should I start? Should I use GLM or something else? Should I first determine whether the data fits neg.binomial or Poisson distribution? If so, how can I do it with R?
I tried using Mann-Whitney U-test but I think I should use something stronger. I would be glad if somebody can explain to me what to do in plain language. Thanks in advance.
Hi everyone. I have applied multiple logistic regression to create a model based on my independent parameters (x, y & w). My generated model function is Z=ax+by+cw-d where Z is an exponential term including the probability of the occurrence of my dependent parameter (Z=exp(P)/(exp(p)+1)), and all of the parameters are binary.
Now in order to interpret the output, I have calculated the probability of the occurrence of my dependent variable, for all values of all possible permutations of the variables as follow:
1: x=0, y=0, w=0 ------> P=0.74%
2: x=0, y=1, w=0 ------> P=2.3%
3: x=1, y=0, w=0 ------> P=1.35%
4: x=1, y=1, w=0 ------> P=4.14%
5: x=0, y=0, w=1-------> P=1.65%
.
.
8: x=1, y=1, w=1------> P=8.83%
Since the sign of all coefficients (a, b & c) is positive, apparently the highest probability occurs when x, y and w be 1. But in this case the probability got its highest value as only 8.8%. Is this result rational?
And how can I interpret the magnitude of each independent parameter? Can I say that since all the variables are binary and have a positive coefficient, a variable with bigger coefficient have bigger impact on the probability derived from Z?
Thank you all in advance for your kind replies.
I see MS Excel has several trend-line options; linear, logarithmic, polynomial, exponential, and power functions.
What is the basis / logic of selecting these functions for biological data.
For e.g. I'm interested in understanding change of abundance either transcripts / proteins; my data fitting with polynominal trend-line.
how can I compare different samples in this option?
Hi guys i have recently conducted a meta-analysis looking to compare 3 different drugs against each other, i am struggling to know which statistic on the meta-analysis do i use to compare the 3 drugs against each other
Am i correct in saying that you would just compare the 3 WMD in each subgroup alongside their confidence interval? i have attached a picture of my meta analysis down below
My study aims to explain why there are more cases of a given disease in certain areas of a state. For that I'm trying to use the no of occurrences as a dependent variable and land use metrics + economic data as the independent ones. I've tried using linear regression, but it doesn't explain very well. If there is literature about it and there's already a certain method established as standard, please let me know.
Colleagues, I need help with Venn diagrams and transcriptomics. I have three list of IDs (example: c58516_g4_i4), only IDs, not the sequences. I need to make a Venn diagram, to know which IDs are shared among the three lists, and which only between two of them and which are only present in its original list. I could do it manually, but it's a huge amount of IDs. Can you suggest me some sowtware for windows or script for linux ?. Thanks!
I am conducting a research project in which I am using SEM model. My exogenous variable( world system position) is ordinal with 4 categories. I am not sure how creating so many dummy variables will work in SEM model. Thus I would like to treat it as a continuous variable. But I am not sure if I will be violating any statistical assumption by doing this. Can somebody help me with suggestion on this?
Can someone explain What is “Experimental Unit”, “Replicate”, “Total sample size” , “treatment size” in Bio-statistics? with a practical biological example.
I see some places, they use "n=..." for replications. According to what I've been taught this is totally wrong.
Does sample size equal to replicate? Then how and why should?
As I've seen in different papers, I'll try to summarize what I've observed in root length measuring test; n.b. each seen has control and treatment
Seen 1: in one plate /box plant grow (Control vs Treatment) and each genotype has 20 seedlings and they report n = 20 seedlings. e.g. like in Picture 1
- in this kind of experiment they consider each seedling is a biological replicate
Seen 2: in one plate /box plant grow (Control vs Treatment) and each genotype has 20 seedlings and they report n = 20 seedlings, 5 independent experiments. e.g. like in Picture 1
- in this kind of experiment they consider each seedling is a biological replicate and following five independent experiments
Seen 3: in one plate /box plant grow (Control vs Treatment) and each genotype has 20 seedlings and they report n = 3. e.g. like in Picture 2
- in this kind of experiment they consider each plate is a replicate and in one plate 10/15/20 seedlings are grown.
We are working on health survey project which has more than 100000 participant and we are confused to use mean or median?
Please help us.
Data are not following normal distribution.
Hello, everybody. I would like to know if it is methodologically correct to use a pool of patient samples to analyze microRNA expression by qPCR in my population. I have 3 groups with ~50 individuals each and therefore, the cost to perform an exploratory study in each patient is extremely high. I was thinking of prepare a pool of cDNA/group and perform the qPCR for each group instead of each individual and observe the trend expression.
Thanks in advance!
Suppose, you have measured 4 clinical parameters (A, B, C, D) at the time of admission of 60 patients with the same disease. You observe the outcome as "severe disease" and "non-severe disease". Now you want to calculate the severity predictive values of:
1. Individual parameters: i] A, ii] B, iii] C, iv] D individually
2. Combinations of v] A+B, vi] A+C, vii] A+D, viii] B+C, ix] B+D, x] C+D
3. Combinations of xi] A+B+C, xii] A+B+D, xiii] A+C+D, xiv] B+C+D
4. and xv] A+B+C+D
How can one compare the results of these 15 combinations and tell which combination can give the highest possible specific, sensitive, positive predictive and negative predictive data ?? Please enlighten.
Thank you
hsa-miR-4454+hsa-miR-7975 prob shows a high read counts in most of the available Nanostring data including ours, while sequencing analyse does not support the abundance of these microRNAs. Anybody has the same issue in microRNA NanoString data?
Hi,
I am just starting working with DESeq. I have a question regarding the basic biological interpretation of DESeq based DE gene expression. There are two situations I have listed below and I would like to know which one is more biologically relevant
I have two treatment groups: treatment 1 and treatment2 and I am comparing them with a control group all with three replicates. I devised my study as
1. I created a dataframe containing counts of all 9 count files and from this dataframe, I am creating comparisons as: T1 vs Control, T2 vs Control and T2 vs T1.
2. I create a dataframe everytime I create a comparison like when I am comparing T1 vs Control, then I am creating a dataframe with 6 count files. Again when I am comparing T2 vs Control I am creating another dataframe with 6 count files.
I want to know which of these two design strategies will give me a more accurate result as to what effects T1 and T2 are causing when compared with control and how are T1 and T2 different as well as similar?
what genes are under study?
I am analysing abundance data using Primer 7 and I am a bit confused about how to pre-treat the data before carrying out SIMPER. I don't know if I have to standardise the samples by total or if I need to standardised the variables (species).
Many thanks!
The power or sensitivity of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H0) when the alternative hypothesis (H1) is true. should this be addressed before every clinical studies?
Recently I'm working with functional candidate gene study. so I have three SNP and need to calculate allele substitution effect and additive and dominance effect for single SNP marker association with quantitative traits (i.e body weight). can anyone give me best statistical way to do this.
Hi. What stats. can I use if my protocol is: 3 different animals to obtain cells for culture experiments, then do the same 3 treatments/interventions on 3 dishes at 1 and 2hr? The animals, cell culture experiments, buffers etc. are all done/made on seperate days from fresh.
I have 4-time points and for each time point I have 3 animals, from each animal are collected cells, which can be either negative or positive. I need compare different time points if there is any difference between positivity/negativity cells. I am thinking about chi-square test. Thanks for a help
Dear all, how do i normalize my data if not normalized by applying log 10 and sqrt transformation?.
I'm developing a cost-effectiveness analysis where I need to calculate time-dependent reintervention rates derived from published sources. I have multiple studies with different follow-up times and different presentations of the data (Kaplan-Meier risk estimates, cumulative probabilities...)
How would I go about calculating a yearly probability of recurrence that could be applied consistently through out my model, and reducing the probability by a constant factor every year?
I'm analyzing data from experiments, where in a net pen we exposed fish to different noise frequencies. However I'm uncertain as to what statistical tests I should use to test for possible differences before/during/after the exposure.
We have data in one second intervals of the area of the school of fish and the velocity of the centroid of the school (plus X/Y coordinates), I'm interested in testing for differences within and between different exposures. Data for each exposure starts 60 seconds before the onset of sound, which lasts for about two minutes, and ends with a 60 second tail after the noise has ended.
Should one use time series analysis, like ARIMA, ARMA or ARIMAX to test the data, or something different altogether? At the moment I'm using SPSS to test the data.
Population A (Formicidae)
Tajima's D Value = -2.66825 (p <0.001)
h = 0.3845 ± 0.0724
π = 0.002474 ± 0.001572
Population B (Formicidae)
Tajima's D Value = -1.40150 (p <0.01)
h = 0.6268 ± 0.0452
π = 0.001186 ± 0.000921
(Fu Fs test was not significant for any of the populations)
I try to analysis the best correlation between the biotic and environmental dataset. My question is: if I have one variable for the biota dataset and more variable for the environmental dataset, is possible I apply the BEST analysis?
I have collected leaf temperature data (ranged from 29 0C to 32 0C) against different organic matter levels. Can I perform ANOVA test to check the difference between leaf temperature for different organic matter levels? or what is the suitable statistical test to perform this?
ADMA has been shown to inhibit nitric oxide synthase (NOS). I am looking at the the production of nitrite in HMEC-1 cells treated with 1μM, 5 μM, 10 μM, 50μM and 100 μM. As well in control cells treated with DPBS.
Hi
I was wondering if anyone could help with my data analysis for a taqman low density array assay I have run as I am quite new to this and am unsure if I have done it correctly and where to go next.
Basically, I am assaying the expression of 96 genes in tissue and models. I have run 5 tissue samples (separate patients) and 2 models (5 repeats of 1 and 3 repeats of the other) on a gene card.
I have used normfinder to identify the best combination of housekeeping genes to use (using the median of each model). I have then found ∆Ct of each of my data points using the geometric mean of the housekeepers. I've then removed outliers using Grubbs test and then found the mean of my two models (i.e. the mean of the 2 means) and the mean of the tissue and found the fold change by dividing 1 to the other. I've subsequently taken log 2 of the fold changes. I now want to find if the changes are significant - I was going to use a T-test but my sample sizes are too small to test normality and I've read that man-whitney U tests are hard to conduct with low sample sizes.
Have I analysed the data correctly thus far and can anyone recommend a test for the significance?
Thanks for all your help!
For example: I'm evaluating 50 genotypes and I have two treatments - say 'A' as control and 'B' as treatment. How can we analyze the genetic diversity through Mahalanobis D2 statistics of these genotypes together ? or we have to evaluate them separately.. Can anyone shed some light on this matter.
Dear friends,
There is Mean and SD of two parameters such as blood glucose and serum Insulin , which have been extracted from about 14 articles in order to a meta analysis.
Now,I have considered to obtain a ratio between these parameters.
- How should I obtain or estimate the Standar deviation of these two parameters which have not been determined in articles?
I would be pleased If some one answer me.
detail steps of the manual and computerize analysis will be appreciated
We are using alpha lattice design in our genetic designs, we have some question as below:
1- we need Expected mean of square for source of variations in order to calculate the genetical variance components in alpha lattice and Rectangular lattice designs?
2- How we can use covariance analysis in alpha lattice to adjust the treatment?
3- Is it possible to get the adjusted data in order to adjusted means?
4- Shall we get the above mentioned items while we have alpha lattice in combined analysis?
Can we have their procedures in SAS or Crop stat ?
I would be very grateful in advance
Which statistical test will help me compare and state significance of the three zones? Is there any indices by which I can calculate the association/proximity of two species?
I want to know which statistics test should I apply for below experiments.
1. Particle size of nanoparticle checked everyday for a week. 3s reading on each day. I want to correlate this with stability of nanoparticle over time. Is correlation co-efficient the best choice ?
2. cumulative drug release from nanoparticle over 48 hrs at different time point under two treatment condition, pH 5.2 and pH 7.4
Thnaks.
For example, I had two group of samples, I calculated the distance decay-rates of the two groups of samples respectively, then I got two values DDRa and DDRb. How can I test the difference between the two values?
Thank you for your reply!
Im planning on doing a Family based association study in which I'll be doing exome analysis on twenty trios w/ affected probands. I've noticed the affected phenotype runs in families and I want to seek out associated variants. Since I am no statistician (the software does most of that work) I'm not sure how to write the statistical considerations section of my IRB. Does anyone have a template or an example of something close that I can work from? Its the only thing thats holding up my protocol in my universities IRB dept.
If it helps, the proposed experimental design mirrors the one used in this study: http://www.ncbi.nlm.nih.gov/pubmed/?term=25737299
I have a study group (N subjects with disease A) and a control group (2N subjects WITHOUT disease A). I want to compare the two groups in terms of outcomes (categorical and continuous variables). Which tests should be applied?
Hi!
I have several sets (e.g. 100) of essential genes + non-essential ones, all were extracted from a certain database of essential genes for a particular disease under different conditions. How can I compare these sets to check if the results obtained under different conditions (100) are significant?
For instance, set #1 contains 2000 genes containing 100 essential genes (found in the hypothetical database) + 1900 non-essential genes, and so forth.
Thanks in advance
Hi,
I have a data set of 20 algae sampled every 2 months for a year for % lipid. I am looking for the correct model to analyse the change in % lipid over time but seem to be running into road blocks everywhere I turn:
-The data is heteroscedastic
-There is data missing for some individuals at different time points
-The samples taken in the month of spawning have huge increases in % lipid followed by a large decrease the next month (so non-linear?)
I know that repeated measures ANOVA is not an option due to the missing data points. But is a glm possible considering the heteroscedasticity and non-linearity? Or would a non-linear model be more appropriate?
I have trawled the internet for answers but there are a lot of opinions, contradictions, and baffling results.
Any guidance would be greatly appreciated.
Almost always in clinical research the log-rank test (Mantel-Haenszel test) is employed to compare the equality of two survival curves. Is there a good way to decide whether other tests may be more sensitive in detecting differences between the groups? For example, if the Kaplan-Meier curves cross as opposed to being roughly parallel over time? Does the total number of events influence the choice of the test statistic? What if we are also adjusting for a covariate, does this situation affect which test should be used to compare the two survival curves?
I proposed a new method for estimating weights of 8 elements for the description of a pattern. Each weight ranges from 0 to 1 and the sum of the 8 elements' weights for the description of each pattern is equal to 1.
I want to compare my estimated weights with those given by a standard method. Using the Bland-Altman plots attached below, agreement limits seems to not be acceptable for my study. In fact, a difference of 0.1 between paired results (obtained using the new method and the standard one) is really important. So I need to define a difference limit between paired results to judge if the compared methods are convergent or not.
Could it be arbitrary defined or there is a method to do it?
Hi all,
I would need to calculate the percentage of ice-free area in a certain radius around each of about 200 sampling points in Antarctica. I guess this would be a similar approach as to calculate vegetation cover etc.?
A long time ago I learned some GIS basics using Idrisi, but I haven't used it in 7-8 years and it was very basic. Now, I quite urgently need this data (2-3 weeks, but the sooner the better :) ).
Therefore, any advise is welcome to get me started.
I have the possibility to use ArcGIS and QGIS.
I came across Quantarctica for QGIS, which might provide the map. Any other sources for georeferenced maps of Antarctica?
I guess I can then plot the samples' coordinates, but I then have to manually delineate the ice-free regions by drawing polygons? Or are there layers available giving the ice-free surface or conversely, ice coverage, such that I would only need to detract this from the circle surface?
Can such an analysis be automated?
Sorry for these probably very basic questions. If you can provide me with a good crash-course tutorial (on GIS in general or a similar problem), that would be very welcome too :)
Thanks in advance!
Bjorn
I have 4 sites with a total of 22 species (i.e. site1 has 7 species, site 2 has 15, etc.). I also have multiple weeks’ species abundance data for each site. I want to analyze the species diversity on temporal and spatial scale based on high throughput sequencing.
Several methods have been proposed to compare sites for the species richness, many of which only use presence/absence data. I want to use abundance data, and do the following:
-use entire dataset, compare the similarity statistically, and obtain an optimum species richness/diversity value (say x number of species needed to reach 95% coverage of the whole dataset)
-use subsampled dataset (time-wise and site-wise), and analyze at which stage the previously obtained optimum number of species is reached (i.e., at 2 months of sampling instead of 12 or in one site instead of 4, etc.)
Any recommendations on the use of abundance data for answering these questions?