Questions related to SAS
I just received the latest TOC alert for Behavior Research Methods, and this article caught my eye:
I've not had time to read it yet, but judging from a quick glance, I wonder if the main "problem" might be that users do not always take time to RTFM* and therefore, do not understand what their software is doing? In any case, I thought some members of this forum might be interested.
* RTFM = Read The Fine Manual ;-)
I am looking for good references for conducting causal mediation analysis using time-to-event data. If you are aware of available code (SAS, specifically), that would be very helpful as well.
I am a beginner with the use of SAS and Specially Orthogonal contrast. My experiment involve 4 rate of Nitrogen (23,46,69 and 92 kg N) at 3 time of application plus a control for bread wheat. The trail was at field by RCBD with three replication. The different responses are labeled as variables 1-39 as depicted in the SAS command I just prepared.
My treatments are:-
N application time =3
Total treatments= 13
Thank you for your recommendation!
I’ve got a data set and I want to calculate R2 for linear regression equation from another study.
For example, I have derived an equation from my data (with R2) and I want to test how other equations perform on my data (and thus calculate R2 for them). Then, I want to compare R2 from my data set with R2 from derivation studies.
Do you have any software for this? Any common statistical software could cope with this task (e.g. SPSS or SAS)? Maybe you have any tutorials on YouTube for this?
We performed a meta analysis recently and SAS popped out an i=squared of exactly zero. I know this is theoretically possible but it doesn't make sense to me as the outcomes in the different studies in the analysis did not have the exact same means and variances. Can anyone shed light on this?
Thanking you all in advance for your thoughts.
I have to write a SAS code and I am a bit confused about the difference between
WHERE and IF conditions because I am getting different data-sub sets
Could you please guide me or provide SAS code for the genotypic and phenotypic correlation of traits?
Hello every one am writing my Msc research proposal.
My experiment will test three feed sources (pulses) as a bee feed supplement. The trials will be set out "Completely Randomized Design. Each treatment group of bees will be provided with 150 g of the respective treatment diets. The DATA data to be collected will be amount of food consumed, sealed brood and bee bread areas and bee strength and honey yield. Data will be subjected to ANOVA using SPSS software (version 28) with proc mixed model of SAS, and Mean values will be separated using to Duncan’s multiple range test (DMRT) at p =0.05 (SAS Institute, 2012).
Is there anyone who advised me whether my data analysis methods are appropriate or not.
I am using SAS and am trying to generate a new variable based on whether cells corresponding to another variable are empty. Any help would be appreciated!
Is there a program, app, website or software that can automatically guide people through the steps of the research methodology they should pursue by simply asking them certain questions about their research data? For example;
1. What kind of data do you have? Choose one below:
a) Qualitative b) Quantitative c) Etc.
2. How many samples do you have?
a) One b) Two c) More than two
15. What Statistical software are you going to use for analysis?
a) SPSS b) Stata c) SAS d) some other software, etc.
RESULT: You can use .... test to measure ....
ps: I know Statistics is too serious of a discipline to be finalized with a few trivial questions. However, it would be nice for researchers to get answers to some very basic questions through a program like the one described above, I suppose.
Thanks in advance for taking time to read.
I want to create the tertiles in SAS to organize my NRF variable into categories. I used the below syntax to do so but the problem is that the number of observations in each category is not similar. I am wondering if there is a potential error that I missed here.
PROC UNIVARIATE DATA=master2.NRF noprint;
OUTPUT OUT=master2.NRFTertile PCTLPTS= 33 67 PCTLPRE=NRF_P;
IF NRF le ..... THEN NRFTertile=1;
ELSE IF NRF gt ..... AND NRF lt ..... THEN NRFTertile=2;
ELSE IF NRF ge ..... THEN NRFTertile=3;
My co-authors and I are trying to run a hierarchical model in SAS as part of a scale development project (a multi-dimensional scale with a higher-order latent factor). I am using code from a prior, published project - but we are getting some odd loadings in our model on specific items (e.g., a loading of 1.00 on one item ). Maybe we are mis-specifying something in our model. Can anyone recommend a consultant that we can hire? thanks.
I am looking for ways to add a random effect in a SUR model, using R or SAS.
To be more specific, I have panel data measured at an individual-and-daily level, and I want to stack 3 equations with different dependent and independent variables in a SUR model, with an individual random-effect coefficient.
If you guys have any example codes that I can refer to, it would be a great help!
A research was carried out on broiler chickens, which is divided into two phases (starter and finisher). However, during the finisher phase, the birds were not redistributed, thereby necessitating the application of analysis of covariance when analysing the performance parameters at finisher phase, whereby the initial weight (IW) will be the covriate variable. The analysis was carried out using SPSS in the past, and was quite straightforward. However using the proc syntax on SAS for this proves difficult. I used the;
Class Enzyme Level;
Model FW TWG Av_FI FCR DFI Survival = Enzyme Level IW;
LSMeans Enzyme Level / StdErr Pdiff Adjust = Tukey;
which makes use of LSMeans for mean adjustment, but the result obtained is same as that obtained without covariate, and also different from that obtained from SPSS.
Could anyone kindly help out on the correct syntax, and/or the interpretation for ANCOVA using SAS proc?
The result obtained from SAS proc is attached.
How do we conduct a test of homogeneity variances (Bartlett's test) in combined analysis for years and locations for two factor factorial experiment.
I use the following SAS codes to draw survival curves:
proc lifetest data=MyData plots=survival(cb=hw test atrisk(maxlen=13)); time PYEAR * Outcome(0); strata Exposure; run;
Since the association is week, I need to customize the (y-axis) to show only between 0.8 and 1
I want to analyze a factorial split-plot in time using SAS.
Factorial Experiment using Completely Randomized Design (CRD);
Factor A: treatments (a1-a4)
Factor B: harvest time, different days after treatment (b1-b5)
Does anyone have SAS codes for this analysis?
My advisor sent me the following code and told me to rewrite it in R.
parms a=19 c=9 k=.08 lag=2;
*a=soluble, c=undegradable, k=rate/h, lag=lag time;
if time<0 then time=0;
output out=temp p=Predicted r=Residual;
The non linear model was easy enough but I was having issues with fitting lag time. However I have written a function that works great, and I have posted it here so that it will hopefully help someone else in the future.
m <- nls(formula = N_Disapp ~ ifelse(test = lag >= Hour, yes = b*exp(-k*(0))+c,
no = b*exp(-k*(Hour-lag))+c),
data = data1, start = parms)
plot(data1$Hour, data1$N_Disapp, main=x)
print(lines(data1$Hour, data1$predicted, col="blue"))
output1<-data.frame(b=out$parameters[1,1], k=out$parameters[2,1], c=out$parameters[3,1], name=x)
To run a loop for all the products:
Note: finaloutput will contain a table with all the results
To just run one:
If this helps you I just ask that you "recommend" this post. Thank you.
I am currently doing a web scraping study about the online prices of laptops. My objectives are
- to test if there is a significant difference in online prices of different laptop brands on different days of the month;
- to determine if there is a significant difference in online prices of different laptop ranges on different days of the month; and
- to test if there is a significant difference in online prices of laptop ranges per brand on different days of the month
I have already finished scraping the online prices of laptops last week. I am planning to use a two-way and three-way ANOVA to answer the objectives but as I was checking the assumptions of ANOVA, I have noticed that the homogeneity of variance is violated. I have used Levene's Test in checking the homogeneity of variance. Is it still okay to proceed in using two-way and three-way ANOVA?
I am looking to calculate Cohen's d for a Welch test so that I can calculate an effect size for unequal variances. The only equation I could find is here (https://www.datanovia.com/en/lessons/t-test-effect-size-using-cohens-d-measure/#cohens-d-for-welch-test). However, I am struggling to calculate it with the output given in SAS or SPSS.
Does anyone have any code or suggestions?
I have 2 models (negbin log and poi log) created using proc genmod in SAS. Both have overdispersion. Do I correct for overdispersion and then compare models again or do I compare right away choosing model with smallest deviance ignoring the overdispersion?
What is best remedy to be used with overdispersion? What does the remedy do to correct the situation? In re to former question, different remedies may affect things?
My main independent variable is a dichotomous, moderator is the Race variable (Hispanics, Non-Hispanic Black, Non-Hispanic Asian and Non-Hispanic White (ref)) and outcome is also a dichotomous. I have created the race categories into dummy variable except the reference group. I am doing an adjusted logistic model and added the interaction term using SAS. Should I include all the interaction terms (independent*Hispanics independent*Non-Hispanic Black independent*Non-Hispanic Aaian) in one model or in 3 different models? My result is significant for Hispanics and Non-Hispanic Black. How should I interpret this?
My project is conducted as Augmented Design at filed. For doing ANOVA I am looking for SAS software code. I could not find a complete SAS code for ANOVA and means comparison. Can someone help me out?
I have a dataset that i have transferred from SAS to Rstudio via the haven package. But when I inspect my dataset in R, i can see that some of the formatted variables are indeed still formatted, while some of the variables are shown as the 'raw' variables without the format.
For instance, I have a numeric variable, M3_SCORE and then right next to it, the formatted version, M3_LEVELS (in SAS, this would be formatted to now be a categorical variable).
In the dataset in Rstudio, however, I can only see the raw numbers and not the new categories.
When I use str(dataset) to inspect the variables, it says that both M3_SCORE and M3_LEVELS are numeric, but under M3_LEVELS it says
..-attr(*, "format.sas")=chr "FMTM3SCORE"
Does this mean that Rstudio will still process M3_LEVELS as a categorical variable, even though I can't see it?
Thank you in advance!
Hi!, Everyone! I am a new learner of SAS software. I am reading an article of a study about minority. In the findings of this study, there is a sentence mentioning something as follows:
" A power analysis using MacCallum et al.’s (1996) SAS program indicated that the statistical power of the models were at 0.90."
Here I also attached his conceptual model (Structure Equation medel) of his study. I am wondering how could the author achieved such result (0.90) with SAS software. As you can see in another attached photo, there are lots of items under the main item of power and sample size, such as Anova, t-tests, multiple regression, etc. However, I can't find the item or button, especially for structure equation model or path analysis. May I know if anyone know which item I should choose in order to achieve the statistical power of the model 0.90. Alternatively, if there is no "ready made" menu or direct button when conducting a power analysis for SEM or Path Analysis in SAS. is there any software that I can achieve such result directly or easily ?
I have taken the data from a field trial established to screen the sugarcane varieties for sugarcane grassy shoot diseases(GSD). RCBD with three replicates was used to establish the trial. Standard varieties are not available for GSD. Therefore, comparison and rating will not be possible.
The number of GSD infected clumps (phenotypically) and # of total clumps had been taken as the main data of the trial in the one-month intervals from 1 to 12 months (12 disease counts). Disease incidence was calculated. In addition, yield data were taken.
1. Could you please explain what kind of statistical analysis is suitable for analyzing the data taken here in order to find the varietal response for the GSD by the SAS program based on disease data?
2. What data (# disease clumps or disease incidence) is appropriate to use for analysis? Further, it would be a great support if anyone can give an idea of how to write CLASS and MODEL statements of SAS for analyzing this data.
I need to use regression models for my research. I used SPSS for linear regression but I want to use univariate and multivariate power regression such as:
a,b,c: model parameters
Y: dependent variable
X,Z: independent variables
Is there any user friendly statistical software to do it?
(I know about SAS or R software, but I think they perform regression by programming)
I have two independent variables : First is Parity with 2 levels: Gilt and Sow. Second is Diet with 4 levels: A (control), B, C and D. The experiment was run in 4 replicates. it is unbalanced design and have 54 observations in total. I am interested in comparing treatments (B, C, and D) to control for gilt and sow. i.e
Gilt B vs Gilt A, Gilt C vs Gilt A, Gilt D vs Gilt A and
Sow B vs Sow A, Sow C vs Sow A, Sow B vs Sow A
As I wanted to compare the treatments only with control, I chose Dunnett's test with a code looking like-
proc mixed data =A;
class Parity Treatment ;
model Output = Parity|Treatment;
lsmeans Parity * Treatment / adjust = dunnett pdiff;
However, this compares all the treatments with Gilt A. It results in following comparisons-
Gilt B vs Gilt A,
Gilt C vs Gilt A
Gilt D vs Gilt A
Sow A vs Gilt A
Sow B vs Gilt A
Sow C vs Gilt A
Sow D vs Gilt A
Which is not what I want. As I mentioned above, I want to compare Gilt vs Gilt and Sow vs Sow. Is there any way to do it? Any help would be much appreciated. Thanks.
I want to analyze a factorial split-plot data performed in two years (combined analysis) using SAS. I think year should takes random effect in the analysis. Does anyone have SAS codes for this analysis?
Thinking about online and MOOC-type certificates for R programming and data analysis, are there any that are recognized and respected by potential graduate schools and employers?
I guess if people can recommend those for SAS or Python, that would be useful as well.
As for example SoV is rep, Genotypes ( parents, crosses, parents vs crosses) and error. I don't know SAS code for this type of ANOVA. Anybody can help me?
It has been a time, I'm studying papers that used Interrupted Time Series (ITS) for their analysis, but unfortunately these papers did not mention which software they used! Even if they mentioned software like R, Python, Matlab, they did not mention for example which R package they used, what is the procedure. It is weird because on ML and Metaheuristic studies mostly we mention the whole algorithm and methodology we applied, so other researchers can replicate our work easily. However, about ITS is not like that and it is hard to enter the field!
Appreciate the help of ITS experts.
Hii, please give me a complete guide or any material to analyse the Experimental data of RBD, CRD etc. to a find Genetic diversity, Character association, Path analysis & other Plant Breeding related experiments by using IBM SPSS
I understand heritability (h2 ) in the 'narrow sense' as the "proportion of the genetic variance (VG) out of the total variance (VG + VE). How do I calculate variability and heritability using Anova output from SAS or Statistica?
I've conducted an RCT in which I'm testing the effect of a group mindfulness intervention on depressive symptoms. Only one group was running at a time so there were four study waves, with each wave of participants being randomized to intervention or control. Outcomes were measured bi-weekly for 6 months. I'm testing the effect of intervention using PROC MIXED in SAS with bi-weekly assessments nested within participant identified in the repeated statement.
A reviewer has suggested that I include treatment wave as a random factor in the model. However, the interaction between treatment and study wave (as fixed effects) is not even close to significant (p = .99), suggesting that the effect of treatment is the same across waves. Is this sufficient justification to keep my analyses as they are and not include treatment wave as a random factor? Thanks!
I am now doing survival analysis in cancer. Is there anywhere I can find the SAS macro for reconstructing Individual Time-to-Event Data from a published KM curve?
Thanks a lot.
We have developed a digitally delivered behavioral change intervention with personalized feedback for college students making use of illicit drugs (see our protocol published here:
(JMIR Res Protoc 2020;9(8):e17829) doi: 10.2196/17829)
We are now in the process of examining the different components of the intervention in an RCT study, using a fractional factorial design.
Does anyone have access to SAS Factex (full version) to run the syntax for us and help us to estimate the number of experimental arms? I do not have access/knowledge on SAS.
The syntax has been defined using the tutorial paper of Collin's et al (Psychol Methods. 2009 September ; 14(3): 202–224. doi:10.1037/a0015826.) and we only want the template SAS generates to define the experimental arms.
All the details can be provided, please email me email@example.com
*Further collaboration will be discussed and of course acknowledgment on the paper will be secured.
While we are analyzing maize data over the locations using SAS software ,is there a means to incorporate all the lines,testers,parents (lines+testers),crosses and checks in one statistical analysis system(SAS 9.0) in order to construct ANOVA skeleton like below,
Cross Vs check
Cross Vs Check*Site
Cross Vs Parents
Crosse Vs Parents*Site
I have an experiment with two levels of two different enzyme and a control treatment (without enzyme) in a completely randomized design. Therefore, a 2 x 2 + 1 factorial design. Can I modeling like a nested design or exist a different model for it?
I am working mainly with SAS and processed my data all steps in SAS. A user needed the data files in SPSS, but at the same time need all variable attributes to be included (formats, missing, var labels, etc.). I prepared an SPSS syntax for that and it is working if I run it in an additional step. In other words, after done with processing the data in SAS, I run the SPSS syntax using SPSS. What I need is how to include (%include ... doesn't work) the SPSS syntax to run automatically when I run my SAS programs!
Thanks dear friends, YA
Hello, I am trying to learn how to use the PCCF for my dissertation research. However, all I can find are directions on how to use the PCCF with SAS. I'm wondering if it is possible to use the PCCF with STATA, and if anyone has an instruction manual, video, or presentation outlining how to do this?
I have got the negative heritability (-3%) for Harvest Index in pooled analysis for RILs. But interestingly the heritability for harvest index was high in individual year (75% and 91%). My design was alpha lattice and 300 RILs were evaluated in two stress seasons for heat tolerance in Chickpea. Analysis was done in SAS proc Mixed model.
Could you please tell me why I got these results and if it is okay then how to interpret the results?
I am developing a questionnaire and first performing an exploratory factor analysis. After I have the final factor structure, I plan on regressing the factor scores on some demographic covariates. Since I am anticipating missing item responses, I am thinking of imputing the item scores before combining them into factor scores (by average or sum).
I came across a paper that suggested using mice in stata and specifying the factor scores as passive variables. I am wondering if this is the best approach since I read somewhere that says passive variables may be problematic. Or, are there any alternative solutions? Thank you!
Here is a link to the paper, and the stata codes are included in the Appendix.
I need to calculate the p-trend of age-adjusted incidences. The age-adjusted incidences were calculated using the direct method, so the numerators and denominators were large. I tried the poison model and found the p-trend was <0.0001, but the incidence rates seem to be non-linear, rather than an increased or decreased trend. Also, I found the value/df are very high (>3500), which should be close to 1 for better accuracy. Is it right to use poison model? Which model is good for calculating the p-trend of age-adjusted incidences?
input periods cases pyears;
1 10000 50000000
2 25000 50000000
3 20000 50000000
4 15000 50000000
proc genmod data=a;
model cases=periods / dist=poisson link=log offset=loga;
Hi Every one,
I am wanting to predict CVD risk in young adults (27-33 y) to associated with early adulthood dietary patterns. Is there anyone who can send me a SAS code to predict CVD risk using a Framingham equation?
Thanks in advance!
Do you have excellent knowledge of both SAS and Matlab programming, and would you be interested in collaborating on a manuscript that deals with Methodologies for Ensemble Forecasting, with application to fisheries population dynamics? You are preferably a MSc/PhD student with strong quantitative background.
I used my taper data to fit a variable-form taper model Kozak 2004-2 ,which is a nonlinear model. The data is longitudinal data that is irregularly spaced and unbalanced.so we need to overcome the inherent autocorrelation by using continuous-time autoregressive error structure CAR().I read some papers in which the authors use SAS /ETS to fit the models.Take A.Rojo2005 ,for example.In A.Rojo(2005),the author incorporated CAR(2) error process into the models to minimize the effect of autocorrelation inherent in the logitudinal data.I did like what Rojo said in the paper.When I add CAR(1) to the model, I can get the result of autoregressive parameter ρ1 .But when I add CAR(2),It is difficult to converge for ρ2.
Could someone can help me to incorporate CAR(2) into Kozak2004-2?
I add the paper A.Rojo(2005) .Thank you very much.
Here are my SAS codes
proc import out=work.taper
datafile='E:/zzs7.csv' dbms=csv replace; getnames=yes;
RUN; /*read data */
data fit_taper;set taper;
if p="f" then output fit_taper;
run;/*Select data for fitting*/
PROC model data=fit_taper method=marquardt sur dw collin;
exogenous bolt tht dbh;
endogenous dob ;
parms b0 0.9884 b1 0.9478 b2 0.0735 b3 0.4884 b4 -0.9783 b5 0.5511 b6 0.1 b7 0.0389 b8 -0.1579 p1 0.8 ;/*start ualue*/
fit dob ;
Can you help me a question? SAS can estimate 95% CI for HRs by using bootstrapping, but I don't know how to write codes for estimating 95% CI for HRs at a certain point time by using bootstrapping for time-dependent variables.
Thank you so much
I would like to compare survival among patients stratified by insurance status, where insurance status has 3 potential values (private insurance, Medicaid, uninsured). I would like to treat private insurance as the reference group and compare survival among Medicaid patients and uninsured patients to patients with private insurance; I am not interested in comparing survival between Medicaid and uninsured patients.
If I were to make this comparison using Kaplan-Meier curves, the generally accepted way appears to be to first perform a log-rank test to see if there is a significant association between insurance and survival. If that log-rank test is significant at p < 0.05, then I could perform post hoc tests where I treat private insurance as the reference group and compare Medicaid survival to private insurance and uninsured survival to private insurance using a p-value adjustment for multiple comparisons. In SAS, a reasonable adjustment method in PROC LIFETEST appears to be ADJUST = DUNNETT, which allows you to treat one group as reference rather than perform all pairwise comparisons.
If I were to make this comparison using a Cox proportional hazards regression model, the generally accepted way appears to be to treat private insurance as the reference group in the model and compute hazard ratios comparing Medicaid mortality to private insurance and uninsured mortality to private insurance. For those 2 comparisons, significance is then evaluated using 95% confidence intervals.
My first question is why are p-value/confidence level adjustments encouraged for Kaplan-Meier curves but not hazard ratios from the Cox model in this scenario (assuming my understanding of generally accepted practices is correct)? Shouldn’t the confidence intervals for the hazard ratios be made wider to account for multiple comparisons? This never appears to be done in the literature, though.
My second question is since the comparisons to the reference group are of primary interest rather than the overall significance of the independent variable, when using Kaplan-Meier curves, is it ever acceptable to treat the comparisons to reference as planned comparisons and not perform the overall log-rank test? In a Cox model, the Type III p-value appears to function as a test for the overall association between the independent variable and mortality, but I don’t think I’ve ever seen it reported in the literature, but log-rank values for the overall association between the independent variable and survival are regularly reported.
I crave your suggestions on the best analytical software for analyzing economic impact using input-output and multiplier analysis?
Which of these is most appropriate: Stata, EViews R, SPSS or SAS...or any other?
I welcome constructive suggestions.
Kind regards and stay safe 🙏
I built a regression model which has a total of 6 risk factors (regressors) and 7 two-way interaction terms. I need to perform a response surface analysis of my specified model which has a total of 6+7 = 13 terms.
However, When I am trying to perform RSM in SAS using PROC RSREG, it is considering all the linear factors, all the second-order factors, and all the two-way interaction factors. So, it is considering a total of 6(linear)+6(quadratic)+30(interaction) = 42 factors. This is the standard way I have seen most of the books/ research articles performing RSM. But don't want to include all the 42 terms. Instead, I just want to include only 13 terms which I have identified in my regression model. Is there any software that does that? Thanks, in advance
I have already analyzed this data in MS Excel 2013 but i'am trying to analyze this data with a statistical program such as SAS. if any one of you have a good experience in statistical data analysis with SAS or in R then suggest me a R or SAS procedure to analyze this data. Read the attached Excel file
I am conducting a meta-analysis on 42 studies. The outcome of interest in each study is represented by a percentage and 7 of them are 0% and 100%. There are 2 categorical covariates. My question is how can I do a meta-regression of using percentage (outcome) in SAS using proc mixed? Thank you.
since SAS university edition is out and freely available for non-commercial use (or at least what i understand), i want to buy a book for learning SAS for general statistical needs which i can also use for a desktop reference manual. I have basic knowledge of SPSS (thanks to discovering statistics using spss by Andy Field) and Stata (A gentle introduction to Stata by Alan A Acook). SAS version of Andy Field's book has some negative reviews on amazon which hesitates me. Any advice and experience is appreciated...
Test to compare 2 groups on a series of items.
I have a questionnaire with 100 items. For each of those 100 items, I have the proportion for each item of those who answered good or very good), by sex. So I have a column with the items, a column with proportion of women and a column with proportion of men. I want to know if the 2 groups are different in their way to answer good or very good for this questionnaire.
I was thinking to calculate the difference in the proportion and do a Wilcoxon signed rank sum test, using SAS proc univariate. I have never done it with this type of data.
Sample size too small for Logistic Regression. Was using SPSS but understand exact can be run in SAS. I have access to SAS studio
I want to know what are the predictor variables for the members who cancelled the account ?I have 7 independent variables and used binary logistic regression since dependent variable is categorical. Could someone give me possible suggestions to improve ROC which is 0.62 . I appreciate your suggestions.
An insect individual is placed in a box that has 6 sectors, each holding a different food. The insect is placed in the center of the box, and it is free to go in any sector. After it enter into a sector (first choice), it is re-placed in the center of the box for a second choice, and again for a third choice.
How can I test whether one sector is preferred over another? Also, how can I test whether one sector is preferred as the first choice?
I'm wondering whether to analyze these data using a multinomial model, as 'sector' is a multinomial variable. Alternatively, I'm wondering whether I should use Fisher's exact test.
What do you suggest?
Say for eg we have a bigger( about 1/2 million observations) data set on birth weights .
Depended variable being a low birth weight a dichotomous variable.
Independent variable-is different age group mothers ( say 4 groups 15-25 26-30 31-44, 45-59).
confounding variables :1) marriage 2) financial status 3) got insurance or no. etc..all of them are dichotomous variables (yes /no
want to know the association by calculating an Odds ratio. by constructing a model and analyzing via multi logistic regression. model ... say as below
LBW(dependent var))= b +x1(mothers age group/independent var )b1+x2(marriage/confounding var 1) b2+.confond var2 B3+.........etc
when I ran a SAS code including all variables with an aim to delete the confounding effect on my dependent var ..what is the best strategy to do..
1)is it stratification or controlling or adjusting ..
2)what is the difference between those 3/any of those are same.
3)how dos those 3 procedures/methods work and effect the OR of dependent variable
I'm sorry I'm new to statistics trying to understand basic concepts..
I have noticed that when using the proc phreg in SAS and the coxph in R in the same data, the model should be different in order to get the same results. In proc phreg I model the time with the censor variable (where censor=1 means the patient withdrew from the study) in order to obtain a HR. In R I use the same model but this time censor=1 means the patient experienced the event, in order to obtain the same HR. Is there a difference in the definition of the censor variable or the models are different?
I used to use SAS software, But recently I got acquainted with Statgraph 18 software. I think that's better. Especially in terms of drawing diagrams. What's your preference?