Asked 15th Jan, 2013
R or SAS: which one is the best statistical software used in medical field?
I am on the starting stage to study a software. I know spss only. I want to study a software either R or SAS.Some of them say R is better and others SAS is better. I am totally confused. Which one is best and easily understanding and highly used in medical field?
Most recent answer
I exaggerate. Over the years, SAS has grown to allow other paradigms of statistical analysis from those embodied in its original design. Similarly, R has evolved and moreover could be used as the engine inside a SAS-like system. However I beiieve that the original design philosophies still dominate the character of the present organisms
Moreover, R remains an open system which grows as new needs arise while SAS is commercial, closed.
Popular answers (1)
Top contributors to discussions in this field
All Answers (162)
I wonder how you narrowed down to those two. SAS is one of the most comprehensive statistical packages available. As compared with others it has the most powerful tools for manipulating data sets. Although it has added much functionality, SAS has preserved the programming lanuage and the idea that SAS should look pretty much the same regardless of what type of computer it is running on for at least two decades. In other words it doesn't look much better running on a PC than it does on some sort of mainframe computer running UNIX or something else. In other words, given that most users are now on PCs, it is a pain to learn. SAS's status as the killer in terms of statistical features among the commercial stats programs has been severely challenged by Stata. In a lot of ways Stata has surpased SAS, although not in data manipulation. Stata is an interesting hybrid. It is not open source by it has the ability to incorporate user-developed routines greatly amplifying the capabilities of an already expansive program. Like SAS, it is mostly run from a command line interface. Finally, R: R is actually free and open source. It is not pretty and has a steep learning curve. It has routines available to do just about anything, including things no commercial pacage does or things only expensive specialty packages do. Although there is no direct tech support from the company who makes it, there are lively R support communities who give advice, help fix problems, etc. Many of the routines for R are well known. Most routines have been thoroughly compared to commercial software that does the same thing demonstrating that R gets the same results, etc. Just like commercial softare, it might have a bug from time to time. Personally, if I were starting out with a new platform, I wouldn't be considering SAS. I would be considering Stata or R. Based on other answers I have seen on ResearchGate, I think you will get a lot of responses favoring R.
If you don't mind me asking, what are you trying to do in the end? Will your focus be on graphics or on tables? What are your resources? Who will be your customers? How much time would you be willing to commit to learning a new language? Is there a standard for data analysis in your particular area?
R has endless flexibility; however, SAS is great for data management and almost everything else (aside from graphics and certain complex models) and is much easier to learn (for most).
You can't really go wrong with either one, but I would highly suggest learning both, if at all possible, as they have both have strengths and weaknesses and knowing how to use both programs will provide you with the ability to approach problems that can not be handled with only one of these programs on its own.
To respond to Deepak's follow up question, I don't think that SAS is the only acceptable package, but R might not be completely accepted. The medical field is a bit unusual in that the name of the statistical package is usually disclosed for even trivial statistics. SAS is certainly accepted, but so are SPSS, Stata, BMDP, and probably quite a number of commercial packages. I have seen Stata make huge gains in academics and don't expect to see that change. Right now, if I wanted a commercial package to start from scratch, I would do Stata. I don't use it myself because I am nearly 30 years into another package, but if I had to make the choice now, it would be Stata for sure.
Hi again, Deepak. I was thinking about this more today. Is there a reason you can't stay with SPSS? If you did, you could add on R for special statistics that maybe can't be done in SPSS or you might need to buy more modules for SPSS in order to do. The biggest problem with SPSS is usually the cost for modules, support, and frequent upgrades. R would give you lots of options to do things like multilevel modeling, structural equations, propensity scoring, not to mention Bayesian statistics and simulations should you need those things. Another option if you want a user-friendly package like SPSS, but cost is a problem is to consider SYSTAT. There is a special link for SPSS users at http://www.systat.com/SwitchToSystat.aspx. You might be interested to know that Systat version 13 was actually written in India (Cranesoft in Bangalore). I had an opportunity to interact with some of the programmers and statisticians during the beta test. There is a free 30-day demo version and a completely free version called MYSTAT, that is based on version 12 and is limited to 100 variables and lacks some of the advanced statistics, but even MYSTAT has more statistics than the SPSS base. The graphs available in the SYSTAT family are very similar to those in SPSS, which means better than most. There is no reason that you can't do most of your work in virutally any mainstream stats package and use R for stuff the stats package can't handle. I use SYSTAT that way, and most of the rest of my work is in HLM (Scientific Software International), but I have also used Mplus (www.statmodel.com) and other software over the years. An interesting free stat package is MicrOsiris from the Univesity of Michigan Survey Research Center, where they also have IVEware, one of my favorites for the imputation of missing data. Of course, I have support of colleagues who work in Stata, SAS, and R, although the capabilities of all these packages overlap a lot. Frequently our papers cite two (sometimes more, especially if one is IVEware) software pacakages used. I hope this is helpful. Bob
thank you Robert. actually i wanted to do SEM and confirmatory analysis. i have no add on packages of SPSS like AMOS. thats why i wanted to study another software that will have all the options of SPSS and some more techniques also. i wanted to study it at the earliest. so i am totally confused what would study first SAS or R
Hi Deepka, R will give you all those things. SAS is a costly annual license although the academic license is not so bad, but I have no idea about India. The module in SAS that does SEM is not great but not terrible. I have only really looked at it once. It should also do CFA. To me Mplus (www.statmodel.com) is the leader in those areas, but it is expensive. It's a program you buy, but the technical support requires a license after the first year. I understand that there are pretty good routines in R for these things, but I haven't seen them or worked with anyone using them, mostly we use Mplus. SYSTAT 13 has a relatively new module (included) for CFA. The SEM program in SYSTAT, RAMONA (also does CFA) is relatively easy to use, but is old. The biggest problem there is the lack of model statistics all the more up-to-date programs give. If you can keep your access to SPSS for the basic tasks, I think that using R for advanced statistics would work. The most important thing is to get hooked up with the on-line support communities for R, because there is no tech support, or if you can afford it, consider Mplus alongside SPSS. Mplus is really brilliant. They offer training courses, even in Asia sometimes. Bob
Deepak, I don't do those things very often. My latest papers of anything like that was a latent class growth analysis "Trajectories of Internalizing Problems in War-Affected Sierra Leonean Youth: Examining Conflict and Postconflict Factors" and an SEM called "Conservation of Resources theory in the context of multiple roles: an analysis of within- and cross-role mediational pathways" (although this paper just came out, we wrote it more than 2 years ago), both done in Mplus. I have a CFA called "A Closer Look at the Measurement of Burnout" and anohter SEM "Fit as a mediator of the relationship between work hours and burnout" done in SYSTAT. Finally, " The relationship between job experiences and psychological distress: A structural equation approach," which was LISREL. All of these are full text on my ResarchGate profile. I don't think things are done much differently in medicine or healthcare except to say that investigation of measurement is relatively new in medicine as compared to psychology; it's more widely known in psychiatry. I don't know much about publishing in healthcare at all. I don't read much about that.
My recommendation would be to start with R - it is free, so if you don't like it, you've lost nothing. If R doesn't give you what you need, then move on to SAS or Stata. Stata seems to be easy to understand and use. I recently saw someone turn it on for the first time on Tuesday morning, and by Wednesday afternoon she was comfortably writing syntax and manipulating data.
Good luck with your analyses.
Based on the simplicity of your question, I would say SAS.
Not that SAS is simple. It's a pain in the SAS to learn. But it has all the standard statistics. If you like memorization without understanding - it is perfect. R requires a minimum of understanding of programming and logic. It offers much more convenient flexibility. SAS is widely accepted as a standard (I suspect some payoff may be involved), but it is not bug-free. Neither is R. If you are barely literate in my field (statistics) and want to make money or impress your friends who don't really know statistics, then become another pain in the SAS.
Yeah, it's all true, but I don't see any compelling reason to give up SPSS, and R already gives options you can't get in SAS. Plus in many academic areas Stata is either displacing SAS or already has (economics, for example), I was around when SPSS was king and SAS knocked it out (before there was even such a thing as a personal computer), so I know what this looks like. So between R and Stata, I don't see any real reason anyone needs to start with SAS at this point in time. I remember when even BMDP was bigger than SAS. If you see where SAS advertises, they are moving on, too. They are focusing much more on their business products and less on their academic ones.
Besides being free, R has the advantages of multiple ways of doing things (for example, several packages for mixed models) and very good support via mailing lists -- in fact, package maintainers such as Douglas Bates or Frank Harrell often respond to questions on these lists.
I would recommend to use R: it can perform a huge amount of statistical analyses due to the excellent packages that are written for R, and it is free to use. However, when you are really interested in performing CFA and SEM, I would recommend using Mplus (http://www.statmodel.com/). In my experience, Mplus is better suited for these types of analyses than R or Lisrel.
SAS with no doubt ! SAS is a real statistical software based on the paradygmo of data matrices having as rows the statistical units and as columns the corresponding variables. Moreover in SAS any new analysis made on the data producing new statistical units descriptors (e.g principal component scores, classification into clusters, probability scores from linear discriminant analysis..) are directly added to the data matrix as other variables (columns).
Moreover SAS does not require programming if not the logical sequence of:
proc 'name of the procedure' data = 'name of the input file';
this allows to concentrate on the real thing and not on accessory and distracting elements like program listing, the real thing is to explore data structures not build programs ! Moreover any statistical procedure you can imagine is in SAS.
Many people I know use both R and SAS. The strength of SAS is that it is a powerful tool for data management. When you have very large databases that require lots of massaging/manipulation, it's great to have SAS at your fingertips. But R-enthusiasts claim that SAS is behind a few years if you're interested in doing 'modern' statistics. There are a lot of new components added (for free) in R that SAS will take much longer to implement at a cost. So ideally, you can conduct all your data management tasks in SAS, export your data, and do your fancy statistics in R. En viola!
SAS in fact is easy to learn, a lot easier than STATA. You learn simple commands and then you keep adding some more, but it is very intuitive language. You keep your editor very easy, SAS dont bother if you write with capital letters... Any person can learn SAS, I teach for my dental students and they had no problem with basic commends. R is free but more difficult for those who are not into the world of math and pure statistics.
It is some hard to answer this question. Because we could not say that which software is better for a field. It mostly depends upon the type of analysis that you want to perform. Actually SAS is very strong regarding network usage and works better in case of large data in large-scale studies. In addition, SAS is more visual than R that makes its use simpler for those who are not familiar with programming languages. On the other hand, every software could perform some kind of analysis in its simplest way, and generally we could not say which software is better than the other one. hope for you the bests.
If you go for high-throuput data analysis in medicine like all these "omics" things, there is no way getting around R. When medical research enters "systems biology", one needs to simulate and analyse dynamic networks, and this is right away possible to do with R as well. If you stay in fields of "standard applications" in data analysis, it will be a matter of personal prefence what to use.
R is a free software and it is a strong advantage from my point of view. Moreover, R can be enriched with various free packages designed specifically for specific purposes. E.G. for genetic analysis there is the package "genetics", for haplotypes analysis "Haplo.stats". Each package has a good user's manual. On the other hand, R is not immediately friendly. It takes a while before getting into it. Worth a try.
I dunno; I"ve programmed in S-Plus (the commercial, costs-money version of R) and I'm a SAS programmer of many years' experience. Both have learning curves. But to manipulate and manage data, I would prefer SAS. SAS ver. 12 has a lot of powerful analytical and data management capabilities, and I disagree that it's at all 'behind the times' in terms of statistical analysis.
R and SAS require you to think about your data a bit differently, because R acts more like an object-oriented language. SAS will remind you of SPSS, although SAS draws its user syntax from PL/1, an IBM programming language from the 60s/70s. When SAS Institute adapted SAS for use on other OS platforms, it chose, wisely, to keep the user syntax the same so that current users wouldn't have to relearn.
However, SAS is not cheap. If you have to buy it yourself, then check out less expensive alternatives. STATA is cheaper, and R is free
As an addendum to my comment above, where do you expect to work and/or study? If the places you're looking at are predominantly SAS shops, then learning SAS would be a very good (if not necessary) idea. While it's true that many places use R, SAS and SPSS are still the go-to packages chosen by institutions for data management and analysis.
I used SAS for many years and find it very powerful and well developed. R will be more and more popular and I think in the near future, R will be the top one.
R is not only free, it is more powerful than SAS, SPSS and STATA http://www.analyticbridge.com/group/productreviews2/forum/topics/product-reviews-comparing-r-matlab-sas-stata-spss
It depends on the accuaracy you want.
Look at a comparison of packages for computing the sin(x):
Unable to compute sin(x) when x>3*10^15. see the log below:
>Warning # 625
>The absolute value of the argument to the SIN function is too large. The
>largest permissible value is about 3 * 10**15. The result has been set to the
>Command line: 24 Current case: 10 Current splitfile group: 1
It is accurate until x=10^9.
Unable to compute sin(x) when x>10^18. see the log below:
. set obs 20
obs was 0. now 20
. gen x=10^(_n)
. gen y=sin(x)
(2 missing values generated)
It is accurate until about x=10^9.After 10^10 the error increases.
while sin(10^18) = -0.9929693
It computes everything but it looses accuracy after x>10^16:
(the true value is 0.8582728)
(the true value is 0.779688)
(the true value is -0.9929693)
(4)Libre Office Calc:
It is accurate til 10^13, then do not work:
For comparison reasons the sin(x) with 32 digits accuracy are:
For floating point arithmetic packages we conclude that in accuracy
R and LibreOffice have better performance than SPSS 20 , STATA 11 and SAS 9.2
The problem is that many statistical software packages are too expensive while their 'engine' for floating point arithmetic computations is too poor. Why to pay so much for a program that cannot compute with accuracy even a simple function like sin(x)?
I don't need necessarily the degree of accuracy that Demetris needs - I need it for stats not maths, ie estimation rather than accuracy (such a sweeping generalisation!). So for me that is less of a concern.
But I do need to be able to manage, import, sort & merge lots of data sets, handle many thousands of records with hundreds or variables, write code to make new variables, do statistical tests & make plots & charts, and modelling,
R can do those last things well, though it helps to understand more stats. And I agree with Sandro that it is the way of the future for stats analysis, but I don't know how good it is for large-scale data management.
SAS has powerful data management features, & I think it is this that has led to its routine use in many large corporates and government departments, as well by researchers. Wider job opportunities for you if you can write SAS.
Stata is more friendly to use than SAS, much better Help screens, but again I'm not sure about data management.
There is also EpiInfo, available free from WHO & CDC - more limited in range of stats but comes with data entry, edits etc facilities.
And SPSS that you know already.
I like the link Jason gave - though I'm not sure all SAS features are well represented.
So I suggest you consider not only your present needs but also what your uses of software are likely to be in the future and your data management needs. And of course what is available to you and is supported in your setting.
When learning, there is quite a bit now on YouTube and other websites - you may find short summaries introducing you to the various features of the package you choose.
I'll be interest to hear what you decide, and why.
In response to your doubt on the ability of R to manage large scale data, I tell you that I workwith microarray data, which implies more than 30 thousand variables per sample plus the enrichment analysis, which correlates different databases with your data, working online. All of this in a regular personal computer.
I'm sure that SAS is a very powerful software and may be the most powerful today. But, as a proprietary software, will be difficult to keep up with R in the next few years. And it is so expensive!
Wafik, you have the correct of it. I'm unsure if there is a "best". Use what is available at least cost. If it does not have the bells & whistles you need, use what is next-most-available at least cost. Keep in mind that learning curves for each kind of software can be expensive too, in terms of time and effort. I use SPSS for this reason. Also available are SAS, Epistat, R, Stata, LISREL, EpiInfo, likely others. (I understand Excel also has some statistical function). "Best" is a matter of choice, opportunity and need. But even before any of that, let's discuss research design issues, sample size and power estimation. Who uses what to plan experiments?
R plus SQL or Python and you'll have all you need, and it won't cost you a thing except your time. I use R for any serious stats work as well as data viz; R's ggplot2 is a brilliant package for making some great visuals (and SAS graphics are STILL a joke, even after the many years they've had where they could have improved them). I still use SAS to manipulate data because a) we have it at work and b) I know how to, but I am moving into using SQL and Python more and more so expect to say "buh-bye" to SAS in the not-too-distant future.
Working with user-friendly programs like SAS or Excel is sometimes dangerous.
Look what happened when Carmen Reinhart and Kenneth Rogoff were wrong in their paper because of an Excel mistake.
I am reproducing from the attached link:
"This is a big deal because politicians around the world have used this finding from R&R to justify austerity measures that have slowed growth and raised unemployment. In the United States many politicians have pointed to R&R's work as justification for deficit reduction even though the economy is far below full employment by any reasonable measure. In Europe, R&R's work and its derivatives have been used to justify austerity policies that have pushed the unemployment rate over 10 percent for the euro zone as a whole and above 20 percent in Greece and Spain. In other words, this is a mistake that has had enormous consequences."
Here in Greece we know about austerity measures.
Now, finally, we learnt that these measures were due to an Excel mistake!
And, what about Stata? I would like to know your opinion. At Spain some research groups are using Stata and they have a reallly good feeling with this software. I personaly use SPSS only but I´d like to know alternatives to ti.
Hello Juan. Like you, I use SPSS because 1) I have used it since it's inception many years ago and know it well; 2) it generally produces most statistics I require in my work; 3) it is or has been available to me at little or no cost the entire time; 4) it interfaces well ("imports") most other kinds of data formats. If one of the criteria above aren't met by SPSS I generally resort to STATA because it offers something (usually #2) SPSS does not. (This could easily be SAS for the same reasons except that I have not used it and do not care to expend the time to learn how.) In extreme cases of statistical need I have used LISREL because neither SPSS nor STATA perform the requisite statistical tests. There many other options as well e.g. -R, -S, Epistat, Systat, BMDP, EpiInfo etc. I recommend though, you use one system as a "base" tool meeting requirements 1-4 above and select alternatives based on your own needs/abilities/preferences. I cannot speak to "errors" as described above as they likely result from investigator decision-making including choice of statistical software. I'm certain other folks have their own opinions but I would caution anyone using "best" in their description because it is at the end of the day a subjective criterion.
As implied in previous comments spss seems relatively easier to use and i resort to others when spss cannot carry out the required analysis particularly multivariate graphical plots.
I would recommend R-software because its a freeware, but if cost is not an issue to u then SAS has a bouyant statistical capacity but not a very encouraging user-friendly interface.
My alternative softwares are Minitab, Statistica and Genstat which have an above average capacity to carry out most statistical analysis particularly analysis relating to biology and ecology.
Thank you for your answer, Dr. Holden. I´m relly in accord with all your opinions. Your comments are very useful me and confirm me in the fact that the attitude I've taken so far may be correct.
If you have enough information about synthax language, R is the best over the other software such as SPSS, STATA, Minitab except SAS. SAS is similar softwareto R as its base oppornitues.But you should look forward to SAS is very high price software while R is free. Also version development of R is very quickly compared to SAS and other software because of R developer also researcher on special fields. If you consider the all items about R and SAS, you should start the R because of it is more creative then SAS. I could definely sugget to you R as a person started with SAS.
With respect to determining the best use of SAS Software (when/how to use/apply SAS Software), you first need to answer a couple of questions about the nature of the data that you wish to analyze. The first question that you need to ask is, "what is the format [file type] of the data that you want to analyze?". SAS can be used to analyze data stored in a number of file formats, e.g., *.txt, *.dat, *.csv, *.xls, *.xlsx, *.dbf, etc., which gives the user great flexibility in how to approach their analysis. The second question that you need to ask is, "how large is the dataset you want to analyze?". I have used SAS to analyze datasets having approximately 4,200,000 rows (observations) and at least 14 columns (variables), so SAS is capable of processing extremely large datasets with relative ease.
Once you have answered these questions, the next step would be to determine the type of analysis you want to accomplish with SAS. SAS can be used to run just about any type of statistical analysis that you would want to perform (e.g., T-Test, ANOVA, Regression Analysis, QQ-Plots [quantile/probability plots], Pearson/Spearman/Kendall Correlations, Boxplots, Histograms, Cumulative Distribution Function plots, etc.). SAS can also be used perform linear programming analyses to find maxima and minima for a system, based on stated system constraints. Along with the support.sas.com website, you can perform a Google search on the type of data analysis that you want to accomplish using SAS and you will find publications from SAS user's groups, written by experienced SAS users, that generally provide useful explanations/clues on how to approach your task(s). A large number of health researchers, environmental researchers, social scientists, pharmacological researchers, and epidemiologists use SAS as their statistical analysis tool.
R is an extremely powerful tool for statistical analysis with great flexibility, has a large user/support community and is free (no cost). You can download R onto your computer (I have it on my computer) and begin work. There are some good books that you can investigate to learn how to use R in your work. I am recommending some R books to you from my personal library (see below):
1. Introductory Statistics with R
Author: Peter Dalgaard
2. The Art of R Programming
Author: Norman Matloff
3. A First Course in Statistical Programming with R
Authors: W. John Braun, Duncan J. Murdoch
4. R in a Nutshell
Author: Joseph Adler
The SAS community realizes that R has important functionality that can be integrated into SAS programs. I am including a reference (journal article) from the Journal of Statistical Software (January 2012) that shows how R capability can be integrated into SAS using SAS macros.
Both tools (SAS and R) have excellent capabilities and are complementary, so either tool that you choose will be able to meet your analysis requirements. This is one of the situations where either choice you make will be the correct one for you.
Go for [R]!
It is not only free, but also open sourced. Greater potential for growth and development comes from open sourced software with a strong contributing community such as [R]. I don't think in the long term, commercial software can compete with the continuous development and updates generated by the user community.
As a student, I find very much comfortable to work with R. It's easy to use and provide useful packages for many applications of statistics esp. in non-parametric test and the like. However, SAS provides detailed output when you run any data in it. But for me, R is more applicable in medical field.
I like SAS very much because I can use batch mode to run SAS program, which it doesn't use any computer resources during writing program. I use Textpad to write SAS program and call SAS to run it and then the result will be in LST file, log in LOG file.
I am not sure if R can do it.
use "External Tools" to call SAS
SAS is not only a software for statstical analysis, it also can be used for data management. If you have to merge data from different files, SAS will do the job. You can also safe data from an analysis (for example mean values or the slope of a regression) and merge them with other information. For example you can combine the reaction of a person to a certain treatment with its age / BMI ect.
This makes it a very flexible tool. The main problem has been mentioned before: the price. best wishes Irene
As reported in previous answers, I would recommend R, too (although medicine is not my research field, I work in biostatistics). It is a powerful open source (and free) software, in constant evolution. Moreover, in my opinion R is much more than statistical software, but also a programming environment. I would say that you can conduct any analysis you may need in R. Personally, I have conducted some analysis in R that could not easily carry out in SAS (e.g., complex nonlinear mixed-effects modeling with several hierarchical levels and multiple random parameters).
It depends, someone is easy to use R but another will be happy to use SPSS or may be SAS, Or some would like work with MSTATC or Statistix or Stata. The software which is easily accessible go for that or may be you can use different software for finding the solution of your problem . Each and every software is of great importance on its own place.
It depends, R may not be very suitable if you need to deal with large data, as by default it loads data into memory for processing. Under windows OS, it will mean a maximum of of 2 or 3 GB (or 16 GB if it is 64bit version). I believe for statistician or data scientists working in medical field, they would prefer SAS for another reason: it is a commercial software supported by a big company, which means in the event when some goes wrong for the app which happens to have something to do with SAS implemented models or algorithms, there may be safe buffer.
R- being a open source program has become sophisticated for statistical data analysis with tons of user feedback and developers (unlike S which was a older version of R)... Any statistician, Bioinformatician these days would ve started with R and eventually moved to SAS.. SAS has lot of features that R doesn't have just because you pay a lot to get the license... STATA is a equally fun and user friendly program like SAS... If you are new to R / SAS .. I would recommend you to use R initially so that will get to learn a lot.. If you are stuck with an error in R/SAS "stackoverflow.com" is a good choice to get help...
I prefer using R because I find it easier to program in R. But both programs have their own advantages.
R is definitely more cutting edge than SAS, because the software is open source and thousands of R users can submit new features to R to quickly add new analyses, graphics and functions to the software. If SAS wants to add a new feature, then they must pay statisticians and software developers to create the new features and test them prior to release. This development process will take months or even years. Most like the new features will not be available until SAS releases a new version or a new expensive "add-on" software. That's why professional statisticians, statistics graduate students and bioinformaticians do most of their work in R ... because R makes it easy to share your new methods with the rest of the world.
The main advantages of SAS are customer support and its handling of large data. SAS is very expensive. It probably costs more than $1000 USD to purchase the base SAS software and each add-on package can costs hundreds or thousands more. A company could easily spend $10,000 or more on SAS licenses for a single researcher. However, all that money buys you world-class customer support. If something doesn't work right in SAS, then paid customer support representatives will help you over the phone or by email to resolve your problem. I've had a number of interactions with SAS customer support and they are mostly pretty good. If something doesn't work in R, then all you can do is post a message on the R mailing list / bbs and hope that someone chooses to help you in their free time.
SAS is also slightly better at handling large data than R. By default, the R software stores its data in the RAM memory of your computer. For most people, that means you will only be able to store about 2 GB to 16 GB of data in R. The base SAS software stores your data in "virtual memory" on your computer's hard drive. That means you could easily handle 100 GB or more on a relatively cheap and old Windows PC with a fast hard drive. There are options you can use to expand the data handling capabilities of R, but using the default settings ... SAS is better at handling large data. Both SAS and R can manipulate data via SQL queries on large databases, so the differences with respect to large data are not terribly significant ... but I would give SAS a very slight edge.
I've been using SAS since 2000 and R since 2004. When I started using SAS back in 2000, it seemed like the average SAS user was still some kind of a statistics geek. The annual SAS User's Group International (SUGI) conferences mostly featured statisticians and other researchers focused on quantitative work. However, over the last 10 years it seems like SAS has largely abandoned academic researchers to devote most of its time to big business analytics. Their annual SUGI conference became the "SAS Global Forum" in 2007 and since then many of their keynote speakers have been business "motivational speakers" in the mold of Tony Robbins. Seriously, they had former NFL quarterback Joe Theissman as a keynote speaker in 2012 and baseball general manager BIlly Beane for 2013. The real SAS geeks are meeting and sharing their papers in much smaller regional conferences like "Northeast SAS User's Group" (NESUG) and others, while the official company-wide conference seems to cater to non-quantitative MBA-types who can make decisions about large license purchases..
I point this out because there has been an almost ideological shift in how people choose between SAS and R. For the most part, if you are an academic researcher working at a university or if you belong to a small startup company doing cutting edge bioinformatics, then you will need to use R. If you work for a large corporation (drug company, etc) or a large government agency that has already bought into SAS, then you will continue using SAS. Other software companies like Stata, SPSS and Minitab are mostly niche products with much smaller communities, in my opinion..
Depends on the purpose. If you are interested in doing "standard" analyses, then any software is good and will serve your purpose, given your available budget. However, many studies, and exploration of data, are not standard. Visualization and data mining are very important, as are working with high-dimensional data in this omics age. SAS has improved greatly, but the typical SAS installation (without Enterprise Miner) is a bit hamstrung for modern needs in modern medicine. If you have the money, SAS can do most things, and its graphical capabilities have finally entered the 21st century. One disadvantage I find, is that it is hard to leverage other tools that are out there, perhaps more suited for a task, from within SAS. One exception, interestingly, is R, which can be accessed either using a macro or via PROC IML (which is not a favorite proc for SAS users)
My take is that SAS is the old warhorse, with, for all its cost and "enterprise level" promise, is not that much better than R, and in some respects worse than R. Given the flexibility needed in current medical research and practice (think decision support and personalized medicine) as well as the changing demands with data volume and velocity, R is probably a wiser choice going forward.
Just one comment to another of the commenters: SAS output is parsimonious?!! It will give you every possible test under the sun. I've had new users (non-statisticians) asking which is the right one to use; there is no guidance from the manual or the software. Not giving you everything is sometimes a good thing.
Our working group often works with very large data sets and complex designs (multiple split-plots, nested and repeated with different random factors - and everything in the same analyses). We have also many very experienced R-Users which were not able to develop running models for these data sets in R. Today, there are very powerful Procs in SAS (Glimmix, HPMixed) which have been developed to analyse very complex designs with complicated data structure.
To me, the primary reason for picking R or SAS is whether you want to work in a bleeding edge environment (R) or a stable development environment (SAS).
There are core (relatively stable and well documented) R packages provided by the R-Project and many more user provided packages which vary in quality up to the level of extending the influence of new research in statistics. Fundamental changes in the software occur roughly every six months and upgrading may break your old programs.
SAS adds new scripting languages and procedures but the old stuff works much like it always did and a ten year old program may well run and provide the same results it did ten years earlier. SAS can do just about anything... as long as the SAS programmer is willing to spend the time making it do anything. If you're a pharmaceutical company taking decades to develop new drugs for submission to FDA, this kind of stability may be helpful. It may not be as helpful if you're just publishing on the cutting edge of bioinformatics.
Secondary considerations are the size of your datasets and price.
R can only work with datasets that can be loaded into the random access memory of your computer. On the other hand, it doesn't cost you any money.
Some (but not all) SAS procedures will develop summary statistics one record at a time so that you can work with any dataset that you can store on a device. The IRS finds this to be helpful. For a single user of SAS, the cost ranges from thousands to tens of thousands of dollars. For institutional users of SAS, the cost per user goes way down as the number of institutional users increases.
Ease of use is not a consideration. Both R and SAS are openly user hostile. 8-)
Depending on the work at hand, I have used both R and SAS extensively. I prefer R.
I have been using both SAS and R for quite a while and without any hesitation I would chose R. It is open, free and flexible and you can very easily share your development with the research community. With an environment such as Eclipse or RStudio (both free) you can do pretty productive software development. As a researcher I would go for R.
I have been using both R and SAS for years, you probably need to use both of them at some point in time. SAS is commercial software and has been out there for years and years and many jobs (especially, pharmaceutical companies) require that you have a decent knowledge of SAS. However, there is an advantage of using R for graphical manipulation over SAS. R is free and easy to use. SAS is very well developed and documented, tons of procedures. So depending on the work at hand, for simple work, they agree totally and there will be no difference (Go for R). For advance programing, it depends on the type of research question at hand.
R is a trend. It has a good name and therefore seems the precise choice, however not a good choice. Let me help you. SAS actually cares about statisticians. They provide me in their Academic Research program which is global free SAS courses equivalent to up to $11,000. They also offer every professor free teaching materials and use of their SAS on demand. I have met the leaders when I delivered my SAS paper at the SAS Global forum in San Francisco. They gave me books as well as free software. They write to me to ensure I am doing well and offer me yearly extensions on my programming courses if I am too busy with Ph.D. research. Now, you tell me which company is better to use. I can guarantee you that Dr. Goodnight delivers. His entire team at SAS are personally involved and they know me and care about me. They even offered to write a success story about me when I spoke at San Francisco and I wanted to wait until later when I achieved more. They even cared about my baby Gabriel and did an interview with me and discussed how Gabriel is just 2 and sent me the entire K-12 curriculum free. I can personally guarantee you if you get involved with SAS you will never look back. They are my personal trainers. The people involved I know by name and when I met them they knew my name and all my details. You just do not get this type of attention, support and help. They are worth what they charge and if only investigated you also can become a free consumer of all their training. SAS is amazing and I love their company. What they have done for me personally, given me in training and also personal support at the conference and online. I can get in touch with Julie Petlick and she will meet all my needs with SAS and the training. If you do not get invested in SAS you are missing a major educational opportunity. Statistics is a life long study and they are they with you on your journey to being an effective statistician or researcher.
Patrice, I have to disagree... It seems that SAS is very interested and cares a lot about statisticians. But we are not choosing a retirement plan. There is no doubt that SAS is an excelent software. But R is free, always up to date and can be used by the biggest and smallest companies. So, it is powerful, updated and democratic. You can choose any field of statistical analysis and you will find a package in R, no matter how new it is. And if you don't like the way the package perform, you simply change it. So, R is flexible too!
Of the many discussions of more stable programs (SPSS, SAS, etc) and open-source programs like R, Charles White's is the best I've seen. Though I'm a hardcore R-user, he makes a great argument for adopting the historical approaches on one's own field.
For interested users, this particular discussion has surfaced before on ResearchGate:
Once again (and like others), I caution Patrice's approach to this question. As Charles White has addressed in a couple of ways, SAS no doubt has its place in business and a number of other disciplines that (1) demand a stable working environment, and (2) are more conservative in the speed at which they adopt new analytical tools. This is not to say that these are fields that stagnate; rather, these may be fields--like medicine--in which rapid adoption of a new analytical tools may be difficult to deploy or may be altogether detrimental to users. From my own experience, I would guess that most current scientists are willing to give new analytical tools a try, after reviewing their advantages and disadvantages. After all, most of us are constantly searching for modeling frameworks that best match our data (i.e. matching model assumptions to our data). So, once again, we're faced with rapidly-developing R and relatively stable (aka static) programs like SAS and SPSS when considering overall functionality. For this, here's a resource showing that R now has >31,000 functions compared to SAS's 1,100. And, they're being developed more rapidly for R due to its open-source structure. Sure, arguments could be made for SAS having additional functions for various outputs, etc, but that's still quite a beating that R is giving SAS. Here's the article with the amateur analysis:
Re: "R is a trend" (Patrice Rasmussen above). From a review of Google Scholar citation,
R does show a upward trend, whereas SAS shows a steep decline (also a trend). This can be explained in a number of ways, and I realize the downfalls of such a sampling methodology. And, of course, the absolute number is still awe-inspiring (though I only know two people in my field who actually use SAS). Still, given the lack of additional information on R versus SAS usage among various fields (and my current laziness to procure any), these data suggest that R is on the steady rise and SAS is starting to wane.
Patrice, I also must ask again, just to prove my and others point: What did you personally pay for your copy of SAS? As a scientist who interacts and mentors several with several Latin American students with little access to such expensive resources, I think it's a requirement to consider return on initial investment (especially when one is using, say, taxpayer's dollars to build educational infrastructure).
And, to belabor the discussion, a nicely summarized comparison of SAS and R:
To this, I add: R has more concise code (aka fewer lines), which is better for those of us wanting to distribute reproducible code. (I limit myself to base R to avoid some of confusion of function names, etc., but the code is often still smaller than SAS--in terms of numbers of lines).
I've already recognized SAS as a powerful player in many fields. My only criticism is of the hard-liners who are intent are naming it the only/best option in town.
J. Patrick Kelley,
Thank you for the kind words you used to describe my earlier post and your analyses. The link you have provided is interesting and relevant but obvious marketing of software services implemented in R. So, I'd like to make a few comments with regard to what the link says about documentation, what the link says about cost, and expand a little on my agreement with your discussion of "the best option in town." I tend to write strongly but I recognize the following are only my opinions.
Documentation. Free SAS documentation appears to be of significantly higher quality than free R documentation. Both sets of free documentation are available on the web for your own evaluation. However, SAS and R both have high quality statistical authors who sell books related to the software.
Cost: To me, the single biggest cost to using any software is training the user. Software, classes, and books are obvious but seriously look at the user’s hourly wage, factor in overhead costs, and factor in the amount of time spent maintaining the software. Time spent actually using the software is time spent learning how to use it better. If you’re going to encourage someone to change software, please recognize that you’re asking them to give up valuable training and the new software needs to have a reasonable potential to provide that user/organization with more value.
Best Option in Town: If you start with no history of software that does what you need done, I recommend making a list of what it is you want done with new software, finding every software package that does what you want done, and asking around about the cost, available service, user community, and available training. Of course, most people reading this message will have a history of software that does what they need done.
As I have said before, I use SAS or R depending on job requirements but I prefer R.
@ Chuck. Very good and informative message. I enjoyed reading that. I kept expecting to read some very strong verbiage (as you mentioned), but I saw nothing!
I agree with all of your points. I do think you're right on about the cost of training the user. The original poster's question--like many other questions---here on ResearchGate are inquiring about program recommendations as if the user has no previous experience with a program in a particular field. The cost of training certainly is relevant to those already in a field that may already have a history with SAS or another program. I have no argument with that. But, for a new user, the cost of procurement trumps the cost of training.
Depends on what is the objective of learning additional package. If it is for using for your data analysis, you may for R because it is available free. If you want to enrich your CV for better job prospects especially in corporate sector, go for SAS because a good proportion of corporate sector possesses and uses SAS and therefore require SAS personnel. According your investment on learning the package will less for R than the same for SAS
I've worked with both R and SAS and generally find R easier. That said, when I was working in industry and had access to dedicated SAS customer support services, I found the quality of SAS people and their commitment to customer support just phenomonal - several times I had gone to them with modeling questions and it was akin to hiring an experienced consultant. I can really appreciate the value that SAS as an entity adds to the field, but at the same time it is almost always easier for me to explore problems and formulate/code up solutions in R.
As for data management, I've found now that R connectivity to Oracle, MySQL, Postgres databases is really quite comparable to the SAS ACCESS add on, using ROracle and related packages, though this was not always the case.
Memory limitations can be an issue though.
Anyway I'm probably being too long winded here - my vote is for R.
If you know SPSS (and I mean really know SPSS, as in you write your own "syntax"), there is no reason to learn SAS. Anything SAS can do, SPSS can do. In fact, over the years, SPSS has adopted a lot of SAS jargon (Type I and III SS and LSMEANS are just two examples).
If you're going to learn a new package to add skills and analyses not easily available to you in SPSS, you should learn R. R is an order or magnitude (or two) faster than SAS PROC IML in my benchmarking. R is a much better platform for bootstrapping and Markov-Chain Monte Carlo methods. R's graphics are still much superior to SAS, especially for publication purposes.
I know (as in, I can and do program in) SAS, SPSS and R. SAS simply doesn't bring enough new tools to the table to be viable (vis a vis SPSS). R, by contrast, does bring a fair amount of new techniques (not least being a good implementation of Lee Wilkinson's Grammar of Graphics ideas) to the party.
Unless you have strong reasons for learning SAS, pick something that gives you options you didn't have before. When should you choose SAS? Well, suppose you've taken a job in the Pharm industry and part of your job is doing analyses for regulatory submission. R isn't (last I heard) on the regulator's approved list. Given its rate of change, I don't expect to ever be there. In that case, you need SAS, because SAS is what everyone uses.
Dennis Cason, there isn't any regulator approved list of analysis programs for submitting to FDA. There are standards for software validation and change management, which are applicable to software from any source. Some FDA regulators are also R users. The advantage of SAS over R for FDA regulated studies is that SAS makes a point of being stable over decades while R (my software of choice) will gleefully break old code if members of the R-Project core group believe it will improve the software over the long term. Regulated projects can run from 5 to 20 years. R releases updates roughly once every six months. For more (cryptic and convoluted discussion) on software validation for FDA Regulated studies see: General Principles of Software Validation; Final Guidance for Industry and FDA Staff
Point taken, Charles. Wasn't there an effort to bring R (or perhaps some fork from R) into compliance with the FDA Guidance? If that effort has borne any fruit, I'm unaware of it.
But I believe my point remains valid: the rate of change (and the change policies) of the R code base are such that R in its present form isn't a viable tool for regulated studies.
Anyone who doubts that should look at the help wanted advertisements in AmStat News. Regulated shops seek SAS proficient programmers.
Regulated shops absolutely seek SAS proficient programmers.. Guidance for using R in FDA regulated environments can be found at: http://www.r-project.org/certification.html
I'm a professional and I will use whatever tools (software) my clients will make profitable for me to use. The general purpose statistical software I have experience using includes SAS, R, and Minitab. OK..., I've used SPSS but that was almost 30 years ago.... ;-)
R is for professional and creative statisticians. People who understand what they are doing and need to create novel analyses because they face new problems or are unsatisfied with standard solutions. SAS is for people who need to press the buttons in order to perform standard analyses, without knowing what they are doing.
While I started many years ago as a SAS user, I have been more comfortable with R recently. Two factors are driving this. First unlike before data size is no longer much of a constraint in R. Second R graphics is far superior and more flexible than SAS. Somebody in SAS has to be reminded how important graphs are for statistical analyses. They sure can do a better job.
I exaggerate. Over the years, SAS has grown to allow other paradigms of statistical analysis from those embodied in its original design. Similarly, R has evolved and moreover could be used as the engine inside a SAS-like system. However I beiieve that the original design philosophies still dominate the character of the present organisms
Moreover, R remains an open system which grows as new needs arise while SAS is commercial, closed.
Similar questions and discussions
Which is the best and easy to use statistical software for data analysis for medical professionals?
- Rimple Jeet
Several statistical softwares are available now a days? Which one is the most user friendly and easy to use for medical professionals?
Has anyone used JASP and Jamovi?
- Chinchu C Mullanvathukkal
Has anyone used both JASP and Jamovi?
I understand that JASP has been developed with a focus on Bayesian statistics and Jamovi is intended as an alternative to SPSS and similar packages.
I am looking for a comparison based on factors such as (non-statistician)user-friendliness, availability of analyses(like posthoc tests) and ease of exporting output.
What are the extra advantages of entering data into Epi-Data instead of directly entering into Statistical Software such as SPSS ?
- Berhanu Nigussie Worku
P/s forward your opinions and ideas in terms of managing the quality of data!
Which statistical software is most useful and easy to handle for biological data interpretation?
- Rajesh Rathore
E.g. SPSS, sigma stat, systat, R, SAS and so on
Can you inform me the best book to get knowledge in mathematical oncology?
- Marika Tsintsadze
I'm physics student, so I haven't a lot information cancer biology, so I want book, where is describe cancer biology and some mathematical tools that used to discribe tumor growth. Also I'm interested how to use clinical date in tumor modeling, for example how can we meassure tumor cells growth rate, radiosensitivity.
How to convert odds ratio to probability?
- Abderrahime Zermane
Hello everyone, I saw a lot of equations on how to convert odds ratio into probability but still do not know which one to apply.
In my case, I used regression to get the odds ratio whether a predictor can impact the injury, but I want the probability of that certain predictor.
Edit: failure rate is 14.76%
Is there a formula on how to convert odds ratio into probability?
Can we maintain the PPFD (Photosynthetic Photon Flux Density) value from LED lamp on screen house condition?
- Ramot Christian
If we want to make the value of PPFD from LED lamp in the stable range (21 hours a day). The screen house still adding by some light source for the outside. Do we have another appropiate technical method for this case?
What type of symptoms develop on Eggplant or brinjal flower ?
- Rupinder Singh
Symptoms look like that of botrytis.. Although plants were affected by Phomopsis leaf blight and brown plant hoppers were observed on few flower petals..Correct diagnosis for the cause of damage will be highly appreciated. .
How to estimate the sample size for phase 1 of a clinical trial?
- Pauline Delhez
How to estimate the minimum number of subjects that should be enrolled for phase 1 of a clinical trial?
The objective of phase 1 is to test for the safety, side effects, dose etc. I know we usually use less subjects compare to phases II and III. But what is the appropriate minimum number of subjects for phase 1? How to estimate this number?
A Drug Information Association Workshop on “Statistical Methodology in Non-Clinical and Toxicological Studies” was held at the end of March 1996. The purpose of this meeting was to discuss and to obtain a consensus on the appropriateness of current and new biostatistical methods relevant in this field of drug development. The following summary repr...
The increase in health research in sub-Saharan Africa (SSA) has led to a high demand for biostatisticians to develop study designs, contribute and apply statistical methods in data analyses. Initiatives exist to address the dearth in statistical capacity and lack of local biostatisticians in SSA health projects. The Sub-Saharan African Consortium f...