Announcing: Slashdot Deals - Explore geek apps, games, gadgets and more. (what is this?)

Thank you!

We are sorry to see you leave - Beta is different and we value the time you took to try it out. Before you decide to go, please take a look at some value-adds for Beta and learn more about it. Thank you for reading Slashdot, and for making the site better!

Ask Slashdot: Switching From SAS To Python Or R For Data Analysis and Modeling?

timothy posted about 7 months ago | from the in-the-parlance-of-our-times dept.

Python 143

An anonymous reader writes "I work for a huge company. We use SAS all the time for everything, which is great if you have a bunch of non-programmer employees and you want them to do data analysis and build models... but it ends up stifling any real innovation, and I worry we will get left behind. Python and R both seem to be emerging stars in the data science game, so I would like to steer us towards one of them. What compelling arguments can you give that would help an old company change its standard if that company is pretty set in its ways?"

Sorry! There are no comments related to the filter you selected.

R... (2)

Rockoon (1252108) | about 7 months ago | (#47376129)

This is what R is for.

Why Python and not C or ERLANG or COBOL? ..

Re:R... (1)

bunratty (545641) | about 7 months ago | (#47376199)

I typically write short programs in Python because the syntax is so concise. It's a powerful language that's fairly easy to use, as opposed to C which is a low-level programming language. This is even more true if you use the NumPy and SciPy packages.

Re:R... (1)

jythie (914043) | about 7 months ago | (#47376545)

NumPy, SciPy, and MatplotLib are a good reason right there.

They are why I tend to recommend python for analysis and modeling, a good set of libraries (and the community that goes with them) plus a relatively low barrier to entry in terms of installing and learning the language.

Python is better overall but R is more like SAS (4, Insightful)

goombah99 (560566) | about 7 months ago | (#47376787)

R has more single function high level commands devoted to stats, these are done right internally and are self consistent with other functions for further processing. But its not as general a programming language as python. if you want something different than the canned functions in R then you will need to write them yourself at which point you might as well be using python. however if you like SAS then chances are R will seem more like what you are hoping for.

Re:Python is better overall but R is more like SAS (0)

Anonymous Coward | about 7 months ago | (#47380259)

R has more single function high level commands devoted to stats, these are done right internally and are self consistent with other functions for further processing. But its not as general a programming language as python. if you want something different than the canned functions in R then you will need to write them yourself at which point you might as well be using python. however if you like SAS then chances are R will seem more like what you are hoping for.

Or you could use python as a kind of wrapper with calls to R, as needed. That way you have the best of both worlds. Problem solved.

Re:R... (1)

ClickOnThis (137803) | about 7 months ago | (#47378135)

[...] as opposed to C which is a low-level programming language.

Assembler is a low-level programming language.

Machine language is a low-level programming language.

C is not a low-level programming language, although you can do low-level programming with it.

http://en.wikipedia.org/wiki/L... [wikipedia.org]

Re:R... (0)

Anonymous Coward | about 7 months ago | (#47378575)

From your link:


Re:R... (1)

Dishevel (1105119) | about 7 months ago | (#47379257)

C-x M-c M-butterfly.

The best way to program for sure.

Re:R... (1)

Jane Q. Public (1010737) | about 7 months ago | (#47379511)

Yes, but...

This is what R was basically designed to do.

On the other hand, I understand from several recent writings that lots of non-statistical experts have been finding that R also makes it Easy To Do It Wrong.

Re:R... (4, Informative)

MightyYar (622222) | about 7 months ago | (#47376463)

R is definitely still ahead for data modeling, but Python has some advantages too. With a bigger set of modules (libraries) to choose from and high popularity in the financial sector, there are big improvements all the time. For the purposes of this discussion, the most important Python modules are:
IPython [ipython.org] : powerful interactive shell
numpy and scipy [scipy.org] : numerical, matrix, and scientific functions (matlab-ish)
pandas [pydata.org] : R-like data structures and data analysis tools (analysis mostly limited to regression)
statsmodels [sourceforge.net] : statistical analysis, complements pandas
sk-learn [scikit-learn.org] : machine learning

So can Python do everything that R can? No. Or, at least, not as easily. But it is improving in that direction quite quickly, and if Python's data analysis capability meets your needs, then you can likely do everything in one language instead of calling R routines from another.

Re:R... (4, Interesting)

radtea (464814) | about 7 months ago | (#47378343)

So can Python do everything that R can?

No, but Rpy can.

I've used R, and it really has a lot of strong points, but I prefer to access it these days via Rpy, which gives me all the power of R along with everything else I get from Python (other libraries, better application development frameworks, etc.)

Both R and Python are real programming languages that are going to be completely useless to non-programmers, so neither of them is a SAS replacement, but of the two, I'd choose Python+Rpy over R for flexibility, power and ease of use (the latter is of course a strongly personal preference... if you really think like a traditional stats geek R will likely seem nicer, as it is clearly created for and by such people.)

Re:R... (1)

Frequency Domain (601421) | about 7 months ago | (#47380083)

Python libraries are simultaneously advantageous and disadvantageous. Yes, they give a lot of leverage to solving a broad variety of problems, but last I checked many of them remained available only in Python 2. The success of the Python library ecosystem has actually interfered with the adoption of Python 3.

Re:R... (1)

superwiz (655733) | about 7 months ago | (#47377387)

Why Python and not C or ERLANG or COBOL? ..

While the question is interesting, it's off topic. You may as well ask the same question about any development task. Clearly the person asking the question already decided that the advantages of Python outweigh the advantages of C,ERLANG and COBOL. He is now asking whether the advantages of R outweigh the advantages of Python. Which is an entirely different topic.

Re:R... (1)

students (763488) | about 7 months ago | (#47379565)

R always gets the analysis job done for me, but when I recommend it I feel a need to include a warning that its data typing is strange.

For example, there are about five types which are like arrays, but which are only sometimes compatible with each other.

Pandas (4, Interesting)

MightyYar (622222) | about 7 months ago | (#47376137)

Python and R are sort-of converging via Pandas [pydata.org] . I'm partial to Python, but Pandas really starts to blur the lines conceptually.

Re:Pandas (2)

joeblog (2655375) | about 7 months ago | (#47379523)

Using R vs Python with Pandas brought home the microlanguage vs libraries debate for me. I'm more experienced and comfortable programing in Python, so generally prefer it. But writing a program to solve the same problem in R or Python, I found the R version would be much faster. On the other hand, the Python version tended to give the correct answer, whereas the R version tended to have weird bugs I couldn't figure out.

As an open source enthusiast, I'd say an unfortunate advantage Python has is its "benevolent dictator" rule for libraries. R (as with Perl, TeX...) has a bewildering number of contributed packages for any given problem, some of which once worked well with old versions, others were never developed properly... so users are left with the frustration of finding something that works.

Python, on the other hand, comes with "batteries included" with a few external libraries like Pandas that are well supported. So unless speed is a big deal, I'd advise Python.

Innovation is more than tools (5, Insightful)

Abroun (795507) | about 7 months ago | (#47376153)

It's unlikely that SAS is the root cause of a lack of innovation, so it's unlikely that introducing a new tools by itself will make a difference. The fact that you work for a 'huge company' is more likely the problem. Does senior management agree that innovation is a priority? Are they willing to make the changes to encourage it (which usually means breaking down fiefdoms, giving up power, and lots of things that senior managers hate doing)? The choice of language is kinda irrelevant absent the right environment.

Re:Innovation is more than tools (1)

Michael Simpson (2902481) | about 7 months ago | (#47378093)

The parent is 100% right. When I worked with SAS, we were able to do things with it that was not anticipated. The sort of thinking required is the same no matter what language you are using. The people using SAS are not typically programmers but people studying relationships and causality. Dropping them down into a lower level language will probably hinder those studies.

Re:Innovation is more than tools (1)

cinnamon colbert (732724) | about 7 months ago | (#47379269)

mod up spot on
it ain't the tools, it is the tool users or their culture

Cost (4, Informative)

TemporalBeing (803363) | about 7 months ago | (#47376169)

The cost of training them to use R will be signifantly cheaper than what you are spending on the SAS licenses, which (last I knew) was a yearly purchase for each user.

And yes, while I have not used R myself, I would certainly recommend it over Python for this use case as it is very dedicated to doing the kinds of things that SAS is good at in a very efficient, friendly manner. I've seen a number of people use it to do some very neat statistical analysis, and their stuff was a lot simpler than the SAS scripts that I use to write years back.

Re:Cost (2)

Sobrique (543255) | about 7 months ago | (#47376297)

Slightly different beasts I think. R is a really impressive analysis tool. Python is a scripting language. The latter is quite a bit more versatile, but ... probably isn't the right tool to solve the problem outlined in the OP.

Re:Cost (2)

OnioOnio (1366661) | about 7 months ago | (#47376309)

This is absolutely true, especially if you work for a huge company. Big companies that are hooked get raked over the coals, and I guarantee that your SAS licenses are costing you millions every year - yes millions (unless you're academic...). The company I work for is stuck with this very dilemma, where the more programming oriented love and use R, whereas those trained in house with little programming background get stuck in the SAS rut (I do think it's easier for beginners to use SAS, but I wholly agree with you that it's likely stifling innovation). R is still a good intermediary, and will be easier to get up an running for many employees than Python. In fact, check out RStudio if you haven't.

Re:Cost (1)

L'Ange Oliver (1521251) | about 7 months ago | (#47376727)

I second that. R is the better choice here.

Belief vs Experience (2)

westlake (615356) | about 7 months ago | (#47377303)

The cost of training them to use R will be signifantly cheaper than what you are spending on the SAS licenses
And yes, while I have not used R myself, I would certainly recommend it over Python for this use case

So not having used R yourself, why do you believe it is the better and cheaper solution?

More than cost (1)

golodh (893453) | about 7 months ago | (#47378291)

I know both SAS and R, and I think that for people who've never programmed, the GUI-based version of SAS wins on end-user usability because end-users can click together (simple and limited) analyses on really big datasets. This has far-reaching consequences for the learning curve.

For R there exist attempts at GUI's (like e.g. R-commander) that offer point-and-click functionality but they're more sketchy.

I think that giving non-programmers access to R will result in a flood of help requests because they really do need some notion of programming to use the R language. With SAS that's more in the background because the GUI tool is relatively well done, and use of the butt-ugly, antiquated and clumsy mainframe-style SAS language can usually be avoided.

In addition I don't know of any (reliable and working) alternative to the SAS Enterprise Guide. which lets you click together elementary data-procesing steps in a network that shows the structure and the results of your work.

I think that statisticians, real analysts and data-scientists will soon feel constrained by SAS and will prefer to use SAS to prepare a dataset for analysis, and then carry out any actual analysis in R.

Last but not least, R is still an in-memory analysis program, which practically limits analyses to what you can be fit in core. There are packages that try to extend R in this direction, but I consider them to be poor quality and cumbersome.

Python on the other hand is aimed squarely at programmers, and nobody else.

Re: More than cost (0)

Anonymous Coward | about 7 months ago | (#47379773)

R has an excellent gui in RSTudio: www.rstudio.com I would recommend it as a much better interface to R.

What's the business case? (5, Insightful)

mwvdlee (775178) | about 7 months ago | (#47376175)

Is it your feeling that SAS is "stifling any real innovation" or do you have examples of projects that are impossible with SAS but possible with Python or R?
Do those example projects actually help the bottom line of the company or are they just "cooler"?

If you can think of examples that have clear financial benefits to the company, you have a solid business case already.
If there are no such examples or other factors negate the benefits, then the company has nothing to gain by switching and should not switch.

Short answer; if you're asking on Slashdot for reasons to switch from product X to product Y, you probably have no real reason to switch.

Re:What's the business case? (1)

retchdog (1319261) | about 7 months ago | (#47376733)

or there are real reasons and he just doesn't know them. he didn't even tell us what his company does, except that it's "huge". if it's finance then there probably are good reasons. if it's healthcare, then not so much, though since R developers are more common (=cheaper) now, the benefit of just ditching those extremely expensive SAS licenses may still be enough.

i agree, though. he should do his own research and then ask slashdot if necessary, which it really shouldn't be. still, i kind of want to see him go to his boss and say "these guys online totally said R was awesome, we should switch."

Re:What's the business case? (1)

dave562 (969951) | about 7 months ago | (#47376987)

This right here.

Nothing happens in a company of any size without a business case.

To amplify upon what has already been said, you need to show the financial benefit to the company. You need to justify the cost to acquire the technology and train people on it. You need to quantify the ROI so that management can weigh the cost of the technology versus all of the other costs that they have to cover every year.

A good thing to research is whether or not any of your competitors are using what you want to use. Has your company lost out on any project opportunities due to not having those technologies? Or did your competitors win business because they did have those technologies? If your company is large enough, there is probably a department that already has this information. Where I work, we get a weekly email that details all of the new deals and projects that the competition is involved with.

Businesses like predictable revenue and established business models. It makes planning and forecasting easier. Nobody likes a whiner or someone who focuses on the negative. You will not gain any traction with statements that have themes like like, "SAS is holding us back. SAS "sucks" in comparison to other technologies." Unless you can show the positives of a new technology, and those positives come up with a massive financial upside that justifies the CapEx to acquire them... you will never make any progress towards acquiring them.

Re:What's the business case? (1)

butalearner (1235200) | about 7 months ago | (#47378267)

Short answer; if you're asking on Slashdot for reasons to switch from product X to product Y, you probably have no real reason to switch.

The long answer was pretty good, but I disagree with the short one. Asking a (presumably) knowledgeable group of people questions like this is a good way to get a more complete picture of the problem space, and asking people from other companies might just score him a few stories about what worked for them and what didn't work.

Here's an anecdote from me: back when I was a fresh-faced, naive junior engineer I wanted to sell management on an open source alternative to an expensive commercial package by targeting some low-hanging fruit and arguing that we should use both. I surveyed my colleagues and found a number of small items here and there that could be automatically ported to the open source version, and demonstrated it to my manager. It wasn't good enough, because there were no hard numbers on what the company might save by doing this.

In other words, as you say, he needs clear financial benefits. Your Mileage May Vary, but these days I would not be surprised if, to his management, the financial justification is far more important than the technical justification.

Don't try (0)

Anonymous Coward | about 7 months ago | (#47376177)

>"I work for a huge company. [...] What compelling arguments can you give that would help an old company change its standard if that company is pretty set in its ways?"

To change an huge old company ? Don't try for your own sanity. Even with an official mandate.

Unlikely to change (1)

Anonymous Coward | about 7 months ago | (#47376207)

You're using SAS because it's a closed-source, paid-support software package. R and Python aren't replacements for that.

Having said that: if you're building statistical models, and you want to provide some interfacing for your non-programming employees, you don't want Python, you want R, in particular, RStudio. You can build routines and packages in R and interface them through RStudio that will still allow your employees to (mostly) ignore programming. However, there's still a huge jump in competence that's required to go from click-and-chug SAS usage to R usage, and I doubt it's going to happen for you.

Also, a caveat: if you're in some form of health care or drug development, you're not going to change. Period. The FDA and other regulatory agencies _require_ SAS analysis, so you can't just substitute with your own choice.

Unlikely to change (1)

Anonymous Coward | about 7 months ago | (#47376319)

While healthcare companies have a history of using SAS. The FDA does not *require* SAS

"The FDA does not endorse or require any particular software to be used for clinical trial submissions, and there are no regulations that restrict the use of open source software (including R) at the FDA." -- http://blog.revolutionanalytics.com/2012/06/fda-r-ok.html

Official FDA policy and requirements is outlined here: http://www.fda.gov/iceci/enforcementactions/bioresearchmonitoring/ucm135196.htm

Re:Unlikely to change (1)

Anonymous Coward | about 7 months ago | (#47377835)

He's right. FDA itself uses R, and FDA employees have even written R packages. Here's a poster (pdf) about this.

The right competitor to SAS is Statistica (1)

mbkennel (97636) | about 7 months ago | (#47377539)

R isn't a replacement for SAS---in practical use it requires much more command line programming ability and although it has an enormous number of packages, many of them are 'academic quality' (meaning good enough to make papers) and fewer are highly validated production quality with all the edge cases & stability tested.

Some SAS capabilities can run 'out of core' (unlike R) so you can process data sets which would not fit into RAM.

Statistica (StatSoft) is the closest direct competitor (Windows only unfortunately) to SAS, and from all the reviews I've read, it's significantly better.

If your institution already has a SAS base, then it will stay that way. However, there are probably many "data management" and "data processing" tasks whose nature is somewhere between computation and file/database management---but often they get implemented suboptimally in whatever package the authors found at hand. So you may be doing lots of things in SAS that you shouldn't be---and the best replacement here is python, not R. The business case to your management could be improving workflow, clarity and lowering the number of SAS licenses needed.

Keep the SAS core tasks for which SAS is good as it is, and evaluate Statistica for these as a competitor, if only to get a break on licenses from SAS if your company does a bake-off competition & bid.

Honestly (0)

Anonymous Coward | about 7 months ago | (#47376219)

Never used anything more than Perl. Yes, yes, I know... Perl is no longer the darling of the "must have a new langusage every year" crowd, but Perl rocks for the same reasons R rocks for data manipulation. Yes, yes, I know I'm considered a UNIX neckbeard for still using Perl. If it's not broke, don't fix it. Perl is a little slower than Python and R for some things, but I see no reason to change when I can do what I need to do in Perl. Let's not even talk about the goodness that is CPAN. This is where Python is lacking. I've been using Perl in various ways since 1998. I'm sticking with it. If people think that Perl sucks for statistical analysis, let them think that...

Re:Honestly (0)

Anonymous Coward | about 7 months ago | (#47376515)

While CPAN is bigger with about 130k modules, PyPI is certainly has most of the bases covered with its 45k packages.

Nothing wrong with sticking with what you know, which is Perl 5....you may change your tune if Perl 6 is ever released and replaces Perl 5 the way 5 replaced 4.

Re:Honestly (0)

Anonymous Coward | about 7 months ago | (#47378439)

Python is effectively Perl6. :3

There are no complelling arguments... (4, Insightful)

Rob Riggs (6418) | about 7 months ago | (#47376277)

Emerging? They were emerging a decade ago. They have emerged. Look, if the company is, as you say, "set in its ways", that is a cultural problem. Unless you are an executive that gets to set goals and compensation, you have very little influence over it. If that is not you, either stay and live with what you have, or leave for greener pastures. The basic question you have to ask yourself is "how will staying here using these outdated tools affect my lifetime earnings potential?" Put another way: "are they paying me enough to put up with this shit?" That is my prime criteria for deciding whether to stay at any job. Your job is to make recommendations. I assume you have already done that and been shot down. Decision time: should I stay or should I go.

Re:There are no complelling arguments... (1)

jtnix (173853) | about 7 months ago | (#47376863)


Re:There are no complelling arguments... (1)

dave562 (969951) | about 7 months ago | (#47376993)

Wisdom right here.

Re:There are no complelling arguments... (0)

Anonymous Coward | about 7 months ago | (#47377707)

Or, the person asking could expand the capabilities of the company by introducing the free tools case by case basis. The choice is between being taking the position of a nagger or a leader, regardless of the higher management. It that effort turns back to bite, the next employer will appreciate the story and the constructive experience.

"real" innovation? Re:no complelling arguments... (1)

Fubari (196373) | about 7 months ago | (#47377887)

Wisdom indeed. Also... can you elaborate one what real innovation is? Seriously. Do you have an example or two of what opportunities are being crushed by the existing culture of "fake innovation"?

Because if not... if this is really large company, you may be perceived as a "precious little snowflake that also complains a lot."

And if this is a really large company, they're going to be able to coast along on the status quo for LONGGG time and I don't know why anybody would listen or care about a whiny snowflake.

So... can you (OP) elaborate the "real" part of "real innovation" ?
What amazing market opportunities are the current group missing?
What kinds of obvious fraud detection is slipping through their collective fingers?
What tremendous potential for increasing shareholder value is being left on the table?

Look, I'm not saying this incredibly large organization of yours could never benefit from some innovation, but realize this: you're going to be pushing against the inertia of entrenched culture and "we've always done it this way", which makes those "Stop Plate Tectonics" bumper stickers seem like an easy task by comparison.

I'll close with an alternative possibility: "I don't much like SAS. Nor do I like the people who have built a functioning SAS-ecosystem that handles this company's Multi-Billion euro statistical needs 'well enough'... it just doesn't encourage real innovation ('real' being code for 'fun'), instead I have to spend my time reading through thousands of pages of documentation and requirements specs and I have to troubleshoot this existing install base... Boy, I wish we had an opportunity to do real innovation here..."

Apples, meet Oranges... (4, Insightful)

Shoten (260439) | about 7 months ago | (#47376323)

SAS is not a language; it's a full multi-tiered solution for the aggregation, normalization, and analysis of data. There's a language as well, but that's just one part of the whole solution. Python and R, while absolutely fantastic languages, are not a full solution.

So, first step...if you're going to offer an alternative, actually have an alternative. I don't know your SAS buildout nor do I know the data sources it consumes, so I can't really point to what else you need to add or how you need to construct it to produce a more flexible replacement to your existing and current SAS infrastructure.

Second step...a roadmap for migration. It's one thing to sign a lease for a new apartment or to buy a new house, and another to shift your life from the old place to the new. If you don't have a plan, at least in broad strokes, then you're going to be doomed when you look for executive sponsorship. You need to make sure that you get all the stakeholders' input as well, lest you leave something out in your roadmap...and then end up with someone who sees you as a problem. That person will most likely be in a position to scuttle the whole thing, as well.

Third step...figure out how to define the benefits in terms of the stakeholders' needs. You're going to replace a system they use; why should they want you to do so? And you have to define it from their perspective, with regard to things they care about. Beware of getting geeky on this...it's very likely that at least one of the people whose support you will need will not be a geek and will be concerned with the output more than the technical means used to produce it. Don't hard-sell, either...pushing too hard will get the door slammed in your face, and even potentially polarize people against you. (See above, under "in a position to scuttle the whole thing.")

There will be steps after that, but those will be largely determined by how the first three steps go. It may involve bringing in outside vendors, doing requirements analysis...a lot of it depends on details of your company as well and how they normally do things. But above all else, remember this: don't buck the system too hard, and don't knock the company you work for. Trying to get a lot of people to support and cooperate with you while telling them that their way of doing things sucks is suicide.

Re:Apples, meet Oranges... (0)

Anonymous Coward | about 7 months ago | (#47377877)

You might take a look at something like https://senseplatform.com/. This gets you a bit more of the wrapper. As the person above notes, R is just the first bit of the replacement, you need something more around it.

Re:Apples, meet Oranges... (0)

Anonymous Coward | about 7 months ago | (#47378445)

SAS is not a language; it's a full multi-tiered solution for the aggregation, normalization, and analysis of data.

Mod parent up. SAS is well known for their statistics software, but has a WHOLE lot more to offer than just statistics software. Kinda like saying Oracle only offers its database software. No, that's what they are best known for, but not the only thing they offer.

If your vendor (SAS in this case) isn't meeting your needs, call your vendor sales representative. They WILL fight to keep you as a loyal customer, INCLUDING mixing and matching other software with their own.

Re:Apples, meet Oranges... (1)

Linnsey Miller (2993021) | about 7 months ago | (#47379999)

Yes, but, you can run R and Python over data in pretty much any backend (Teradata, Hadoop, etc.). That's usually the second conversation - how you want to accomplish your goals.

Re:Apples, meet Oranges... (1)

Shoten (260439) | about 7 months ago | (#47380155)

Yes, but, you can run R and Python over data in pretty much any backend (Teradata, Hadoop, etc.). That's usually the second conversation - how you want to accomplish your goals.

That you think that just "Teradata" or "Hadoop" is the other thing needed in addition to Python or R to replace an SAS implementation tells volumes about how much you don't know about SAS and what it really does to satisfy customer requirements. You don't replace SAS with nothing more than a bare database and a Python interpreter.

And you can't just say "I want you to throw out your existing infrastructure just so that I can use X programming language...you figure out how to make it happen" to the company you work for. This is an RGE..."Resume Generating Event."

I made the switch (4, Informative)

TyFoN (12980) | about 7 months ago | (#47376361)

Personally at least.
I used to work in one of the largest banks in the world, and everything we did was SAS/MSSQL.
I had some personal stuff in R, but most of the other analysts didn't seem too interested except using what I made for them except for one phd in the German department. I never pushed it though since there was so much legacy code, including code I had written my self.

Now I have switched to a start-up bank, and I am the only analyst.
I've used R/RStudio/Shiny with PostgreSQL in the back very successfully, with all code in git. Now I can bring good analysis forth much faster than I used to in SAS that can be viewed on any device with the option of downloading the source data in excel and csv.

The management loves this.

If you show them a few good ones they will want more, but I wouldn't start to rewrite all the legacy code. SAS isn't bad when you have it set up properly.

But another good thing about R is that you get access to innovation in the statistics fields faster, and you don't have to pay huge sums of money for extra features.

RStudio and Shiny is a bit expensive for the pro versions, but nothing compared to SAS, and the open source versions are free.

Re:I made the switch (2)

nullchar (446050) | about 7 months ago | (#47376509)

If you show them a few good ones they will want more, but I wouldn't start to rewrite all the legacy code.

This. Submitter should build a few small projects that give a different end result than the current code base. If you're just swapping R for SAS but delivering the exact same output, no management will care. The sample projects either needs to report the data in different ways, or visualize the data [d3js.org] , or even as this parent suggested, simply provide a copy of the output as a spreadsheet.

Innovation will come by thinking about the problem differently and exploring different ways to ask questions to gain insight into your business. If you're just crunching the same numbers, don't bother. For the submitter personally, it's great to learn R and Python, but don't expect an organization shift unless it provides something unique.

Python FTW (1)

xfizik (3491039) | about 7 months ago | (#47376363)

Don't bother with R, use Python from the start. On the off chance that you may need some R functionality that can't be replicated or doesn't already exist in a Python library, you can always call R functions from Python through an interface/wrapper. As far as programming languages go, Python and R play in different leagues.

Re:Python FTW (2)

DaBombDotCom (1587833) | about 7 months ago | (#47376499)

Sorry but calling R from Python just doesn't cut it. Some of the best tools in R rely on complex data structures that are not compatible with Rpy. Plus Rpy support on windows is abysmal. You are better off using Python for all non-stats scripting, get your data set up, then analyze and plot with R.

hey (0)

Anonymous Coward | about 7 months ago | (#47376415)

Don't SAS me boy!

You have to *demonstrate* that SAS is better (2)

Nutria (679911) | about 7 months ago | (#47376421)

Go do something in R or Python that is useful to the company but impossible or very difficult in SAS.

Then show it to the hard-core SAS users. If they're interested, demonstrate it to your boss along with how it can save the company (and especially your cost center) money.

Re:You have to *demonstrate* that SAS is better (0)

retchdog (1319261) | about 7 months ago | (#47376833)

they won't be interested. SAS users are dolts; it's one of those languages where you can do a handful of (admittedly useful) things very easily, but is a tarpit for any kind of general procedural development.

just develop something cool, show it to the SAS people (who won't understand) as a pro forma exercise, then go to the boss.

Re:You have to *demonstrate* that SAS is better (0)

Anonymous Coward | about 7 months ago | (#47379197)

This works better than you would think. I would suggest trying this with simulation work, since I've seen R code run at about 100 times the speed of SAS code in this regard.

SAS and data science? (1)

majid_aldo (812530) | about 7 months ago | (#47376423)

if your company is using SAS, then i don't think what you're doing is data science. analysis is not data science.

Re:SAS and data science? (1)

retchdog (1319261) | about 7 months ago | (#47376809)

yes it is. to borrow your sig, data science is just "exxxtreme data analysis."

You don't know what you're talking about (0)

Anonymous Coward | about 7 months ago | (#47376437)

Just from reading the summary, I would say you don't know what the hell you're talking about.

Seriously, replacing a SAS with Python or R is not a solution at all. It seems to me like you're completely unaware of the infrastructure and connectivity involved with SAS.

You need to do a lot more research on data analysis, data mining, analytics, and integration before even talking about a solution. Because you don't know what the tools you want to replace actually do; you're just looking at one aspect of the infrastructure.

Re:You don't know what you're talking about (1)

retchdog (1319261) | about 7 months ago | (#47376795)

Yes, the submitter is an idiot incapable of his own research, but this sounds like SAS astroturf FUD (yes, there is such a thing).

Re:You don't know what you're talking about (0)

Anonymous Coward | about 7 months ago | (#47377089)

No, its not actually.

There replacements for SAS that IMO are actually better and FOSS. But I'm not going to do the idiots research for him; because they should be obvious to anyone with a slight clue as to what's going on in the industry in the past 7 years or so.

Re:You don't know what you're talking about (1)

retchdog (1319261) | about 7 months ago | (#47377287)

"infrastructure and connectivity involved with SAS." "You need to do a lot more research on data analysis, data mining, analytics, and integration before even talking about a solution."

nope, that's a string of buzzwords written by marketing. at least we agree that the submitter is an idiot, i'm just adding the AC to the list as well.

R is better for non-programmers (3, Insightful)

DaBombDotCom (1587833) | about 7 months ago | (#47376445)

In my experience, R is better for non-programmers precisely because it doesn't often behave like a typical programming language. It is *designed* for statistical analysis and so for someone just starting out it can be very intuitive.

Re:R is better for non-programmers (1)

ahoffer0 (1372847) | about 7 months ago | (#47378013)

I agree that R is better for non-programmers. R is a tool you can use to answer all kinds of questions. It is popular economists, psychologists, mathematicians and people who need a computer to get their work done.

I'm more of a computer person. R drives me nuts. To me, R feels like a hodge-podge of features that aggregated together over decades. Python is different. It has a Benevolent Dictator For Life and it feels cohesive. If Python is the Parthenon, then R is the Grand Bazaar. Your individual mileage may vary.

Audience? (0)

Anonymous Coward | about 7 months ago | (#47376475)

The phrase "non-programmer employees" makes me think it doesn't matter for your company.

If the users aren't statistically trained they'll probably never need the capabilities that R and SAS offer. Something like excel or JMP would be fine if they're occasionally running regressions and making pretty pictures for presentations.

It's what new graduates are learning (0)

Anonymous Coward | about 7 months ago | (#47376547)

I work at a University with a Statistics program good enough to have its own separate department. We primarily teach our students to work in R, so why not take advantage of all the free training.

Try and prove it (1)

setrops (101212) | about 7 months ago | (#47376607)

SAS on Z/OS ?
Good luck with your python code.
I think you should learn SAS.

Research and Recruitment (3, Interesting)

Alan Shutko (5101) | about 7 months ago | (#47376619)

I work for a large Fortune 25 company. We have an existing SAS presence and we do some good work in SAS. There are two main reasons that we are bringing R into our environment: research and recruitment/retention.

R is extremely common across research right now. When a new paper comes out describing a new algorithm or modeling technique, the odds are extremely good that it comes with R source code. With R in-house, there is very little time or effort to try these things out to see if they can help our current work. With SAS, we would need to invest time recoding everything or worse, wait until it is baked into SAS itself. That is a huge barrier to adopting new approaches.

Recruitment and retention are related to R's popularity in research. Let's face it, data scientists are a hot commodity right now. Lots of companies are looking to hire them and there aren't enough good people to go around. We're seeing that a lot of the new talent have been using R in their graduate work rather than SAS, and are interested in an environment where they can continue using R. Additionally, it's harder to retain people once you've hired them if they can't use what's become a lingua franca.

SAS remains a great tool, and we're not going to get rid of it. Rather, we want to add R to the toolbox.

(I don't mention python here... We've got some folks working with Python especially for NLP, but for the work we do there's a lot more folks using R across industry and academia.)

As someone who moved from SAS 1 year ago... (3, Interesting)

Anonymous Coward | about 7 months ago | (#47376677)

I work in IT at a large company (>30k employees) who recently dropped SAS. Before we did, we tried out R but what we found out was that except for IT and some tech savvy engineers, nobody seemed to get anything done without help, even after training.
We had decided to drop SAS due to the ludicrous license costs (at one point we were paying more on renewals than we did when we purchased it! WTF?) and due to some issues with their installation/upgrade process that they were not able to resolve within a reasonable timeframe. We ended up switching to StatSoft's STATISTICA, which has a much lower price point (~30% of what we paid for SAS), predictable renewal fees (20% of purchase price), vast feature set (in the Data Miner package we have), excellent Office integration and import/export compatibility with SAS data files. Oh, and it also features R integration so you can still use R from within it if you want. Users became proficient very quickly, after receiving some training.
I recommend you consider their solutions... Open source is not always best, especially when it comes to borderline tech-illiterate business users.

R is free and well supported (0)

Anonymous Coward | about 7 months ago | (#47376747)

Both in terms of the language and packages. MOOCs are freely available from top notch universities (Johns Hopkins) to provide the training and development of personnel for R utilization.

R - Consider Which R (1)

Kagato (116051) | about 7 months ago | (#47376751)

I would recommend R. It's the language college grads are getting trained in. The reason for that is simple. There's no licensing costs for a simple R dev environment. However, I wouldn't use the free stuff for anything that ad hoc. If you have a production big data job I would look at something like Vertica (purchases by HP a couple years ago.) Extremely fast big data DB engine. Not only will it run R, but it has the ability to break the R up into smaller chunks at execution time and distribute the execution across the DB cluster.

Stuff like that just isn't possible in SAS yet. SAS is built upon some very old skool constructs that make it very brittle and very difficult to meet the performance expectations of todays big data world. SAS may end up there, they are privately held and have a very large R&D budget, but I think they would have to do a total rewrite for it to compete. Not that SAS is going away, there's just so much of it in the business world. Be that as it may, in 15-20 years SAS could be a Foxpro of it's age.

One vote for Python (4, Informative)

werepants (1912634) | about 7 months ago | (#47376779)

Granted, I don't have much experience with R, but Python has some notable benefits - it is very well established and you can find tools to do just about anything. It is fast and easy to develop, and very easy to learn thanks to the readability and plentiful resources online. I imagine you'll have an easy time finding people with python experience, as well.

I haven't used it for any "big data" tasks, but for a number of small, interactive data analysis utilities it has been really enjoyable to work with. One standout tool for me has been pyqtgraph, which is lightning fast and creates some really impressive interactive visualizations. It's also got some pretty incredible features out of the box - arbitrary user-definable ROIs, instantly change any plot to a log-log, or even do a Fast Fourier transform with just a right click. If I sound like a fanboi, I kind of am - after trying to deal with the agony of 3D data manipulation in matplotlib (python's matlab package), it's a whole different world.

Both or either (0)

Anonymous Coward | about 7 months ago | (#47376885)

You could be investing in building your own talent and skills for the price of what you are paying in licensing costs. Also if you want to attract top talent and not fall behind in data science it helps to be using the best tools of the trade. From your questions it looks like R is the best fit for you. But if you have good programmers you could also be successful with Python.

My company recently released a blog post on the "battle" between Python and R (shameless plug).

JASP! (1)

lisabeeren (657508) | about 7 months ago | (#47376893)

i'd take a look at JASP.

http://jasp-stats.org/ [jasp-stats.org]

- it has an attractive UI like SAS and SPSS, and you're not stuck writing code for an analysis that should be quite straight forward.
- the analyses are themselves implemented in R, and python is to be supported.
- an API is in the works for implementing arbitrary analyses

Use both (1)

digitalhermit (113459) | about 7 months ago | (#47376939)

Python seems to be gaining favor but IMHO the downside is that it's a general purpose language and not built with statistics in mind.

R is quite easy to use both from installation to language standpoint. It's trivial to install and there are many, many packages (of differing quality) on cran. You can easily take advantage of multiple processors, GPUs, even Hadoop (to an extent). The main downside is that it's mostly constrained by the memory of the host system. So even though it's easy to load a 20G dataset into my 32G laptop, it's not quite so easy to work on a 2TB dataset without some customization. At that point other tools may be easier, such as Python.

Now... I have never needed to crunch a 2TB dataset. My scripts fit comfortably into an 8G VM. What R gets me is that, as a non-statistician, I can easily generate charts, run analyses, and use the libraries that smarter people have built :>. The syntax is trivial and I can do 99% of what I need with a library or some minor customization.

iPython (0)

Anonymous Coward | about 7 months ago | (#47376951)

For programmers, I'd say that the combination of iPython+numpy+scipy is pretty hard to best. I've been working elusively with this stack for about a year, and I especially like iPython notebooks' ability to mix code, documentation, and graphics all in a single document; it makes replicable, self-contained analysis pretty straight-forward. The Anaconda distribution contains everything you need in a single, easy-to-install tarball (along with a whole lot of other scientific packages preinstalled.

But you really do need some good programming chops.

Until I switched I was using R+RStudio. R has a lot of packages for just about any kind of analysis you could imagine, and you could do worse than with it. R is probably a little easier for non-programmers, and a little harder for programmers (for example, you can use functions as first-class objects (kind of), but they are inserted textually at interpretation time, so that modifications to the functions are not replicated to the data structures that reference them. Also: 1-indexing).

Julia (language)? (1)

by (1706743) (1706744) | about 7 months ago | (#47376973)

I haven't really used it, but I've heard good things about Julia (especially if you're familiar with Matlab). Plus, it seems to play nice with Python. http://en.wikipedia.org/wiki/J... [wikipedia.org]
http://julialang.org/ [julialang.org]

Re: Julia (language)? (1)

ahoffer0 (1372847) | about 7 months ago | (#47377255)

I'd recommend Julia for traditional scientific computing- things based on continuous math like systems of equations. Julia's sweet spot is similar to MATLAB.

While the R has a lot of similarities to MATLAB, but it "feels" like it is aimed at the stats & machine learning user.

Re: Julia (language)? (0)

Anonymous Coward | about 7 months ago | (#47377997)

I haven't really used it, but I've heard good things about Julia (especially if you're familiar with Matlab). Plus, it seems to play nice with Python.
http://en.wikipedia.org/wiki/J... [wikipedia.org]
http://julialang.org/ [julialang.org]

I played around with Julia some number of months ago -- very promising, but Not There Yet (I had to open up many bug reports and submit some patches just to get DB access to work, and origin/master (where I was cloning from, because the releases were not too good) tended to break every time someone committed.

Really looking forward to it one day, though....

Re:Julia (language)? (1)

majid_aldo (812530) | about 7 months ago | (#47377349)

julia is nowhere near, in intent, as well as development, the statistical tasks needed by the poster.

Mathematica (0)

Anonymous Coward | about 7 months ago | (#47377065)

You should also consider Mathematica. It has made great strides in data analysis in recent years and is way better for mathematical modeling than Python or R.

SAS support (0)

Anonymous Coward | about 7 months ago | (#47377159)

Greatest thing about SAS, is the support they provide. Surprisingly this is a well kept secret, particularly in huge companies. SAS support folks actually know statistics very well (and I believe support is free), and may even code a solution if one is willing to share data.

At risk of slipping into psychoanalysis, it appears that your question/(problem?) is related to working in a huge company. Most people who work in huge companies feel small and insignificant. If this affects you, look around - in huge companies, it is likely that there are already some groups that use R, if you that tickles your fancy.

Though R is great for cutting edge analytics, most departments in most huge companies hardly ever do any of that. Personally have seen huge company folks that do not know what median means. Even more depressing is when "paid professionals" in huge companies do not know how to calculate averages. In such circumstances, it hardly matters whether a person uses SAS, R, or Python. In fact SAS may have a slight advantage in such circumstances.

R for Speed of Implementation, Python for Scale (2)

manlygeek (958223) | about 7 months ago | (#47377315)

There is a classical problem here. R is great for getting trained and productive VERY quickly. It has 4,600 packages that will do almost anything you need to and it does some very sophisticated statistical methods right out of the box. What can't be done out of the box (or from the core download since it's not really a boxed product) has likely been coded in a package -- even very complex biostatistical and bioinformatics methods. Also R has a lot of graphical data visualization functionality built in and extended by some awesome packages like ggplot2. Additionally, R does a great job with documentation as it can inject data, visualizations and code into markdown documents, which makes publication a whole lot easier. R's functional/imperative/quasi-object oriented approaches have their quirks (but then what language doesn't?). One thing to note however is that R is not in itself multithreaded and it requires that all the data it is working on reside in memory. For very large, very complex data sets that could be a bit of problem. So where R is great from a quick ramp up perspective, Python will probably scale better to huge datasets in the tera- and peta- byte range. It has come along way especially with scipy, numpy and other packages listed above. So if you anticipate having to scale in this way, then Python maybe a better long term toolset. I like them both and use them both. I choose which one I am going to use for a project (and stick with the toolset for the whole project) based on dataset size, statistical/visualization complexity and documentation requirements. R tends to win out a bit more often for me.

Re:R for Speed of Implementation, Python for Scale (0)

Anonymous Coward | about 7 months ago | (#47377751)

As I understand Python is not multithreaded.

Re:R for Speed of Implementation, Python for Scale (2)

MightyYar (622222) | about 7 months ago | (#47377991)

It isn't, but many of the modules are written in C or other thread-capable languages. For instance, if you are using sk-learn to analyze a dataset with a machine-learning algorithm, your Python code will run on a single processor but the calls to sk-learn to do your heavy lifting will distribute across cores.

Why do they need to learn your job? (0)

Anonymous Coward | about 7 months ago | (#47377431)

The people who do analysis on institutional data are experts in their field. YOU are an expert in the field of programming so it is YOUR job to provide them with an extract or presentation of the data. I challenge you to call up the VP of Finance and ask him/her to learn python.

Use both (1)

ceoyoyo (59147) | about 7 months ago | (#47377479)

I find R's syntax really annoying for actually doing anything. So I do all the data acquisition, manipulation, etc. in Python and use the RPy2 bridge to just run the actual analysis in R. Best of both worlds.

Python is the better programming language (1)

RobertJ1729 (2640799) | about 7 months ago | (#47377561)

The arguments in favor of R boil down to this: R is more widely used by statisticians and has a much larger library of statistical packages. But R is not a very good programming language [r4stats.com] , is difficult to learn, and is not well suited to integrate with or be used for more general purpose programming tasks.

Python, on the other hand, has a vast library of packages but does not yet have nearly as many packages specialized for the statistical computing domain. The arguments in favor of Python are, in essence, that it's very easy to learn and easy to use and easy to integrate with other general purpose programming tasks. Python is also gaining a lot of momentum in the scientific computing community. For many statistical analysis applications (most?), the packages that do exist for Python are more than adequate. Some folks even suggest that R's lead over Python is evaporating fast [readwrite.com] .

Python because (1)

tyggna (1405643) | about 7 months ago | (#47377575)

When you get into the statistics, numpy, and scipy, it's all just python bindings for native fortran/C code--so it tends to be about the best there is in terms of execution time.

Use as appropriate and as preferred (0)

Anonymous Coward | about 7 months ago | (#47377593)

I work for a *large* US federal government agency - NOT DoD, thank you - and our division does medical research on heavy-duty systems. Many of the researchers are using python or R - I believe it's a matter of what they're more familiar with, or find easier to use, based on their own predilictions.

Did I mention that we run Linux on all but a couple servers....?

At any rate, their results are certainly good enough for publication, with our organization's name behind it.

              mark, who does not speak for his Division, Center, organization, or agency, nor for his employer
                                          (a federal contractor), much less for the view out his non-existant window

Every time I See One of These Articles on Slashdot (0)

Anonymous Coward | about 7 months ago | (#47377691)

I realize the readership of this blog has descended into the abyss. Answering questions of this nature is the job of the person asking the question. If I was an employer and found an employee posing workplace related questions on the blog, that employee would be an ex-employee PDQ.

Simple arithmetic shows us (1)

Boawk (525582) | about 7 months ago | (#47377963)

that R is 15 better than C.
I was able to figure that out with this bit of C code:
printf("%d", 'R' - 'C');
I'm not sure how to do that in R though.

Suck it up and Program in SAS (3, Insightful)

ichabod801 (3423899) | about 7 months ago | (#47377985)

I used to be you, almost exactly. Almost everything we do at work is in SAS, and I was pushing hard for R and Python and getting nowhere. I hated SAS because it was so clunky and out of date. So many SAS programs are bad because they're being done by statisticians with no programming background. Then I went to NESUG a few years ago and saw presentations by the likes of Whitlock, Dorfman, and others, and realized serious programming *was* being done in SAS. I resolved to just become the best SAS programmer I could. The first thing you need to do is stop programming Python in SAS. SAS is like Lisp in that it is a different paradigm, and not programming in that paradigm only makes things harder. Learn that paradigm. Learn the data step inside and out. Every time you have a %do loop, ask yourself if you can do it in a data step. Every time you wish you had OOP, ask yourself if you could represent the objects in a data set. Or learn the new ds2 data step that has OOP. Learn proc sql and know when it's better to use than a data step. That's what I did, and it took my SAS programming to a whole new level, and allowed me to innovate legacy code and transform the applications we were using. Because back when I was you, SAS wasn't the obstacle to innovation, I was.

Supporting all 3 options (1)

FlipperPA (456193) | about 7 months ago | (#47378113)

I work for a large University in a division that provides financial data to ourselves, as well as other academic institutions. We had been a SAS only shop since our inception in the early '90s, save for a few FORTRAN users here and there.

We wanted to support more options for the researchers using our service, and today, we support SAS, R, and Python. One nice thing about SAS is SAS/SHARE. Basically, it makes your native SAS files (*.sas7bdat) available as tables in a database over ODBC or JDBC, with full index support. This has allowed us to consume these same data in both R and Python. This had made many of our younger researchers (think Masters students instead of tenured faculty) very happy!

Good luck.

R from within Python (0)

Anonymous Coward | about 7 months ago | (#47378173)

RPy2 [http://rpy.sourceforge.net/rpy2/doc-2.4/html/overview.html], project is focused on providing simple and robust access to R from within Python. Having an interface between both languages to benefit from the libraries of one language while working in the other.

R is not an emerging star (1)

LetterRip (30937) | about 7 months ago | (#47378389)

R has been around for a long time and has long been a standard.

Pythons sklearn is indeed an 'emerging star'.

Personally I use both.

Also have a look at some of the many stand alone tools vowpal wabbit (blazingly fast for regression learning, scales to ridiculous amounts of data) is superb, as is sofia-ml (for clustering, again scales quite well)

I tie them all together in python, since there are python bindings for R, and you can use pythons 'Subprocess' module to pipe commands and data for commandline tools that don't have python bindings.

There are other useful tools as well - I use Weka for some of my initial visualization and when I'm feeling lazy and want a quick result.

Do it! (1)

jhaiduce (1033992) | about 7 months ago | (#47379101)

I used to work for an organization that used SAS. As I recall, its only selling point was backwards compatibility with 1970's IBM mainframes. I rewrote all the SAS scripts I was given in a mix of other languages (R wasn't an option at the time, and I hadn't discovered Python yet). I don't know much about R, but I highly recommend python for its versatility.

why is cool desirable (1)

cinnamon colbert (732724) | about 7 months ago | (#47379281)

I mean, if SAS works, why waste time on hot cool stuff that may be obsolete in a year or two ?
this whole innovation for the sake of innovation thing is so last century
(see a post on crooked timber about a week or so ago, also P Krugman in his blog flagged a New Yorker article on the cult of innovation)

R for current productivity (1)

John Daschbach (3731065) | about 7 months ago | (#47379597)

I use R every day but have used Numpy and Scipy and related tools in the past and still on occasion. The package and documentation system in R is excellent. Good packages come with a vignette with examples that lets you quickly get up to speed with a new package and all packages are documented to be accepted at CRAN. A quite impressive variety of statistical and modeling packages are available. There are multiple graphics packages (although I find the standard one the most useful). The R library system I find excellent, especially when incorporating Fortran or C code you have written or obtained. For an old Lisp programmer, R makes perfect sense. The basic most flexible data structure in R is the list with useful feature that you can access elements by index (list) or key (hash) [a common idiom in Lisp]. One the downside standard R is all in memory so big data is not built in. A commercial company, Revolution Analytics I think is the name, has a big data version of R, and there are big data packages for some domains. Python support for big data is more extensive. The database interfaces are workable, and the netcdf4 interface quite usable, but if you need a lot of fast, flexible, external data access I'm certain Phthon is a much better choice. The object model(s) S3 and S4 are different from Python and are more familiar to users of CLOS but if your going to develop a large in-house package Python is a far better language. As always the answer depends on your needs and expertise. Personally I find R and Fortran and C and C++ and Perl make an excellent environment for data analysis and modeling but that is very dependent on my background and needs and lets me explore statistical ideas very efficiently. If I knew that the domain I was working in was reasonably narrow and I wanted to develop larger scale packages instead of varied data analysis I would choose Python. As an example RStan and PyStan seem to stay pretty synchronized but Python has Theano. If your interests are more oriented to statistical learning algorithms Python is the better choice, but if you want to use a wider array of Bayesian statistical analysis easily on data R may be the better choice.

Very different user requirements (1)

Linnsey Miller (2993021) | about 7 months ago | (#47379975)

SAS is used because it is much more approachable, whereas both R and Python require a different skill set to use. For R you require both programming and statistical backgrounds, whereas Python has some great packages but requires even more programming chops. If you're willing to hire the new skill set, I would recommend R. It is industry standard, easily extendable (just download a new package), and powerful. If you want to enable your business group to access data directly, look into setting up Tableau. Don't forget that you can get all of these to work on Hadoop if you require a big data solution. Source: I am a big data architect at a leading consulting firm.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?