Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Comparing R, Octave, and Python for Data Analysis

Soulskill posted more than 2 years ago | from the data-analysis-just-wants-to-be-free dept.

Open Source 61

Here is a breakdown of R, Octave and Python, and how analysts can rely on open-source software and online learning resources to bring data-mining capabilities into their companies. The article breaks down which of the three is easiest to use, which do well with visualizations, which handle big data the best, etc. The lack of a budget shouldn't prevent you from experiencing all the benefits of a top-shelf data analysis package, and each of these options brings its own set of strengths while being much cheaper to implement than the typical proprietary solutions.

cancel ×

61 comments

Sorry! There are no comments related to the filter you selected.

Get the Popcorn (4, Funny)

eldavojohn (898314) | more than 2 years ago | (#40092849)

So, you're linking a SlashdotBI article to the Slashdot front page?

Well then [imgur.com] .

Re:Get the Popcorn (0)

Anonymous Coward | more than 2 years ago | (#40092975)

And with such a lightweight article, too? Please. There was only one code sample, and no example of how one task would be accomplished across all three.

Re:Get the Popcorn (0)

Anonymous Coward | more than 2 years ago | (#40094143)

Thanks for providing that feedback Anonymous!

.. just channeling "startrekkie"

Re:Get the Popcorn (1)

ObsessiveMathsFreak (773371) | more than 2 years ago | (#40095987)

If people thought Idle was bad, the Business Intelligence takes Slashdot an order of magnitude lower.

How long until the BI editors demand outright access to the frontpage?

Re:Get the Popcorn (1)

Anonymous Coward | more than 2 years ago | (#40100361)

It's even more moronic when you consider that the articles comments had more useful content than the actual article!

It's no wonder that taco left... /. It was a nice ride but you've really fallen by the wayside in the last few years as in nearly irrelevant with late story postings and garbage like this one.

Did I seriously miss something? (4, Informative)

ACK!! (10229) | more than 2 years ago | (#40093013)

The whole article was not much more than a high level review. The graphic naturally draws attention to the parameters the writer wanted to cover but he did not back up his graphic with any sort of serious textual review of what he felt were the weaknesses or advantages of the different programming language at least not in any detail.

Re:Did I seriously miss something? (4, Interesting)

Ruie (30480) | more than 2 years ago | (#40093099)

The whole article was not much more than a high level review. The graphic naturally draws attention to the parameters the writer wanted to cover but he did not back up his graphic with any sort of serious textual review of what he felt were the weaknesses or advantages of the different programming language at least not in any detail.

And what he has is flawed as well. For example, he marked R as having issue with big data which is quite wrong - I routinely analyze multi-GB datasets in memory, and my databases go into TB. Of all the three languages R is the only one to have a native format (data.frame) that interfaces easily to database queries. Both Octave (Matlab) or Python have to use compound types which make addressing difficult.

Also, I found R easier to master than either Octave or Python, but this is probably because I am familiar with Lisp.

Re:Did I seriously miss something? (0)

Anonymous Coward | more than 2 years ago | (#40094777)

I think the difference is when you use file formats that are flatter than databases and certain GUIs. In those cases, rather than taking the data as it needs it, it attempts to load all of it into memory and can max out the memory allowed to the process in 32 bit systems. But even then, there are ways around that through smart planning, variable use, and multiple data files for different variables so not all are in memory at once (of course databases implements all three at once internally).

Re:Did I seriously miss something? (2)

dondelelcaro (81997) | more than 2 years ago | (#40094827)

there are ways around that through smart planning, variable use, and multiple data files for different variables so not all are in memory at once

There are also packages like ff and others which handle absolutely gigantic files by offloading parts of them to storage and only allocating memory for them (and storage) when required. R certainly has some problems with dealing with huge amounts of data, but they aren't insurmountable for datasets less than 1T.

Re:Did I seriously miss something? (0)

Anonymous Coward | more than 2 years ago | (#40100929)

Right. I'm just saying that if you are insistent on using badly-written GUIs on the 32-bit version using flatter files that you can get around them. And the only reason I say that is that all the people at the top of my organization who are persuaded by charts like those in TFA would be the ones using it on their 32-bit computers with information they entered into excel or called from the databases into an excel spreadsheet that they then try to use in an R GUI.

Thankfully, R has more than one way to skin a mouse as other responses to my post has pointed out.

Re:Did I seriously miss something? (1)

Ruie (30480) | more than 2 years ago | (#40097039)

I think the difference is when you use file formats that are flatter than databases and certain GUIs. In those cases, rather than taking the data as it needs it, it attempts to load all of it into memory and can max out the memory allowed to the process in 32 bit systems. But even then, there are ways around that through smart planning, variable use, and multiple data files for different variables so not all are in memory at once (of course databases implements all three at once internally).

This only happens if you issue a call like read.table("mytable.txt") - you can read the file piece by piece if you want to. Granted, this requires some work (unlike SAS), but in return you can do loops ;)

Re:Did I seriously miss something? (1)

martin-boundary (547041) | more than 2 years ago | (#40098125)

Thats not the real problem. The real problem with R and Octave/Matlab etc is that when you want to use a specialized function to analyze your data, the function isn't usually implemented in an efficient way (ie it will create temporary tables/vectors and perform operation that don't scale, etc).

So effectively your rich exploration environment is unusable unless you refrain from using all but the simplest operations, or you write your own versions of commands from scratch with efficiency in mind.

This is _particularly_ noticeable with graphics. Try plotting a terabyte dataset _entirely_ on the screen in 3d and rotating it.

Re:Did I seriously miss something? (2)

plopez (54068) | more than 2 years ago | (#40099651)

32 bits? are you serious?

Python does have data.frame.. (3, Informative)

csirac (574795) | more than 2 years ago | (#40096487)

Through pandas [sourceforge.net] , for a start. The SciPy/NumPy stack is quite nifty, I'm especially interested in how to apply it for working with irregular time series data.

Not to say anybody should ditch R, I still support our researchers most weeks at work in using it. But it's not as clear-cut as you seem to think it is, especially in terms of memory efficiency.

Re:Python does have data.frame.. (1)

Ruie (30480) | more than 2 years ago | (#40097025)

Didn't know about this one - thanks !

Re:Did I seriously miss something? (1)

Anonymous Coward | more than 2 years ago | (#40097453)

And what he has is flawed as well. For example, he marked R as having issue with big data which is quite wrong - I routinely analyze multi-GB datasets in memory, and my databases go into TB.

Dude. That's not what people mean when they say big data. HP and Dell will both quite happily sell you machines with 2TB of main memory, and SGI will go to 16TB, and anything which can fit in memory on a single machine without custom hardware isn't big data. It's only big data once you get up to a few hundred terabytes.

Re:Did I seriously miss something? (1)

Ruie (30480) | more than 2 years ago | (#40122717)

And what he has is flawed as well. For example, he marked R as having issue with big data which is quite wrong - I routinely analyze multi-GB datasets in memory, and my databases go into TB.

Dude. That's not what people mean when they say big data. HP and Dell will both quite happily sell you machines with 2TB of main memory, and SGI will go to 16TB, and anything which can fit in memory on a single machine without custom hardware isn't big data. It's only big data once you get up to a few hundred terabytes.

Heh ! I am sure I can use R on such hardware, as long as I have access to it ;)

Re:Did I seriously miss something? (1)

plopez (54068) | more than 2 years ago | (#40099639)

If you know Lisp and OOP R is easy. Unfortunately Lsip has become arcane and most programmers I met did not understand OOP.

Re:Did I seriously miss something? (1)

Anonymous Coward | more than 2 years ago | (#40093783)

he did not back up his graphic with any sort of serious textual review

She [slashdot.org] is Geeknet's "Senior Director of Analytics".

Re:Did I seriously miss something? (2)

Anrego (830717) | more than 2 years ago | (#40094179)

Indeed. This is high level "meeting for the suits" bullshit. I can picture this showing up on powerpoint presentation.

Here are your three options.. this is the one that sucks, this is the one that sucks for a different reason, and this is the one I want you to go with. Oh, and here is a chart with some pretty checkmarks and stuff to help clarify! Lets do lunch!

SlashBI is very disappointing (1)

Chuck Chunder (21021) | more than 2 years ago | (#40095111)

It's full of puff pieces and press releases.

I think a lot of Slashdot readers (me included) would be interested to get an introduction in various practical aspects of analytics, especially with Open Source tools we can experiment with ourselves. SlashBI could be a good gateway for that. So far every article I have read there has seems like a waste of time.

Re:Did I seriously miss something? (1)

ceoyoyo (59147) | more than 2 years ago | (#40100861)

It wasn't even that. It came down to one of the last paragraphs:

"In my [limited and misleading] experience...."

Python isn't good at visualization? I guess the author has never used VTK-Python or Matplotlib. R isn't good with big data? I suppose that comes from R not having great database interactivity... so just feed it data via Python using rpy2.

I wish he had learning resources. (4, Insightful)

Anonymous Coward | more than 2 years ago | (#40093031)

I wish there was also a column for availibility of resources for learning like: tutorials, free books, example code, etc ....

Never selected that way (4, Insightful)

vlm (69642) | more than 2 years ago | (#40093059)

how analysts can rely on open-source software

I've done that kind of stuff at work and those criteria are NEVER how a package is selected.

If I need a commercial product I need all manner of signoffs requiring at least weeks of delay and massive IT involvement so they can insert it into windoze images automatically or whatever it is they do.

If I'm doing FOSS it just ... gets done that day. No agony. And it just works, and instead of a call center script reader in India who can only tell me to reinstall the software over and over, with FOSS the "whole internet" is my support system and they as in the whole internet know what they're doing.

Nothing about this has changed in about 15 years, so I'm not sure how this is "news". This would have been a good "news" story in the early/mid nineties.

Re:Never selected that way (1)

Anonymous Coward | more than 2 years ago | (#40093193)

Even with proprietary software, the "whole internet" can support your system, it is bunk to say that only happens with FOSS.

And to say it just works is bunk too, I see plenty of problems with FOSS where the "whole internet" has no f*cking clue other than, go to the source and figure it out yourself - not always a trivial exercise.

But go ahead and and keep believing your own bullshit.

Re:Never selected that way (1)

MikeBabcock (65886) | more than 2 years ago | (#40095845)

... except the 'whole internet' often says "too bad, you'll have to wait for a fix" with proprietary software whereas "Oh, try this patch over here" often happens on FOSS instead.

Re:Never selected that way (2)

Sebastopol (189276) | more than 2 years ago | (#40093315)

This is a thinly veiled attempt to put Python on the same level as R. /shakes head/

Re:Never selected that way (1)

seanzig (834642) | more than 2 years ago | (#40094145)

Absolutely - we all know that Python is much greater than R. ;-) Seriously though, I know where he's coming from, but it really should have had better explanations regarding his ratings for each language. For example, if one uses the Visualization Toolkit (VTK, www.vtk.org), it has Python bindings. I think the author simply doesn't know about that.

Both! (3, Insightful)

Kludge (13653) | more than 2 years ago | (#40094987)

The best option is to use python and R, through rpy for example.
R rocks for statistical libraries and good documentation.
Python rocks for everything else.

Re:Both! (-1)

Anonymous Coward | more than 2 years ago | (#40095523)

The 80s called. They want their "this rocks" statements back.

Re:Never selected that way (2, Insightful)

Anonymous Coward | more than 2 years ago | (#40093861)

Besides, in research, using something opensource (or at the very least gratis) makes it that much easier for others to replicate what you did. Getting SAS scripts just isn't fun.

Re:Never selected that way (0)

Anonymous Coward | more than 2 years ago | (#40094239)

Where I work the whole Cisco fiasco put the fear of god into the high level suits. The fall out is a huge and cumbersome process for getting approval to use FOSS tools... even though we arn't modifying or distributing them. It's to the point where it's less headache to _buy_ something than go through the lengthly FOSS approval process.

Re:Never selected that way (1)

hawguy (1600213) | more than 2 years ago | (#40096861)

Where I work the whole Cisco fiasco put the fear of god into the high level suits. The fall out is a huge and cumbersome process for getting approval to use FOSS tools... even though we arn't modifying or distributing them. It's to the point where it's less headache to _buy_ something than go through the lengthly FOSS approval process.

What was the Cisco fiasco? My company uses Opensource tools routinely and I've never even heard of the Cisco fiasco.

Re:Never selected that way (0)

Anonymous Coward | more than 2 years ago | (#40100563)

The FSF took on CISCO over improper use of GPLed code. I too have seen it put the "fear of god" into management. They don't actually understand why it happened or why it's non-applicable or the difference between using eclipse or svn on your workstation and including GPL code in your product .. the message that got through was using FOSS == getting sued.

Re:Never selected that way (5, Insightful)

Anonymous Coward | more than 2 years ago | (#40094369)

I'm an astronomer. At this point in my career, I move to a new research institution every couple of years. Each institution may have a site licence for some piece of commercial software like IDL or Matlab, but I use free software (Python, in my case) because I know that I can keep using it, rather than rewriting all my scripts for a new language every time I move.

Re:Never selected that way (0)

Anonymous Coward | more than 2 years ago | (#40096749)

In big companies there are often oppressive/cautious/conservative soft are restrictions that don't allow one to find a FOSS and use it so we are stuck going through corporate IT for commercial software.

What an awful article. (1)

Anonymous Coward | more than 2 years ago | (#40093085)

n/t

Superficial and arbitrary (0)

Anonymous Coward | more than 2 years ago | (#40093123)

As someone who regularly programs in all three of those languages I'd like to point out that the comparison is completely arbitrary. This is one of the most lazily writting articles I've seen Slashdot link to.

Re:Superficial and arbitrary (1)

MattBecker82 (1686358) | more than 2 years ago | (#40098441)

This is one of the most lazily writting articles I've seen Slashdot link to.

Mod +1 Ironic

More crap from /. (4, Insightful)

NoMaster (142776) | more than 2 years ago | (#40093139)

"Here is a breakdown of R, Octave and Python ..."

No there isn't - that's there is not much more than a shitty 'feature' table, too high level to be anything other than facile, which is "Based on [the author's] own user experience and research".

As an student user of all 3 I would have been interested in reading a good comparative review or explanation aimed at outsiders. This ain't it; it's just more slashvertising.

Re:More crap from /. (1)

Anonymous Coward | more than 2 years ago | (#40093525)

Yes, but the advantage of the author's approach is that it'd be real easy to extend the review to include Scilab.

Low Quality Article (0)

Anonymous Coward | more than 2 years ago | (#40093143)

This is a really low quality article. Ironically, even though it's a /.-BI article, it's not up to /. quality.
I had a colleague ask me recently about the strengths and weaknesses of R, Octave, and Python. When I saw the summary of the article, I was about to send the link to him. Then I read the article. Forget that.

I h8 Python! (-1)

Anonymous Coward | more than 2 years ago | (#40093167)

Any language which relies on invisible characters for anything more than disambiguating non-whitespace tokens is LAME!

And, wow, if an editor inserts spaces for tabs, or uses real tab characters in Python code indented with spaces...Well, it puts the "fun" back in to "fun"ction!

Sooooo lame!

Flame away, folks!

Re:I h8 Python! (1)

MetalliQaZ (539913) | more than 2 years ago | (#40093533)

Spoken like a man who earned a C in freshman year intro to programming, but for some reason didn't switch to a humanities major.

Re:I h8 Python! (0)

Anonymous Coward | more than 2 years ago | (#40094461)

Spoken like a man who works the receiving end of a glory hole on a nightly basis.

Or if you can't make up your mind (2, Interesting)

Anonymous Coward | more than 2 years ago | (#40093353)

Sage math http://www.sagemath.org/

Julia? (3, Informative)

Chrisq (894406) | more than 2 years ago | (#40093473)

There was a previous article about Julia [slashdot.org] which looked cool. I wonder how this measures up

Re:Julia? (0)

Anonymous Coward | more than 2 years ago | (#40097173)

Julia is (AFAIK) a compiled language, intended to be high-performance in every aspect (like Fortran). R, Octave, and python are all interpreted (with some exceptions for e.g. pypy), and can only perform well through calling C/Fortran/Julia functions where the overhead is small compared to the computation done in lower-level languages. I don't think anyone uses Fortran for data analysis unless you get into piles of disks and heavy algorithms, and I believe that's the crowd Julia is targeted at. If you want to calculate a few statistical numbers out of a few GB of data, python/R/octave takes (at most) a few hours, but who cares anyway...

Oh.. (2, Insightful)

Anrego (830717) | more than 2 years ago | (#40094089)

Now that's just desperation.

Come on .. keep this shit in bi. Either it takes off or it doesn't.

cheaper to implement depends on salaries (0)

Anonymous Coward | more than 2 years ago | (#40095127)

An abacus is cheaper to implement than most things on a computer as long as you don't count developer time; pull out the Dick Feynman method from LANL in the 1940's and you are good to go.

Read this while listening to the Mensroom segment (1)

tehlinux (896034) | more than 2 years ago | (#40095199)

My suggestion is to try all three, and see which offering’s toolbox solves your specific problems.

Well no **** Sherlock!

I don't understand (4, Informative)

utkonos (2104836) | more than 2 years ago | (#40095821)

This article compares three languages that have different purposes. R's purpose is statistical analysis and visualization. Octave is a general mathematical analysis and visualization language. Python is a generalist language that has it's own focuses on code readability among other things.

These languages also have a target audience. R is for statisticians and scientists. Octave is for mathematicians, and Python is for programmers.

Re:I don't understand (0)

Anonymous Coward | more than 2 years ago | (#40096283)

But from a data analyst's perspective, all three could serve Machine Learning purposes.

Re:I don't understand (0)

Anonymous Coward | more than 2 years ago | (#40099825)

From a data analyst's perspective you use data analysis software when doing any serious work of which there are many OSS alternatives.

Re:I don't understand (0)

Anonymous Coward | more than 2 years ago | (#40100131)

Generally, you're right, but there is an ever growing list of Python modules for scientific computing and data visualization and I'd argue that even for non-programmers, it's starting to surpass Octave and is more competitive in terms of useful library functions and performance with commercial MATLAB (provided MATLAB compatibility is not a requirement). Sure, Python is a general purpose programming language, but it's growing into a full featured interactive numerical environment, like Octave and MATLAB, too.

Re:I don't understand (0)

Anonymous Coward | more than 2 years ago | (#40102299)

Python is for programmers.

What's interesting is that part of Python's current popularity is because there is a large number of users who aren't programmers. SciPy and NumPy are super powerful data analysis libraries for python. Couple this with python's approachability for non programmers and you end up seeing a lot of people from the scientific community using it.

Re:I don't understand (1)

utkonos (2104836) | more than 2 years ago | (#40121559)

Fantastic! When Python's libraries surpass what is available in CRAN (think CPAN but for R) I'll switch, and I'm sure everyone else will as well they're both just tools. Statisticians use R because its designed for statisticians. And that was my original point. The original article is strange because it is comparing apples and oranges. Plus, it was absolutely flame-bait, because there aren't really any R or Octave zealots. People that use them think of them as tools. The author compared them to Python to get the Python zealots to come out of the woodwork and make a stir around the article.

At lease this one brought some juicy comments! (0)

Anonymous Coward | more than 2 years ago | (#40096269)

Hey guys, if you are interested in having more details on those 3 software and else, some of the comments in / BI are pretty good (at least from my perspective). For example, one anonymous reader posted "Both Octave and R have specific places in the pantheon of analytics, usually adjacent to their respective work-alikes. Unfortunately, there is no current operational Octave nor R compiler (as in optimizing compiler), so in both cases, you have something interpreted. This isn't a terrible thing ... its great for interactive debugging ... but performance on non-natively compiled code is horrible. Just try a dense LU decomposition on a large matrix (say 4k x 4k) just to see how painful it is compared to well optimized Fortran/C." ... Just check out the rest!

Re:At lease this one brought some juicy comments! (0)

Anonymous Coward | more than 2 years ago | (#40097097)

As far as I remember, most of Octave magic IS done in fortran.

Re:At lease this one brought some juicy comments! (1)

plopez (54068) | more than 2 years ago | (#40099765)

Oh oh.... you mentioned Fortran. Here come the "Fortran is ugly and out of date" posts. To nip it in the bud I will link to http://en.wikipedia.org/wiki/Fortran#Fortran_2008 [wikipedia.org]

Check out Fortran 2008,which is way cool! Everything you could want from a modern programming language.

apples vs oranges (1)

plopez (54068) | more than 2 years ago | (#40099709)

I still don't get it. How can you compare specialized statistical and number crunching languages with a general purpose programming language.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>