Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

R Throwdown Challenge

timothy posted about 3 months ago | from the if-you-pirate-it-so-much-the-better dept.

Programming 185

theodp (442580) writes "'R beats Python!' screams the headline at Prof. Norm Matloff's Mad (Data) Scientist blog. 'R beats Julia! Anyone else wanna challenge R?' Not that he has anything against Python, Matloff adds, but he just doesn't believe that Python or Julia will become 'the new R' anytime soon, or ever. Why? 'R is written by statisticians, for statisticians,' explains Matloff. 'It matters. An Argentinian chef, say, who wants to make Japanese sushi may get all the ingredients right, but likely it just won't work out quite the same. Similarly, a Pythonista could certainly cook up some code for some statistical procedure by reading a statistics book, but it wouldn't be quite same. It would likely be missing some things of interest to the practicing statistician. And R is Statistically Correct.'"

cancel ×

185 comments

Sorry! There are no comments related to the filter you selected.

Can't use it (5, Funny)

smitty_one_each (243267) | about 3 months ago | (#47086779)

Nothing with a name that verbose can possibly be any good.

Re:Can't use it (1)

dmbasso (1052166) | about 3 months ago | (#47086817)

Like... hmm... C?

Re:Can't use it (1)

Anonymous Coward | about 3 months ago | (#47086865)

I'm waiting for R++ myself...

Re:Can't use it (2)

rudy_wayne (414635) | about 3 months ago | (#47086903)

R#

Re:Can't use it (0)

Anonymous Coward | about 3 months ago | (#47086923)

Have you hear of these exceptional languages?

w
H
O
o
S
H

Re:Can't use it (3, Funny)

FatdogHaiku (978357) | about 3 months ago | (#47087263)

Is this the programming language of Pirates?
Is this the programming language for Pirates?
Is this the language for programming Pirates?
Arrr...

Can't spell warez without R (2)

tepples (727027) | about 3 months ago | (#47087369)

And to what extent are statisticians willing to use warez?

Re:Can't use it (1)

FatLittleMonkey (1341387) | about 3 months ago | (#47087613)

Posting to undo stupid.

Re:Can't use it (0)

Anonymous Coward | about 3 months ago | (#47087281)

I'm waiting for the R vs D wars.

Re:Can't use it (0)

Anonymous Coward | about 3 months ago | (#47087527)

Do we send ASM programmers to the hague...

Hard to believe in these figures (5, Funny)

CRCulver (715279) | about 3 months ago | (#47086789)

And R is Statistically Correct.

I don't see any margin of error. This claim is scientifically worthless.

Re:Hard to believe in these figures (0)

Anonymous Coward | about 3 months ago | (#47086851)

only 1 in 1,225,461 people use it anyway.

Re:Hard to believe in these figures (0)

Anonymous Coward | about 3 months ago | (#47086911)

I'd like to see a double-blind study on the chef claim, too.

Re:Hard to believe in these figures (2)

I'm New Around Here (1154723) | about 3 months ago | (#47087013)

It dices. It chops. It purees. It makes my food taste better, to a not insignificant amount.

Any other claims you want to hear from a chef*?

.
*Note: Worked in several restaurants during and after high school. Now I occasionally cook or make deserts at home.

Re:Hard to believe in these figures (1)

fuzzyfuzzyfungus (1223518) | about 3 months ago | (#47087307)

I'd like to see a double-blind study on the chef claim, too.

I'm pretty sure that food, like music, has a fringe of...enthusiasts...who would tell you that double-blind studies just ineffably blunt the terroir in some more or less mystical way (which is of course the real reason why they have trouble performing above chance), rather than let base materialism and the plebian theory that functionally identical outcomes can be produced by a variety of means sully the transcendent subtlety of their experience.

Re:Hard to believe in these figures (1)

Culture20 (968837) | about 3 months ago | (#47087381)

I enjoy having my steak prepared the same way every time, but I would balk at eating a 100% reproduced steak.

Bad analogy (5, Insightful)

Florian Weimer (88405) | about 3 months ago | (#47086797)

An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer.

You generally want to use programming languages designed by experienced programmers (even better, experienced language designers) who work closely with subject matter experts. Left to their own devices, experts are likely to get a lot of things wrong, and if the language is sufficiently popular, you are stuck with their mistakes for a long time to come.

Re:Bad analogy (-1)

DexterIsADog (2954149) | about 3 months ago | (#47086863)

The analogy wasn't perfect, but yours was terrible. I guess that's what you get when a tech geek tries his hand at writing.

Re:Bad analogy (1)

I'm New Around Here (1154723) | about 3 months ago | (#47087027)

Considering Florian Weimer didn't make an analogy in his post, your post is what happens when a /. geek tries to make an argument based on his own skills in reading comprehension.

Re:Bad analogy (0)

Anonymous Coward | about 3 months ago | (#47087433)

He totes did, and so did GP. Reading is fundamental.

Re:Bad analogy (5, Interesting)

Glock27 (446276) | about 3 months ago | (#47086871)

Exactly. Julia will eat R for lunch soon enough, I think. It's an elegant, well designed and efficient language. It's only been around for a couple of years, and has a very vibrant and rapidly growing community.

Check it out for yourself: The Julia Language Homepage [julialang.org] . It's got a lot to offer anyone with an interest in mathematics, including statisticians. It's based on the LLVM, and interfaces trivially with C libraries - plus it's a very fast language in it's own right, unlike R or Python.

Re:Bad analogy (5, Interesting)

retchdog (1319261) | about 3 months ago | (#47086959)

my friend uses julia, and every few weeks complains about some bug. the other day he mentioned that the latest release broke Bernoulli sampling (wtf?). the others have been pretty fundamental too.

this is a serious problem, of course. the other one is lack of libraries. R is an abysmal pile of shit, but at least it's a standard; pretty much 95%+ of applied stats is at least partially supported by someone's hacked-up library/package. julia is far, far short of that, and it appears that much of its community is more interested in pretty graphics, meta-wankery, and interface methodology than actual working statistics (not that there's anything wrong with that per se).

yeah, yeah, "fix it yourself," and it's on my list to write at least a basic survival analysis package for it. but i wouldn't blame anyone for not using it, and i wouldn't recommend it for doing stats as it is now.

Re:Bad analogy (3, Funny)

Gaygirlie (1657131) | about 3 months ago | (#47086995)

my friend uses julia, and every few weeks complains about some bug.

He should tell Julia to wear protection and be more careful with who she spends time with so as not to catch so many bugs.

Re:Bad analogy (1)

K. S. Kyosuke (729550) | about 3 months ago | (#47087003)

How much R package code is written in R? Would it be such a problem to take an R parser and generate Julia code out of it as a first iteration? Then, people could refactor it - if necessary - while keeping the first version around for regression testing. Even if the original R APIs are horrible, at least they have the benefit of people being familiar with them, as you rightly point out.

Re:Bad analogy (1)

Antique Geekmeister (740220) | about 3 months ago | (#47087321)

Like the f2c toolkit, for converting Fortran to C?

I don't think you could write he parser in R, or in Julia.

Re:Bad analogy (1)

K. S. Kyosuke (729550) | about 3 months ago | (#47087351)

Or f2cl? ;-) I don't see a reason why one shouldn't be able to write the parser in Julia. It seems perfectly equipped even for such tasks. It even has macros, come to think of it.

Re:Bad analogy (1)

fuzzyfuzzyfungus (1223518) | about 3 months ago | (#47087355)

Given that these languages are (primarily, obviously anything Turing-complete can be turned to the same purposes as anything else, if somebody feels like it) used for statistics work, I'd be inclined to wonder whether that is the easiest or best way to go about it:

If something is already implemented in R, and you want to more or less blindly feed it a new target, or re-run it to see how it works, R was apparently not broken enough to stop it, because it's already done.

If you want to implement some, currently unsupported, aspect of statistics in Julia, with API or binary compatibility with R not a consideration, you could potentially end up in a situation where being reasonably sure that your translated version works, does what it is supposed to, and is vaguely human readable might take longer or be more difficult, or both, than starting with the math you wish to implement and building something non-broken from scratch.

Re:Bad analogy (1)

K. S. Kyosuke (729550) | about 3 months ago | (#47087413)

Julia isn't strictly numerical. It sure as hell isn't "primarily for statistics work". It has a numerical bent, but so far I haven't seen any limitation in the sense that something general and non-numeric in it would be possible (in the sense of Turing completeness) but impractical. Indeed, the very fact that Julia has been designed with support for Lisp-like macros in mind should be a hint to you that perhaps expecting it to have at least generous facilities for manipulating and transforming syntactic trees and structures is not entirely unwarranted. The only obstacle I see is a dearth of parsing tools in (or for) it, but that is the least of my worries (the thought of OMeta immediately comes to my mind).

Re:Bad analogy (1)

Jmstuckman (561420) | about 3 months ago | (#47087797)

Although much R package code is written in R, many of the important bits are living in FORTRAN libraries (many of which date back to the 1980s) which are linked into the packages.

Re:Bad analogy (0)

Anonymous Coward | about 3 months ago | (#47087905)

R is an abysmal pile of shit

Could be worse. Could be MATLAB.

Re:Bad analogy (1)

KingOfBLASH (620432) | about 3 months ago | (#47086965)

Using three lines of code I can do a regression in R and get the output, including loading the data.

Python? Fuhgeddaboutit. Can do, but with a lot more code.

Of course, if you're looking to do stuff you'd expect of a normal scripting language, R falls flat on its face.

The solution? R + Python. They talk to each other quite nicely, and you can get the best of both worlds.

Re:Bad analogy (2)

tomhath (637240) | about 3 months ago | (#47087057)

Python? Fuhgeddaboutit. Can do, but with a lot more code.

Yea, with Python it takes up to nine lines of code [blogspot.com] to calculate the regression and generate a plot

Re:Bad analogy (0)

Anonymous Coward | about 3 months ago | (#47087087)

Two of those lines are import statements you would only need to do once if doing a lot of related work (or just put in your start-up script or use ipython which pulls in most of pylab and numpy anyway, so taking zero lines of code). It looks nice to define things on a separate line, but you don't really need to have a separate line just for an arange definition or making the points need to plot the line. And different plotting settings mean you don't need the show command. You end up back to about three lines of code: one to define/load your data, one to run the regression, and one to plot it.

Re:Bad analogy (3, Informative)

KingOfBLASH (620432) | about 3 months ago | (#47087107)

You're just getting a plot. I'm talking about output that looks like this:


Call:
lm(formula = new_day_return ~ prior_day_return + rsi_under_10 +
        rsi_under_20 + rsi_under_30 + rsi_over_70 + rsi_over_80 +
        rsi_over_90 + fourteen_day_rsi, data = mydata5)

Residuals:
      Min 1Q Median 3Q Max
    -100 -1 0 1 205700

Coefficients:
                                      Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.845e+01 3.742e+02 -0.263 0.792
prior_day_return -4.143e-04 3.434e-03 -0.121 0.904
rsi_under_10 -1.916e-01 3.798e+00 -0.050 0.960
rsi_under_20 2.195e-02 1.447e+00 0.015 0.988
rsi_under_30 -2.291e-01 6.915e-01 -0.331 0.740
rsi_over_70 -2.364e-01 3.348e-01 -0.706 0.480
rsi_over_80 5.135e-03 4.820e-01 0.011 0.991
rsi_over_90 7.162e-03 8.650e-01 0.008 0.993
fourteen_day_rsi 4.193e-04 3.434e-03 0.122 0.903

Residual standard error: 163.7 on 1581663 degrees of freedom
    (137 observations deleted due to missingness)
Multiple R-squared: 5.397e-07, Adjusted R-squared: -4.518e-06
F-statistic: 0.1067 on 8 and 1581663 DF, p-value: 0.999

Re:Bad analogy (1)

Anonymous Coward | about 3 months ago | (#47087565)

Then you just need to use a different package not meant for the people who want a quick single regression, e.g. use statsmodels [sourceforge.net] . The example there is rather verbose, as some people prefer that, especially when learning, but you can easily do a less verbose version similar to R:

results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
print results.summary()

Re:Bad analogy (4, Insightful)

professionalfurryele (877225) | about 3 months ago | (#47087297)

Sorry but I use both R and python in my work as a biomechanist and while I love working with python and hate working in R, R is not only less verbose for this task, but it is more consistent, intuitive and better documented. Very few languages beat python for simple, easy to read code, but it is not up to the task of doing general purpose statistics. To see why this is the case consider a problem with that blog post. All the diagnostic plots I need to do to check the regression are missing, no qq, no cook's, not even something simple like fitted vs. residual. Now consider what happens when I notice that while the fit is decent the residuals depend on what subject I'm looking at and I need to vary the error term. Or need to switch to a mixed effects model because there is clearly a dependence on the intercept by subject.
Seriously when i say I hate R, I mean it. The code is ugly, it can be hard to read and woe betide the poor git who makes the mistake of needing a plot more complicated that something lattice can do. It is still better than python for statistics.

Re:Bad analogy (0)

Anonymous Coward | about 3 months ago | (#47087327)

"woe betide the poor git who makes the mistake of needing a plot more complicated that something lattice can do."

Base R can give you almost any plot you want.

Re:Bad analogy (1)

professionalfurryele (877225) | about 3 months ago | (#47087697)

You can. For me the primitives are a pain to work with compared with matplotlib. Not that anything I've used has good 2D primitives for plotting, just gradations of less crappy.

Re:Bad analogy (0)

Anonymous Coward | about 3 months ago | (#47087163)

> You generally want to use programming languages designed by experienced programmers (even better, experienced language designers)

That's why Python has such a good design... oh, wait!

Re:Bad analogy (0)

Anonymous Coward | about 3 months ago | (#47087217)

C++ & java are good proof of that problem.

Re:Bad analogy (0)

Anonymous Coward | about 3 months ago | (#47087293)

An Argentinian chef could cook sushi just fine. I would expect a Japanese chef could make Empanadas just fine too. I don't see how nationality or race would make any difference at all.

The skill of the chef, and the level of experience making the dish in question, are going to be much more significant determinants of the outcome.

true, but not really because of R itself (5, Insightful)

Trepidity (597) | about 3 months ago | (#47086807)

R itself is okay, but even as a long-time user I don't think the language or environment itself is all that much to brag about. What makes it great for statistics is just that statisticians use it, which means that a lot of the packages are written by statisticians. That makes a big difference: recent papers often have R implementations, standard problems have well-maintained R packages for them with all the bells and whistles, etc. As Matloff notes, this means they often have everything that statisticians are looking for, while straightforward textbook implementations you often find in other languages often aren't nearly as thorough in how they handle the statistical models, or only handle some special cases (though there are some really good packages in other languages, just not as many).

But I don't think that has much to do with R itself being uniquely suited to statisticians. It's used for historical reasons: Bell Labs S was influential in the field way back when nothing like Python or Julia existed, and statisticians started using it because it was a lot nicer than Fortran, which is what other areas of science mostly used back then. GNU R is essentially a free-software workalike for Bell's S, and it's kept most of the community on board through a mixture of existing packages, familiarity, and inertia.

Re:true, but not really because of R itself (1)

Anonymous Coward | about 3 months ago | (#47086935)

I'm not a heavy R user, but I do appreciate how functions such as lapply are so easily scaled. I'm a peripheral stats user with some minor programming experience, but I even able to to use the NVidia GPU libraries and then R on Hadoop with minor code changes. For that, I'm happy.

Re:true, but not really because of R itself (3, Interesting)

jythie (914043) | about 3 months ago | (#47086947)

*nods* who uses a language has more impact on its usefulness then anything inherent to the language. LIbraries, support community, easy of hiring people who both know the language and have domain specific skills, much more important then what kind of sugar the language has.

Data mining (0)

Anonymous Coward | about 3 months ago | (#47086967)

If I had to do some intense statistical analysis, then R is probably a better choice.

Now, if I have to get data via a feed or web page scraping, manipulate it, clean it, do some sanalysis, display it or feed it to another program, then Python makes all of that much easier and maintainable.

Back in the old days before all these smancy fancy tools, we used this red book called something like "Mathematical Programming in C" - in the snow; uphill both ways. It had the code and alogrithms to implelent all the stats, engineering, and god knows what - all in C.

I don't see it on Amazon - or I got the title totally wrong.

Re:Data mining (2, Interesting)

Anonymous Coward | about 3 months ago | (#47087007)

You got the title wrong.

_Numerical Recipes in C_, by Press, W. et al

http://www.amazon.com/Numerical-Recipes-Scientific-Computing-Edition/dp/0521431085

IIRC there was also a _Numerical Recipes in FORTRAN_ as well.

Also see http://www.nr.com/ . I think they only have a single book now called _Numerical Recipes_ and it is in its third edition.

Thanks son! (-1)

Anonymous Coward | about 3 months ago | (#47087109)

The onion on my belt is getting a little old and it's scent can't keep me as sharp.

I should try blueberries. My grandma made the best blueberry pie! This was back when everyone wore onions on their belts and the Jitterbug was making a comeback. Those were the days and 64K was plenty of memory for anything!

But fishing is another story ....

Re:Data mining (1)

plopez (54068) | about 3 months ago | (#47087881)

The numerical recipes series was much more than algorithms and code. It told you more about the how and *why* of an algorithm. And when as in when it should be used. The commentary alone is enough reason to buy them even if you never actually use any code from them.

Re:true, but not really because of R itself (3, Interesting)

HuguesT (84078) | about 3 months ago | (#47087061)

R has some pretty unique graphing packages. Nothing that I know of matches the way you can do 2D and 3D plots in R. Not Python, not Gnuplot, not Julia, not Matlab, not Excel, not Mathematica, nothing.

Re:true, but not really because of R itself (2)

Trepidity (597) | about 3 months ago | (#47087081)

Around here Python's matplotlib has been making some inroads in the plotting category, even among people who use R for the actual data analysis, but it's admittedly not as featureful as the whole suite of R plotting packages.

Re:true, but not really because of R itself (0)

Anonymous Coward | about 3 months ago | (#47087227)

Yes, the advantage of R is the plotting.

If R was written by statisticians... (0)

Anonymous Coward | about 3 months ago | (#47086809)

Why is there no "r" in statistics?

truth, lies, and statistics (0)

Anonymous Coward | about 3 months ago | (#47086815)

" And R is Statistically Correct" doesn't mean anything.

memorial to our broken hearts & good spirits (-1)

Anonymous Coward | about 3 months ago | (#47086821)

we can go back? some of the views of the much maligned cohen http://www.youtube.com/watch?v=9F8QM3tjkTE & the frequently misunderstood john http://www.youtube.com/watch?v=ndzHqFv9Mdo our references; mlk http://www.youtube.com/results?search_query=mlk%20speech&sm=3 jfk http://www.youtube.com/results?search_query=jfk%20speech&sm=3 forecasting; http://www.youtube.com/results?search_query=wmd+weather 100% preventable (by us) starvation (mostly innocent kids) still # 1 killer world wide.... followed closely by 100% fatal deception in an almost tied intertwintion of spiritless fatality....

Meh (5, Informative)

hyfe (641811) | about 3 months ago | (#47086827)

Statistics major who programmed Python professionally for a few years (and have a MsC in Comp.Sci) ...

... this is all posturing and drama, but good on Prof. Norm Matloff for getting some attention. R is rather usefull, has quite a few extremely usefull features as a language, including some of the best list/indices handling I've seen anywhere. Excellent libraries for statistical work, but it also has quite a few the most downright abhorrent language decision I've seen anywhere ever, with the amazingly poor string handling (for a scripted language) topping that list ( http://www.burns-stat.com/page... [burns-stat.com] )

Python, C, Mathematica and R all have different strengths for mathematical work / numerical calculations though, and using the best tool for the job is what it's about. As always, what the best tool actually is, is also rather subjective, as which tool will best solve a specific task is always dependent on your skill with the different tools. I do agree with professor though, even though there's quite abit of Python hype (python + scipy/matplotlib is amazing) R is not being replaced anytime soon. It's too good at what it's good at.

Re:Meh (-1)

Anonymous Coward | about 3 months ago | (#47087015)

Is there a feature that allows you to spell "useful" properly?

I dislike Python (0)

Anonymous Coward | about 3 months ago | (#47086835)

because it is an inferior mish-mash for an up-start generation which was never taught the, "In the end, everything looks like LISP," maxim. And its requirement for particular whitespace offends me as someone who has spent the last decade working with accessibility groups.

I'm not really sure I see where R fits, though. For basic statistical work, SPSS is good. For advanced statistical work, surely you'd want a general purpose language with cross-language libraries?

Re:I dislike Python (3, Insightful)

jythie (914043) | about 3 months ago | (#47086955)

Hrm. I never thought about the whitespace requirements in python from an accessibility perspective.

Re:I dislike Python (1)

fuzzyfuzzyfungus (1223518) | about 3 months ago | (#47087391)

Hrm. I never thought about the whitespace requirements in python from an accessibility perspective.

I know that Python's approach to whitespace is very...polarizing; but I've always wondered how much it would cause trouble either for people who really loath it, or for specialized situations that tend to crop up under 'accessibility' (where the path from text file to user is likely going through one or more atypical transformations, anywhere from simple contrast bumps up through text to speech or the like).

Given that the whitespace has to have an unambiguous meaning to the python interpreter, your editor could presumably convert, in either direction, any notation you desire, so long as it covers the same possible meanings (and, ideally, doesn't clash with characters python uses to mean something else, since then it'd have to convert those as well, potentially sending you chasing down the road to something that looks utterly different).

It's not as though what you see on the screen bears much resemblance to the actual underlying sequence of bits.

Re:I dislike Python (1)

Pinky's Brain (1158667) | about 3 months ago | (#47087033)

In the end most people will still use anything but LISP.

Re:I dislike Python (3, Interesting)

KingOfBLASH (620432) | about 3 months ago | (#47087183)

Believe it or not, most statisticians are not programming wizards.

Most stats guys use R, matlab, mathematica, or something similar. Even if it takes days to run a program that would take 20 minutes in C. Sort of like how the business guys will use VBA when they need anything, because that's what they know.

Languages like R are used because they are accessible. And once they reach a critical mass, everyone learns them in a field.

Sort of like how Fortran just won't die.

Re:I dislike Python (0)

Anonymous Coward | about 3 months ago | (#47087235)

SPSS had a bug in repeated measures anova they failed to correct for >20 years. If you cant see the code you can't trust it...

Re:I dislike Python (1)

pla (258480) | about 3 months ago | (#47087347)

because it is an inferior mish-mash for an up-start generation which was never taught the, "In the end, everything looks like LISP," maxim.

I have to suspect you as trolling here, because although I do indeed know Lisp (and Scheme, and Tcl) - Very, very little of my code ends up looking anything like Lisp.


And its requirement for particular whitespace offends me as someone who has spent the last decade working with accessibility groups.

I will fully agree with you that required whitespace offends me, but that has fuck-all to do with accessibility. Any programming language that doesn't let you write the entire program on one line with zero whitespace (not that you ever should do that, Perl notwithstanding) has some serious damage.


I'm not really sure I see where R fits, though. For basic statistical work, SPSS is good. For advanced statistical work, surely you'd want a general purpose language with cross-language libraries?

Statisticians != Programmers. TFA's rant very much looks like the newbie programmer after mastering his first language, who then tries to apply that particular hammer to every problem he comes across. "Damnit, that screw will get pounded in! Yes, I can chop through this 2x4 by striking it repeatedly with the claw-end! Yes, I can trick pure C into supporting something vaguely like an associative array!"

Good programmers will eventually realize that the job defines the tool to use. Poor programmers will stay trapped forever in an interpreted language with garbage collection. And Statisticians will go to their grave believing that whatever language they learn first counts as the best choice ever; Physicists have the same problem, thus you often see the most powerful supercomputers on the planet running... Fortran.

Hah (0)

Anonymous Coward | about 3 months ago | (#47086859)

His posts are perfectly 'random'. R itself is written in C. Python is also written in C. I can't see why one can get much better statistical correctness in R than what comes from its underlying implementation - in C.

A joke on the subject (4, Funny)

kav2k (1545689) | about 3 months ago | (#47086861)

A joke I've read recently [twitter.com] :

I'm not sure if "R is written by statisticians, for statisticians" is a good thing e.g. "stadiums are built by footballers, for footballers"

Re:A joke on the subject (0)

Anonymous Coward | about 3 months ago | (#47087615)

Yeah, not sure I want to hop on a plane made by passengers for passengers...

Don't throw down R if you won't talk SAS (0)

Anonymous Coward | about 3 months ago | (#47086873)

R may be written for statisticians, but is rightly criticized for lacking the validation that SAS has (which python et al also lack). There's a good discussion here [inside-bigdata.com] on the subject. And for what it's worth, both R and SAS both lack the tools to easily hook into other systems, which really makes them good ONLY for ad hoc statistics and reports.

Re:Don't throw down R if you won't talk SAS (1)

Trepidity (597) | about 3 months ago | (#47086889)

You can't talk SAS unless you've got a big bank account, though. A one-year, individual (single-desktop) license costs upwards of $5,000, which makes it a non-starter for a lot of people. Also, it's not open source.

Re:Don't throw down R if you won't talk SAS (1)

retchdog (1319261) | about 3 months ago | (#47087001)

yes, R is written for people who know what they are doing.

R is for statisticians... (0)

Anonymous Coward | about 3 months ago | (#47086885)

and pirates.

so different (1)

bucket_brigade (1079247) | about 3 months ago | (#47086887)

Yeah R is so different from Python, I mean everything is the same but not quite and I totally have a point and not just bullshitting because like Japanese sushi and beef Argentinian soup, brocolli.

Re:so different (0)

Anonymous Coward | about 3 months ago | (#47087059)

I know what you mean. It really is the broccoli that sets it apart from the rest, tomato cucumber.

Sausage & potato chutney cracked solder joints!

Who really f-ing cares? (3, Insightful)

nurb432 (527695) | about 3 months ago | (#47086907)

Use the right tool for the job and stop bashing other tools that were designed for different jobs .

Re:Who really f-ing cares? (0)

Anonymous Coward | about 3 months ago | (#47086977)

Who really f-ing cares? people who want to write tools that make people more produce care because these kinds of articles highlight the advantages and shortcomings of using the tool. Also discovering the 'right tool for the job' may not be straightforward so having the aforementioned facts to hand can help.

Right tool for the right job. (-1)

Anonymous Coward | about 3 months ago | (#47087055)

This isn't carpentry.

Folks around here like to stress the difference between computer science and being a code monkey all the time.

Well, here's how to tell the difference:

A computer scientist develops alogrithms and is able implement them in any language. It may be easier to do so in one language rather than another - mostly because of libraries. Python has Beautiful Soup for web page scraping that I have not seen in any other language. I'd have to roll my own if I wanted to scrape a page in C.

A code monkey is lost if his language of choice doesn't have the libraries to support what he is doing.

It's all just syntax - and obviously whether or not if said language has platform support. It'd be a bitch to write an iOS app in COBOL.

But a bubble sort? That can be implemented in any language. Same goes for statistical analysis - it just may be a lot harder.

Because everything eventually has to boil down to the same ones and zeros that the processor runs.

Re:Right tool for the right job. (0)

Anonymous Coward | about 3 months ago | (#47087597)

dickbreath???...

Re:Right tool for the right job. (0)

Anonymous Coward | about 3 months ago | (#47087683)

Well, by that logic, nothing past machine code was ever needed.

I think there's some value in having a language that allows you te express code in an efficient notation. Just like there's value in having mathematical formulas and not having to write mathematical work as prose.

Re:Right tool for the right job. (0)

Anonymous Coward | about 3 months ago | (#47087705)

Python has Beautiful Soup for web page scraping that I have not seen in any other language.

Java has JSoup [jsoup.org]

yawn (0)

Anonymous Coward | about 3 months ago | (#47086937)

So a special purpose statistics language beats out python - a general purpose language with lots of varying libraries (its real strength...)

Thats news? or worthy of some retards crowing ?

I never heard of R before and as it is statistics I see no need to know much further.

Next language or equiv I might look at is one that simulates Quantum Computing, as I want to see what applications that computing method is actually applicable to.

Wrong site to note a challenge (0)

Anonymous Coward | about 3 months ago | (#47086957)

APK throws challenges to trolls here on hosts. Not a 1 manages to validly topple his points.

I've found the problem... (0)

Anonymous Coward | about 3 months ago | (#47086983)

"R is written by statisticians, for statisticians"

This is primarily why it will never gain widespread adoption, too. Most people aren't statisticians, and probably don't want to be.

Re:I've found the problem... (2)

gnupun (752725) | about 3 months ago | (#47087023)

"R is written by statisticians, for statisticians"

Does R invent new syntactic constructs that make it useful for handling/generating statistical data? So far I've not seen any new syntax in R that warrants creating a new programming language -- it's just a rehash of various scripting languages already available.

From a programmer's perspective, R should just be an easy to use library that you can use in various languages like Python, Julia, Ruby, etc. There's no need to learn new syntax if it's not that new and useful.

Re:I've found the problem... (2)

HuguesT (84078) | about 3 months ago | (#47087093)

How about the syntax for specifying model? [princeton.edu] .

lmfit = lm( change ~ setting + effort )

Re:I've found the problem... (0)

Anonymous Coward | about 3 months ago | (#47087579)

More than one python package supports similar syntax, e.g. statsmodels, with the only minor issue being you need to put string quotes around it.

Re:I've found the problem... (0)

Anonymous Coward | about 3 months ago | (#47087157)

well to be fair (though i pretty much agree with you) the R syntax is a lot older than the 3 languages you specified.

Re:I've found the problem... (2)

fuzzyfuzzyfungus (1223518) | about 3 months ago | (#47087447)

Don't forget the influence of history: R wasn't designed for superiority to Python, Julia, and Ruby; but in large part to be a GNU-acceptable implementation of S, which may well have been designed for superiority to APL and FORTRAN; and which has existed since somewhere between the-before-time-when-the-gods-were-young and the start of the Second Trilobite War.

So basically R doesn't beat Python, or anything.. (0)

Anonymous Coward | about 3 months ago | (#47087089)

unless you're a statistician or interested in writing programs for high-accuracy statistics.

With R... every day is Talk Like A Pirate Day! (3, Funny)

TheRealHocusLocus (2319802) | about 3 months ago | (#47087105)

"Arrrr.... fix yar name 'R' while you may, maties!!"

I may not have the belly for Deep Statistics but I do know abut Internet Search noise levels. I remember trying to do research on WebDAV (believe me, there is such a thing) only to discover that folks discussing it invariably refer to it as 'dav'. Because saying "Distributed Authoring [and] Versioning" out loud makes you spit out your toothpick. Any attempt to search 'webdav' yielded only the sterile official pages, and attempts to search on 'dav' with other keywords brought up conversations from the community of Disabled American Veterans who also use the term in casual conversation, and have said an awful lot over the years. They occupied 'dav' first.

Now you may think you can pull off a 'C' where Google seems to pick off relevant results if you combine it with any computery term, but it was not always so. It has taken an incredible saturation of C, and perhaps some special coded cases on Google's part, for this to come about.

The success of Perl is due in some part to the ability of confused people to obtain help and advice about it merely by searching on its unique spelling.

So the best way to push this R language is with a refit of the name. Go with the pirate theme, it will sell many more T-shirts than those of silly camels and pearls. But stake out a bit of Keyword Real Estate that presently has a relatively low population density.

Google search result estimate counts, descending order,
r --- 2,730,000,000
ar --- 656,000,000
arr --- 24,400,000
arrrrrrrr --- 3,060,000
arrrr --- 876,000
aarr --- 638,000
arrr --- 536,000
arrrrr --- 405,000
aaarrrrr --- 267,000
arrrrrr --- 205,000
arrrrrrr --- 129,000
aarrr --- 107,000
aarrrr --- 107,000
aaarrr --- 56,600
aaarrr --- 56,600
arrrrrrrrr --- 52,400

Adding arrrs is not enough since talking like a pirate is typically accomplished with a single 'a', so ar+ space is pretty well populated up to ar{5}, it looks like best ratio is around a{3}r{3}. But even choosing the less-optimum and easier to type a{2}r{3} by using 'aarrr' instead of 'r' you have improved the signal to noise ratio by a factor of twenty-five thousand.

Push the name change firmly and decisively. This means that if anyone mentions 'R' there should be immediate responses that ask, "What AARRR you talking about?" This will inject the proper searchable term into the discussion while it reminds the poster of the name change.

For an interesting 9 minute lecture that might help sell you on this idea, listen here [upenn.edu] .

Re:With R... every day is Talk Like A Pirate Day! (1)

TheRealHocusLocus (2319802) | about 3 months ago | (#47087187)

For an interesting 9 minute lecture that might help sell you on this idea, listen here [upenn.edu] .

Certificate warnings freak you out? Try this link instead [upenn.edu] , now with matching wildcard, calmer seas and less mogul.

Re:With R... every day is Talk Like A Pirate Day! (2)

wisnoskij (1206448) | about 3 months ago | (#47087303)

It is scary sometimes how much control the limitations of Google Search has over our lives.

For example, the best anti pirating system you can use for any game or film is to name it with less than 3 characters. It then becomes very hard to search for it.

It took me days to find "9" (and I know others who had similar problems), and I think I never did end up seeing "B".

Re:With R... every day is Talk Like A Pirate Day! (1)

Bite The Pillow (3087109) | about 3 months ago | (#47087523)

If google search is limiting your pirating, you may want to investigate something a little more specialized. I assume you're talking about the 2009 film with Jennifer Connelly, not the 2005 short nor the video game - either would be two clicks away after less than a minute.

And if Google Search is really impacting your life in any meaningful way, you should step away from the keyboard for a weekend.

I think this is more a case where you detected a pattern from two events, and extrapolated to assume that everyone has the same problem all the time. It's normal and natural to do so, but not correct.

With R... every day is Talk Like A Pirate Day! (1)

iggymanz (596061) | about 3 months ago | (#47087317)

I'm afraid your research neglects a huge subset of the Talk-Like-A-Pirate word space, 'yarr' has 523,000 results

Re:With R... every day is Talk Like A Pirate Day! (1)

Anonymous Coward | about 3 months ago | (#47087877)

"Yarr" also has a nice acronymization: "Yet Another R Rebranding".

Re:With R... every day is Talk Like A Pirate Day! (0)

Anonymous Coward | about 3 months ago | (#47087337)

WebDAV is at the core of Subversion source control web access. I've not seen anything else use it in the last 8 years.

If you're going to use R (4, Informative)

Johnny Loves Linux (1147635) | about 3 months ago | (#47087167)

Be sure to use RStudio as the front end: http://www.rstudio.com/ [rstudio.com] . Using on R in a terminal is ok, but having the beautiful GUI frontend RStudio makes working with R sooooooo much better! The help system, plots, R markdown (knitr), and inspecting variables in RStudio is so much easier. As far as comparisons go,
  1. R is no competitor to python for writing generic scripts.
  2. Python (numpy, scipy, statsmodels, pandas, sklearn, matplotlib, ipython and ipython notebooks) is not yet ready to compete with R for doing statistical analysis but give Python a couple of more years and then slashdot should do a review of how it compares.
  3. You can always call R from python using the r2py module. This is really easy within an ipython notebook using the %load_ext rmagic command.

For a nice video on using ipython notebook in data analysis: https://www.youtube.com/watch?... [youtube.com]

For a nice selection of ipython notebooks for doing various type of data analysis: https://github.com/ipython/ipy... [github.com]

State of Programming in the Sciences (2)

wisnoskij (1206448) | about 3 months ago | (#47087271)

Having seen the state of programming in the Sciences, I really do not thing that "built by statisticians" is something you would want to advertise.

Beats python at what? (3, Interesting)

umafuckit (2980809) | about 3 months ago | (#47087277)

A few examples are provided in TFA but it's all rather vague as to why R "beats" Python. I've been using R for years for fitting mixed effects linear models. It does this really well, it makes it easy to compare models, it's got all the cutting-edge stuff in it. The problem with R, however, is that it's shitty and unintuitive as a programming language. I do all my pre-processing in MATLAB and I only ever export to R when I have a final data frame that needs a moderately complicated statistical analysis.

MIssed point Apples - Oranges (1)

WatchMaster (613677) | about 3 months ago | (#47087349)

No one uses R for it's amazing language*. The language sucks. R is used because it has nearly limitless, tested, and approved statistical algorithms. Want partial least squares, support vector machines, linear models, principle components analysis, Fisher's exact test?, they are all there waiting to process your data. Along with hundreds of other analyses that you might really need to use but don't even know about yet.

"Python" doesn't have this stuff because it is a language, not a set of statistical methods.

*there may be a few deviants who use it for self flagellation

ASM is superior (0)

Khyber (864651) | about 3 months ago | (#47087373)

ASM makes R possible.

Henceforth, ASM is king. R is just another pretender.

Fortran throwdown challenge! (1)

Theovon (109752) | about 3 months ago | (#47087767)

This guy must have been reading the recent stuff on Fortran and decided to jump on the bandwagon.

Fortran was written by engineers and scientists for engineers and scientists.
R is written by statisticians for statisticians.

Well, there you have it. If a language or other kind of tool was developed by practitioners of X for other practitioners of X, it’s likely that it will be better than some other tool that was designed for a different purpose.

Who would have thunk it.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>