×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

R 3.0.0 Released

samzenpus posted 1 year,11 days | from the brand-new dept.

Software 75

DaBombDotCom writes "R, a popular software environment for statistical computing and graphics, version 3.0.0 codename "Masked Marvel" was released. From the announcement: 'Major R releases have not previously marked great landslides in terms of new features. Rather, they represent that the codebase has developed to a new level of maturity. This is not going to be an exception to the rule. Version 3.0.0, as of this writing, contains only [one] really major new feature: The inclusion of long vectors (containing more than 2^31-1 elements!). More changes are likely to make it into the final release, but the main reason for having it as a new major release is that R over the last 8.5 years has reached a new level: we now have 64 bit support on all platforms, support for parallel processing, the Matrix package, and much more.'"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

75 comments

Congratulations R Team (5, Interesting)

Anonymous Coward | 1 year,11 days | (#43366727)

Someone who can't afford license fee of SAS or Matlab, this is the best alternative out there. And in some cases a better alternative.

Not well known but R's accessibility support is far better. Here is an example from a paper accepted in R Journal

Statistical Software from a Blind Person's Perspective
A. Jonathan R. Godfrey

http://journal.r-project.org/accepted/2012-14/Godfrey.pdf

Re:Congratulations R Team (5, Insightful)

Anonymous Coward | 1 year,11 days | (#43366767)

It also feels more appropriate, somehow, to do research code in R: It's supposed to be shareable and reproducible, and using an expensive and proprietary language kind of defeats the purpose. Besides, CRAN and Bioconductor have rather a lot of useful stuff...

Re:Congratulations R Team (2, Interesting)

Anonymous Coward | 1 year,11 days | (#43366821)

Tell that to all the "scientists" and "researchers" paying money for _and_ investing lifetimes worth of effort into writing libraries for Matlab, Maple, Mathematica, LabView and other proprietary environments, instead of contributing to make the existing free environments better.

Matlab --> GNU Octave, Scilab, NumPy/SciPy
Maple, Mathematica --> Maxima, SymPy

Re:Congratulations R Team (0)

Anonymous Coward | 1 year,11 days | (#43366907)

Also Sage

Re:Congratulations R Team (1)

Anonymous Coward | 1 year,10 days | (#43367071)

What about Axiom.

Re:Congratulations R Team (0)

Anonymous Coward | 1 year,10 days | (#43367225)

Meh... much of the power of open source is scalability... build something on top of something else on top of something else, without licensing issues. Matlab and such are setup to basically be top-level stuff, so I don't mind the people that work on it getting paid. It has a decent interface, and having seen Matlab scripts that many many researchers create, you'd have to be crazy to rely on them as anything more than the original one-off they were meant for.

Re:Congratulations R Team (1)

Anonymous Coward | 1 year,10 days | (#43367363)

I have seen biologists write R, so I feel your pain.

Re:Congratulations R Team (5, Insightful)

Bill_the_Engineer (772575) | 1 year,10 days | (#43368075)

Tell that to all the "scientists" and "researchers" paying money for _and_ investing lifetimes worth of effort into writing libraries for Matlab, Maple, Mathematica, LabView and other proprietary environments, instead of contributing to make the existing free environments better.

Times are changing. There are many forces at work here:

1. Cutbacks in funding is making lead scientists look for ways to save money.
2. The proprietary vendors upgrading their software and charging license fees for each version (one particular vendor licenses specific minor versions).
3. The desire to share work and non-proprietary methods are the best way to do it.
4. New postdocs are familiar with python (they like working in iPython in particular) and its libraries.
5. R is gaining ground with the older scientists due to its features and price.

Re:Congratulations R Team (1)

stenvar (2789879) | 1 year,10 days | (#43370801)

Unfortunately, many scientists probably use commercial tools in hopes that their libraries will be picked up by the company and they will earn some money from it.

Re:Congratulations R Team (0)

Anonymous Coward | 1 year,10 days | (#43371911)

Huh? There's a mouthful of speculation.

You know, many share their files openly... http://www.mathworks.com/matlabcentral/fileexchange/

Re:Congratulations R Team (0)

Anonymous Coward | 1 year,10 days | (#43372445)

How often, if ever, does that happen? Maybe it could be a possibility for researchers that do more pure research on things like PDE solvers and such, although even then, their algorithms are going to be published and their implementation code might be kind of minimal and easy enough to reimplemented from the paper. And I've typically had no problem getting copies of codes from researchers, for free, by asking them... although most codes are really problem specific and have almost no documentation. And I'm not sure what company would want to touch that code over copyright concerns if the code was worked on by a group.

I have seen scientists bail from academia by going to work for software companies that write closed codes for various physics problem solving. Although that seems to have almost no connection to use of proprietary languages in tools beforehand, as those companies were working with Fortran or C. Maybe it is different for engineers...

Re:Congratulations R Team (0)

Anonymous Coward | 1 year,10 days | (#43368337)

There is quite a bit of inertia and tendency to use that which is familiar, but at the heart of it, scientists and researchers are more concerned about using their tools to get something done than to develop tools for the sake of developing tools. If there is an existing package for Matlab that does what you need or something really close to what you need while not for open source tools, then Matlab will get used. If all of the tools are about equal in terms of features, than a researcher is going to use what they have and what will get it done fastest for them. In many cases it is cheaper and quicker to use those proprietary tools, even though the software costs money, because the time it would take to rewrite something would also cost money. And these days, the last few departments I worked in used pooled licenses, so the costs were not that high to have a few licenses shared among several dozen researchers.

That said, I've seen a tendency to use free software where possible. The place I am now uses mostly a mix of Python, IDL, and LabView. The IDL is around to run historic code that is pretty old at this point, but no one wants to rewrite. And the LabView is for things that we can't get drivers for that will work with Python, or involved too much effort to get to work with Python versus five minutes in LabView. And for some reason it seems really easy to get undergraduate hourly workers that can program in LabView instead of Python, so it ends up that almost all of the LabView programming is effectively free, while Python code gets maintained by graduate students and staff.

Proprietary software will still be around for sometime though. It looks expensive to someone doing things out of their own pocket, but in many cases it is still really cheap compared alternatives. Many researchers don't have the option to make ideological choices with their grant money and need to pick the cheapest option that will get done what they need to get done. Although I haven't found it to be much of a barrier to free software, as many code in proprietary languages are still available and easy enough to convert, as I've done many times with Matlab code into Python.

Re:Congratulations R Team (4, Insightful)

spike hay (534165) | 1 year,10 days | (#43368565)

Drives me crazy. At least with statisticians, R is by far the dominant package now. But in science, it's Matlab Matlab Matlab.

Python + Numpy/Scipy is such a better alternative now it's not even funny. It's actually a real language, and has loads of packages. And unlike Matlab, you don't have to pay extra money for additional packages (or any money).

The use of closed source software in science is a waste of scarce resources, and it hurts openness. Another thing is that every numerical type class I've had has used Matlab. It's really unfair to expect students to purchase a copy. I use Octave when I have to deal with this, but it is not perfectly compatible.

Re:Congratulations R Team (0)

Anonymous Coward | 1 year,10 days | (#43369141)

And unlike Matlab, you don't have to pay extra money for additional packages (or any money).

That works great if Python already, or can quickly, do what you want. Otherwise, you are going to be paying for software, regardless of whether it is via software company or via employees' time. Right now we get volume licenses for Matlab that cost less than what a graduate student costs for a single day. And considering we have about 10 times as many people as we have Matlab licenses, the cost is even less per employee to have it available.

The use of closed source software in science is a waste of scarce resources, and it hurts openness.

At times it is far cheaper for us to use closed source software. Maybe in the grand scheme of things it would be better to have zero closed source software, but you would need to convince someone higher up to either make grants for groups to create such tools, or expand grants for groups that need to use such tools so they can make them. In the meantime, we've shared our code with other groups before, and it is pretty clear what it is trying to do usually, so that hasn't stopped other groups from converting it to other languages or filling in the gaps that were previously provided with closed source software when that better fulfills their needs.

Re:Congratulations R Team (1)

GiganticLyingMouth (1691940) | 1 year,10 days | (#43371417)

Students have it good when it comes to matlab -- you can get a student version of matlab + simulink (with 10 or so toolboxes) for $99. The people who are really hurt by matlab's pricing schemes are the hobbyists who don't qualify for a student copy. There's this huge price dichotomy; when you're a student it's $99, after you graduate it's $5000+, and that's without any toolboxes.

However, for academic use it makes perfect sense for scientists to use matlab over the alternatives. At least in the UC (university of california) system, a department will have some (large) number of licences for it's faculty to use, and so professors and their student (who commonly won't have much coding experience) will have what's essentially free access matlab and it's associated toolboxes. From their perspective, they want to run their experiments and write their papers, not learn how to code and be a pseudo-sysadmin. They want to use the simplest environment that stays out of their way, and not have to deal with installing various libraries. Say what you will about matlab, its support and documentation is very good.

Re:Congratulations R Team (1)

PhamNguyen (2695929) | 1 year,10 days | (#43372915)

I tried switching to Python + Numpy/Scipy from Matlab. In the end I switched back to Matlab. I'm already familiar with python, and have done a lot of C++ programming so slight langauge differences were not the issue. Here are some of the reasons I switched back to Matlab:

IDE Matlab comes with a ready to use IDE.

Value semantics Matlab treats Matrices (and all classes that are not derived from "handle") using value semantics, so you know that Y=f(X) won't change X, if X is a matrix. However it also uses copy-on-write so that calling f won't make an unnecessary copy of X. This makes life a lot easier than trying to figure out how to make deep copies of everything.

Simpler syntax for math Matlab generally has simpler syntax for mathematical operations, which makes sense since that is what it was designed for. Python is a general purpose programming language, so it doesn't have the syntatic sugar for dealing with matrices and arrays that matlab does.

Faster out of the box Even when I installed ATLAS, numpy was still slower than matlab. Tuning ATLAS to your own system would probably result in equal speeds, but that is a lot of extra work.

On the other hand, the python route does have many advantages: more sensible and standard approaches to namespace / filesystem and OO programming, no license issues (say if you want to set up a cluster), easier to integrate with other languages, and access to wide range of non-math related libraries (although generally this is not a real advantage because you can, and should, do data processing in python and only do the actual math in mathalb). As to extra cost, I think that compared to the value of the student/researcher's time the cost is not very great. The only toolboxes most researchers need are the Optimization toobox and the Statistics toolbox.

Re:Congratulations R Team (0)

Anonymous Coward | 1 year,8 days | (#43389647)

1. Python has a built-in IDE. It's called IDLE (Integrated DeveLopment Environment).
2. Matrix and Array operations are really easy using extended Python slicing syntax.
3. Python is full-featured INTERPRETED programming language. If you want ATLAS to run faster, use a compiled language like C/C++ or FORTRAN. A lot of developers will do their prototyping in Python because it's faster than a compiled language to develop the code in, and then "translate" the code into a compiled language such as C/C++ or FORTRAN. So, "Faster out of the box" depends on which aspect you're referring to.
4. If you're going to use Python for data processing and do the "actual math" in another language, use the RPy interface to do it in R. Then none of your code is using a proprietary programming language.

Re:Congratulations R Team (1)

spike hay (534165) | 1 year,5 days | (#43418035)

Yeah,it's incredibly easy just to offload loops or whatever into Fortran and just use F2py. As an aside, Fortran 90 is just about as easy as Numpy or Matlab, so it do 90% of my work in there. I just use F2py to compile my Fortran modules as Python modules. Then I have the flexibility of using an interpreter with the speed of Fortran.

I haven't used it, but PyPy is a very fast JIT compiler for python to speed up native Python code.

Re:Congratulations R Team (1)

rbprbp (2731083) | 1 year,10 days | (#43368573)

While I despise MATLAB for a large set of reasons, I agree that a large variety of toolboxes is available for pretty much anything you might want to do. And those are what most of the MATLAB users look for, in my experience: they might dislike the language, but MATLAB provides/they can purchase toolboxes that do what they need for their research.

Re:Congratulations R Team (1)

tlhIngan (30335) | 1 year,10 days | (#43368945)

Tell that to all the "scientists" and "researchers" paying money for _and_ investing lifetimes worth of effort into writing libraries for Matlab, Maple, Mathematica, LabView and other proprietary environments

Depends on who they're doing it for, it seems. It's a time vs. money balance.

In a commercial environment, Matlab tends to win out purely because of the toolboxes - especially current ones where Matlab has real-world interfaces so after modelling, you can prototype your control system with real hardware. Then after that, you compile your Matlab code down to your controller hardware. So instead of having to learn one thing for simulation (Matlab), then learn how to prototype your controller directly in hardware (new tools, C, etc) you just keep the same Matlab implementation and compile it down to something you integrate rather than try to debug pesky translated code.

LabView is similar - a horrible mess if you want to program with it, but scientists and the like love it because it means not having to mess with code.

Basically the goal to achieve the results in a certain timeframe - cost tends to be secondary. If you cannot get X done in Y months, it's a huge problem.

Pure scientific research paid for by universities and governments typically have a lot more time but a lot less money to spend - if it takes you Z more months but you can save W dollars, it's worth it.

It's why companies rarely fund pure research initiatives - things like Microsoft Research (and Bell labs) still are curiosities.

Re: Congratulations R Team (1)

fsterman (519061) | 1 year,10 days | (#43370199)

I haven't used labview, but Knime is both opensource and awesome. I can quickly prototype pretty much any workflow I want and get really good reproducibility. Debugging and unit tests need to be more directly integrated, but it is still a great package for practical science. It has R/Java/python integration as well!

Re:Congratulations R Team (1)

Anonymous Coward | 1 year,10 days | (#43372345)

LabView is similar - a horrible mess if you want to program with it, but scientists and the like love it because it means not having to mess with code.

I'm not sure "love" is the right word, at least in my experience. Although I've worked on several projects that used LabView, 90+% of the people working on it seemed to bitch and complain about it constantly. Many use it because they have already existing code using it, or because of equipment that has drivers that are much, much easier to use in LabView, or because in a few select cases, it lets you bang out a GUI control really quick. But otherwise, it seems to make the rest of the project a nightmare to maintain, and can involve a lot more time arm deep in LabView than it would to find a similar problem in more plain text code.

Re:Congratulations R Team (1)

gajop (1285284) | 1 year,11 days | (#43366957)

I don't know about SAS, but Octave is a much better alternative to Matlab if money is your main issue.
Hell even scilab or python's numpy are more similar.

Re:Congratulations R Team (4, Interesting)

ceoyoyo (59147) | 1 year,10 days | (#43367299)

I have a license for SAS through my university. I gave up trying to convince the stupid thing to install. If the installer wasn't crashing, the license manager was.

MatLab has similar, though less severe problems.

R had a nice double click installer that worked the first time. Later I compiled it, which worked without any headaches. There's a nice bridge from R to Python and you can extend either one, or embed either or both in other applications.

You meant R has better accessibility options for the disabled but it's just plain more accessible.

Re:Congratulations R Team (1)

garcia (6573) | 1 year,10 days | (#43368871)

I am a SAS developer and have never run into any such problems but I won't say I don't believe you. However, the benefit of that large licensing fee is the easy access to SAS help resources (real live people living over there in Cary, NC) who get back to you VERY QUICKLY for ANY level of technical question you have.

Their employees, at least the hundred or so I've met over the years when presenting at SAS Global Forum, have been INCREDIBLY friendly and helpful.

Re:Congratulations R Team (1)

ciurana (2603) | 1 year,10 days | (#43374823)

I am a SAS developer and have never run into any such problems but I won't say I don't believe you. However, the benefit of that large licensing fee is the easy access to SAS help resources (real live people living over there in Cary, NC) who get back to you VERY QUICKLY for ANY level of technical question you have.

Their employees, at least the hundred or so I've met over the years when presenting at SAS Global Forum, have been INCREDIBLY friendly and helpful.

If commercial software is your thing, and you can afford it, and the vendor offers good support, 100% agreed.

If you're looking for R help the best two places to start are:

* Get a copy of The R Book, by Crawley -- it'll save you days of pointless/incomplete search for web resources
* Swing by the R IRC channel on Freenode (irc://irc.freenode.net/#R) -- we welcome n00bz

Cheers!

pr3d

Re:Congratulations R Team (0)

Anonymous Coward | 1 year,10 days | (#43369171)

Holy shit, I've met him.

Re:Congratulations R Team (1)

Dystopian Rebel (714995) | 1 year,10 days | (#43371827)

I once had a job in the EduBubble where I had to learn SAS. It's a language could only survive in the EduBubble, which is at least 15 years behind in technology and 25 years behind in thinking.

If R isn't a well designed language, at least it is free, open source, and capable.

SAS (1)

billstewart (78916) | 1 year,10 days | (#43372923)

A couple of years ago I ran into SAS at a trade show. It really surprised me that they were still around; I'd previously seen their products on mainframes back in the late 70s, with punch cards. (I forget by now whether I'd used SAS or SPSS, which were the two competing commercial stats packages in that environment.)

yay! Let's celebrate with some pie (1)

Anonymous Coward | 1 year,11 days | (#43366739)

pie(c(85,15),init.angle=25,col=c("yellow",1),labels=c("pacman","not pacman"))

Re:yay! Let's celebrate with some pie (0)

Anonymous Coward | 1 year,10 days | (#43368593)

Cute. Brings back memories of playing Pacman on my Commodore VIC-20 so many years ago during The Greatest Decade - the 1980s. ;) I ran your R command and celebrated with a slice of Pacman pie.

Rrrrrrrrr....... (-1)

Anonymous Coward | 1 year,11 days | (#43366743)

Interesting to try try out? Never heard of it before. Wow! Thanks.

A niche perhaps, but great OSS! (0)

Anonymous Coward | 1 year,11 days | (#43366793)

Used R in my thesis research a few years ago ... it was such a blessing to have statistical package I could have on *my* computer! Thanks fellas!

Packages (0)

Anonymous Coward | 1 year,11 days | (#43366831)

yes, you need to rebuild all of them: update.packages(checkBuilt=TRUE,ask=FALSE)

How modern! (0, Troll)

loufoque (1400831) | 1 year,11 days | (#43366857)

64-bit support! Trivial inefficient parallelization! Matrices!
Welcome to 2001.

Re:How modern! (2)

scatteredthoughts (2880633) | 1 year,11 days | (#43366913)

Are you aware of better alternatives?

Re:How modern! (1)

Anonymous Coward | 1 year,11 days | (#43366975)

Julia: http://julialang.org/

Re:How modern! (1)

djmurdoch (306849) | 1 year,10 days | (#43367099)

From Julia's web page:

"Currently, Julia is available only for 32-bit Windows."

Julia is available in 64 bits on other platforms, but posting it as a reply in a thread that was complaining how late R is to the 64 bit game is a bit rich. R has had 64 bit releases for all platforms for 3 years now. What's new in 3.0.0 is the removal of the remaining 32 bit limit on individual objects.

Re:How modern! (4, Insightful)

LourensV (856614) | 1 year,10 days | (#43367053)

I recently switched my scientific programming from R to Python with NumPy and Matplotlib, as I couldn't bear programming in such a misdesigned and underdocumented language any more. R is fine as a statistical analysis system, i.e. as a command line interface to the many ready-made packages available in CRAN, but for programming it's a perfect example of how not to design and implement a programming language. It's also unusably slow unless you vectorise your code or have a tiny amount of data. Unfortunately, vectorisation is not always possible (i.e. the algorithm may be inherently serial), and even when it is, it tends to yield utterly unreadable code. Then there is the disfunctional memory management system which leads you to run out of memory long before you should, and documentation even of the core library that leaves you no choice but to program by coincidence [pragprog.com].

As an example of a fundamental problem, here's an R add-on package [r-project.org] that has as its goal to be "[..] a set of simple wrappers that make R's string functions more consistent, simpler and easier to use. It does this by ensuring that: function and argument names (and positions) are consistent, all functions deal with NA's and zero length character appropriately, and the output data structures from each function matches the input data structures of other functions.". Needless to say that there is absolutely no excuse for having such problems in the first place; if you can't write consistent interfaces, you have no business designing the core API of any programming language, period.

Python has its issues as well, but it's overall much nicer to work with. It has sane containers including dictionaries (R's lists are interface-wise equivalent to Python's dictionaries, but the complexity of the various operations is...mysterious.) and with NumPy all the array computation features I need. Furthermore it has at least a rudimentary OOP system (speaking of Python 2 here, I understand they've overhauled it in 3, but I haven't looked into that) and much better performance than R. On the other hand, for statistics you'd probably be much better off with R than with Python. I haven't looked at available libraries much, but I don't think the Python world is anywhere near R in that respect.

Anyway, for doing statistics I don't really think there's anything more extensive out there than R, proprietary or not, although some proprietary packages have easier to learn GUIs. In that field, R is not going to go anywhere in the foreseeable future. For programming, almost anything is better than R, and I agree that those improvements you mention are not doing much to improve Rs competitiveness in that area.

Re:How modern! (1)

ceoyoyo (59147) | 1 year,10 days | (#43367311)

RPy2. I never touch actual R code because I agree with you - the language itself isn't as bad as some, but it's not good either. RPy2 lets you have access to R without having to actually code in it.

Re:How modern! (3, Interesting)

Anonymous Coward | 1 year,10 days | (#43367397)

I can somewhat relate to the documentation issue although I believe that it is more a question of organizing the documentation.

When you mention "a fundamental problem" you mention function implementations, thus library rather than language issues. R itself is an extremely expressive, functional (or rather multi-paradigm) language that can be programmed to run efficient code. Yet it is syntactically minimalistic without unneeded syntax (as opposed to all of the scripting languages perl/python/ruby). This makes it a truly postmodern language IMO. Efficiency can sometimes be a problem but the break-even point for implementing parts in say C/C++ is only slightly different than for other languages (say perl/python) and is enabled by an excellent interface (Rcpp package).

For myself the biggest change to make was to start thinking in functional concepts coming from a procedural background. Much of R criticism IMO stems on a failure to realize conceptual differences between functional and procedural programming. Another problem that might spoil the impression of R sometimes is the plethora of packages of highly varying quality.

Re:How modern! (1)

LourensV (856614) | 1 year,10 days | (#43369061)

I can somewhat relate to the documentation issue although I believe that it is more a question of organizing the documentation.

One of the things that bothers me about the documentation is that there's often no distinction between interface and implementation. Instead of a description of what a function does, you get implementation details mixed up with what it approximately hopes to achieve, leaving you unable to see the forest for the trees.

When you mention "a fundamental problem" you mention function implementations, thus library rather than language issues. R itself is an extremely expressive, functional (or rather multi-paradigm) language that can be programmed to run efficient code. Yet it is syntactically minimalistic without unneeded syntax (as opposed to all of the scripting languages perl/python/ruby). This makes it a truly postmodern language IMO.

Well, there's only one implementation, so it's rather pointless that it could be implemented efficiently. The language specification isn't exactly good enough to create a competing, compatible implementation either. I agree that the syntax is minimalistic and that there's extremely little boilerplate, but I could really do with some way of defining data types (Python 2 is lacking there as well IMO), and namespaces...

Efficiency can sometimes be a problem but the break-even point for implementing parts in say C/C++ is only slightly different than for other languages (say perl/python) and is enabled by an excellent interface (Rcpp package).

Ah, the universal solution to problems with R: here's how to do it in some other language or software instead. Sorry for being sarcastic, but it's amazing how often effectively that advice showed up whenever I searched the web for a solution to some problem I encountered with R.

As an example of my experience, I use JAGS to fit models to data, and JAGS wants to have the model as a text file description. My model has a node for every combination of some 13000 sites and 11 years, and the text file gets to several tens of megabytes depending on model options. Creating it is basically a matter of running through all the combinations of sites and years, looking up some additional data, and spitting out a line of text describing them. My first implementation was very naive, nested for loops that essentially did a nested loop on the data. It generated output at several tens of kilobytes per second, getting slower and slower as it went on. I managed to speed it up by preallocating memory (R seems to not double the capacity of a vector when it runs out, as the C++ STL does, but add a constant extra amount, so that growing a vector made the loop run in quadratic time, except that when measured it actually seemed to be exponential, for who knows what reason.), pre-sorting data and changing to a merge join, and vectorising as much as possible. It now does about a megabyte per second, which is fast enough for my purposes. However, the code is now completely unreadable, and it's still not anywhere near what the hardware can do (PostgreSQL does the equivalent nested loop in less than a second). R turned what should have been a trivial programming task into a frustrating adventure, and the result is still not very good.

For myself the biggest change to make was to start thinking in functional concepts coming from a procedural background. Much of R criticism IMO stems on a failure to realize conceptual differences between functional and procedural programming. Another problem that might spoil the impression of R sometimes is the plethora of packages of highly varying quality.

True, but this is really another instance of the don't-do-it-in-R solution, because those functional programming functions effectively just run your loop in C, rather than in R (if they don't forward the whole operation to a C scientific maths library), which makes the performance bearable. If R were really a multi-paradigm language, then you would be able to solve a problem procedurally as well if it happened to be the best way to do it.

Re:How modern! (4, Interesting)

njnnja (2833511) | 1 year,10 days | (#43367469)

Despite R's weaknesses as a programming language, R has such a large number of well-documented, well-tested, statistical functions with a wide array of arguments to vary that it is very difficult for another language to match. For example, maybe you want to build an arima time series model. OK, not too tough to find a library in Python or C++ that does that. Now what if you want to add an exogenous variable to the arima model? Maybe a seasonal component? Next maybe you want to automatically pick the best model according to AIC? Oops, make that BIC. Looking at it again maybe a Vector Autoregressive model is best. Or a VECM?

While I'm sure there are excellent implementations of all of these wrinkles in other languages, with R, I have great confidence that the functions that I want and need now and in the future are going to be there and are going to be implemented correctly, and kudos to the R team for giving us that kind of confidence.

R does have a lot of problems, among the worst is loop performance. It really forces you to vectorize everything, which leads to less maintainable code, and is generally a coding technique that new hires coming from other languages will face a steep learning curve with. What I have found useful is to use R as a data exploration and model parameterization tool, but once the model is ready to be put into production, you can use the parameters calculated by R in an implementation in the language of your choice, e.g., C++.

I guess this is a long winded way of saying that as with so many questions of "which language is best," the real question is "which question is best for you and your application?" R is usually the best language only for people who are regularly using a such a wide variety of statistical analyses that you won't find a large part of what you need in the libraries of other languages. For me, I couldn't imagine working without it.

Re:How modern! (0)

Anonymous Coward | 1 year,10 days | (#43372837)

Since you don't like R, you should check out Juila, a new, high-performance language for math and statistics from MIT. It's open source (MIT license), and has an ultra-smart compiler which doesn't require you to vectorize your code. Therefore, you can get good performance for non-vectorizable algorithms. No more of melting your mind to twist your "for" loops into vectorized constructs!

http://julialang.org/

Julia wants to be all things to all men -- it aims to replace both Matlab and R (or do what those two languages do). It also contains parallelization constructs directly in the language, so in theory you can write code which scales to huge datasets. Easy parallelization still needs work, IMO, but the folks working on the language have done parallel programming in the past, and are aware of the issues.

Re:How modern! (1)

epine (68316) | 1 year,9 days | (#43379079)

Needless to say that there is absolutely no excuse for having such problems in the first place; if you can't write consistent interfaces, you have no business designing the core API of any programming language, period.

I guess you missed the memo that the K&R string functions are deprecated in many projects such as OpenBSD which has their own recommended set of string functions.

Way back when, Iverson and his APL cronies put a great deal of effort into defining the APL arithmetic operator set to conform to the largest possible set of simple arithmetic identities. Has the definition of the modulus operator concerning negative arguments been consistent in all languages since? That they shrouded this deep elegance with inscrutable Greek letters matters exactly how? They wrote a paper detailing all the identities they had discovered concerning the APL operator set. I've never seen a single other language bother to do this. Perhaps because identities written out with floor() and modulus() and spzkrm() lose a lot in translation.

Language designers preoccupied with consistency are known as dreamers (or Hurd developers). The formula that seems to grow up to become a language people actually use is 75% utility and 25% elegance.

I guess you missed the memo that when elegance dies on the vine in infancy, it does no one any damn good.

Why are programmers reluctant to refactor code when an elegant API becomes available to replace a hastily conceived core API? Because you can rarely trust the equivalence all the way down to the last edge case, because few APIs documents their identity sets listing all the cases you'd like to be true (and a few you hadn't even considered yourself) as well as the cases you presumed should be true, without realizing that these cases fundamentally conflict with other identities that made the cut.

Programmers who don't declare their identity sets shouldn't be allowed to write APIs, because a reliable identity set is the only way the downstream programmer will dare to refactor your API out of his applications if it turns out your API sucks--as part of a mass exodus from superior documentation.

Are you beginning to see the problem here?

Re:How modern! (1)

loufoque (1400831) | 1 year,10 days | (#43367629)

In the same category, MATLAB.
Otherwise you also have real programming languages, ranging from C to Python.

tutorials.. (0)

Anonymous Coward | 1 year,11 days | (#43366921)

I was recently asked to make an interactive tutorial in R, which sounded like a fun project, except I have no clue of R. Are there any good starting points at learning R?

Re:tutorials.. (3, Informative)

Bearhouse (1034238) | 1 year,10 days | (#43367113)

Hard to know where to start, especially as you give no information on your target audience...Do they know stats already?
Also, if your target audience is used to GUIs rather than CL, then...
http://answers.oreilly.com/topic/954-introducing-the-r-graphical-user-interface/ [oreilly.com]

Alternative, you use Web front-end here, (disclaimer, I've not tried it)

http://www.squirelove.net/r-node/doku.php [squirelove.net]

Writing a tutorial from nothing is hard. You can do this to get some good ideas:

1. Download a free evaluation copy of 'Minitab'.
(I'm not connected with Minitab, but I've used it a lot, and it's great 'basic' stats analysis software)
2. Install, and then open help
3. Consult 'tutorials' section :) Obviously, don't just rip off their stuff; not cool

As a suggested flow, I've found that, as a start, you can introduce basic stats, then demonstrate how the software works.
Using the same data-set for the first few, (say ten), lessons is better. Minitab tutorials keep changing the data, which confuses students.
You'll only need 5 columns or so, and remember to include some discrete variables to enable stratification of your continuous variables.
Use a real-world example, such as household expenses for different families, whatever.

For tutorial flow, what works for me as a 'basic' intro to a stats package:

1. What is data? What are statistics?
2. Types of data, how they look as raw data, (in the database) and then once we start to analyse them with stats and graphs (to start, just 'common' stuff like continuous variables, normal & lognormal, and discrete, binomial & poisson).
3. Basic stats & graphical analysis for single variables. Normality tests. Include time series plots as well as histograms / dotplots / boxplots.
4. Multivariate analysis; x/y charts, matrix plots, interaction plots.
5. Hypo tests (for both cont & disc variables)
6. Regression, (simple, then multiple if you're feeling brave)
7. Control charts (for both cont & disc variables)

If you work out how to do this in 'R', by actually using it, your tutorial will pretty much write itself, (keep saving your screens - Irfanview is a great, free, tool I use for this. Install, open, hit 'C' for manual or automatic screen save options.)

Re:tutorials.. (1)

zhvihti (864974) | 1 year,10 days | (#43367533)

A new, easy to use, free, online R system is StatAce (www.statace.com). The GUI analysis is still in infancy (only descriptives, correlation and OLS at this stage) but it supports any and all R code, many libraries, and has good data management (e.g. allows you to save results).

Re:tutorials.. (1)

fsterman (519061) | 1 year,10 days | (#43370797)

Mod parent up, this is *by far* the best GUI for R I have seen. Is it open source? I would think statistical analysis would be an especially good target for paid open-source SAAS.

Re:tutorials.. (1)

ciurana (2603) | 1 year,10 days | (#43374889)

The single best R resource I've ever used was The R Book, by Crawley. Before buying it I invested way too much time searching all over the web for solutions to simple and complicated things alike, almost always with poor or incomplete results. The O'Reilly R books are barely OKi. Short circuit the BS and go straight to The R Book. It paid for itself in about 2 hours of coding (it's expensive and runs between $80 and $150, when it's available -- my time is way more valuable, though).

For applied R to problem solving, my suggestion would be to go with Data Analysis Using Regression and Multilevel/Hierarchical Models by Gelman and Hill. It requires you to have gone through college level stats -- fantastic book.

Another good book is Data Mining with R, Learning with Case Studies -- especially the Introduction, which offers one of the best R programming overviews out there. It's about the only reference that explains the R object-oriented and functional features well.

Web resources are vital AFTER you've sunk 200 to 500 hours into R work. By then you'll have grasped the language and many of its quirks (it's a language by and for scientists, not programming professionals -- so, saying it's "quirky" is a hella of understatement), and the web resources will be more helpful because most are incomplete, but by you'll have enough experience to "fill in the blanks".

You're welcome to swing by irc://irc.freenode.net/#R -- we welcome n00bz!

Cheers!

pr3d

This is how it should be done ... (4, Insightful)

golodh (893453) | 1 year,11 days | (#43367045)

R's developers are, unlike many other Open Source developers, very careful about releasing production-quality software.

As in: when they release it, you can trust it to work.

Hence they didn't mess around with major reconstruction of R's guts until they could release something that's finished (and well-tested !) and bumped the version number to 3.0.0 when they did in order to properly differentiate it from previous versions.

This is one of the differences between amateur OSS offerings (like for example KDE with its miriad half-baked Kxxx packages, sundry horrible OSS games, etc.) and genuine production-quality OSS (like R, Lapack, Octave, Libre Office, PostgressQL, MySQL, GRASS GIS, QGIS, Maria DB, GNU CC, the Linux kernel etc.)

This is very gratifying as R happens to see widespread use in academia, government and business when it comes to data analysis and statistics.

If R has a weakness, it is that uses an in-memory approach to data-processing, unlike e.g. SPSS, which keeps almost nothing in memory and simply makes passes through datafiles whenever it needs something. R is also a bit memory-hungry, so the need for genuine 64-bit implementations should be clear.

Apart from sporting about 4000 useful and ready-to-run statistical applications packages, R has convenient and efficient integration with C code and has what's probably a contender for the best support for data-graphics anywhere.

For those who didn't know, even packages like SPSS and SAS have incorporated R interfaces to tap into the wealth of application packages that R offers. Can't think of a more significant compliment right now.

Re:This is how it should be done ... (0)

Anonymous Coward | 1 year,10 days | (#43367339)

LOL. GRASS is a finished product? I would really like that. Maybe I'll download it and check it out again. A year ago I was not impressed. When it couldn't open a simple shapefile... well that was that.

Re:This is how it should be done ... (0)

Anonymous Coward | 1 year,10 days | (#43367667)

I think you did something wrong, it loads shapefiles just fine and has done so for at least a decade. It uses the OGR library to do that which is one of the most widely used and solid libraries in the industry.

Re:This is how it should be done ... (1)

leromarinvit (1462031) | 1 year,10 days | (#43373017)

LOL. GRASS is a finished product? I would really like that. Maybe I'll download it and check it out again. A year ago I was not impressed.

You're smoking it wrong.

Re:This is how it should be done ... (1)

ciurana (2603) | 1 year,10 days | (#43374923)

LOL! Octave is a finished product? That's news to me. Horrible package when compared against R Project and its satellite projects (e.g. RStudio).

Not trolling, just can't say that Octave is usable with a straight face. Poor UI, bad copy of MatLab, and horrible performance. Friends don't let friends use Octave. They show them the path to R.

Cheers!

Re:This is how it should be done ... (0)

Anonymous Coward | 1 year,10 days | (#43375855)

Octave is a fine product which works well. I've got both it and Matlab on my workstation and use them everyday. For a quick little calculation often I'll start up Octave simply because it loads up faster.

Use whatever works for you, but that's no reason to trash something which isn't a perfect fit for you, since the whole world is not like you.

GUI (1)

dr.Flake (601029) | 1 year,10 days | (#43367111)

Even i have used R in the past for my thesis. My statistician was using S-plus to do magical things that the hospitals SPSS definitely could not do.
However, S-plus was not available to us non-statisticians.
As a complete non-programmer, mediocre statistician, i was able to reproduce en build upon his examples in R.

But what i truly missed was a usable GUI. there were some, and i tried them all at the time, but none were able to do more than the basics. For someone using R daily, a GUI will be more trouble and limited. But for someone like me, a well developed GUI like S-Plus had at the time would have bee more than welcome.

Seeing the headline R 3.0.0, the first thing i was looking for: did they include a GUI by default???

Re:GUI (1)

djmurdoch (306849) | 1 year,10 days | (#43367161)

There are no new GUIs in the R distribution, but there are several GUIs produced by third parties that probably weren't available when you were doing your thesis. I like RStudio and recommend it to my students, but there are others too.

Re:GUI (1)

fsterman (519061) | 1 year,10 days | (#43370507)

I think you are confusing GUI with IDE; RStudio and most of the other R "guis" don't make R more discoverable. SPSS and the like are used because they offer guidance on what one should try given what they already know. With an IDE, you still have to know how to program. Throwing together a text editor, an output window, and an execution button doesn't do much.

It's really disheartening that a professor thinks this solves any of the major pedagogical problems that R forces. I really wish you would STOP recommending RStudio and start recommending tutors.

Re:GUI (1)

djmurdoch (306849) | 1 year,9 days | (#43381835)

Yes, RStudio is an IDE. An IDE is a GUI for development. If you want a GUI to do statistics without programming, then RStudio is not what you need.

I really don't know what you're talking about in your second paragraph. R doesn't force any pedagogical problems. It's a tool. It doesn't force anything.

Re:GUI (2)

clark0r (925569) | 1 year,10 days | (#43367201)

I have recently implemented RStudio for a customer. http://www.rstudio.com/ [rstudio.com] It's a web interface for R which appears to be clean and easy to use. Installation was straightforward from RPM, you only need R-core, R-devel xdgutils and the rstudio RPM itself.

Re:GUI (4, Informative)

golodh (893453) | 1 year,10 days | (#43367265)

There are usable GUI's for R, and best of all: they can be installed as packages from within R.

The best-known one is called 'R commander' (package name = Rcmdr ). It gives you a point-and-click interface and (like SPSS) drops the R code to repeat what you did using the menu (so that your work is reproducable).

Functionality includes: data summaries, contingency tables, means tests, proportions tests, variance tests, ANOVA, cluster analysis, model fitting (linear, generalised linear, logit), various graphs, tests for comparison between fitted models, plus draws and lookup tables for lots of continuous and discrete distributions. Rcmdr allows for plugins, and a number of them are also downloadable as R packages (e.g. experimental design).

The second one I know about is called 'Deducer' (package name Deducer), which provides a GUI loosely resembling that of SPSS.

Both GUIs are workable and allow you to do simple things simply.

There's also a rather nice IDE, called RStudio (which is a separate download).

Re:GUI (1)

logistic (717955) | 1 year,10 days | (#43368323)

R is an example of the best and worst of FOSS.

Every time I switch institutions I can use it. No problem with lack of site license,no grant money for a license or activation problems on a new machine. I can use it on whatever OS the organization owns. I can get it up and running in about 5 minutes and it will work.

Awsome community. If you have a problem there's a good chance there's something in the CRAN that solves it.

But super steep learning curve. Begginner documentation is at best suboptimal ("go buy a book on S" is not the most helpful advice) The cost of entry to steep. It will take you a while to figure out how to import your flat text file for analysis.
The GUI's are hard to find and are limited in capability compared to something like SPSS. R commander is great but pretty limited and counter intuittive.
It's hard for me to recommend it to people to use for intro stats.

That said R I love and it's great to see it continue to thrive with a new milestone release.

Steep download curve? (1)

golodh (893453) | 1 year,10 days | (#43369291)

I tend to agree about the learning curve.

If you just use R to run data through a package (which in my opinion is the quickest way to get a lot of value out of R) then the learning curve is tolerable. Less steep than for SAS (I think), but steeper than for SPSS.

On the other hand: R in and by itself is mostly a tool for statisticians and data analysts (or anyone else who doesn't flinch at having to write scripts, who's acquainted with the phenomenon of 'manual', and who's used to spending a few hours or so reading before they try to do anything). That in itself represents a barrier.

I've found the on-line R documentation mostly unhelpful for beginners (thorough but pedantic, often implicit, and tending to use jargon). The offline 'Introduction to R' is a lot better though, and there are some good user-contributed texts that can be freely downloaded. I agree that it's useless to buy a book on the actual language (be that S or R) because as a beginner you will only use R's ready-made functionality and script that. If you fins yourself delving into the language you're probably doing something wrong (for a beginner). Your best bet is to buy one of the 'cookbooks' for R.

I tried to use it for an undergraduate statistics course in conjunction with Excel using the RExcel package and Rcmder.

The RExcel package establishes a com link between MS Excel and R and comes with an Excel plugin that creates Rcmdr menus in Excel. The net result is that people can load, view, and edit their data in MS Excel, open the menu, send the data to R, do menu-driven analyses in Rcmdr, and bring results back in Excel if required.

It was less than a success. Students stumbled over having to realise you have to send the data to R before the menu options take effect, had difficulty of keeping track of where their 'live' data actually was (Excel or R), and on top of that had difficulty remembering where to look for the menu options.

Yes, I know. Well ... they were business school students but still, eh?

I believe that R commander can work for an introductory course, provided you match the content of the course exactly to the RCommander menu or vice-versa. Your students will be a bit hemmed in aftwerwards: they'll be able to replicate the stuff you prepared for them, but as soom as they try anything else they will have to sit down, think, and spend time figuring out how to use the software.

Re:GUI (0)

Anonymous Coward | 1 year,10 days | (#43369721)

My own plug for StatET [walware.de] with Eclipse.

I found that the StatET package with Eclipse is brilliant. It gives a Matlab IDE experience to R. Install the required packages in R, install the software in Eclipse and follow the help tutorial to set up the console environment.

I have used on Ubuntu, Mac and Windows. The only config problems I've found is setting up StatET with the cygwin version of R.

http://www.walware.de/goto/statet

The magic words: this new book (0)

Anonymous Coward | 1 year,10 days | (#43367209)

"Covers R version 3.0"

God's just making stuff up ... (-1, Offtopic)

dltaylor (7510) | 1 year,10 days | (#43367249)

One of my favorite Sci-Fi theories (along with "krakens went extinct because we overfished their prey during the whaling days") was in a short story wherein God was finding it difficult to continue to create laws of the universe that rationally explained things that were either "left over" during Creation or "just pretty". We are NOT supposed to be able to prove God exists because of some obvious miracle (cue both the Christians that don't understand anything about their own religion and the fans of HHG), so there must be mathematically sound explanations for cosmic phenomena. Quasars were apparently a big mistake, and even God's having trouble with the whole dark matter/dark energy thing, the acceleration of cosmic expansion, the matter/antimatter imbalance, and now we have to have microlensing to explain a twinkling star.

Anyone out there remember the title/author of the story?

Please stop calling R a bad language (0)

Anonymous Coward | 1 year,6 days | (#43400641)

Once in a while I read comments like

"I recently switched my scientific programming from R to Python with NumPy and Matplotlib, as I couldn't bear programming in such a misdesigned and underdocumented language any more. R is fine as a statistical analysis system"

"Python + Numpy/Scipy is such a better alternative now it's not even funny. It's actually a real language, and has loads of packages."

  Interestingly, these comments come from practitioners, not language theorists, who tend to have far more appreciation for language trade-offs (see,e.g., http://r.cs.purdue.edu/pub/ecoop12.pdf). I believe this usually due to poor knowledge of R, and sometimes poor understanding of language design in general. I would like to clarify a few points here:

1. R is not a DSL or a "statistical language". It's a general-purpose, Turing-complete language, with which you can write pretty much anything. R's byte-compiler is written in R. The ability of a language to be written to a large degree in itself is usually a sign that the language is not so flawed.

2. R has deep roots in functional programming and in Scheme. It was not written by statisticians who did not know language design (another widely held misconception). Luke Tierney wrote a entire LISP for Statistics before becoming a core contributor to R. That was 25 years ago.

3. R is not a *perfect* language, but neither are languages like Python or . Python for example has a ton of syntactic sugar (good for me, but bad for Alan Perlis), a bolted-on object model (OOP purists like to diss it), reluctant functional programming (which is really neither meat nor fish) and concurrency is as wonky on it as it is on R. But that is really besides the point. Perfect languages are extremely useful to drive the design of future imperfect ones. To a large degree, language doesn't matter. It's what you can do with it that matters.

4. R is an *amazing* language for data analysis. I have a fairly good knowledge of other substitute languages (including Matlab, Python and Fortran), and nothing comes close to it. If you don't think so, it's fine. Just don't denigrate something you probably don't know enough to appreciate.

5. The statement, popularized by John Cook I believe that "I’d rather do math in a general language than do general programming in a mathematical language" underlies much of the objection to R usage. Let's unpack this statement. It really says that the cost of doing general programming in a technical-oriented language like R is higher than the cost of doing mathematical programming in a language like Python. This is of course an empirical statement. It doesn't make to sense to write a web server in native R, but neither does it make any sense to write an operating system or a DBMS in Python. But, a lot general programming tasks faced by data analysts are relatively simple: read/write files or to a DB; serve web pages on a low-traffic server; be called by a larger program. For all these things R just works (TM). Conversely, there are hundreds of very specialized packages, very well-thought in R written by really, really good people that simply have no counterpart anywhere. The cost for replacing them is very steep: either do some dumbed-down analysis or roll your own. On the other hand, Python or Clojure or Ruby, even Java! have sufficient capabilities to run a linear regression and produce a plot, so no need to complicate things.

My larger point is, again, that languages don't matter as much as people like to believe (the same applies to editors, btw). Principles matter. Availability of what you need matters. *Choose the language of least resistance*: for data analysis of your bank account on a weekend, you can use Excel. To run a highly customized simulation on Blue Gen/L, you *must* use Fortran. In this light, be happy for what R can do for you, and rejoice for this major new release. If you are using moderately-sized data sets (2^31 elements is actually a good reference point) and need quick access to state-of-the-art statistical routines and visualization, maybe R is just all that you need.

-gappy

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...