# Statistical Programming With R

#### timothy posted more than 9 years ago | from the no-r-in-statistics dept.

52
An anonymous reader writes *"This series introduces you to R, a rich statistical environment, released as free software. It includes a programming language, an interactive shell, and extensive graphing capability. What's more, R comes with a spectacular collection of functions for mathematical and statistical manipulations -- with still more capabilities available in optional packages."*

## PHP (-1, Offtopic)

## alatesystems (51331) | more than 9 years ago | (#10319588)

everything in the whole worldyou could wnat to do -- with still more capabilities available in optional packages [php.net] .PHP can do anything you want to do, and for free, and it's extremely fast with zend optimizer. I use it for all my shell scripts as well.

#!/usr/bin/php -q

oh yeah

Chris

## Why do we need this? (0, Troll)

## Muda69 (718162) | more than 9 years ago | (#10319591)

## Good-oh... (3, Interesting)

## BrokenHalo (565198) | more than 9 years ago | (#10319656)

I've heard good things about R, but have never really got to grips with it (although I know it has been around for a while), so any kind of primer is more than welcome as far as I'm concerned.

## SPSS is garbage (2, Informative)

## BoomerSooner (308737) | more than 9 years ago | (#10320350)

## Re:Good-oh... (2, Informative)

## RealAlaskan (576404) | more than 9 years ago | (#10321088)

## Linux Stats package (1)

## spineboy (22918) | more than 9 years ago | (#10325278)

## B ... C ... C++ (3, Funny)

## YetAnotherName (168064) | more than 9 years ago | (#10320136)

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S languageSo, R came from S; that must mean that R++ is coming up next!

## Re:B ... C ... C++ (1)

## adiposity (684943) | more than 9 years ago | (#10321035)

## Re:B ... C ... C++ (0)

## Anonymous Coward | more than 9 years ago | (#10322136)

## Re:B ... C ... C++ (2)

## MarkGriz (520778) | more than 9 years ago | (#10322315)

"R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language"

:-)So, R came from S; that must mean that R++ is coming up next!

No, that's just a bullshit answer so they don't have to admit "R" really stands for 'rithmatic

## Graphing, hah! (4, Interesting)

## Teancom (13486) | more than 9 years ago | (#10320333)

</cranky old man>

## Re:Graphing, hah! (0)

## Anonymous Coward | more than 9 years ago | (#10320433)

>window, and then take a snapshot of the window

You could make a postscript plot and then convert that

## Re:Graphing, hah! (1)

## StandardDeviant (122674) | more than 9 years ago | (#10321150)

## Re:Graphing, hah! (2, Informative)

## StarWynd (751816) | more than 9 years ago | (#10321771)

I don't know how R came into being, but IDL was originally designed as an ad hoc statistics and plotting tool. Because everyone was using it as an ad hoc tool, there was an assumption that everyone always had a display available. Unfortunately, that design flaw still exists in the language today. The implementation wasn't stupid back then, but fortunately IDL has new ways to handle graphics so that the display isn't involved. Maybe R had a similar history? Maybe not?

## R supports graphics output in many formats (2, Informative)

## jeif1k (809151) | more than 9 years ago | (#10322065)

Output to different graphics devices has been in S, Splus, and R for as long as I can remember (and that's a long time). Maybe you should try having a look at the copious documentation for R; the documentation, like the system itself, is free.

## Re:R supports graphics output in many formats (1)

## KjetilK (186133) | more than 9 years ago | (#10324973)

## Re:R supports graphics output in many formats (1)

## jeif1k (809151) | more than 9 years ago | (#10326346)

## RTFM! (4, Informative)

## KjetilK (186133) | more than 9 years ago | (#10324815)

R is really a beautiful language, for its purpose. It has a very nice correspondence with math and code, and for most parts of "hard" science, that's really important.

Compared to MATLAB, you can easily write R code 5 times as compact as MATLAB code, and still get more understandable code.

## Re:RTFM! (1)

## justins (80659) | more than 9 years ago | (#10343695)

I'm sorry, you don't get to use witticisms like "RTFM" while defending something something as fundamentally idiotic as:

The mind shudders at what the less "simple" ways of doing it must look like.

## Re:RTFM! (1)

## KjetilK (186133) | more than 9 years ago | (#10347750)

## Re:Graphing, hah! (1)

## HuguesT (84078) | more than 9 years ago | (#10326340)

Do help("bitmap") for all the details.

Do try to read the man page till the end next time, or ask a question to the dev team. Both the jpeg and png man page mention the function bitmap as a solution to the problem you are having.

All the best.

## Re:Graphing, hah! (1)

## Teancom (13486) | more than 9 years ago | (#10326484)

## Re:Graphing, hah! (0)

## Anonymous Coward | more than 9 years ago | (#10346829)

## What's a Robust Replacement for Excel??? . . . (2, Interesting)

## Mr. Pillows (745647) | more than 9 years ago | (#10320496)

## Re:What's a Robust Replacement for Excel??? . . . (2, Informative)

## Anonymous Coward | more than 9 years ago | (#10321098)

http://www.wolfram.com/news/statistics.html [wolfram.com]

## Re:What's a Robust Replacement for Excel??? . . . (2, Informative)

## Anonymous Coward | more than 9 years ago | (#10321899)

## Re:What's a Robust Replacement for Excel??? . . . (1)

## RWerp (798951) | more than 9 years ago | (#10346427)

## Re:What's a Robust Replacement for Excel??? . . . (1, Informative)

## Anonymous Coward | more than 9 years ago | (#10321839)

No data entry facilities, but it handles multi-dim. visualisation very well and has an handful of convenient methods (correlation analysis, PCA, histogram) built in.

I've come to realise that Excel's data vis. is almost totally a joke, and that its value for data entry is almost as questionable. I'm a little bit irked by the gnumeric team's decision to keep all of excel's "features" (256 column max, &c.)...

## Re:What's a Robust Replacement for Excel??? . . . (3, Informative)

## tabdelgawad (590061) | more than 9 years ago | (#10324985)

1) Matrix languages (e.g. Matlab, Gauss): These have C-like syntax with the basic data object being an nxn matrix (so, internally, a scalar is a 1x1 matrix). These languages are the way to go if you want to write your own statistical/simulation algorithms. They do have extensive pre-written routines for many statistical tasks, but they're mainly for people who know that a regression coefficient vector is given by inv(X'X)X'y and aren't afraid to code that. Nice thing is that it would be a single line of code to do this computation. I believe GNU/Octave belongs to this category.

2) Data languages (SAS, SPSS): The basic object here is a dataset with variables. Inverting a data matrix here is essentially a meaningless concept, and would be extremely difficult to do, but creating a new variable that sums sales for different people by division for certain months is straightforward (note that this is very difficult in a matrix language). Beyond trivial manipulations, you'd store code in procedures like any programming language.

3) Menu-Driven languages (e.g. EViews): The basic object is still a dataset with variables, but your primary method of manipulation is menu-driven. Want to run a regression?, just select your dependent and independent variables from dropdown lists and click

There's some area of overlap between 2 and 3. 2-type programs provide a rudimentary menu-driven system for those who don't want to code everything, and 3-type languages will allow you to store some command line instructions for future use.

In terms of learning curves, they get progressively flatter (easier) from 1 to 2 to 3.

Pick your poison!

## Re:"Inverting a data matrix" (1)

## nusratt (751548) | more than 9 years ago | (#10334182)

Not sure if this meets your definition, but I've been using SAS for boocoo years and can tell you that it has a "TRANSPOSE" facility explicitly for making columns into rows & vice versa.

## Re:What's a Robust Replacement for Excel??? . . . (1, Informative)

## Anonymous Coward | more than 9 years ago | (#10327275)

v.Inverse[X].v

It's mostly shell-based, but the shell includes pretty formulas and graphics, histograms are not too difficult to do, and EVERYTHING in mathematica is a data structure which can be read and manipulated by the shell, including bitmaps, sound, and even a mathematica notebook; of which the manual is one! You can also get a package to link it to Excel if you really want to.

Go to wolfram.com and order a trial version. That's what I did and I quickly came to the conclusion it was worth paying the full price.

Others around me (for the purposes of this discussion we're quantitative analysts for a financial company) use S+. It is like R (like C) but it has an excel-like front-end which might suit you better. I'm told it's similarly expensive though.

Another option I considered was Python, with freeware statistical and graphics packages which are available for the language, but Mathematica beat the pants off this combination for useability.

## And Of Course... (3, Informative)

## pnatural (59329) | more than 9 years ago | (#10320856)

## R is cool... (1)

## dash2 (155223) | more than 9 years ago | (#10321322)

But... ANYTHING is better than SPSS.

dave

## Manipulations (4, Funny)

## Brandybuck (704397) | more than 9 years ago | (#10321697)

What's more, R comes with a spectacular collection of functions for mathematical andstatistical manipulations...I can see that this package will be quite popular with political campaign managers.

## Man, have I been out of the loop (0)

## Anonymous Coward | more than 9 years ago | (#10321906)

Better hit the O'Rielly books. I have a lot of catching up to do!

## So, what's the difference... (1)

## StarWynd (751816) | more than 9 years ago | (#10321912)

## Re:So, what's the difference... (4, Informative)

## jeif1k (809151) | more than 9 years ago | (#10322198)

In contrast, R is very close to Splus and comes with an extensive array of statistical toolboxes. Many professional users use, and even prefer, R for their day-to-day work.

If you are doing anything with statistics, graphs of real-world data, or bioinformatics, R is the package to use.

If you are doing other kind of numerical work, things are less clear. Matlab is widely used, but it is hugely expensive and the language is pretty limited. Octave is the obvious open source choice, but there aren't many packages for it, and Matlab software requires some amount of porting if you want to use it with Octave. Numerical Python is technically far better than either Matlab or Octave, and it has a lot of packages and features that neither offer, but it (obviously) isn't Matlab compatible, so you can't just load existing Matlab packages into it.

## Re:So, what's the difference... (3, Informative)

## KjetilK (186133) | more than 9 years ago | (#10325623)

Anyway, what it doesn't do as well as IDL (*shrug*) is visualization. Its graphing is limited to, well, graphs. Interactive analysis with funny widgets and stuff isn't R's selling point. Nor is R very well developed for image analysis and stuff like that. I think they have multi-D fourier transforms now, but they didn't two years ago.

IDL, OTOH, doesn't really do statistics at all. For example, it doesn't come with something as fundamental as QQ-plots. Believe it or not, but every paper that comes with an assumption of normality should come with QQ-plot... Or at least have done it.

The syntax of IDL (*shrug*) is unbeliably nasty (*shrug*, aargh, sorry, couldn't resist). I heard they have done something about it now, but two years ago, IDLs concept of scoping was at best, uhm, well, unclear. You could easily modify variables in other peoples badly designed code without being aware of it. Then, the COMMON blocks you often needed to pass parameters...? I have a hard time understanding people would actually use anything like IDL (*shrug*). R has a very clearcut lexical scoping of objects. You've got to really design your code veeery badly to fall in the same traps IDL programmers fall in on a regular basis. I've seen IDL programmers who's been in it since the beginning go WTF over scoping... It was better being a lone R user than an IDL user with a lot of support...

Also IDL attempted to get in OO in version 5 (IIRC), but it is a mess. OO designers would be rolling in their graves over this. R, OTOH, has decided not to incorporate all OO concepts, but the stuff they have done, is very clean, very easy to understand, and perfectly sound.

But the real point of R is to have very clear mapping between code and mathematics. You code your math, it is so easy to see what happens. No iterating over array indices, it simply never happens. That's extremely appealing once you've got the hang of it.

I once translated 70 lines of MATLAB code to 7 lines of R code, some interpolation stuff that didn't exist in R. Never finished it though, because I found I didn't need it, but as a proof of concept it was great. And while MATLAB code was pretty hard to grok, the R code was very straightforward, you could just show it to anyone with basic training in math, and they would immediately see what it did. Try that with code from any of the others!

I think that the basic thing is that most numerical math for physics and astronomy is right now more advanced in IDL or MATLAB. If you do any kind of statistics, you should be going over to R. If you are willing to code, I'd argue that R is a platform so much better than IDL and MATLAB, you should be migrating your code starting now. I know I'd be writing thousands of lines of R code rather than going back to IDL (*shrug*)... :-)

Then, you know, you can't inspect the code in the core of IDL or MATLAB. It is likely to be flaws in there, and they may not have meant anything for any other problems than yours.... I got hit with three bad bugs in R when I worked with it, I manage to narrow them down, and they were all corrected within hours. To me, this is extremely important. The implementation of math should be available for review just like a derivation of equations are.

## what R isn't (5, Informative)

## bahamutirc (648840) | more than 9 years ago | (#10321972)

For people who have never taken real stat classes in college (or never learned it on their own) R will seem like a useless language. Most other languages can handle basic statistics computations.

Statistics is a whole lot more than means and averages. When I took my first real stat class, everything I knew about statistics was

literallycovered on the first half of the first page. I was totally blown away by what you could do with statistics.R is for hardcore stat folk who know a bit about programming, not programmers who need to do a little basic computation.

## Re:what R isn't (1)

## randall_burns (108052) | more than 9 years ago | (#10323830)

The other thing is that folks need a better way of handling relations and statistical functions. Right now, you need to learn a _lot_ to do stuff that shouldn't be that hard. Its almost like folks wanted to make sure that any project of this nature would need a DBA _and_ a script developer(or team) _and_ a statistician to get work done. That really shouldn't be the case.

## Re:what R isn't (2, Informative)

## KjetilK (186133) | more than 9 years ago | (#10324919)

However, you do not necessarily need to be into statistics to find R appealing. I'm an astrophysicist, and I wrote my whole thesis based on R. I started out with a bit of C, and I used some small Perl hacks to do some naive parallellizing, but I eventually phased out the C code and relied on R. I'd write thousands of lines of R code rather than go back to something like IDL (*shrug*).

For programmers, it may be a bit hard to overcome that you do not need for loops in R. For most purposes, think matrix and vector arithmetic instead. If you look at it from a math perspective that makes a lot of sense. For the things you do where it doesn't make sense to think in terms of vector arithmetic, you think in terms of applying functions to array elements instead.

Also, R has some simple OO concepts. They do not aim to do everything OO does, but the things they do, they do very well (as opposed to IDL (*shrug*, *shrug*) where they attempt to do everything, but does it bad). You need to exploit this to make it pretty.

I think these are the main two things that needs to be overcome for most scientists to use R efficiently. I really fell in love with the language for these two reasons, and I'd recommend R for all scientists, also non-statisticians.

## Comparison of R, Mathematica, S-plus, Matlab, etc (1)

## Mr. Pillows (745647) | more than 9 years ago | (#10322004)

## Re:Comparison of R, Mathematica, S-plus, Matlab, e (3, Informative)

## dovf (811000) | more than 9 years ago | (#10327199)

Other tools which I have come across, but haven't really worked with: Axiom [nongnu.org] (symbolic computations, CAS); Scigraphica [sourceforge.net] (graphing); opendx [opendx.org] (data explorer + visualization).

I've actually never really used R (by the time I came across it, I was done with my physics labs), so I can't really compare any of the others to it. But it definitely looks like one of the tools that I should add to my suite.

## Re:Comparison of R, Mathematica, S-plus, Matlab, e (0, Flamebait)

## pkhuong (686673) | more than 9 years ago | (#10331830)

## Comparison to octave? (2, Interesting)

## j1m+5n0w (749199) | more than 9 years ago | (#10322317)

Does anyone have any insight on how this differs from octave [octave.org] ?

This is the first I've heard of R, but I've tried using octave a few times. It seems to be a sort of enhanced gnuplot. I was thinking about using it for a project I'm working on, though I may just stick with good 'ol C for performance.

Do any of these projects work well with sparse matricies? I'm interested in using them to run a pagerank [wikipedia.org] -like computation, but not if they use n^2 memory.

-jim

## Re:Comparison to octave? (2, Informative)

## Anonymous Coward | more than 9 years ago | (#10324756)

Octave is basically an open source version of Matlab. This R looks similar just with a different programming language and different libraries.

It looks like it's probably more powerful but I don't know since I haven't used R.

## Re:Comparison to octave? (1)

## KjetilK (186133) | more than 9 years ago | (#10325673)

However, the C and FORTRAN bindings in R are excellent. So, if you're doing statistics on the stuff you find, you might want to look at doing the high-performance stuff in C, and wrap R around it.

Keep in mind that most of the statistics in R has undergone the most extensive peer-review by the most qualified statisticians in the field of any software (it could be true for other systems where the code is open too, but often some essential things are not). So, use the statistics in R, if you have uses for it, it is rigorous.

## Re:Comparison to octave? (0)

## Anonymous Coward | more than 9 years ago | (#10348841)

and you can easily creat any sparse matrix object in R

## And of course this comes a little late... (0)

## Anonymous Coward | more than 9 years ago | (#10324141)

## Minitab? (1)

## Tablizer (95088) | more than 9 years ago | (#10326527)

## Re:Minitab? (1)

## choad (66624) | more than 9 years ago | (#10343685)

## Re:Minitab? (0)

## Anonymous Coward | more than 9 years ago | (#10346873)

like eview and spss, minitab is an statistical software. it was never developed as programming language. excel have better programming capability then minitab, eview, or spss.

if you want serious stat programming language, you want s-plus, R (gnu s-plus), matlab with stat toolbox and maybe sas.

sas is very unfriendly to programmer. you have to learn from the basic if you want to programming it. usually it ended up using only the build-in function, not programming. r and matlab have better if you already know least C or C++.

matlab with toolbox is almost good as basic R in statistcally programming wise. and R does OOP.