Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

R In a Nutshell

samzenpus posted more than 4 years ago | from the read-all-about-it dept.

Book Reviews 91

joel.neely writes "R is a statistical computing environment that is fully-compliant with state-of-the-art buzzwords: free, open-source, cross-platform, interactive, graphics, objects, closures, higher-order functions, and more. It is supported by an impressive collection of user-supplied modules through CRAN, the 'Comprehensive R Archive Network.' And now it has its own O'Reilly Nutshell book, R in a Nutshell, written by Joseph Adler. I am pleased to report that Adler has risen to the challenge of the highly-regarded 'Nutshell' franchise. As is traditional for the series, this title mixes introduction, tutorial, and reference material in a style that is well suited to a reader who already has a background in programming, but is a new or occasional user of R." Read on for the rest of Joel's review.As a curious newcomer to R who wanted to get going quickly, I was well-served by Part 1, which provided an R kickstart. Chapter 1 covers the process of getting and installing R. It is short, to the point, and just works, addressing Windows, Mac OS X, and Linux/Unix with equal attention. Chapter 2, on the R user interface, introduces the range of options for interacting with R: the GUI (both the standard version and some enhanced alternatives), the interactive console, batch mode, and the RExcel package (which supports R inside a certain well-known spreadsheet). Chapter 3 uses a set of interactive examples to provide a quick tour of the R language and environment, establishing a task-oriented theme that carries through the rest of the book. The last chapter of part 1 covers R packages. It summarizes the standard pre-loaded packages, introduces the tools to explore repositories and install additional package, and concludes by explaining how to create new packages.

As a polyglot programmer who is always interested in seeing how a new language approaches programs and their construction, I enjoyed Part 2, which described the R language. This section begins with an overview in chapter 5, and then devotes a chapter each to R syntax, R objects, symbols and environments (central to understanding the dynamic nature of R), functions (including higher-order functions), and R's own approach to object-oriented programming. This section closes in chapter 11, with a discussion of techniques and tips for improving performance.

As a busy professional with data sitting on my hard drive that I'd like to understand better, I appreciated Part 3, with its practical emphasis on using R to load, transform, and visualize data. Chapter 12 presented alternatives for loading, editing, and saving data, from the built-in data editor, through file I/O in a variety of formats, to a mature set of database access options. Chapter 13 illustrated a range of techniques for manipulating, organizing, cleaning, and sorting data, in preparation for presentation or more detailed analysis. Chapter 14 introduces the reader to the wealth of graphical presentation options built into the R environment. There are so many charting types and details that this chapter could have been overwhelming, but Adler keeps the interest high and the mood light by drawing on an engaging variety of data: toxic chemical levels, baseball statistics, the topography of Yosemite Valley, demographic data, and even turkey prices. Chapter 15 is devoted to lattice graphics, the R implementation of the "trellis graphics" technique for data visualization developed at Bell Labs. This chapter illustrates the power of lattice graphics by exploring the question of why more babies are born on weekdays than weekends.

As a non-statistician who still occasionally needs to do some number-crunching, I'm sure I'll be returning to Part 4, with its detailed explanations and illustrations of analysis tools and techniques–almost two-hundred pages worth. In chapters 16 through 20, Adler surveys topics in data analysis, probability, statistics, power tests, and regression modeling. As someone who has been offered too many medications and lost fortunes, I found much to enjoy in chapter 21, which used a variety of spam-detection techniques to illustrate the concepts of classification. Chapter 22, on machine learning, discusses several of the data mining techniques that R supports. Chapter 23 covers time series analysis, which may be used to identify trends or periodic patterns in data. Finally, chapter 24 offers an overview of Bioconductor, an open-source project focused on genomic data.

The book closes with a detailed reference to the standard R packages.

This is an impressive piece of work. In a volume of this size (about 650 pages), navigation is crucial, and I found both the organization of the chapters and index up to the task. I was able to follow the instructions and examples through the first several chapters of the book essentially without a hitch, and in the latter chapters the variety of illustrations and data sources added interest to what could have been very dull going.

I won't claim perfection for this book. There were a couple of explanations that could have been clearer, and one or two odd turns of phrase or rough edits. Out of all the code examples that I tried, I found exactly one that didn't seem to work without a minor correction. For a work of this size, that's actually pretty amazing!

As a long-time O'Reilly reader, I see Joseph Adler's R in a Nutshell as a welcome addition to the menagerie.

You can purchase R in a Nutshell: A Desktop Quick Reference from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

Sorry! There are no comments related to the filter you selected.

Emphatic Agreement (5, Interesting)

eldavojohn (898314) | more than 4 years ago | (#32952956)

I also own this book in hard copy and cannot say enough good things about it. First I would like to add that the author is also the author of Baseball Hacks [oreilly.com] , which might not sound like a popular title for Slashdot but if you are a nerd and techie/programmer then this book is for you! Never have I seen such statistical rigor and beautifying/aggregation of baseball statistics brought together. I only hope that Mr. Adler continues to produce such great technical volumes.

In a volume of this size (about 650 pages)

Not to criticize the reviewer but there's not enough written above to do this book justice. From the author's emphasis on preprocessing the data in another language (like Perl I think he uses in the Chapter 3 tutorial) so that it can be effortlessly ingested by R to the very last pages on machine learning in R, it's a good book. I actively lament that in college I was relegated to Matlab instead of R today and the many packages available on CRAN.

I too would give this book a 9/10. It sometimes tries to inject tutorials in what should probably stick to being a reference and it might have too large of a scope for a single volume (I've read sets of books on machine learning and classification models) but this book is great for R beginners and R intermediates and as an R reference.

Seriously if you know a statistician who codes or if you know a developer who values statistics then this is their book. Given the nature of the subject matter and the GPL'd beauty of R, you'll undoubtedly have a hard time finding a negative review of this book anywhere.

Re:Emphatic Agreement (1)

Monkeedude1212 (1560403) | more than 4 years ago | (#32953154)

I don't own the book myself but a buddy of mine does, and he has also recommended it. It's funny that you mention Baseball Hacks - this friend of mine started using R for a new kind of statistical analysis of the World Cup here not long ago, and it actually is quite impressive with what he's come up with.

Anyways - Of the few pages I've skimmed it was pretty good. I'd definately be a beginner, having only taken one stats course, but I could understand what was on the page and what the tutorials were teaching, even without trying them myself. I think thats really the mark of a good programming book for me. If I can look at the first few pages, without any prior knowledge on the exact subject matter, and still understand whats going on, thats a huge plus.

Oh God, A GNUFreak... (0)

Anonymous Coward | more than 4 years ago | (#32953170)

"the GPL'd beauty of R"

LOL, wut?

Re:Oh God, A GNUFreak... (2, Funny)

Ukab the Great (87152) | more than 4 years ago | (#32953434)

The letter R is under the GPL license, therefore any derivative work that uses the letter R, such as Hamlet, the US Constitution, the Forums section of Penthouse magazine, and this post grant the reader the same rights and privileges afforded to a three year old putting a single magnetic letter R on a refrigerator.

Pirates (2, Funny)

bigrockpeltr (1752472) | more than 4 years ago | (#32953666)

there be rum and free software matey... aRRRRRRRRRgh

Re:Oh God, A GNUFreak... (1)

squidfood (149212) | more than 4 years ago | (#32953910)

therefore any derivative work that uses the letter R...

Yeah, you think this is funny until you try to google any particular bit of specific info on 'R'.

And don't get me started on looking up using 'R' with 'c'. (actually, that one works much better than it used to).

Re:Oh God, A GNUFreak... (2, Informative)

DaVince21 (1342819) | more than 4 years ago | (#32955186)

Google "R language", or "R code", or something similar. It's search engine searching 101.

Re:Oh God, A GNUFreak... (1)

squidfood (149212) | more than 4 years ago | (#32955862)

Google "R language", or "R code", or something similar. It's search engine searching 101.

Thank you very much for your extremely useful skills. I never would have guessed! If you think I haven't tried dozens of permutations of same and not constantly come up with hits based on stray letter 'R's, well... you haven't really done anything challenging with a search engine before. (I'm not talking about cases where you're trying to just get basic facts about R, but when you want to look up some arcane error messages etc. that the stray R's doom many searches).

Re:Oh God, A GNUFreak... (1)

DaVince21 (1342819) | more than 4 years ago | (#32956650)

Quotes work well around (parts of) error messages, but I guess you already knew that as well. Anyway, ichthyoboy seems to have posted something great.

Re:Oh God, A GNUFreak... (1)

squidfood (149212) | more than 4 years ago | (#32957628)

Agreed, I didn't know about that one!

Re:Oh God, A GNUFreak... (0)

Anonymous Coward | more than 4 years ago | (#32988312)

I don't think, squidfood, that you actually are looking for "R language" or "R code". :You might be looking for: R language, which is different from "R language". The quotations tell the search engine to look for instances of everything between the quotations, not just one or possibly both individual search terms. I looked for "R language" (one term) and got back a slew of results all dealing with the R programming language.

If you want to search for an arcane error then type in "partial text of arcane error" (one term). If that doesn't work you probably want to go to a specific site about R and look within that site for help.

Good luck using the internet buddy!

Re:Oh God, A GNUFreak... (2, Informative)

ichthyoboy (1167379) | more than 4 years ago | (#32955938)

Or even better yet, use Rseek [rseek.org] : basically a modified Google search that looks specifically through pages on R.

Re:Emphatic Agreement (1)

AnonymousClown (1788472) | more than 4 years ago | (#32953222)

First I would like to add that the author is also the author of Baseball Hacks [oreilly.com], which might not sound like a popular title for Slashdot but if you are a nerd and techie/programmer then this book is for you!

I once developed baseball software for collecting data and then reporting the stats. I could never understand why all that data was collected and why those stats were calculated for the players. I guess it's part of the game to watch someone hit another home run or steal a base or strike out and imagine the stats changing - I seems to be a waste of time.

If folks who follow all those stats put that time and effort to the stock market, they'd all be millionaires.

Re:Emphatic Agreement (0)

Anonymous Coward | more than 4 years ago | (#32953270)

I mean, yea a lot of fans follow those stats just for the hell of it, but lots of baseball execs compile and follow them to try to determine mathematically which players are "best". If they do so successfully, I assure they already are millionaires. Billy Beane pioneered the use of sabermetrics for scouting and development and has a multimillion dollar contract with Oakland, and has made millions more for the team.

Re:Emphatic Agreement (1)

MikeBabcock (65886) | more than 4 years ago | (#32953734)

If folks who follow all those stats put that time and effort to the stock market, they'd all be millionaires.

No, actually, they wouldn't.

But I can introduce you to a lot of others who also seem to think so.

Re:Emphatic Agreement (1)

rnj (779212) | more than 4 years ago | (#33005934)

Sean Forman (the guy who built baseball-reference.com) is by no means rich. But he was able to walk away from a tenured professorship (Math).

If you're talking the guys who compile the raw data, they're basically people who like to keep score while they watch the game. I know several people who are scorers for Stats (speaking of which, the guy who started that company made millions) or other data sources.

Then there are the people who are transcribing the historical data at retrosheet. These are all volunteers who love the fact that we now can get play by play results for the distant past (pretty much complete since the early 1970s, well over 90% complete going back to the early 50s and spotty before then. Though the 1920 are pretty much complete)

Re:Emphatic Agreement (1)

localman57 (1340533) | more than 4 years ago | (#32953244)

How introductory is the book to someone with little to no statistics background? I saw the review says that part 4 covers this, but I wonder. If I get my textbooks out, I can figure out how to do things like standard deviation, best fit lines, etc, but that's about it. Would this book/tool be useful to someone with such a rudamentary understanding of statistics?

Re:Emphatic Agreement (1)

Monkeedude1212 (1560403) | more than 4 years ago | (#32953298)

I had a stats course about 3 years ago - and so I mean I understand how to do them but I don't remember the formula's off hand like a stats major probably would. It's difficult to say exactly how much of that course I retained exactly.

But in the few pages I read - I was able to follow along quite easily. I guess the best way to put it is: You have to have some understanding of Statistics and how they work, some familiarity with it, you won't be lost in it.

Whether or not it will be USEFUL to you is another thing entirely - and that depends on what you want/need to do and how you plan to make it perform.

Re:Emphatic Agreement (1)

Miseph (979059) | more than 4 years ago | (#32953398)

My guess is that if you aren't terribly familiar with statistics, you won't be entirely comfortable writing in a language designed to process complex statistical calculations.

If the book is good enough that it can teach you stats, this guy deserves a Pulitzer, not a 9/10 from Slashdot.

Re:Emphatic Agreement (1)

tool462 (677306) | more than 4 years ago | (#32953612)

I'm replying to you, since you seem familiar with R, but hopefully others can chime in as well.

You imply that it's preferable to preprocess the data in another language, like Perl. What makes R valuable as a completely separate language, rather than being implemented as a library within an existing language? I'm assuming there must be something compelling you can't get out of adding
use R;
R::some_func(...);
to your parsing code.

Re:Emphatic Agreement (2, Informative)

Yold (473518) | more than 4 years ago | (#32953878)

It handles data nicely. You can do things similar to list comprehensions in python. Implementing it in another language would break its semi-compatibility with S-plus. It also has data-types aimed towards the sorts of processing that it is designed for, like formula objects and data frames. Finally, the interactive mode is invaluable for exploratory analysis.

You could build a ton of syntactic sugar into another language to get something close to R, in-fact, that's actually what basically all of the operations in R are (syntactic sugar, as described in R In A Nutshell).

So to answer your question, it makes more sense to design a language for statistics rather than hack it onto an existing language.

Re:Emphatic Agreement (1)

langelgjm (860756) | more than 4 years ago | (#32953880)

I have only limited experience with R, but FWIW, R contains some basic data importation routines for standard things like CSV or tab delineated data. There are also libraries that let you import files from other statistical packages like Stata, SPSS, etc. I imagine you could directly import XLS with a library, though I've never had to.

I would hazard a guess and say that if you had to massage your data into a more machine-readable format, it'd be easier to do it in Perl, then load the result up into R, since Perl is so good at text-processing. On the other hand, if your data is already readable, you can write short (or long) programs entirely in R, which is fairly handy. I started using it because I liked the fact that it wasn't a stand-alone statistics package, but a full-fledged language.

Re:Emphatic Agreement (2, Informative)

chthonicdaemon (670385) | more than 4 years ago | (#32954386)

This is really the old domain specific language argument. Why go for a DSL when you have a good general purpose language and you can add functionality with libraries. In the end, it's all about notation. You can add a matrix library to Java and write A = B.times(C).plus(D).invert().transpose(), or you can have a language that allows you to write A = inv(B*C+D)'. In R, the data frames are a really rich way of handling data, and the things you can do form a great working environment. For what it's worth, there are R wrappers for many languages (like Perl and Python), but once you have gotten used to the full R environment, using the engine from other languages grates.

I predict "R" apps will be susceptible to piracy (2, Funny)

aapold (753705) | more than 4 years ago | (#32953048)

I mean, how can they resist?

Re:I predict "R" apps will be susceptible to pirac (1)

selven (1556643) | more than 4 years ago | (#32955670)

I mean, how can they resist?

"R" is resistance, so R apps should be able to resist just fine.

So is the next major version... (2, Funny)

mobby_6kl (668092) | more than 4 years ago | (#32953124)

going to be called R-square?

Re:So is the next major version... (3, Funny)

Hatta (162192) | more than 4 years ago | (#32953384)

R dR R.

Re:So is the next major version... (1)

eric2hill (33085) | more than 4 years ago | (#32953722)

Simpsons... R^3DR^2 => R R R D R R

Re:So is the next major version... (0)

Anonymous Coward | more than 4 years ago | (#32953494)

hR, hR, hR

Re:So is the next major version... (1)

karimlalani (1823188) | more than 4 years ago | (#32953832)

R2 actually. And the second developmental release would be R2D2....

Re:So is the next major version... (1)

sco08y (615665) | more than 4 years ago | (#32954962)

Avast.

R be dope (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#32953180)

It be dope!

dee
ooo
peee
dope

Those buzzwords are okay, I guess... (0, Funny)

Anonymous Coward | more than 4 years ago | (#32953224)

...but can it architect a paradigm to leverage Web 2.0 community-driven crowdsourcing, using social graphs to monetize the cloud, thus rendering a modular-frameworked, enterprise-grade solution?

Re:Those buzzwords are okay, I guess... (1)

Elros (735454) | more than 4 years ago | (#32955550)

So...you're making money off Facebook?

Re:Those buzzwords are okay, I guess... (1)

linhares (1241614) | more than 4 years ago | (#32955830)

Sorry, only in loosely-coupled multidimensional-scaling managerial settings

Re:Those buzzwords are okay, I guess... (0)

Anonymous Coward | more than 4 years ago | (#32956484)

no one will get that joke, though :'(

R Tools (4, Interesting)

Idgarad (530269) | more than 4 years ago | (#32953302)

R is an excellent language to learn for just about every field. It's ability to import and export data to MS based resources such as Access, Excel, MS-SQL and other non-MS sources makes it a versital tool. It's commerical parent is S-PLUS and is nearly syntax identical with minor variations. Buy the book, use the tool, impress your Eve Online players by pinning down the July Tritanium prices and hitting the weekly averages within .5 ISK by doing time series analysis using regression plus ARIMA on the residuals. Find out cool things like Hulkageddon impacts frigate prices more then exhumers and MORE! FUN FOR THE WHOLE FAMILY (Except your big sister because she's icky and into boys....) For those what want to do google searches but find 'R' difficult there is the rseek.org site and a few quick links to get you started while you wait for the nutshell book to arrive in the mail. R Intro : http://www.itc.nl/~rossiter/teach/R/RIntro_ov.pdf [www.itc.nl] Programming in R: http://manuals.bioinformatics.ucr.edu/home/programming-in-r [ucr.edu] R Graph Gallery: http://addictedtor.free.fr/graphiques/ [addictedtor.free.fr] Big Resource I use: http://www.math.yorku.ca/SCS/StatResource.html [yorku.ca] The Little Handbook: http://www.tufts.edu/~gdallal/LHSP.HTM [tufts.edu] The Big N: http://www.itl.nist.gov/div898/handbook/ [nist.gov] There are hundreds of PDF references out there that can help as well, too many to list. Good luck, have fun.

Re:R Tools (0)

Anonymous Coward | more than 4 years ago | (#32957116)

All things I can do in a language I already know. What good is R to me? Is it faster or something?

Sure it might some things syntactically easier but generally speaking I don't find DSL's all that great. If I know what I'm doing and then I don't need a special language to do it and often I can choose a language that will run my code a whole lot faster.

as a longtime slashdot reader (1)

mattdm (1931) | more than 4 years ago | (#32953380)

Let me just say: wow, thanks for actually providing a review, rather than a blurb copied from the amazon listing.

Seriously. Thanks.

Re:as a longtime slashdot reader (0)

Anonymous Coward | more than 4 years ago | (#32954872)

Here is blog with some unique and cool examples gathering interesting data and displaying it in many different ways. http://www.r-chart.com/

ebook version (3, Informative)

proxima (165692) | more than 4 years ago | (#32953406)

As an (occasional) R user, I am excited to see a well-reviewed O'Reilly book on the language. I went and checked the major ebook stores - Amazon, BN, and Stanza, and none had the title.

It turns out that in addition to the Safari books service, O'Reilly also sells DRM-free copies in epub, mobi, and PDF formats. This book is available here [oreilly.com] . It's not a huge discount over the printed version on Amazon ($6.50 less), though. I'm surprised, then, that it isn't available via the major stores.

Re:ebook version (1)

_|()|\| (159991) | more than 4 years ago | (#32955950)

Does O'Reilly sell any of its books through ebook intermediaries? Since they sell DRM-free versions direct, usually in multiple formats, I've never bothered to look. The only exception I've noticed is the occasional iPhone app.

As for price, discounts aren't too hard to find. I ordered one book for half its listed ebook price. I've since gotten a couple of "deal of the day" ebooks for $10 each. Today's deal, listed prominently at oreilly.com [oreilly.com] , is Learning the vi and Vim Editors (PDF only, unfortunately).

Re:ebook version (1)

proxima (165692) | more than 4 years ago | (#32956214)

Does O'Reilly sell any of its books through ebook intermediaries? Since they sell DRM-free versions direct, usually in multiple formats, I've never bothered to look. The only exception I've noticed is the occasional iPhone app.

I've noticed a few have Kindle versions, but Stanza and its store has nearly the entire library. I guess this way O'Reilly gets the full purchase price rather than 70%.

As for price, discounts aren't too hard to find. I ordered one book for half its listed ebook price. I've since gotten a couple of "deal of the day" ebooks for $10 each. Today's deal, listed prominently at oreilly.com, is Learning the vi and Vim Editors (PDF only, unfortunately).

So it would appear. I often check Retail Me Not [retailmenot.com] for online store codes.

Once you have a decent ereader device, DRM-free ebooks seem like the perfect solution for tech books. Searchable, no marginal weight for otherwise heavy books, etc. What's interesting to me is that the DRM-free epubs can't be loaded directly onto my ipad in the BN or ibooks app; via dropbox, I was only able to send the epub to GoodReader or Stanza (which is fine, the Stanza reader is really nice). I think the ibooks app can transfer epub files via USB and iTunes, but that's a pain (I sync very, very rarely, since I don't use Windows or Mac OS X much). Not sure what's up with the BN reader app; the nook device can handle epubs via USB mass storage transfer.

Re:ebook version (1)

JanneM (7445) | more than 4 years ago | (#32957456)

They sell some titles through the Android app store - but of course, that would probably count as a direct sale too. I'd like to get this book there but for some reason they mostly seem to sell Windows-related titles and not much else.

What language do pirates code in? (4, Funny)

by (1706743) (1706744) | more than 4 years ago | (#32953466)

R!

<drops pin...>

It is a good buy (0)

Anonymous Coward | more than 4 years ago | (#32953484)

I have a number of other good R books and this book has been a really useful addition (works better as a reference than many of the others). At first I wasn't going to buy this given the up and down nature of the O'Reilly series- but "R in a Nutshell" is definitely up.

Fantastic book for a fantastic language (2, Interesting)

spike hay (534165) | more than 4 years ago | (#32953502)

This book always sits right on my desk.

R is a language that more people should really learn. The statistics community has definitely gravitated strongly to it. These days, with the thousands of packages on CRAN, it's much superior in functionality compared to other packages like STATA or SAS (I won't even go into people who use matlab for statistics), not to mention open source.

It still is a bit slower than matlab for some matrix operations, but hopefully that will be improved in the future.

Not much left! (2, Funny)

Korbeau (913903) | more than 4 years ago | (#32953656)

Based on Wikipedia, only G, H, N, O, P, U, V, W one-letter programming language names are left! Time to invent a new language :)

Re:Don't foget, Unicode now exists (0)

Anonymous Coward | more than 4 years ago | (#32953854)

Based on Wikipedia, only G, H, N, O, P, U, V, W one-letter programming language names are left! Time to invent a new language :)

Ah, but we have unicode now, so we have a _lot_ of single-letter names to go.

Re:Don't foget, Unicode now exists (1)

Hognoxious (631665) | more than 4 years ago | (#32963444)

Let me take this opportunity to say that unicode can fuck right off.

Re:Not much left! (1)

mc2thaH (920212) | more than 4 years ago | (#32954082)

I'll start working on "W". It will be difficult to program in though, because it will randomly blow shit up :-)

Hello World in W (0)

Anonymous Coward | more than 4 years ago | (#32957090)

#include "Cheney.h"

number_thing Main_Street(totally_empty_void)
{
PRINT "HELLO WORLD"
FEED ME A LINE
RETURN(TO DARK AGES)
}
//Note, you can use any comment style in W you want,
//and you must use lots of them because of the Slashdot
//caps thing. Comments might actually cause changes in
//program semantics. Don't misunderestimate it.

V was a voice processing language in the early 90s (1)

teambpsi (307527) | more than 4 years ago | (#32954124)

:V (the dots were above it) -- was a cross platform byte-code compiled language used for voice processing applications (DOS & Unix)

Re:Not much left! (1)

Unordained (262962) | more than 4 years ago | (#32954302)

Result of programming in G [google.com] : several hits. Maybe wikipedia needs to be updated?

Re:Not much left! (1)

Perl-Pusher (555592) | more than 4 years ago | (#32954312)

I propose O, the language will be slick and propose a new paradigm change in computing. But in actual use it will fail to deliver.

Re:Not much left! (0)

Anonymous Coward | more than 4 years ago | (#32957120)

Is O a language for programing porn games?

Re:Not much left! (1)

alecto (42429) | more than 4 years ago | (#32959444)

That's where Big-O notation came from.

Re:Not much left! (1)

master_p (608214) | more than 4 years ago | (#32957324)

Personally, I am waiting for R++. I heard it's gonna have templates, operator overloading, the lot!!!

Re:Not much left! (0)

Anonymous Coward | more than 4 years ago | (#32959026)

R++ == S ???

Wonderful software (1)

digitalhermit (113459) | more than 4 years ago | (#32953900)

More than a decade ago I gave a talk on using R, Octave, MuPAD and other software in the classroom environment. It's a great package. Back then I used it to get through stats courses and plot disk usage in a graph. Now I'm using it to hammer through stock market data each night. To do the same with some commercial packages would cost thousands of dollars.

Another outstanding reference for R: "R in Action" (3, Informative)

dr_canak (593415) | more than 4 years ago | (#32954044)

Not having read the O' Reilly book,

I can't draw a comparison between the two, but I have been extremely pleased with "R In Action" by Robert Kabacoff

and it can be found here:

http://www.manning.com/kabacoff/ [manning.com]

It's a work in progress, in that some 90% of the book is written. Pre-ordering the electronic version gives you the ability to download chapters as they are written, plus a final e-copy (or hard copy if you pay more) when it's completed.

I have a high degree of familiarity with SPSS and SAS, and am learning R to get around the crazy licensing issues of the aforementioned programs. I have been very pleased with Kabacoff's book, as I had *no* familiarity with R before grabbing "R in Action." The publisher/author support a forum where purchasers can identify errors and/or make suggestions for improvements before the book goes to final press.

Not sure if it is competition for "R in a Nutshell" or simply an additional reference, but worth checking out if you want to learn R. It's been very helpful for me.

jeff

What kind of a Nuts are we talking about here? (0)

Anonymous Coward | more than 4 years ago | (#32954256)

650 pages in a Nutshell book?

Following the Murphy's Law "Anything which can be put in a Nutshell belongs there" I avoid Nutshell books, but that's a different topic.

Question about integration with other languages (1)

notandor (807997) | more than 4 years ago | (#32954298)

Is there a way to integrate R programs with another high level language like Java, for example to bind a R object to a Java interface? I have basic familiarity with R, and I would like to use programs written in R directly with other programs written in a object-oriented language, as opposed to do file i/o for the bridge between them.

The general idea is to be able to take Java objects and pass them to R and do all the stats numbercrunching with smaller R programs, that are somehow integrated with a Java program. The results then get back as other objects that can be further processed in Java.

Are there any possibilities for that?

Re:Question about integration with other languages (2, Informative)

Stradenko (160417) | more than 4 years ago | (#32954488)

JRI sounds like you want, but rJava is there when you want the reverse.
http://rosuda.org/JRI/ [rosuda.org]
http://rosuda.org/rJava/ [rosuda.org]

Similar things exist for Python, Perl and probably others.

Re:Question about integration with other languages (0)

Anonymous Coward | more than 4 years ago | (#32954494)

Sounds like you want rJava (http://www.rforge.net/rJava/)

For Python you get rpy2 (http://rpy.sourceforge.net/rpy2.html)

For Windows specific stuff you get R DCOM (http://cran.r-project.org/contrib/extra/dcom/)

R in a nutshell = Rpy (3, Informative)

js_sebastian (946118) | more than 4 years ago | (#32956300)

I don't know about java, but when I have to use a statistics library available in R, I use rpy. It's a python module that lets you automagically call r functions very easily, and directly get back python objects or R objects for further processing with R methods. Python's introspection capabilities make this sleek and transparent, I doubt a Java binding could be as cool (though if you need java, there probably are solutions).

and honestly, i'm so glad i don't have to use R directly... TFS says it is object oriented, but as far as I can recall all the library methods i tried just returned heterogeneous matrixes, with no real user-defined types. And the function calling semantics are mind-boggling, with mixing of keyword and positional arguments leading to all sorts of weirdness...

Re:R in a nutshell = Rpy (1)

tfrayner (186362) | more than 4 years ago | (#32961230)

R does support fully user-defined types, inheritance and polymorphic methods. You just have to want to use them enough to dig through the multiple OO implementations available as part of the core. The commonly used systems, S3 and S4 objects, don't exactly play nicely together. I personally lean towards S4 since it seems much cleaner, but a lot of legacy code still uses S3 so it looks like there won't be a rationalisation of these two systems any time soon. The Bioconductor R modules generally (but not exclusively) use S4, so check those out for examples.

Re:R in a nutshell = Rpy (0)

Anonymous Coward | more than 4 years ago | (#32962630)

Your point about OO is comletely false. User defined types are completely supported through both the S3 and S4 class systems. There are also packages available for more traditional OO and prototype programming.

As for your issue with function arguments, I can only say that I read the R-help list daily, and have rarely seen anyone confused by this point. Quantum mechanics is 'mind boggling', not the R function call semantics.

In short, don't put too much stock into what this has to say, he obviously hasn't tried very hard.

Re:R in a nutshell = Rpy (2, Informative)

mbakunin (258573) | more than 4 years ago | (#32965486)

Sadly, no. As the other guys said, R does absolutely everyting you claim it doesn't. Every positional function argument is a shortcut you can call explicitly in any order. Don't put any stock into this recommendation.

If you are working in python, have discovered that SciPy's stats functions are not ready for prime time (they aren't), and need drop-in replacements, use rpy. Otherwise, you will find it does not play very well with R. It feeds and returns objects in what I found unintuitive and unuseful ways. Yes, you can make it work, so if you're in python already, you should use it. Otherwise, learn and use R when it makes sense, which is roughly 90% of the time doing data analysis.

Re:R in a nutshell = Rpy (1)

acheron12 (1268924) | more than 4 years ago | (#32967558)

Bingo. Much as I like R, the language leaves a lot to be desired compared to Python - it doesn't even have a built in dictionary type. For a fully integrated package including Python and R, SAGE [sagemath.org] is worth a look.

Re:R in a nutshell = Rpy (0)

Anonymous Coward | more than 4 years ago | (#32973170)

R is much less forgiving than Python, in the sense that when I write Python 90% or more of the time my first draft code does what I intend. First draft R either fails or returns bizarrely mangled results. Of course the reason is that I hadn't thought the problem through sufficiently and/or did not understand the functions or underlying theory thoroughly enough.

However eventually when you do begin to really get R it is a beautiful language. Very 'mathy' and clean. Some of the stuff you can do with hypergeometric surface distributions and stuff are just incredible. I only started to scratch the surface of the graphics capability.

The coolest was that I did a research project analysis in R, where I compiled the data, computed scores, analyzed data and generated graphs in R. I wrote the paper in LaTeX. After it was finished, I could update the raw data (in CSVs) and with a few commands I would programmatically regenerate the scoring and all the generated data tables, generate all the graphs and then re-run LaTeX to produce a paper with all the updated data. All this in 2 minutes.

thank you R....(and LaTex) (0)

Anonymous Coward | more than 4 years ago | (#32954642)

I only wanted to say that having learned R and Latex just for doing my PhD thesis (I am a PM now and I have never used them since my dissertation), I would strongly recommend them, especially to those never planning on going back to academia...it's once in a lifetime opportunity to do beautiful AND useful coding, feel proud of them and being able to brag to non geeks/nerds. All at the same time.
PS: my PhD was nothing to do with CS by the way.

Rattle for R baseed datamining (2, Informative)

khb (266593) | more than 4 years ago | (#32954730)

Often reason people get involved in statistical analysis is there is a body of data, and no clue where to start ... as inhabitants of the information age, and cheap storage ... there's lots of material and often little clue or thought to what the stored data might mean.

http://rattle.togaware.com/ [togaware.com] is a website dedicated to "rattle" which is an R package (and togaware has a PDF book that's a great introduction) to a GUI based datamining tool.

Very handy, and the book is very lucid.

Re:Rattle for R baseed datamining (1)

JanneM (7445) | more than 4 years ago | (#32957554)

I had no idea about rattle. It looks very useful; thanks for the info.

thanks R...(and thank you LaTex) (1)

harmental (1859728) | more than 4 years ago | (#32956156)

I only wanted to say that having learned R and Latex just for doing my PhD thesis (I am a PM now and I have never used them since my dissertation), I would strongly recommend them, especially to those never planning on going back to academia...it's once in a lifetime opportunity to do beautiful AND useful coding, feel proud of it and being able to brag to non geeks/nerds. All at the same time. Just priceless..... PS: my PhD had nothing to do with CS by the way.

Wrong paradigm (0)

Anonymous Coward | more than 4 years ago | (#32956830)

Statistical computing is where it is at but this is the wrong paradigm for it.

To quote Gauss: "Mathematics is the study of relations."

Relational programming is the right paradigm. In statistical terms, a relationship is a "case" and a set of relationships is a "relation". To use SQL terms, a case is a "row" and a "table" is a random variable or more accurately a set of random variables in a relation.

Once you have that idea, you can express your statistical propositions -- the things on which you make observations of cases -- in something relevant like the propositional calculus.

The = vs. == typo problem solution (1)

istartedi (132515) | more than 4 years ago | (#32956948)

The <- approach is interesting, but what's the R notation for "less than negative six".

And so, the quest continues. Pascal's := might be the best; although I hate to admit it because Pascal is my "had to deal with in school and was struggling so I hate it" language.

Re:The = vs. == typo problem solution (0)

Anonymous Coward | more than 4 years ago | (#32958712)

The <- approach is interesting, but what's the R notation
for "less than negative six".

And so, the quest continues.

Erm, just *space* -6

Re:The = vs. == typo problem solution (0)

Anonymous Coward | more than 4 years ago | (#32958976)

It's a typo problem. What if you forget the space? You should have inferred that the problem involves not typing the space. Saying "just space" is like saying "just =" for the aforementioned = vs. == problem.

Re:The = vs. == typo problem solution (1)

u38cg (607297) | more than 4 years ago | (#32961742)

R doesn't permit assigning an assignment, so x-y-2 will bork. x-y -2 will assign a boolean.

Re:The = vs. == typo problem solution (0)

Anonymous Coward | more than 4 years ago | (#32970012)

I assuming you meant x-y--2 for assigning an assignment which works just fine. x-y-2 subtracts the x vector from the y vector then subtracts 2 from the result.

Re:The = vs. == typo problem solution (1)

u38cg (607297) | more than 4 years ago | (#33022490)

No, just /. munching angle brackets. Bah.

It's for 'Statistical' computing (1)

DynaSoar (714234) | more than 4 years ago | (#32958608)

Serious question here.

I do a lot of statistical analyses, including some I've authored. The book is for the programmer, but R is for statistics and that means someone who actually uses the numbers for something.

SAS has it's own language as well as GUI with menus and can interchange data structures with many common programs.

SPSS has all these, plus is can record what's pulled down from the menus and generate code in its own language, which is easy to understand, comes as a text file, and can be edited and cut-and-pasted into batch files.

Why should I care about R?

And for the Matlab users, it was never meant to be a stats program, the stats add-on package requires you learn to write M code first, then learn the package, months of learning to get through it all, and you have to bring a full compliment of statistical know-how with you. If someone made you use Matlab for stats and the two weren't already your bent, someone needs spanked.

Re:It's for 'Statistical' computing (1)

itslifejimbutnotaswe (1173791) | more than 4 years ago | (#32959542)

1. R is free (as in beer and as in speech).
2. R is far more extensible than SAS or SPSS. When what you're modelling doesn't fit in with the predefined options, you can deal with that by extending R in whatever way you wish.
Those are the major advantages of R for statisticians.

Re:It's for 'Statistical' computing (1)

Jungle guy (567570) | more than 4 years ago | (#32959754)

I have used two programs for Statistical Analysis that have one advantage of R: both are free, nad part of the GNU project. Of course, both have disadvantages.

1) PSPP [gnu.org] - a free alternative to SPSS. It does not have every option as SPSS, but in my opinion is fairly complete and has a lot of power. It is just like "click click. There is the average, the median, the standard deviaton, my null hypothesis cannot be rejected, let`s go back to work".

2) Gretl - Gnu Regression, Econometrics and Time-series Library [sourceforge.net] - a great tool for econometric analysis. If you are interested only in econometrics, I find it much more powerful than PSPP. If you are an R guru, you can use Gretl (which can be operated from a GUI or a CLI) for most calculations and, whenever you find a dead end, send the data to R.

For me, R is an incredible beast that I would like to tame. But programs like PSPP or Gretl (and SPSS, eViews, etc) can help me in so many situations that I don`t find myself needing R that much.

Re:It's for 'Statistical' computing (0)

Anonymous Coward | more than 4 years ago | (#32960200)

SPSS is a joke. SAS is fine if you have the money to pay for it and don't care about doing anything developed over the last decade. R is growing faster and is used much more by serious statisticians to implement new ideas. Of course, then there is the fact that "SAS also makes people into worse statisticians", to quote Gelman (http://www.stat.columbia.edu/~cook/movabletype/archives/2009/01/r-in-the-news.html).

Re:It's for 'Statistical' computing (1)

kklein (900361) | more than 4 years ago | (#32961454)

R is growing faster and is used much more by serious statisticians to implement new ideas.

I definitely agree with you there. It's easy for them to get their ideas into motion with R, since it's open.

That doesn't necessarily make it better for most people, though. That makes it better for statisticians. Most users of statistics don't need to be on the cutting-edge. In fact, they might need to lag a bit, because peer-reviewers may not accept your paper if they don't actually understand it. I work a lot with IRT, and when I write a paper, I basically have a big chunk of boilerplate I paste in to explain what it is and how the various models work. My research is only as good as my ability to explain it, and if it swerves off-course and becomes a statistics lesson, that just isn't good.

I use SPSS because that's what most people in my field use.

Re:It's for 'Statistical' computing (1)

kklein (900361) | more than 4 years ago | (#32961426)

I use R from time to time. It's great for banging out a quick-and-dirty graph or something. It's so straightforward that if you really know exactly what it is you want to do, it's really fast to do it in R.

However...

I don't think I'll be using it that much now that I was able to get SPSS with the Advanced Stats pack onto my research budget. I'd been using a cracked copy of 11.5 for years, and that's why I had migrated to R. Now that I have SPSS, and didn't have to pay for it... I guess I don't really see the point.

Don't get me wrong. R is unbelievably awesome for free, and isn't even that hard to get the hang of. But when someone else is picking up the tab, SPSS is also free, is easy to use, has very nice documentation, and is supported by all sorts of other software tools I use. Out of all the stuff I use, only LimeSurvey (also FOSS) explicitly supports kicking out files formatted nicely for R, whereas everything else (a bunch of IRT software--I'm a tester--people not programs) just supports .xls and .sav for SPSS...

R is great. Great. But SPSS (I just realized that it's been called PASW for a couple years, but no one uses that name--it's unpronounceable) is the whole package. Overpriced, definitely, but most of its users don't actually pay for it.

Re:It's for 'Statistical' computing (2, Insightful)

u38cg (607297) | more than 4 years ago | (#32961902)

You might think of the difference between, say, Python and C. Both are Turing complete languages, but they are ideal for different problem domains. Or as a professor of mine put it, "SPSS and SAS are, mostly, for solved problems. R is for unsolved problems."

Re:It's for 'Statistical' computing (1)

DynaSoar (714234) | more than 4 years ago | (#32973592)

Thank you all for your comments, I've learned a lot, including the fact that others use the more mundane commercial programs to good advantage too.

The best comment was u38cq's "SPSS and SAS are, mostly, for solved problems. R is for unsolved problems." Certainly most analyses that use such as T-tests (don't sniff -- it's what we do for fMRI. but on a massive scale) and ANOVA/MANOVA are more easily done with something made to do them rather than write code for such simple things. But it's not the problems that are solved or not, it's that the processes are refined or not.

What SPSS etc. can't do for me is comparing a stack of similar X/Y plots, and cranking out a plot that gives the average, +variance, -variance, and s.d., then comparing two or more such stacks statistically to tell me how similar or different they are (the two are not the same), in total, across each column or line, point by point, and especially within areas that are significant peaks or sinks.

If someone wants to really make a splash, look up SPM (statistical probability mapping), the analysis we use on fMRI data, come up with something better, and write it. SPM does tens of thousands of t-tests simultaneously and tells us where things differ between conditions or some such. The correction required to keep it all p .05 is astounding, 12 decimal places being common. The data get skimmed so hard that ... look up the guy that found that dead salmon can recognize human facial expressions, that's what can happen and it's a big reason why fMRI sucks (the other having to do with being unable to tell excitatory from inhibitory activity).

So, maybe I'll write R into my next proposal.
What?
Free?
Great, so I'll use the money to buy a programmer instead.

mmm, R! (1)

hubertf (124995) | more than 4 years ago | (#32969904)

R is a very impressive, mature program that does a hell of a job.
I best liked connecting R data sets to a PostgreSQL database
for my PhD thesis, and then doing statistical data on SQL selections
without bothing about the SQL bits any more.

Also, I see lots of universities in Germany step up and teach R, which I think is good.

  - Hubert

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?