An Introduction to Statistical Learning: with Applications in R

Name: An Introduction to Statistical Learning: with Applications in R
Author: Daniela Witten, Gareth James, Robert Tibshirani & Trevor Hastie

by Daniela Witten, Gareth James, Robert Tibshirani & Trevor Hastie

Buy on Amazon

Recommended by

Roger D. Peng

"This book is written by a powerhouse of authors in the machine learning community, true authorities in the field. But beyond that, they’re also great writers. There’s another book by the same publisher called The Elements of Statistical Learning which is a bit more advanced, but this one can capture a much wider audience. If you really want to get into the guts of the models and statistics tools that are being used today, this is a great reference and a great way to learn. They have a ton of code out there to go with the book, including an R package to implement the models, run the examples, etc. Of the many books that you could choose in this category, it’s really one of the better ones. I’ve been using R for twenty years now. I started using it when I was in school, and I didn’t know much about Python back then; I saw it mostly as a scripting language. Now, twenty years later, it’s hard to teach an old dog new tricks! That said, I think it would have been a very different story if R had not evolved the way that it has. It has grown tremendously, the ecosystem and the community have become huge, to the point that there are more things that you can do with it now that you could possibly learn. For the work that I do, it’s the perfect tool. Support Five Books Five Books interviews are expensive to produce. If you're enjoying this interview, please support us by donating a small amount . There are different phases in data science on any given project, and some tools are more suitable to some phases than others. In the first phase of exploring and looking at the data, I think that pretty much any tool is useful there; all you want is something you’re familiar with, so that you can work quickly without the tool getting in the way. But as you get to the final stages like modeling and producing the final results, you want to make sure that you can ensure things like reproducibility, consistency, and robustness. And here, Python and R are obviously two good languages. Those things aren’t necessarily mutually exclusive. Most people can actually start with some off-the-shelf algorithm and see how it performs; but that will only take you so far. As time goes on, I think you’ll quickly reach the limits of off-the-shelf solutions. Once that happens and for whatever reason your pre-packaged machine learning software isn’t performing what you need it to do, you have to know what’s going on if you want to make changes, or even do something completely new. Having an understanding of what’s going on underneath the software, and being able to make improvements, is like an edge that you can carry with you in your career."

Data Science · fivebooks.com