Review of 'Data Science in Education Using R' on 'Goodreads'
5 stars
This book is directed at K12 educators but a lot is applicable to higher ed. However, it is more advanced, in terms of R programming, than the authors seem to assume. There is a big leap between the early, foundational chapters, and the walkthrough (worked problems) chapters that actually involve R coding. Conveniently, the online (and free) edition of the book allows copying of blocks of code so one does not have to retype to follow along.
I wish there'd been more attention and space given to explaining output of the manipulations involved. There were a couple of weird things in the early walkthroughs after going from wide to narrow dataframes (this is something that required more explaining), such as a same student ID getting different genders. Similarly, in the multilevel models walkthrough, there should be some explaining on how to choose models or how to interpret the output, and …
This book is directed at K12 educators but a lot is applicable to higher ed. However, it is more advanced, in terms of R programming, than the authors seem to assume. There is a big leap between the early, foundational chapters, and the walkthrough (worked problems) chapters that actually involve R coding. Conveniently, the online (and free) edition of the book allows copying of blocks of code so one does not have to retype to follow along.
I wish there'd been more attention and space given to explaining output of the manipulations involved. There were a couple of weird things in the early walkthroughs after going from wide to narrow dataframes (this is something that required more explaining), such as a same student ID getting different genders. Similarly, in the multilevel models walkthrough, there should be some explaining on how to choose models or how to interpret the output, and what to do when one model returns almost no significant correlation, but a small tweak returns significance on almost all the variables.
Lastly, in the random forests walkthrough, the last manipulation returned wildly different results that the book, even though I used their exact code. Now, using Windows, I was warned that the results might be a little different, and they were, but minimally so, until the last bit of code where the differences were massive.
That being said, there is a lot to recommend this book. It is written very clearly. The dataedu package is very helpful, as is the online version of the book where the code can be copied (I did get a paperback copy nonetheless, to support the effort).
I am tempted to put this book in the hands of some of our data-obsessed administrators just to show them how wrong they're going about a lot of things.
