SocProf reviewed Data Mining for the Social Sciences by Paul Attewell
Review of 'Data Mining for the Social Sciences' on 'Goodreads'
5 stars
I really liked this book but this is not for beginners. The book assumes familiarity with standard statistical techniques and their shortcomings in order to argue for the application of data mining techniques to the social sciences.
The book is divided into two parts: (1) a more conceptual view of what data mining why social scientists should adopt data mining techniques; (2) a series of worked examples using different techniques and different softwares (jmp Pro, SPSS, Stata, R).
If you are familiar with data mining, already, you will recognize a lot of these techniques and concepts: cross-validation, LASSO, VIF regression, PCA, clusters, classification techniques, random trees and random forests, association rules, and LCA.
The authors do a really good job of explaining the value of these techniques and on what type of data they perform best. In the Kindle edition, the screenshots of software screens are too small (fortunately, I …
I really liked this book but this is not for beginners. The book assumes familiarity with standard statistical techniques and their shortcomings in order to argue for the application of data mining techniques to the social sciences.
The book is divided into two parts: (1) a more conceptual view of what data mining why social scientists should adopt data mining techniques; (2) a series of worked examples using different techniques and different softwares (jmp Pro, SPSS, Stata, R).
If you are familiar with data mining, already, you will recognize a lot of these techniques and concepts: cross-validation, LASSO, VIF regression, PCA, clusters, classification techniques, random trees and random forests, association rules, and LCA.
The authors do a really good job of explaining the value of these techniques and on what type of data they perform best. In the Kindle edition, the screenshots of software screens are too small (fortunately, I also had a paperback to follow along). The only issue I had is that the authors use the same dataset over and over. I would have liked some more diversity as well more detailed explanations of software outputs. But the use of one dataset flies in the face of trying to convince social scientists to make greater use of data mining techniques.
In the end, I really liked the book and found it very clear, but again, it is not for beginners.