A lot of people ask me about how to further develop their data science skills. I can only share what works for me, which is that I read a lot, and I really mean a lot. With that said, here are my favorite books and blogs that I would recommend to anyone who wants to learn more about data science.
Non Technical Books
Taming the Big Data Tidal Wave
(From Inside flap)
Big data is sprouting up everywhere. Leveraging it today will drive competitive advantage; ignoring it puts your organization at risk of lagging behind the competition. To keep your edge razor-sharp, you must aggressively pursue capturing and analyzing these new data sources to extract the insights that they offer . . . starting today! Taming the Big Data Tidal Wave provides you with the tools, processes, and methods you need to tackle big data and cultivate a culture of innovation and discovery in your organization.
With its unique focus on the effective analysis of these increasingly popular data sources, Taming the Big Data Tidal Wave shows you step by step how to design, incorporate, and profit from a world-class advanced analytics ecosystem in today’s big data environment. Author and analytics expert Bill Franks shares his “from the trenches” perspective to show you how big data is changing the world of analytics. He maps out an easy-to-follow action plan to help your organization uncover new business opportunities, implement new business processes, and make better-informed decisions.
So many books on this topic are frankly either thinly disguised marketing BS, or just plain BS. I particularly liked the third part where Mr. Franks discusses the people and approaches for integrating big data into an organization. I found myself nodding in agreement on many occasions. There was one particular instance in which Mr. Franks discusses the phenomenon of “Shadow IT”, or unauthorized IT. It amazes me how his point, which was that IT should be an enabling force for corporations, often is far too restrictive and the result is an outcropping of shadow IT. If you are interested in learning more about integrating advanced analytics into an organization, I would highly recommend this book.
I’m not going to beat around the bush and I’ll just state that I really liked this book. In Creating a Data Driven Organization, Anderson effectively discusses what it really means to be a data driven organization. I’ve given copies of this book to many of my managers, but I would recommend this book for both data scientists as well as individuals who are managing data driven organizations. For data scientists or other analytic practitioners, this book will help you think in terms of creating an organization around data.
Data Driven Security is a really solid overview of the entire data science process, but what makes this book particularly valuable is that it was written by security professionals for security professionals. Nearly every other data science book I’ve read was written to the general case, whereas this book is unique in that it focuses on a particular use case: the application of data science to security. On top of that, it is very well written. Rudis and Jacobs use both R and Python examples in this book which is good in that their focus are the concepts rather than the specific implementation. I also really like that this book covers topics beyond machine learning and actually shows the application of these techniques as well as visualization.
Automate the Boring Stuff with Python
Automate the Boring Stuff with Python is an excellent book for those seeking to learn how to really apply python in real world scenarios. If you’ve never coded before and want to learn how to do so in a business context, this is the book for you. I use this book in my beginning python classes, and find that anyone can understand the concepts presented here. There also is a companion website (https://automatetheboringstuff.com) where you can read virtually the entire book online for free. My favorite part about this book is that it really shows you the relevant parts of the python language. You won’t learn how to make video games, or create a py-pet, but you will learn how to open spreadsheets and crunch data. Bottom line, if you’re a working professional and want to learn to code, this is the book for you.
Python for Data Analysis
Python for Data Analysis by Wes McKinney is probably one of those seminal texts which every data scientist must own. Really… This book introduces you the core data analysis modules of Numpy, SciPy and Pandas and introduces the concepts of vectorized programming–a concept which carries over to R, or Spark or pretty much any other modern data science coding language that you will use. McKinney is a major contributor to both Pandas and NumPy so you really are getting the information directly from one of the key developers of the ecosystem.
Update: An Updated 2nd Edition was just released!