There are a lot of bytes floating about the internet making the case for who is a data scientist, including a whole subset of posts about who are “real” data scientists. I have observed that this philosophy manifests itself in data science training programs as well. Right now, if you want to become a data scientist, it’s an all-or-nothing proposition. I would place the available programs in three categories:
- Online Training from Companies such as Coursera, Udacity and others
- 12-14 week bootcamp-style data science courses
- University programs
Many of these programs look amazing and if I had the time I would love to take one (or all) of them. With that said, all these programs take the approach that the students will emerge from these programs as data scientists and as result, they teach a comprehensive set of skills to their students.
A Different Approach
There are certain professions in which you either are a member or not, and non-members practicing the profession could have disastrous consequences. Medicine for instance. You either are a medical doctor or you are not, and when individuals who are not qualified to practice medicine are dispensing medical advice, the results are often disastrous–as best illustrated by the anti-vaccine movement which at this point is being pushed by many celebrities rather than legitimate medical professionals. Engineering would be another example. You wouldn’t want to drive over a bridge that I designed because well… I have no idea how to do it.
However, these professions all have professional associations and government regulators who determine who has the requisite skills to call themselves members of that profession. This does not exist for data science, for good reason. Data Science as practiced today, is a mixture of several other disciplines, and can be applied to nearly any other discipline. The breadth of skills that fall under the data science umbrella is so enormous.
In 2013, O’Reilly published a short booklet entitled Analyzing the Analyzers in which they identified four main groupings of individuals who considered themselves data scientists which were:
- Data Creative
- Data Developer
- Data Researcher
- Data Businessperson
You can see from the graphic on the left that the skill breakdown is far from homogenous. Data Science differs from other professions in other professions–such as medicine–practitioners MUST have a core set of knowledge in order to be a member of that profession. As illustrated in the chart on the left, that is not really the case for data science. One could be a machine learning master, yet have no experience with big data and still legitimately be said to be doing data science work. Therefore, I’d like to propose that data science be viewed as a spectrum of skills rather than a binary condition.
In short, I believe that data scientists spill too much ink trying to label individuals as individuals (or as “fake” data scientists). Instead data scientists should spend time determining what skills actually make up data science.
Teaching Data Science Skills Rather than Data Science
This slight twist on the approach has implications for training. If data science is no longer an exclusive club, but rather a collection of skills, those skills, or portions thereof can be taught to individuals who have no interest in becoming data scientists. In other words, data science training could be about teaching anyone who does analytic work data science skills which they can incorporate into their workflow. The goal of this kind of training would not be to mint full data scientists, but rather, teach individuals data science skills relevant to their profession. This kind of training could be a lot shorter and less comprehensive, but in the end, I believe that it would be more practical for the thousands of individuals out there seeking to incorporate data science into their work but don’t really have the time to put into it or the desire to become data scientists. Mind you, I am not proposing dumbing data science down, but rather I am suggesting that in addition to what is currently offered, data science training can be effective if it is tailored to specific audiences, and focuses on the techniques that would be directly relevant to those audiences.
The data science world today views data science as a binary condition: either you are or not a data scientist. However data science should be viewed as a collection of skills which virtually any professional can incorporate into their workflow. If data science training was viewed in this context, organizations could increase their use of data science by training their current staff in the data science skills that are relevant to their work.