Skip to content

Academics and Data Science

I received the following comment on an article: Let’s Stop Using the Term Fake Data Scientist and thought it merited a response.  Usually the comments I receive are constructive even if they disagree with what I wrote, but this particular comment, demonstrated an arrogance which I believe is a huge problem in the data science world.

You can of course read the original article here, but the basic point was that data science is interdisciplinary field–consisting of a mixture of computer science, applied mathematics, and subject matter expertise, with a smattering of data visualization and communication skills.   I believe that it is inappropriate to label someone as a fake simply because their skillset is proportioned differently than many math-centric data scientists.  I’m also a believer in Dr. Carol Dweck’s thesis on having a growth-oriented mindset (as stated in her book Mindset) and that people who might be working in data science but whose skills need development in a certain area, should be given instruction and assistance rather than derogatory labels.

The author of the comment, didn’t identify himself (or herself) but if I had to hypothesize, I would guess that the author of the comment is male, probably works in academia, and has a Ph.D in a hard science such as Physics or Math.  My responses are in plain text while the comment is in italics.

This article is a crap and meant to sooth people with undergrad and/or masters degrees titled as “Data-Scientists”.

This opening statement is very revealing about the author’s world view. He clearly views himself as superior to people who do not have a doctorate and does not believe that they are qualified to call themselves a data scientist.

Term “Scientist” is awarded to some chosen people which means something. A scientist (whether a “data-scientist” or any other) cares about answering “Why” before “How”. A data-scientist is more into understanding the very intrinsic nature of data. His/her analytical minds constantly tries to find underlying patterns in data.

This statement is just dripping with condescension.  I’m glad to know that “Scientist” is a title which is “awarded to some chosen people.”  Out of mere curiosity, who chooses whether someone merits being called a scientist?  What are the criteria to be “chosen”?  Is holding an advanced degree a requirement? Was Thomas Edison a scientist?  How about the Wright Brothers, Steve Wozniak, Gregor Mendel, Michael Faraday?  As you might have guessed, all those individuals made major contributions and did not have advanced degrees in their fields.

Ph.D. means Doctor/Doctorate in Philosophy. It is awarded to those who have demonstrated/proven and have accepted abilities (by well-known scientific community) in a particular field. These abilities are demonstrated not just once, and not just in one way…these are proven over and over in different ways. Some examples to prove these abilities are:

(1) First getting into a reputable Ph.D. program at a competitive university. Just getting there is a big challenge. You need a track record of good undergraduate and/or masters degree, references, etc.
(2) Winning funding from competitive sources, such as, NSF, NASA, DoD, DoE, etc.   Nope
(3) Ability to teach undergraduates by performing TA duties.  Not unique to doctoral students
(4) Being a research fellow/assistant for someone who has a lot more knowledge and experience than you.  Also not unique to doctoral students
(5) Win funding to attend conferences and to present your research-findings before scientific community.  Nope
(6) Win funding for scientific workshops, camps, etc.  Nope
(7) Pass advanced level courses in your disciplines.  This is also required of Masters’ Students
(8) Pass qualifying/comprehensive exams in your areas of research. 
(9) Be known to current scientific community AND to know current scientific community. Nope
(10) Develop some meaningful scientific methods and discoveries.  Nope
(11) Publish your findings in reputable conferences and/or journals (not some crappy, low level conferences/journals).  Nope
(12) Convince scientific and/or federal organizations to give you research grant (which is very big achievement as you are evaluated by your peers anonymously and they trust your abilities by providing your funding coming from tax-payers money).   Nope
(13) Graduating with a Ph.D.. It is estimated that only 64% Ph.D. students actually graduate with a Ph.D. degree in the USA. Most drop-out as they are unable to complete above steps. They complete only a few, but not all so they fail. Also, there are only 1% PhDs in the USA. So earning a Ph.D. from a reputable institution under the supervision of a real scientist is a big achievement. It changes your title from Ms./Ms. to Dr. Does that mean something to undergraduates/Masters?  So what?  I worked at the CIA which hires ~.2% of applicants. Does that make me superior to the hundreds of thousands of people who weren’t hired? (It doesn’t)

The thing about the list above is that it is a mix of requirements to become a professor/researcher and not to earn a Ph.D.  I know many Ph.Ds who work in industry and couldn’t care less about publishing through “reputable conferences or journals”. Getting grants is also not a requirement for Ph.D, but it is part and parcel of working in academia.  Again, I know many data scientists, engineers, medical doctors (a.k.a.real doctors) who could care less about publication.

The point here is not to belittle or demean academic researchers.  Far from it.  But as most people have probably noticed, there are no departments of data science (not yet at least) at any major universities.  Also, if you look at the real pioneers in data science–those individuals who made early ground-breaking contributions to the field, they don’t exactly meet these criteria.

I can continue defining characteristics of a “Real Scientist”. Giving someone a title of “Data Scientist” without him/her having any solid track-record listed above is a joke and an abuse to scientific community.

Well, in that case, many of the best data scientists in the world would be a joke to the author.  Virtually every list of best data scientists in the world has many individuals who do not have the “above track record”.

Above mentioned scales are standard in most universities. It means that not everyone has ability to sustain pressure of carrying out research and prove himself/herself. This scale filters out those who do not deserve to be a scientists. It could be due to personal problems or perhaps nature did not give them ability to cross that line which separates a “Real scientist” from a “Non-real scientist”.

Again, I think here, the author is confusing a researcher/professor with an applied scientist.

It is said that when you can not give someone monetary promotion, give them title (recognition) award so that they calm down and feel better about themselves. Most IT companies now-a-days award you “Data-Scientist” position even if you have an undergraduate/masters degree.

This person clearly has no idea what he/she is talking about.  I know of no companies who award you a “data-scientist” position to calm you down and make you feel better about yourself.  I also really love the implication that nobody is entitled to the title “Data Scientist” if you “only” have an undergraduate/masters degree.   I’ll repeat the earlier point about some of the world’s best data scientists “only” have undergraduate/masters degrees.

Being able to perform some basic statistical analysis, writing regression/classification/clustering model in R, Python, etc. does not make you a data-scientist. These are just tools. Everyday new tools appear in market. Nothing special about them. You can for sure call yourself a “Data-Analyst”, but trust me, if you meet an actual scientist, s/he will be humored to hear you calling yourself a “Data-Scientist” with an undergraduate/masters degree.

Again, this is where the author demonstrates his complete ignorance of the field.  Yes, doing some simple analysis in R or Python does not make you a data scientist.  However, in my book, you can call yourself a data scientist if you can take an organization’s raw, messy data and extract value from this data in a rigorous, repeatable process to solve a business problem.

I’ve never understood the academic attitude towards tools.  When I was an undergrad, my CS department refused to teach C++ or C because it was a “skill” which they felt it was beneath them.   With respect to the tools, in the data science world–especially the big data arena–the tools are developing so rapidly that if you are not constantly keeping up to date with what is available, you could very easily find yourself wasting enormous amounts of time and money, and ultimately losing ground to your competition.  Let me put it another way, understanding the tools in the data science ecosystem is an absolutely vital part of being a data scientist.

Finally, a real data-scientist does not give a damn about business/profit, etc.. For a data-scientist a data is just a mixture of numbers, characters, sentences, etc. Data-scientist in interested only in finding hidden patterns in data. It is similar to what a patient is to a medical doctor; just a subject who has some known/unknown symptoms. Doctor’s job is to treat those symptoms whether that patient is the president or a criminal.

See, this is where the author is totally wrong.  While a data scientist may not give a damn about business/profit, the company who hires him or her to solve their data problems and create value from their data will definitely care about business/profit.  I would submit that if a data scientist wishes to remain employed, it is vital that a data scientist have an intimate understanding of the domain in which they work.  At a higher level, what this also shows is that the author does not grasp that data science is an applied discipline–similar to engineering or perhaps medicine.

Finally, I’ll conclude by stating that this person’s attitude is precisely what is wrong with data science in many organizations: specifically that they are populated with arrogant jerks who don’t understand that a data scientist’s job is to extract value from an organization’s data.  

Data science work is interesting, challenging and exciting work and has the opportunity to have an enormous impact in countless disciplines, from online advertising to medical research.  However, we really need to leave the exclusivity and arrogance at the door.   The Lubavitcher Rebbe used to say to people that “If you know Aleph (the first letter of the Hebrew alphabet), teach it to someone who doesn’t.”  The implication is that no matter what level of knowledge you have, there is probably someone who lacks that and that you can teach them something.  If we want to inspire others to learn more about data science and incorporate it into their work, the way to do this isn’t through condescension and arrogance, but rather to have a positive attitude and willingness to mentor those who haven’t yet learned these techniques.

Share the joy

One Comment

  1. Data Science Corporate Training Data Science Corporate Training

    This is a very informative discussion about Data Science
    Thanks for sharing

Leave a Reply

Your email address will not be published. Required fields are marked *