Skip to content

The Difference between Software Development and Data Science

I am fortunate enough to get regular messages from recruiters on LinkedIn asking to speak with me about software development jobs.  Here’s the thing… I’m not a software developer, I do data science and data analytics.  For the last seven years, my job title has included the words “data” and “scientist” in the title.  I have never held a position with the words “Software” and “Developer” in the title.  I have taught and am currently teaching classes with titles such as “Data Science for Security Professionals” and “Applied Data Science for Security”.   All of this is on my LinkedIn profile, yet despite this, the messages continue.

On some level, it makes sense.  If you look at my resume, you’d see that I have a degree in computer science, experience with various coding languages, and projects on github.  Hell, I’m a committer for Apache Drill…

So what’s the difference between a data scientist and software developer?

Let me first say that I have nothing against the profession of software development.  Software developers have changed the world for the better on an unimaginable scale, and for that I’m eternally grateful.  I do not mean to disparage in any way software developers or make it sound that I think I’m better than a developer in any way.  I have nothing but respect for software developers who have spent countless hours building tools, apps and programs that I use every day.  I wish I could code at that level.

So, what’s the difference? Well, I believe that while the skills are technically similar, how they are applied are quite different.  Consider the difference between a cabinet maker and a carpenter.  Both work with wood, use common wood-working tools such as saws and hammers but their work product are quite different.  Carpenters typically build things like houses and buildings whereas cabinet makers build…cabinets.  There are many other examples of professions which use similar skill sets for different purposes, such as physicists and engineers, surgeons and pathologists, and I could go on.

Both developers and data scientists need to understand technology and they need to have a good grasp of math, albeit the math that data scientists use regularly will be more statistical in nature, whereas the math that developers use will be more “computer science-y” in nature.  (I took a ton of CS/Math classes and I really have no way of describing them other than computer science-y).  But here’s where the two groups really start to diverge.

Data scientists write code as a means to an end, whereas software developers write code to build things.  Data science is inherently different from software development in that data science is an analytic activity, whereas software development has much more in common with traditional engineering.  Data scientists tackle problems such as identifying fraudulent transactions, or predicting which employees are likely to leave a company.  Software developers can take the data scientists models and turn them into fully functioning systems with production-quality code.  Software developers tackle problems like getting an algorithm to run more efficiently, or building user interfaces.

Update: Credit for the Venn Diagram goes to Stephan Kolassa who originally published it here: https://datascience.stackexchange.com/a/2406/2853. I’ve always felt that Drew Conway’s original diagram, despite becoming the de-facto standard for defining data science lacked something. I created my own which included some of the things on Stephan’s but it was not nearly as elegant as Stephan’s.

Who Cares?  Isn’t this just Semantics?

No… It matters a lot because if you hire a data scientist to do software development work they will run for the hills.  Right now data science is a high demand skill and if you give a good data scientist work that doesn’t align with their skillset or interests, they will leave in short order.  I don’t know if the inverse is true, but I do know that us data science types are pretty protective of our pedigree and title.

More importantly than title, is that if you hire a data scientist and expect them to be a software developer, you are wasting a lot of your time and money.   If you hired a data scientist, put them to work identifying opportunities to use data science in your organization.  If they are any good, they’ll find many places in which data science techniques can be used to improve your organization.  If they’re really good, they’ll teach your development teams some data science skills and then you’ll have a whole team of people looking for opportunities to deploy data science!

Share the joy

2 Comments

  1. Reading this leaves me feeling data scientists use pre existing software tools to draw meaning from existing data. Is this generally the case?

    • Charles Givre Charles Givre

      Hi Dennis,
      A lot of the time data scientists do in fact make extensive use of analytic libraries rather than implementing algorithms from scratch. The python examples would be NumPy, SciPy, Pandas, Scikit-learn, Matplotlib and others. The R examples would be dplyr, caret, ggplot and others.
      Really it makes sense to use these libraries as they contain many optimizations that make the computation much more efficient than if you were to implement them yourself. IMHO data scientist’s time is better spent using the libraries than building and debugging these algorithms.

Leave a Reply

Your email address will not be published. Required fields are marked *