Teaching Data Science in English (not in Math)

chalkboard I spend most of my time now teaching others about data science and as such I do a lot of research into what is going on with respect to data science education. As such I decided to take an online machine learning course and it led me to a serious question: why don’t we use pseudo-code to teach math concepts?

Consider the following:

This is the formula for Residual Sum of Squares, which if you aren’t familiar, is a metric used to measure the effectiveness of regression models.

Now consider the following pseudo-code:
residuals_squared = (actual_values - predictions) ^ 2 RSS = sum( residuals_squared )
This example expresses the exact same concept and while it does take up more space on the page, in my mind at least, is much easier to understand. I don’t have any empirical data to back this up, but I would suspect that many of you would agree.

Greek Letters are Jargon

Another thing I’ve realized is that part of the reason math becomes so difficult for people is that it is entirely taught in jargon, shorthand, and shorthand for shorthand. The greek letter sigma represents a sum, but if you don’t know that then it represents confusion. If you aren’t familiar with this formula, then the other Greek letters could be meaningless, yet if we used pseudocode, any part of this formula could be rewritten using English words (or any other language) and thus easily understood by anyone.

I’m working on developing a short course in Machine Learning called Crash Course in Machine Learning which I will be teaching at the BlackHat conference in August. I’m curious as to what people think about presenting algorithms using pseudo-code instead of math jargon. I suspect it will make it easier for people to understand without diluting the rigor.

Share the joy

5 Comments

Demystifying Linear Regression with Python – revelar.io

[…] a compelling post about some challenges in machine learning education. You can read it for yourself here, but I’d like to build upon what he posted; it’s bothered me for a long time that the […]

February 28, 2016 Reply
Will Dwinnell

You make a good point. I’ve written a small posting related to this, on my own Web log, “Data Mining in MATLAB” ( http://matlabdatamining.blogspot.com/2016/03/matlab-as-near-pseudocode.html ), revolving around my MATLAB implementation of your pseudo-code:

residuals_squared = (actual_values – predictions) .^ 2;
RSS = sum( residuals_squared );

March 7, 2016 Reply
Victor

This would be wonderful! Explaining data science concepts in this manner would hit the core of the concepts head on. This would be similar to explanations of math concepts on http://www.betterexplained.com

I took a course in data science with General Assembly in DC and it was helpful.
I urge you to check out this book (http://www.amazon.com/Fundamentals-Machine-Learning-Predictive-Analytics/dp/0262029448/ref=sr_1_4?s=books&ie=UTF8&qid=1457751712&sr=1-4&keywords=machine+learning)

Great blog! bookmarked

March 12, 2016 Reply
MATLAB As (Near-)PseudoCode | A Bunch Of Data

[…] “Teaching Data Science in English (Not in Math)”, the Feb-08-2016 entry of his Web log, “The Datatist”, Charles Givre criticizes the use […]

April 3, 2016 Reply
Teaching Data Science in English (not in Math) — The Dataist – jiangose

[…] via Teaching Data Science in English (not in Math) — The Dataist […]

October 31, 2017 Reply

Teaching Data Science in English (not in Math)

Greek Letters are Jargon

Share this:

Related

5 Comments