In the early days of data science, many data scientists came with a math background and as a result I think the field took on some bad practices, at least from a computer science perspective. In this post, I’m going to introduce ten coding practices that will help you write better code.
You might say that better is a subjective term, however, I believe that there are concrete measurements to define good vs. bad code.
- Good code is easy to understand and thus will take less time to write and most importantly debug
- Good code is easy to maintain by other people besides the author
- Writing code well will avoid hidden intent errors–ie errors that exist such that your code executes and does what it’s supposed to do most of the time. Intent errors are the worst because your code will appear to work, but all of a sudden, there will be some edge case or something you didn’t think about and now your code breaks. These are the most insidious errors.
- Good code is efficient.
Ultimately, taking on good coding practices will result in fewer errors, which directly translates to more work (value) being delivered and less effort being spent on fixing and maintaining code. Apparently this is a bigger issue than I realized. When I was writing this article, this other article got posted to my Twitter feed: https://insidebigdata.com/2019/08/13/help-my-data-scientists-cant-write-production-code/. I’ll try not to duplicate the points this author made, but in general, the biggest difference that I see between code most data scientists write and production code is that data scientists tend not to think about reusability.