Skip to content

Category: Data Science

Tips for Debugging Code without F-Bombs – Part 1

Debugging code is a large part of actually writing code, yet unless you have a computer science background, you probably have never been exposed to a methodology for debugging code.  In this tutorial, I’m going to show you my basic method for debugging your code so that you don’t want to tear your hair out.

In Programming Perl, Larry Wall, the author of the PERL programming language said that the attributes of a great programmer are Laziness, Impatience and Hubris:

  • Laziness:  The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don’t have to answer so many questions about it. Hence, the first great virtue of a programmer.  (p.609)
  • Impatience:  The anger you feel when the computer is being lazy. This makes you write programs that don’t just react to your needs, but actually anticipate them. Or at least pretend to. Hence, the second great virtue of a programmer. See also laziness and hubris. (p.608)
  • Hubris:  Excessive pride, the sort of thing Zeus zaps you for. Also the quality that makes you write (and maintain) programs that other people won’t want to say bad things about. Hence, the third great virtue of a programmer. See also laziness and impatience. (p.607)

These attributes also apply to how to write good code so that you don’t have to spend hours and hours debugging code.

The Best Way to Avoid Errors is Not to Make Them

Ok… so that seems obvious, but really, I’m asking another question and that is: “How can you write code that decreases your likelihood of making errors?”  I do have an answer for that.   The first thing is to remember is that bugs are easy to find when they are small.  To find bugs when they are small, write code in small chunks and test your code frequently.  If you are writing a large program, write a few lines and test what you have written to make sure it is doing what you think it is supposed to do.  Test often.  If you are writing a script that is 100 lines, it is MUCH easier to find errors if you test your code every 10 lines rather than write the whole thing and test at the end.  The better you get, the less frequently you will need to test, but still test your code frequently.

Good Coding Practices Will Help You Prevent Errors

This probably also seems obvious, but I’ve seen (and written) a lot of code that leaves a lot to be desired in the way of good practices.  Good coding practices mean that your code should be readable and that someone who has never seen your code before should be able to figure out what it is supposed to do.  Now I know a lot of people have the attitude that since they are the only one working on a particular piece of code, then they don’t need to put in comments.  WRONG WRONG WRONG  In response, I would ask you in 6 months, if you haven’t worked on this, would you remember what this code did?   You don’t need to go overboard, but you should include enough comments so that you’ll remember the code’s purpose.

Here are some other suggestions:

  1. Adopt a coding standard and stick to it:  It doesn’t matter which one you use, but pick one and stick to it.  That way, you will notice when things aren’t correct.  Whatever you do, don’t mix conventions, ie don’t have column_total, columnTotal and ColumnTotal as variables in the same script.
  2. Use descriptive variable names:  One of my pet peeves about a lot of machine learning code is that they use X, Y as variable names.  Don’t do that.  This isn’t calculus class.  Use descriptive variable names such as test_data, or target_sales, and please don’t use X, Y, or even worse, i, I, l and L as variable names.
  3. Put comments in your code:  Just do it.
  4. Put one operation per line:  I know that especially in Python and JavaScript, it is fashionable (and “Pythonic”) to cram as many operations onto one line as possible via method chaining.  I personally think in series of steps and it is easier to see the logic (and hence any mistakes) if you have one action per line.

Plan your program BEFORE you write it

I learned this lesson the hard way, but if you want to spend many hours writing code that doesn’t work, when faced with a tough problem, just dive right in and start coding.  If you want to avoid that, get a piece of paper and a pen (or whatever system you like) and:

  1. Break the problem down into the smallest, most atomic steps you can think of
  2. Write pseudo-code that implements these steps.
  3. Look for extant code that you can reuse

Once you’ve found reusable code, and you have a game plan of pseudo code, now you can begin writing your code.  When you start writing, check every step against your pseudo code to make sure that your code is doing what you expect it to do.

Don’t Re-invent the Wheel

Another way to save yourself a lot of time and frustration is to re-use proven code to the greatest extent possible.  For example, Python has a myriad of libraries available at Pypi and elsewhere which really can save you a lot of time.  It is another huge pet peeve of mine to see people writing custom code for things which are publicly available.  This means that before you start writing code, you should do some research as to what components are out there and available.

After all, if I were to ask you if you would rather:

  1.  Use prewritten, pretested and proven code to build your program OR
  2. Write my own code that is unproven, untested and possibly buggy

the logical thing to do would of course be to do the first.

In Conclusion

Great programmers never sit down at the keyboard and just start banging out code without having a game plan and without understanding the problem they are trying to solve.  Hopefully by now you see that the first step in writing good code that you won’t have to debug is to plan out what you are trying to, reuse extant code and test frequently. In the next installment, I will discuss the different types of errors and go through strategies for fixing them.

Leave a Comment

The Case for Generalist Data Scientists

I recently read an article by Daniel Tunkelang entitled Data Scientists: Generalists or Specialists? and it resonated with me.  I’ve been involved with hiring data scientists for some time now and I also get a lot of recruiters contacting me about various data science jobs.  My general observation is that when companies search for data scientists, they tend to use the equation (Machine Learning = Data Science), and tend to play down all the other skills that make up data science, such as creativity, critical thinking, data preparation etc.

Tunkelang writes:

Early days

Generalists add more value than specialists in a company’s early days, since you’re building most of your product from scratch and something is better than nothing. Your first classifier doesn’t have to use deep learning to achieve game-changing results. Nor does your first recommender system need to use gradient-boosted decision trees. And a simple t-test will probably serve your A/B testing needs.


Later stage

Generalists hit a wall as your products mature: they’re great at developing the first version of a data product, but they don’t necessarily know how to improve it. In contrast, machine learning specialists can replace naive algorithms with better ones and continuously tune their systems. At this stage in a company’s growth, specialists help you squeeze additional opportunity from existing systems. If you’re a Google or Amazon, those incremental improvements represent phenomenal value.

So, should you hire generalists or specialists? It really does depend—and the largest factor in your decision should be your company’s stage of maturity. But if you’re still not sure, then I suggest you favor generalists, especially if your company is still in a stage of rapid growth. Your problems are probably not as specialized as you think, and hiring generalists reduces your risk. Plus, hiring generalists allows you to give them the opportunity to learn specialized skills on the job. Everybody wins.

Read the complete post here on O’Reilly.com.  What needs to be noted here is that companies will need more specific skills as their analytics mature and evolve, however in the beginning creativity, competence and critical thinking are most likely the most important skills.  I tend to agree with a lot of what Tunkelang writes, and I do get the sense that a lot of hiring managers believe their projects are a lot more mature and advanced than they really are.  Thoughts?

Leave a Comment