Skip to content

Category: Training

Back to BlackHat…For the 5th Time!!

Happy belated New Year everyone! I’ve been taking a bit of a blog break as I’ve been quite busy between work, personal travel, and working on my startup GTK Cyber. But I’m back now and have some exciting news! My team and I have been accepted to teach Applied Data Science course once again at BlackHat in Las Vegas! This year we’ve made a major change to our course: it’s now a full four days instead of two!

Leave a Comment

Why more women don’t code: A heartbreaking story with a good ending

I’ve been reading a lot lately about the ills of the tech industry, with a few book reviews in my queue to finish, and I posted a question on LinkedIn about what inspired people to get into tech.  My motivation was to see if there was a difference in men and women.  My hypothesis is that there are societal and cultural factors which discourage girls and women from studying tech (math, Computer science, engineering etc) and hence there aren’t enough qualified women to fill the tech jobs, and ultimately we end up with the current state of affairs where men outnumber women 3 or 4 to 1 in most tech companies.

Anyway, I received the following private response from a former student to whom I shall refer as S.  S was a student in one of my recent classes and a delight to work with.  Her story is absolutely heartbreaking and needs to be heard.  I lightly edited it, only to remove some details which would identify her.

You can share my story but I am not ready to have my name on it. I am going to be looking for a job soon and not everyone will appreciate it. They will see me as slow and too old. I was 54 before I gave myself to permission to study tech and coding languages. Growing up, girls were not encouraged to study math. I was teased about my abilities in math, because I could not recite the times tables. I grew up believing that I couldn’t do math. At home, my brother received an early TI calculator. It was supposed to be shared between us but that didn’t happen. Besides being annoying, it was clear that electronics were not for girls.

I began my university studies in psychology which seemed the only science that did not require math. I was bored and I tried to learn math on my own. Actual classes involved grades and that was disastrous. I somehow passed all the mathematics prerequisites and ended up in graduate school for chemistry. During my quantum mechanics class, I struggled. I went for help from the professor. He realized that I couldn’t do the times tables verbally and completely humiliated me.

At my job, I learned about agile, innovation and human centered design. I loved that these ideas as they provided a framework and a fresh vocabulary to talk about science and problem solving instead of just math. I excelled at facilitating these techniques. Many of the prototypes we needed to wireframe involved a website or an application. I became curious about data and technology, but I would never let myself work in this area. The risk of humiliation was too great. My supervisor already realized that I stumbled verbally with numbers. I did not want to be in a position to lose my job while trying out new skills.

About the same time, I had a routine hearing evaluation and I was diagnosed with great hearing but a serious auditory processing disorder. The audiologist predicted that I probably had terrible problems with spoken arithmetic and verbal math. I was thunderstruck. How could he know that? I had been punished as a child for exactly this issue. I internalized it as part of my self image. Although I was a great reader, I was unable to recite the times tables and do my arithmetic. I couldn’t explain why, maybe I really was a bad kid. After digesting the audiologist’s report, I allowed myself to become more interested in data and technology.

I fight paralyzing “imposter syndrome” every time I sit in front of my computer. I began to take free classes on-line and go to meet-ups and learn even, when I couldn’t talk about it well. I joined groups for women who code.  I continue to learn and I just signed up for an intensive software engineering boot camp. I currently volunteer as a teaching assistant for introductory python at two different community women’s coding groups. I continue to attend meet-ups.  I am not yet where I want to be but I am finally allowed to move ahead. Data is going to change our world and I don’t want to miss out.

Leave a Comment

Going Back to BlackHat!

For the last three years, I have had the honor and privilege of teaching a data science class at the BlackHat conference in Las Vegas.  Well, I found out yesterday that I’ll be going back for a fourth year!   Together with my amazing colleague Austin Taylor (@HuntOperator), we will be teaching Applied Data Science and Machine Learning for Cyber Security.  It turns out that this is the only class at BlackHat this year about data science or machine learning!

Teaching a class at BlackHat is really a great experience, and quite terrifying at the same time.  You’re presenting a class to the best of the best in security, so you really have to know your stuff.  From my experience, the students are really on top of their game so it makes for very interesting and engaging sessions.

What’s New for This Year

This year’s class I have to say, will be the best one yet.  We’ve developed a lot of new material including a lesson about improving the performance of models, beaconing detection with Austin’s Flare library, anomaly detection with K-Means clustering and more.  I’ll be posting more about the course as we get closer to the event, but if you have any questions or requests, please let me know!  If you’re interested, don’t wait, register now!

 

 

Leave a Comment

Thoughts on Teaching Data Science

A big interest of mine is how to impart what little I know of the tools and techniques of data science to others.  When I was at Booz Allen, I taught numerous classes both for internal staff and for various clients.  I’ve also taught for Metis, O’Reilly Publishing and for the last three years, at BlackHat so I do have some experience in the matter.   I’ve looked at MANY data science programs to see if what they are teaching lines up what I’m teaching and I’d like to share some things which I’ve noticed which will hopefully help you build a better data science program.  My goal here is to share my mistakes and experiences over the years and hopefully if you are building a data science training program, you can learn from what I learned the hard way.  I make no claims to be the perfect data science instructor, and I’ve made plenty of mistakes along the way.

While I’m at it, I’ll put in a plug for an upcoming data science class which I am teaching with Jay Jacobs of BitSight Security at the O’Reilly Security Conference in NYC, October 29-30th.

Really, data science instruction is an optimization problem: as an instructor, your goal is to minimize confusion whilst maximizing understanding.  To do this, you must remove as many obstacles as possible from the students’ path which create dissonance.  This may seem silly, but I have observed that if you have small errata in your code, or your code doesn’t work on their machine, even due to something they did, it significantly detracts from their learning experience and their opinion of you as an instructor.  Therefore, removing all these obstacles to understanding is vital to your success as an instructor.

2 Comments

Fixing STEM Education

To both of my loyal readers, I apologize for not writing anything in a while, but I have been absolutely slammed with classes and conference presentations.  Anyway, I’ve been doing a lot of thinking about my earlier post about Teaching Data Science in English.   The post provoked a decent response, mostly positive.

One reader sent me the following comment about my post which I’ve decided to quote (with permission) in its entirety because I think it accurately reflects why people get so frustrated when they try to learn mathematical concepts. What interested me was that this individual took action and “translated from mathspeak to English” and all of a sudden she was able to understand the underlying concepts.

Awhile ago I read a piece you had written on LinkedIn about making ‘mathspeak’ and ‘techspeak’ (i.e. coding) more accessible to regular people, by decreasing mathematical notation usage and increasing the use of real words in explanations of formulas and concepts. It was something that stayed with me because I’ve always understood broader mathematical concepts but have always had trouble with the mechanics, and I think a lot of that has had to do with the amount of notation used…math seems like a foreign language sometimes, and there are 2 levels of understanding: the first is merely deciphering the ‘foreign language’, which already puts me out of my comfort zone (think reading Spanish or French if you are a native English speaker) and then understand the underlying concepts, which becomes harder due to the fact that it’s written in a ‘non-native’ language. Recently I’ve started taking an online course in machine learning on XXX. Already in the second lesson, he dove straight into notation-filled formulas, and I was starting to get that overwhelmed feeling that I’m familiar with from previous years of math. But I had what you wrote in my mind, and I thought I’d give it a shot and manually ‘translate’ the formulas and equations into English, and stick with that. Well, I did that, and it worked so well. I feel that I am able to follow along with the underlying theory of the class and by extension, the formulas and algorithms he presented in ‘mathese’ whereas before I would have shut-down and assumed it was beyond my grasp. Thanks so much for highlighting this aspect of the math/English understanding divide. It is continuously helpful for me. (Emphasis mine)

I’d like to share another related story.  One of my first paying jobs was working for KUAT public television as the web developer (www.kuat.org) and I wanted to do some things that required automating a data flow from an archaic DOS based database.  I was teamed up with a programmer who helped me build the process and in doing so, I learned how to write regular expressions.  I got so into it, I nearly automated myself out of a job.

Fast forward a year or so, when I was nearly done with my CS degree, I had to take an upper level CS course about Automata, Grammars and Languages, which included regular expressions in the course description.  I was pretty excited because by this point, I had become a master at regular expressions and was looking forward to a class that I knew some of the material going in.  Boy was I in for a shock.  When we got to the regular expressions section, it degenerated into a plethora of Greek letters and assorted jargon to the point where I truly loathed going to class.

Theory Should Not Be Taught at the Expense of Application

What I also realized in that CS class was that most of my fellow students may have passed the tests, they did not have any clue how to use regular expressions in real life, or why you would want to use them in the first place.  While we were spending time writing expressions that match ‘aaaaaaabababaaaaa‘ and drawing the automata that “implement” that, the knowledge of how to apply this to a real life problem, such as extracting data artifacts from raw data, was completely lost on the class.

What if the instructor had started the class by showing us this:

pattern = '([a-zA-Z0-9_.]+)@([a-zA-Z0-9_.]+\.\w{2,3})'
matchObj = re.match( text, pattern )
if matchObj:
email = matchObj.groups(0)
account = matchObj.groups(1)
domain = matchObj.groups(2)

If you’re not familiar, this brief example in python-esque pseudo code demonstrates how to match, and extract email addresses, accounts and domains from text.

I don’t think I’m saying anything new here, but too many technical classes both in academia and out, spend a disproportionate amount of time on the underlying theory, whilst simultaneously ignoring, or downplaying the actual application of the concepts being taught.  The result is that many students walk away frustrated, not understanding the actual use of what they are learning, and while professors and instructors may pat themselves on the back for preserving the “purity” of their curricula, I would argue that they have utterly failed in their task of educating their students.

The bottom line here, is that some people are really interested in theory, however for knowledge to be translated into something useful, students should be exposed early and often to a theory’s application and in conclusion, if you are designing some STEM training or a classes at a university don’t forget the importance of demonstrating how to apply the concepts you are teaching.

Leave a Comment

Something is Rotten in the State of Data

I’m writing this blog post in the departure lounge at Heathrow, on my way back from Strata + Hadoop World, London.  Whilst at Strata, speakers kept coming back to the idea that an ever growing number of large businesses are not really happy with the investments they have made in analytics and data science.  One of the speakers quoted a Forrester 2016 survey which claimed that:

  • 29% of firms are good at translating analytics results into measurable business outcomes
  • -20% change in satisfaction with analytics initiatives between 2014 and 2015
  • 50% of firms expecting to see stagnation or a decrease in big data/data lake investments in 2016.

These are very disappointing numbers, however not completely unsurprising.  Using the “5 why” technique an intra-management dialogue might go something like this:

2 Comments

Teaching Data Science in English (not in Math)

chalkboardI spend most of my time now teaching others about data science and as such I do a lot of research into what is going on with respect to data science education.  As such I decided to take an online machine learning course and it led me to a serious question: why don’t we use pseudo-code to teach math concepts?

Consider the following:
34bd2b1ce9d35d34c115548ad24846fc

 

 

This is the formula for Residual Sum of Squares, which if you aren’t familiar, is a metric used to measure the effectiveness of regression models.

Now consider the following pseudo-code:

residuals_squared = (actual_values - predictions) ^ 2
RSS = sum( residuals_squared )

This example expresses the exact same concept and while it does take up more space on the page, in my mind at least, is much easier to understand.  I don’t have any empirical data to back this up, but I would suspect that many of you would agree.

Greek Letters are Jargon

Another thing I’ve realized is that part of the reason math becomes so difficult for people is that it is entirely taught in jargon, shorthand, and shorthand for shorthand.  The greek letter sigma represents a sum, but if you don’t know that then it represents confusion.  If you aren’t familiar with this formula, then the other Greek letters could be meaningless, yet if we used pseudocode, any part of this formula could be rewritten using English words (or any other language) and thus easily understood by anyone.

Crash Course in Machine Learning

I’m working on developing a short course in Machine Learning called Crash Course in Machine Learning which I will be teaching at the BlackHat conference in August.  I’m curious as to what people think about presenting algorithms using pseudo-code instead of math jargon.  I suspect it will make it easier for people to understand without diluting the rigor.

5 Comments

Data Science Classes at BlackHat 2016!

I’m very pleased to announce that this year, my team and I got two classes accepted for the BlackHat conference in Las Vegas!  I believe that data science and machine learning have a huge role to play in infosec/cybersecurity, and in a way, it really is a domain which is crying out for data science to be used.  There are ever expanding amounts of data , the actors are becoming more sophisticated, and the security professionals are almost always strained for resources.  Our classes won’t turn you into data scientists, but you will learn how to directly apply data science techniques to cybersecurity.  If this sounds interesting to you, please check out our Crash Course in Data Science and our Crash Course in Machine Learning.  Both are two day classes and will be offered from July 30-31st and Aug 1-2, 2016.

Crash Course in Data Science (for Hackers)

Crash Course in Data ScienceThis interactive course will teach network security professionals how to use data science techniques to quickly write scripts to manipulate and analyze network data. Students will learn techniques to rapidly write scripts to improve their work. Participants will learn now to read in data in a variety of common formats then write scripts to analyze and visualize that data. A non-exhaustive list of what will be covered include:

  • How to write scripts to read CSV, XML, and JSON files
  • How to quickly parse log files and extract artifacts from them
  • How to make API calls to merge datasets
  • How to use the Pandas library to quickly manipulate tabular data
  • How to effectively visualize data using Python
  • How to apply simple machine learning algorithms to identify potential threats

Finally, we will introduce the students to cutting edge Big Data tools including Apache Spark and Apache Drill, and demonstrate how to apply these techniques to extremely large datasets.

Crash Course in Machine Learning (for hackers):

ccmlThis interactive course will teach network security professionals machine learning techniques and applications for network data. This course is a continuation of the skills taught in the Crash Course in Data Science for Hackers. Students will learn various machine learning methods, applications, model selection, testing, and interpretation. Participants will write code to prepare and explore their data and then apply machine learning methods for discovery.

A non-exhaustive list of what will be covered include:

  • Machine Learning Introduction and Terminology
  • Foundations of Statistics
  • Python Machine Learning Packages Introduction
  • Data Exploration and Presentation
  • Supervised Learning Methods
  • Unsupervised Learning Methods
  • Model Selection and Testing
  • Machine Learning Applications for Network Data
Leave a Comment