Happy belated New Year everyone! I’ve been taking a bit of a blog break as I’ve been quite busy between work, personal travel, and working on my startup GTK Cyber. But I’m back now and have some exciting news! My team and I have been accepted to teach Applied Data Science course once again at BlackHat in Las Vegas! This year we’ve made a major change to our course: it’s now a full four days instead of two!
Category: Conference Presentations
I gave a presentation at Strata London 2016 about data gathered from cars. This is an ongoing research interest of mine but here are the slides and the video.
My employer is amazing and in the last two months, they’ve allowed me to attend a lot of data science conferences and I thought I’d share some general reflections on my experiences.
Open Data Science Conference: A Great Value for New Comers
I gave two presentations this year at Open Data Science Conference (ODSC) East and I just wanted to put it out there that if you are new to data science or are just interested in learning more about data science, then ODSC is a really great venue to meet incredibly talented individuals as well as attend high quality technical talks.
I just returned from Strata + Hadoop World in San Jose, where I gave a talk entitled “Kosher Collection: Best Practices in Data Handling“. I really had an amazing time at Strata this year and major kudos to the organizers for putting on a great show.
The central premise of my talk is that in today’s world, there is a social contract between data collectors and consumers. Essentially the agreement is that consumers give their personal data to a data collector in exchange for mutual benefit. The problem is that consumers, in general, lack an understanding of the technology as well as data collection and as a result, are unable to provide informed consent. Furthermore, this issue is likely to be exacerbated in the future as the opportunity to opt-out of mass data collection is disappearing.
I’m very pleased to announce that this year, my team and I got two classes accepted for the BlackHat conference in Las Vegas! I believe that data science and machine learning have a huge role to play in infosec/cybersecurity, and in a way, it really is a domain which is crying out for data science to be used. There are ever expanding amounts of data , the actors are becoming more sophisticated, and the security professionals are almost always strained for resources. Our classes won’t turn you into data scientists, but you will learn how to directly apply data science techniques to cybersecurity. If this sounds interesting to you, please check out our Crash Course in Data Science and our Crash Course in Machine Learning. Both are two day classes and will be offered from July 30-31st and Aug 1-2, 2016.
This interactive course will teach network security professionals how to use data science techniques to quickly write scripts to manipulate and analyze network data. Students will learn techniques to rapidly write scripts to improve their work. Participants will learn now to read in data in a variety of common formats then write scripts to analyze and visualize that data. A non-exhaustive list of what will be covered include:
- How to write scripts to read CSV, XML, and JSON files
- How to quickly parse log files and extract artifacts from them
- How to make API calls to merge datasets
- How to use the Pandas library to quickly manipulate tabular data
- How to effectively visualize data using Python
- How to apply simple machine learning algorithms to identify potential threats
Finally, we will introduce the students to cutting edge Big Data tools including Apache Spark and Apache Drill, and demonstrate how to apply these techniques to extremely large datasets.
This interactive course will teach network security professionals machine learning techniques and applications for network data. This course is a continuation of the skills taught in the Crash Course in Data Science for Hackers. Students will learn various machine learning methods, applications, model selection, testing, and interpretation. Participants will write code to prepare and explore their data and then apply machine learning methods for discovery.
A non-exhaustive list of what will be covered include:
- Machine Learning Introduction and Terminology
- Foundations of Statistics
- Python Machine Learning Packages Introduction
- Data Exploration and Presentation
- Supervised Learning Methods
- Unsupervised Learning Methods
- Model Selection and Testing
- Machine Learning Applications for Network Data
Video of my Strata + Hadoop World NYC talk this year.
It looks like my Strata talk sparked some conversation and an article at ProPublica!
After reflecting on the matter more, I hope that people will start to understand that these home automation d
evices really are data collection devices for the manufacturer of the device. The Automatic, in my opinion, while it is a very neat device, provides little information that the driver wouldn’t alrea
y know about themselves and hence little benefit to the customer. However, to the Automatic company, when you start aggregating this data, it provides a wealth of data to them. Therefore, devices should have some sort of ranking as to benefit to consumer vs. benefit to company.