Skip to content

Month: July 2020

Splunk and SQL, Together at last?

For the last few years, I’ve been involved with Splunk engineering. I found this to be somewhat ironic since I’ve haven’t used Splunk as a user for a really long time. I was never a fan of the Splunk query language (SPL) for a variety of reasons, the main one being that I didn’t want to spend the time to learn a proprietary language that is about as elegant as a 1974 Ford Pinto. I had worked on a few projects over the years that involved doing machine learning on data in Splunk. which presented a major challenge. While Splunk does have a machine learning toolkit (MLTK), which is basically a wrapper for scikit-learn, doing feature engineering in SPL is a nightmare.

So, how DO you do machine learning in Splunk?

You don’t.

No,Really… How do you do machine learning in Splunk?

Ok… People actually do machine learning in Splunk, but IMHO, this is not the best way to do it, for several reasons. Most of the ML work I’ve seen done in Splunk involves getting the data out of Splunk. With that being the case, the first thing I would recommend to anyone attempting to do ML in Splunk is to take a look at huntlib (https://github.com/target/huntlib) which is a python module to facilitate getting data out of Splunk. Huntlib makes this relatively easy and you can get your data from Splunk right into a Pandas DataFrame. But, you still have to know SPL or you do a basic SPL search and do all your data wrangling in Python. Could there be a better way?

4 Comments