I have been working on using Apache Drill for security purposes and I wanted to demonstrate how you can use Drill in a real security challenge. I found this contest which included a PCAP file of an actual attack, as well as a series of questions you would want to answer in order to the analysis. (https://www.honeynet.org/node/504)
My thought here is that Drill’s advanced ETL capabilities are not terribly useful if you can’t use Drill to do basic stuff that tools like Wireshark can do already, so I wanted to see if it would work in real life. This example was good because I also had “the answers” so I could see how Drill stacks up to the contest winners.
First, I had to see if Drill could actually read the PCAP. The PCAP reader can be a bit wonky, but fortunately, Drill read it without issues! (Whew!). For these examples, I will be using Drill and Superset.
Part one will contain a demonstration of how to use Drill to answer the questions in the first part of the challenge.
1. Which Systems Are Involved?
The first question was to determine which systems (IP Addresses) were involved in the attack. Drill had no problem answering this question. There are several ways you could approach this. I chose the following query:
SELECT DISTINCT src_ip, dst_ip FROM dfs.test.
So from this, we can see that there are two IP addresses involved with this attack:
There are probably other ways you could query to get the same results, but that was what I thought of first.
2. What can you find out about the attacking host?
This question first require you to figure out which is the attacking host, which we can guess is the 220.127.116.11 address. First, let’s see what we can find out about where in the world this user is located. In order to do this, you’ll need to get some geo-locational data from the IP address(es). Fortunately, yours truly wrote a collection of geo-ip functions which are available here: https://github.com/cgivre/drill-geoip-functions/. These functions use the free MaxMind geo-ip dataset. I am also working on a collection of DNS/Whois functions which I intend to submit as a PR at some point. Stay tuned.
In the example above you can see that we’re able to extract a considerable amount of information about the attacker including the city/country, lat/long, the ASN number and owner organization. There is more, but that should suffice for now.
3. How many TCP Sessions are contained in the dump file?
Again, no real challenge here… All you have to do is group by the
tcp_session field and you can easily see that there are 5 sessions. Let’s dig a little deeper and get some stats on those sessions. In the query below, you can see some of the associated metadata for each TCP session including the start/stop times, total byte count for the session and the session duration. Note that the duration is expressed as as
INTERVAL data type which might not be immediately familiar to you.
4. How long did it take to perform the attack?
The query above answers that question, but we can do it directly with the query below. The attack took 16 seconds which suggests an automated attack.
Again, since the timestamp is natively a Drill timestamp object, you can do basic arithmetic on this and calculate intervals.
5. Which operating system, service(s) was used in the attack? Which vulnerability?
These questions are a little more difficult because you have to actually explore the data a bit. The victim’s operating system is most likely Windows XP and we know this due to some information contained in the actual packet data itself.
As for the service, you can determine the service based on the ports used in the attack. In this case, we can use the above query to see what ports are in use. We can see from the example, that port 445 is in use during this attack. I’m working on a port lookup function so that this reference is right in Drill, but until that is done, a simple google search will reveal that port 445 is used for SMB over IP, a Microsoft protocol. There are many exploits for this port.
In part 2 of this tutorial, I will demonstrate how to use Drill for the rest of the challenge questions.