Skip to content

Category: Uncategorized

Cyber Attacks on Cancer Centers? Is Nothing Sacred?

Recently, I responded to a linkedIn article about a cyber attack on a cancer center. The person who posted it basically asked the question, why would a hacker go after something like a cancer center or a school? After reading the posts, it seemed to me that there is a lack of understanding as to why and how hackers actually do what they do.

Why Do Hackers Hack?

One of the biggest misconceptions about hacking is that people think that they are not important or that they don’t have anything that hackers want. The issue here is that people don’t understand why people hack. There are three basic reasons why hackers hack and they are: money, politics, espionage, and malice/mischief. Money is perhaps the easiest to understand, hackers want money. You have it… therefore you are a target. Hackers can steal your money in several ways, either by stealing your credit card and bank info, or by installing some sort of ransomware. What is important to understand that since you have money, you are a target.

Hackers also might be interested in your systems to build botnets, which are armies of computers that a hacker controls, which can be used for all kinds of mayhem. Some master hackers build botnets and then rent them out to other nefarious individuals for profit.

Leave a Comment

Basta Facebook!

Facebook has been in the news quite a bit in the last few weeks, and I’ve finally reached the conclusion it is time for me to go. Now, I know Facebook will not care. I do not have millions of followers. I’m not an influencer, nor do I have an Instagram account or use any of their other services.

Why Now?

Over the last few years, I’ve become increasingly disillusioned with social media. I’ve written about it extensively here and elsewhere, but I believe strongly that people are entitled to their privacy. Social media companies believe the opposite, and believe it is their solumn right and duty to invade every aspect of our lives for their profit. I say no more! Basta!

I didn’t delete my Facebook account entirely, however I did put a note that I will not be using it anymore. I had already removed all Facebook applications from my phone, including Messenger, Instagram, WhatsApp etc.

It’s More than Privacy

Until recently, my continued objection to social media in general (with the exception of LinkedIn) was the invasion of privacy. I’ve researched and written about how these companies are stealing your privacy and profiting from it. But in the last few months, I’ve been reading a lot of research about the dangerous side effects of social media and come to the conclusion that it is much worse than simply invading your privacy. Social media is about behavior modification.

Leave a Comment

Barnes & Noble Data Breach — Are Rewards Cards Worth the Risk?

American bookselling giant Barnes & Noble recently made headlines because of a data breach that may have exposed their customers’ confidential and sensitive information to hackers.

In the email sent to their customers, Barnes & Noble mentioned how they only became aware of the cybersecurity attack that resulted in unauthorized and unlawful access to certain Barnes & Noble corporate systems around October 10, 2020.

In addition to letting their shoppers know about the data security incident, the Fortune 1000 company also tried to reassure their customers that their payment card or other financial data hadn’t been compromised – as they are encrypted, tokenized and inaccessible. The email went onto explain to their loyal clients that the systems impacted were those that contained their email addresses, billing and shipping addresses and telephone numbers.

Although the Barnes & Noble data breach has placed their customer’s sensitive data at great risk, it pales in comparison to some of the biggest and most historical data breaches. One of the biggest incidents was back in 2018 with the Facebook data breach that compromised over 50 million Facebook accounts for three days before the company’s announcement.

Leave a Comment

My Love Hate with Apple Part 2: Thunderbolt Ports

I’m an Apple guy through and through. (You can read more about my love/hate relationship here.) My first computer ever was a Woz edition Apple ][ GS and although I had a hiatus with Apple for sometime during college, ever since I came back, I’m all in. My last computer, a Macbook Pro (2014 edition I think) was by far the best computer I had ever owned. Unfortunately, an incident with a can of Dr. Pepper as well as time had led me to the decision that it was time to upgrade. So last year, I got a new MacBook Pro 2018 edition with the touch bar. This enters the hate phase of my relationship with Apple.

Now, I understand the Apple philosophy of remove extraneous features and complexity and agree with it in principle. However, simplicity should not come at the cost of functionality. The new Macbooks, as everyone has noted come with USB-C ports instead of the smorgasbord of ports that existed on previous editions of the machine. Whilst this certainly looks more elegant, after I received my machine, I had to buy a bunch of dongles so that I could use my peripherals. This was a minor annoyance, but I thought a one time thing. How wrong I was…

1 Comment

Easy Analysis of HDF5 Data

There is a data format called HDF5 (Hierarchical Data Format) which is used extensively in scientific research. HDF5 is an interesting format in that it is like a file system within a file, and is extremely performant. However, the HDF5 format can be quite difficult to actually access the data that is encoded in HDF5 format. But, as the title suggests, this post will walk you through how to easily access and query HDF5 datasets using my favorite tool: Apache Drill.

As of version 1.18 Drill will natively support reading HDF5 files.

2 Comments

Using Drill for Network Forensics: Part 1

I have been working on using Apache Drill for security purposes and I wanted to demonstrate how you can use Drill in a real security challenge. I found this contest which included a PCAP file of an actual attack, as well as a series of questions you would want to answer in order to the analysis. (https://www.honeynet.org/node/504)

My thought here is that Drill’s advanced ETL capabilities are not terribly useful if you can’t use Drill to do basic stuff that tools like Wireshark can do already, so I wanted to see if it would work in real life. This example was good because I also had “the answers” so I could see how Drill stacks up to the contest winners.

First, I had to see if Drill could actually read the PCAP. The PCAP reader can be a bit wonky, but fortunately, Drill read it without issues! (Whew!). For these examples, I will be using Drill and Superset.

Part one will contain a demonstration of how to use Drill to answer the questions in the first part of the challenge.

Leave a Comment

Everything You Need to Know about the Future of Data Science in One Image

I saw this image on LinkedIn a few days ago and realized that it is proof of the future of data science. The image is of the leaderboard from a Kaggle competition, which isn’t particularly remarkable, but what is remarkable is the competitor in 2nd place: Google AutoML. Not only did AutoML come in 2nd, but it did so in fewer entries and the score was 0.00093 off of first place.

You might be thinking, “Well that’s Google, and they have the best stuff. Technology like that will not be available to the masses anytime soon, and if it is, it will require massive clusters.” Au contraire mon frère. There are a slew of new Python modules which automate various phases of the machine learning process. My personal favorite of which is called TPOT. TPOT is a Python module which automates the entire machine learning process, and generates python code for your entire pipeline.

I did a little experiment with TPOT and was able to build a model with data from Kaggle that scored in the top 10 for a simple exercise.

At some point, Google will likely make their AutoML available to the public if it isn’t already, and data scientists will have to prove that their value over automated machine learning tools.

So What?

The significance of this is enormous. Since the coining of the term data science, many people have focused very heavily on the math and machine learning aspects of data science. These aspects are certainly important, but these steps can be automated, as you can see, with ever improving performance. What this means in the long run is that as available computing power increases and these tools get better and faster, the understanding of the inner workings of the algorithms will become less and less important. (This is not true if you are working at a really cutting edge company that is developing new algorithms, or doing academic research.)

Therefore, if you are a data scientist or an aspiring data scientist, should you quit now? Hardly. Automated ML is really exciting because it will enable you to focus on the things that computers can’t do, and likely won’t ever be able to do, which are: conceiving and defining data problems, communicating the results to stakeholders, as well as the data cleaning/feature engineering steps. Automated machine learning will enable or force data scientists to focus on tasks that truly require human thought and using data science to add value to their organizations.


1 Comment

So You Want to Write a Book…

Well, we did it.  I finally finished the book that I had been working on with my co-author for the last two years.  I thought I’d write a short post on my experiences writing a technical book and getting it published.  I know many people think about writing books, and I’d like to share my experiences so that others might learn from lessons that I learned the hard way.  Overall, it was an absolutely amazing experience and I have a feeling that the adventure is only beginning….

Leave a Comment

Why don’t Data Scientists use Splunk?

I am currently attending the Splunk .conf in Orlando, and a director at Accenture asked me this question, which I thought merited a blog post.  Why don’t data scientists use or like Splunk.  The inner child in me was thinking, “Splunk isn’t good at data science”, but the more seasoned professional in me actually articulated a more logical and coherent answer, which I thought I’d share whilst waiting for a talk to start.  Here goes:

I cannot pretend to speak for any community of “data scientists” but it is true that I know a decent number of data scientists, some very accomplished and some beginners, and not a one would claim to use Splunk as one of their preferred tools.  Indeed, when the topic of available tools comes up among most of my colleagues and the word Splunk is mentioned, it elicits groans and eye rolls.  So let’s look at why that is the case:

9 Comments

Can you use Machine Learning to detect Fake News?

Someone recently asked me for assistance with a university project whereby they were asked to predict whether a given article was fake news or not.  They had a target accuracy of 70%.  Since the topic of fake news has been in the news a lot, it made me think about how I would approach this problem and whether it is even possible to use machine learning to identify fake news.  At first glance, this problem might be comparable to spam detection, however the problem is actually much more complicated.  In an article on The VergeDean Pomerleau of Carnegie Mellon University states:

“We actually started out with a more ambitious goal of creating a system that could answer the question ‘Is this fake news, yes or no?’ We quickly realized machine learning just wasn’t up to the task.” 

Leave a Comment