Skip to content

The Dataist Posts

So I Launched a Startup…

I haven’t been posting for quite some time, a fact that I know has caused much consternation to both of my readers. Anyway, the reason has been that I launched a startup! I’ve decided that I am going to shift my blog and write a bit about my experiences as a founder, dealing with investors, managing time, and everything else I’ve learned. The best part is that the story isn’t written yet, so we don’t really know how it is going to turn out! Will we flame out? Will we become the next billion dollar company? Will we get acquired? We will see! If you want to know how this story ends, you’ll have to keep reading!

Since this is MY blog, I’m not going to use this as a marketing platform for our tool. (It really is amazing… not just saying that) I’m also not going to be writing about individuals on my team, employees, investors etc.

This will be a raw account of what it’s like to build a company from the ground up, from my perspective as CEO.

Share the joy
1 Comment

Cyber Attacks on Cancer Centers? Is Nothing Sacred?

Recently, I responded to a linkedIn article about a cyber attack on a cancer center. The person who posted it basically asked the question, why would a hacker go after something like a cancer center or a school? After reading the posts, it seemed to me that there is a lack of understanding as to why and how hackers actually do what they do.

Why Do Hackers Hack?

One of the biggest misconceptions about hacking is that people think that they are not important or that they don’t have anything that hackers want. The issue here is that people don’t understand why people hack. There are three basic reasons why hackers hack and they are: money, politics, espionage, and malice/mischief. Money is perhaps the easiest to understand, hackers want money. You have it… therefore you are a target. Hackers can steal your money in several ways, either by stealing your credit card and bank info, or by installing some sort of ransomware. What is important to understand that since you have money, you are a target.

Hackers also might be interested in your systems to build botnets, which are armies of computers that a hacker controls, which can be used for all kinds of mayhem. Some master hackers build botnets and then rent them out to other nefarious individuals for profit.

Share the joy
Leave a Comment

Basta Facebook!

Facebook has been in the news quite a bit in the last few weeks, and I’ve finally reached the conclusion it is time for me to go. Now, I know Facebook will not care. I do not have millions of followers. I’m not an influencer, nor do I have an Instagram account or use any of their other services.

Why Now?

Over the last few years, I’ve become increasingly disillusioned with social media. I’ve written about it extensively here and elsewhere, but I believe strongly that people are entitled to their privacy. Social media companies believe the opposite, and believe it is their solumn right and duty to invade every aspect of our lives for their profit. I say no more! Basta!

I didn’t delete my Facebook account entirely, however I did put a note that I will not be using it anymore. I had already removed all Facebook applications from my phone, including Messenger, Instagram, WhatsApp etc.

It’s More than Privacy

Until recently, my continued objection to social media in general (with the exception of LinkedIn) was the invasion of privacy. I’ve researched and written about how these companies are stealing your privacy and profiting from it. But in the last few months, I’ve been reading a lot of research about the dangerous side effects of social media and come to the conclusion that it is much worse than simply invading your privacy. Social media is about behavior modification.

Share the joy
Leave a Comment

Barnes & Noble Data Breach — Are Rewards Cards Worth the Risk?

American bookselling giant Barnes & Noble recently made headlines because of a data breach that may have exposed their customers’ confidential and sensitive information to hackers.

In the email sent to their customers, Barnes & Noble mentioned how they only became aware of the cybersecurity attack that resulted in unauthorized and unlawful access to certain Barnes & Noble corporate systems around October 10, 2020.

In addition to letting their shoppers know about the data security incident, the Fortune 1000 company also tried to reassure their customers that their payment card or other financial data hadn’t been compromised – as they are encrypted, tokenized and inaccessible. The email went onto explain to their loyal clients that the systems impacted were those that contained their email addresses, billing and shipping addresses and telephone numbers.

Although the Barnes & Noble data breach has placed their customer’s sensitive data at great risk, it pales in comparison to some of the biggest and most historical data breaches. One of the biggest incidents was back in 2018 with the Facebook data breach that compromised over 50 million Facebook accounts for three days before the company’s announcement.

Share the joy
Leave a Comment

My Love Hate with Apple Part 2: Thunderbolt Ports

I’m an Apple guy through and through. (You can read more about my love/hate relationship here.) My first computer ever was a Woz edition Apple ][ GS and although I had a hiatus with Apple for sometime during college, ever since I came back, I’m all in. My last computer, a Macbook Pro (2014 edition I think) was by far the best computer I had ever owned. Unfortunately, an incident with a can of Dr. Pepper as well as time had led me to the decision that it was time to upgrade. So last year, I got a new MacBook Pro 2018 edition with the touch bar. This enters the hate phase of my relationship with Apple.

Now, I understand the Apple philosophy of remove extraneous features and complexity and agree with it in principle. However, simplicity should not come at the cost of functionality. The new Macbooks, as everyone has noted come with USB-C ports instead of the smorgasbord of ports that existed on previous editions of the machine. Whilst this certainly looks more elegant, after I received my machine, I had to buy a bunch of dongles so that I could use my peripherals. This was a minor annoyance, but I thought a one time thing. How wrong I was…

Share the joy
1 Comment

Splunk and SQL, Together at last?

For the last few years, I’ve been involved with Splunk engineering. I found this to be somewhat ironic since I’ve haven’t used Splunk as a user for a really long time. I was never a fan of the Splunk query language (SPL) for a variety of reasons, the main one being that I didn’t want to spend the time to learn a proprietary language that is about as elegant as a 1974 Ford Pinto. I had worked on a few projects over the years that involved doing machine learning on data in Splunk. which presented a major challenge. While Splunk does have a machine learning toolkit (MLTK), which is basically a wrapper for scikit-learn, doing feature engineering in SPL is a nightmare.

So, how DO you do machine learning in Splunk?

You don’t.

No,Really… How do you do machine learning in Splunk?

Ok… People actually do machine learning in Splunk, but IMHO, this is not the best way to do it, for several reasons. Most of the ML work I’ve seen done in Splunk involves getting the data out of Splunk. With that being the case, the first thing I would recommend to anyone attempting to do ML in Splunk is to take a look at huntlib (https://github.com/target/huntlib) which is a python module to facilitate getting data out of Splunk. Huntlib makes this relatively easy and you can get your data from Splunk right into a Pandas DataFrame. But, you still have to know SPL or you do a basic SPL search and do all your data wrangling in Python. Could there be a better way?

Share the joy
4 Comments

How to Manage a Data Breach Incident

Facebook’s big data breach in 2018 sent millions of users into a flurry to reset their Facebook passwords, hoping and praying that their information hadn’t found its way into the wrong hands. Some 50 million Facebook accounts were compromised three days before the company’s announcement, granting hackers access to personal data and landing the social media giant in hot water. Despite damage control efforts, this large-scale hack called into question the overall safety of online accounts.

Big companies are big targets, so it’s no surprise that we only hear about high-profile data breaches. In reality, however, smaller businesses are the ones who are more at risk of being affected by a breach, as they ultimately have much more to lose. When it comes to detecting an attack, researchers from IBM estimate that it takes small businesses about 200 days before a breach is found — and by then, the attack is well under way. It can cost a large amount of money and reputational damage for a business to get back on its feet after a breach, especially when taking into account the time and manpower required to rebuild a company’s operational systems.

All in all, predicting and preventing a data breach is tricky. And while some argue that breaches are inevitable in this day and age, businesses should still arm themselves with the proper tools and information on not only how to prevent a data breach, but also what to do when defenses fail and an attack occurs.

But first, it’s important to know what you might be dealing with. While not all cybersecurity attacks can result in a breach, this article by Chief Executive outlines seven types of data breaches, ranging from hacking via malware and phishing, to employee negligence and physical theft. Being aware of these attacks can help you prevent them, or at least know what to do in the off-chance that your company falls victim.

Share the joy
1 Comment

5.5 Tips on Starting a Career in Data Science

People often ask me questions about starting a career in data science or for advice what tech skills they should acquire.  When I get asked this question, I try to have a conversation with the person to see what their goals and aspirations are as there’s no advice that I can give that is universal, here are five pointers that I would say are generally helpful for anyone starting a career in data science or data analytics.

Tip 1: Data Science is a Big Field:  You Can’t Know Everything About Everything:

When you start getting into data science, the breadth of the field can be overwhelming. It seems that you have to be an expert in big data systems, relational databases, computer sciences, linear algebra, statistics, machine learning, data visualization, data engineering, SQL, Docker, Kubernetes, and much more.  To say nothing about subject matter expertise.  One of the big misunderstandings I see is the perception that you have to be an expert in all these areas to get your first job.   

Share the joy
2 Comments

Explore REST APIs without Code!

One of the big challenges a data scientist faces is the amount of data that is not in convenient formats. One such format are REST APIs. In large enterprises, these are especially problematic for several reasons. Often a considerable amount of reference data is only accessible via REST API which means that to access this data, users are required to learn enough Python or R to access this data. But what if you don’t want to code?

The typical way I’ve seen this dealt with is to create a duplicate of the reference data in an analytic system such as Splunk or Elasticsearch. The problems with this approach are manifold. First, there is the engineering effort (which costs time and money) to set up the data flow. On top of that, you now have the duplicate cost of storage and now you are maintaining multiple versions of the same information.

Another way I’ve seen is for each API owner to provide a graphical interface for their API, which is good, but the issue there is that now the data is stove-piped and can’t be joined with other data, which defeats its purpose altogether. There has to be a better way…

Take a look at the video tutorial below to see a demo!

Share the joy
4 Comments

On the Importance of Good Design

The image above is a late 50’s MGA convertible. I picked this because I happen to think that this car is one of the most elegantly designed cars ever made. Certainly in the top 50. While we as people place a lot of emphasis on design when it comes to physical objects that we use, when it comes to software, a lot of our software’s design looks more like the car below: This vehicle looks like it was designed by a committee that couldn’t decide whether they were designing a door stop or a golf cart.

I’ve been doing a lot of thinking lately (for reasons that will become apparent in a few weeks) about the lack of good software engineering practices in the data science space. The more I think about it, the more I am shocked with the questionable design that exists in the data science ecosystem, particularly in Python. What’s even more impressive is that the python language is actually designed to promote good design, and the people building these modules are all great developers.

As an example of this, I recently read an article about coding without if statements and it made me cringe. Here is a quote:

When I teach beginners to program and present them with code challenges, one of my favorite follow-up challenges is: Now solve the same problem without using if-statements (or ternary operators, or switch statements).

You might ask why would that be helpful? Well, I think this challenge forces your brain to think differently and in some cases, the different solution might be better.

There is nothing wrong with using if-statements, but avoiding them can sometimes make the code a bit more readable to humans. This is definitely not a general rule as sometimes avoiding if-statements will make the code a lot less readable. You be the judge.

https://medium.com/edge-coders/coding-tip-try-to-code-without-if-statements-d06799eed231

Well, I am going to be the judge and every example the author cited made the code more difficult to follow. The way I see it, hard to follow code means that you as the developer are more likely to introduce unintended bugs and errors. Since the code is difficult to follow, not only will you have more bugs, but these bugs will take you more time to find. Even worse, if you didn’t write this code and are asked to debug it!! AAAAHHHH!! I know if that were me, I wouldn’t even bother. I’d probably just rewrite it. The only exception would be if it was heavily commented.

Share the joy
Leave a Comment