Skip to content

Author: Charles Givre

So I Launched a Startup…

I haven’t been posting for quite some time, a fact that I know has caused much consternation to both of my readers. Anyway, the reason has been that I launched a startup! I’ve decided that I am going to shift my blog and write a bit about my experiences as a founder, dealing with investors, managing time, and everything else I’ve learned. The best part is that the story isn’t written yet, so we don’t really know how it is going to turn out! Will we flame out? Will we become the next billion dollar company? Will we get acquired? We will see! If you want to know how this story ends, you’ll have to keep reading!

Since this is MY blog, I’m not going to use this as a marketing platform for our tool. (It really is amazing… not just saying that) I’m also not going to be writing about individuals on my team, employees, investors etc.

This will be a raw account of what it’s like to build a company from the ground up, from my perspective as CEO.

1 Comment

Cyber Attacks on Cancer Centers? Is Nothing Sacred?

Recently, I responded to a linkedIn article about a cyber attack on a cancer center. The person who posted it basically asked the question, why would a hacker go after something like a cancer center or a school? After reading the posts, it seemed to me that there is a lack of understanding as to why and how hackers actually do what they do.

Why Do Hackers Hack?

One of the biggest misconceptions about hacking is that people think that they are not important or that they don’t have anything that hackers want. The issue here is that people don’t understand why people hack. There are three basic reasons why hackers hack and they are: money, politics, espionage, and malice/mischief. Money is perhaps the easiest to understand, hackers want money. You have it… therefore you are a target. Hackers can steal your money in several ways, either by stealing your credit card and bank info, or by installing some sort of ransomware. What is important to understand that since you have money, you are a target.

Hackers also might be interested in your systems to build botnets, which are armies of computers that a hacker controls, which can be used for all kinds of mayhem. Some master hackers build botnets and then rent them out to other nefarious individuals for profit.

Leave a Comment

Basta Facebook!

Facebook has been in the news quite a bit in the last few weeks, and I’ve finally reached the conclusion it is time for me to go. Now, I know Facebook will not care. I do not have millions of followers. I’m not an influencer, nor do I have an Instagram account or use any of their other services.

Why Now?

Over the last few years, I’ve become increasingly disillusioned with social media. I’ve written about it extensively here and elsewhere, but I believe strongly that people are entitled to their privacy. Social media companies believe the opposite, and believe it is their solumn right and duty to invade every aspect of our lives for their profit. I say no more! Basta!

I didn’t delete my Facebook account entirely, however I did put a note that I will not be using it anymore. I had already removed all Facebook applications from my phone, including Messenger, Instagram, WhatsApp etc.

It’s More than Privacy

Until recently, my continued objection to social media in general (with the exception of LinkedIn) was the invasion of privacy. I’ve researched and written about how these companies are stealing your privacy and profiting from it. But in the last few months, I’ve been reading a lot of research about the dangerous side effects of social media and come to the conclusion that it is much worse than simply invading your privacy. Social media is about behavior modification.

Leave a Comment

My Love Hate with Apple Part 2: Thunderbolt Ports

I’m an Apple guy through and through. (You can read more about my love/hate relationship here.) My first computer ever was a Woz edition Apple ][ GS and although I had a hiatus with Apple for sometime during college, ever since I came back, I’m all in. My last computer, a Macbook Pro (2014 edition I think) was by far the best computer I had ever owned. Unfortunately, an incident with a can of Dr. Pepper as well as time had led me to the decision that it was time to upgrade. So last year, I got a new MacBook Pro 2018 edition with the touch bar. This enters the hate phase of my relationship with Apple.

Now, I understand the Apple philosophy of remove extraneous features and complexity and agree with it in principle. However, simplicity should not come at the cost of functionality. The new Macbooks, as everyone has noted come with USB-C ports instead of the smorgasbord of ports that existed on previous editions of the machine. Whilst this certainly looks more elegant, after I received my machine, I had to buy a bunch of dongles so that I could use my peripherals. This was a minor annoyance, but I thought a one time thing. How wrong I was…

1 Comment

Splunk and SQL, Together at last?

For the last few years, I’ve been involved with Splunk engineering. I found this to be somewhat ironic since I’ve haven’t used Splunk as a user for a really long time. I was never a fan of the Splunk query language (SPL) for a variety of reasons, the main one being that I didn’t want to spend the time to learn a proprietary language that is about as elegant as a 1974 Ford Pinto. I had worked on a few projects over the years that involved doing machine learning on data in Splunk. which presented a major challenge. While Splunk does have a machine learning toolkit (MLTK), which is basically a wrapper for scikit-learn, doing feature engineering in SPL is a nightmare.

So, how DO you do machine learning in Splunk?

You don’t.

No,Really… How do you do machine learning in Splunk?

Ok… People actually do machine learning in Splunk, but IMHO, this is not the best way to do it, for several reasons. Most of the ML work I’ve seen done in Splunk involves getting the data out of Splunk. With that being the case, the first thing I would recommend to anyone attempting to do ML in Splunk is to take a look at huntlib (https://github.com/target/huntlib) which is a python module to facilitate getting data out of Splunk. Huntlib makes this relatively easy and you can get your data from Splunk right into a Pandas DataFrame. But, you still have to know SPL or you do a basic SPL search and do all your data wrangling in Python. Could there be a better way?

4 Comments

5.5 Tips on Starting a Career in Data Science

People often ask me questions about starting a career in data science or for advice what tech skills they should acquire.  When I get asked this question, I try to have a conversation with the person to see what their goals and aspirations are as there’s no advice that I can give that is universal, here are five pointers that I would say are generally helpful for anyone starting a career in data science or data analytics.

Tip 1: Data Science is a Big Field:  You Can’t Know Everything About Everything:

When you start getting into data science, the breadth of the field can be overwhelming. It seems that you have to be an expert in big data systems, relational databases, computer sciences, linear algebra, statistics, machine learning, data visualization, data engineering, SQL, Docker, Kubernetes, and much more.  To say nothing about subject matter expertise.  One of the big misunderstandings I see is the perception that you have to be an expert in all these areas to get your first job.   

2 Comments

Explore REST APIs without Code!

One of the big challenges a data scientist faces is the amount of data that is not in convenient formats. One such format are REST APIs. In large enterprises, these are especially problematic for several reasons. Often a considerable amount of reference data is only accessible via REST API which means that to access this data, users are required to learn enough Python or R to access this data. But what if you don’t want to code?

The typical way I’ve seen this dealt with is to create a duplicate of the reference data in an analytic system such as Splunk or Elasticsearch. The problems with this approach are manifold. First, there is the engineering effort (which costs time and money) to set up the data flow. On top of that, you now have the duplicate cost of storage and now you are maintaining multiple versions of the same information.

Another way I’ve seen is for each API owner to provide a graphical interface for their API, which is good, but the issue there is that now the data is stove-piped and can’t be joined with other data, which defeats its purpose altogether. There has to be a better way…

Take a look at the video tutorial below to see a demo!

4 Comments

On the Importance of Good Design

The image above is a late 50’s MGA convertible. I picked this because I happen to think that this car is one of the most elegantly designed cars ever made. Certainly in the top 50. While we as people place a lot of emphasis on design when it comes to physical objects that we use, when it comes to software, a lot of our software’s design looks more like the car below: This vehicle looks like it was designed by a committee that couldn’t decide whether they were designing a door stop or a golf cart.

I’ve been doing a lot of thinking lately (for reasons that will become apparent in a few weeks) about the lack of good software engineering practices in the data science space. The more I think about it, the more I am shocked with the questionable design that exists in the data science ecosystem, particularly in Python. What’s even more impressive is that the python language is actually designed to promote good design, and the people building these modules are all great developers.

As an example of this, I recently read an article about coding without if statements and it made me cringe. Here is a quote:

When I teach beginners to program and present them with code challenges, one of my favorite follow-up challenges is: Now solve the same problem without using if-statements (or ternary operators, or switch statements).

You might ask why would that be helpful? Well, I think this challenge forces your brain to think differently and in some cases, the different solution might be better.

There is nothing wrong with using if-statements, but avoiding them can sometimes make the code a bit more readable to humans. This is definitely not a general rule as sometimes avoiding if-statements will make the code a lot less readable. You be the judge.

https://medium.com/edge-coders/coding-tip-try-to-code-without-if-statements-d06799eed231

Well, I am going to be the judge and every example the author cited made the code more difficult to follow. The way I see it, hard to follow code means that you as the developer are more likely to introduce unintended bugs and errors. Since the code is difficult to follow, not only will you have more bugs, but these bugs will take you more time to find. Even worse, if you didn’t write this code and are asked to debug it!! AAAAHHHH!! I know if that were me, I wouldn’t even bother. I’d probably just rewrite it. The only exception would be if it was heavily commented.

Leave a Comment

Pandemics, Birthday, and Life Before Snark

Today marks about the 45th day I’ve been stuck in the house and it happens that my birthday was last week, so I’ve been doing a lot of reading and reflecting on things. The last few weeks have been really up and down. I’ve been doing a lot of puttering around the house and working on silly projects like replacing the headlight gaskets on my MGA, which also involved painting the headlight buckets, cutting off rusty screws and redoing wiring, but I digress. Despite being home all the time, I’m finding it very difficult to get any meaningful work done.

Before

After

On the up side, I’ll be doing some new online classes with O’Reilly starting around the end of May! The topics relate to coding practices and data visualization, so stay tuned!

Leave a Comment

Public Data Still Lacking on COVID-19 Outbreak

As you are reading this, you are probably (like me) under quarantine or shelter in place due to the COVID-19 outbreak. As a data scientist who has been stuck in the house since 10 March, I wanted to take a look at the data and see what I could figure out. I’m not an epidemiologist and claim no expertise in health care, but I do know data science so please take what I am saying with a grain of salt.

Why is there no data?

My first observation is that very little data is actually being made publicly available. I am not sure why this is the case, but I spent a considerable amount of time digging through the WHO, CDC and other agencies’ websites and APIs and found little usable data. For example, the World Health Organization (WHO) posts daily situation reports with data, however the sitreps contain data, however the files are in PDF format. I attempted to extract these tables from the PDFs however this proved to be extremely difficult as the formatting was not consistent. It would be trivial to post this data in CSV, HDF5 or some other format that is conducive to data analysis, however the WHO did not choose to do that. I found generally the same situation at the other major health institutions such as the CDC.

Health related information in the United States is regulated by the Health Insurance Portability and Accountability Act (HIPAA), which imposes draconian fines and restrictions on private health information, so some of the secrecy may be due to this law.

1 Comment