Announcing the First Release of Griffon: A Virtual Environment for Data Science

My colleagues Austin Taylor and Melissa Kilby are proud to announce the first stable release of Griffon: A Virtual Machine for Data Science. Griffon is a virtual machine which contains many data science tools pre-configured, installed and linked up to make it so that you don’t have to be a Linux expert to try them out. If you are teaching a class, or if you are simply wanting to learn more about a particular tool, then Griffon is perfect for you.

You can download Griffon here: https://github.com/gtkcyber/griffon-vm.

The Story

Several years ago, Austin and I went out for dinner after teaching a class at BlackHat, and discussed a problem we encountered in the class: setting up a data science environment is difficult. Even though we told the students what they needed well in advance of the class, we still had a lot of problems configuring students’ systems. After doing a lot of brainstorming, we realized that data science needed something like Kali Linux–a version of Linux with all the common data science tools installed, and ready to go.

What Does Griffon Include?

Griffon is equipped with several broad categories of tools.

Programming Languages

Let’s face it, us data science types write code, and to do that, we need programming languages. Therefore Griffon includes most all of the languages that a modern data scientist might use such as:

Java
Julia
NodeJS
Python
R
Ruby
Scala
and more!

We’ve also taken the time to install many of the common libraries and extensions for these languages so that you can use them right out of the box.

In addition to the languages, we’ve also included a collection of IDEs and editors, visualizing software and other development tools including:

If you are interested in training, we set up the Jupyter notebook with kernels from many different languages so that you could write instructional notebooks in Julia, Scala, R or other languages.

Databases

Griffon also includes a collection of databases and data warehousing software as well as graphical user interfaces. Additionally, we also installed drivers, such as pymysql so that if you want to interact with these databases programmatically, you can! The database list includes:

ElasticSearch
MongoDB
MySQL
PostgreSQL
SQLite

We also installed a few visualization tools listed below to help you visualize and explore data in SQL databases.

Big Data / Distributed Computing

Learning how to use Hadoop, Spark and other big data tools can be challenging in that all these tools are made for linux, and simply installing them can be challenging. While Griffon is obviously not intended to be used in a production environment, Griffon can be used to learn the fundamentals of Hadoop and other big data tools. Griffon includes:

Apache Drill (of course!)
ElasticSearch
Flink
Hadoop
Hive
Kafka
Pig
Spark

Where do I get Griffon?

The Griffon Docs are available on our github repository here. Unfortunately, Griffon is too big to host on Github directly, so it is hosted externally, but there are links to the binary from the Github repo.

What do I need to run Griffon?

Griffon is big… so you’ll need 8GB of RAM and ~30-40GB of hard disk space. Griffon will run with 4GB of RAM, but will not perform well if you try to use any of the big data tools. We recommend using VirtualBox to open Griffon.

Is there a Docker Version of Griffon?

It’s in the works. If anyone wants to help out with that, please email me.

Share the joy