Docker for Data Science

Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.
c
comments

By Sachin Abeywardana, Founder of DeepSchool.io


Docker for Data Science

Docker is a tool that simplifies the installation process for software engineers. Coming from a statistics background I used to care very little about how to install software and would occasionally spend a few days trying to resolve system configuration issues. Enter the god-send Docker almighty.

Think of Docker as a light virtual machine (I apologise to the Docker gurus for using that term). Generally someone writes a *Dockerfile* that builds a *Docker Image* which contains most of the tools and libraries that you need for a project. You can use this as a base and add any other dependencies that are required for your project. Its underlying philosophy is that if it works on my machine it will work on yours.

What’s in it for

FROM ubuntu RUN apt-get install python3

This Dockerfile would install python3 (as a layer) on top of the Ubuntu layer.

What you essentially do is for each project you write all the apt-get installpip install etc. commands into your Dockerfile instead of executing it locally.

I recommend reading the tutorial on https://docs.docker.com/get-started/ to get started on Docker. The learning curve is minimal (2 days work at most) and the gains are enormous.

Dockerhub

Lastly Dockerhub deserves a special mention. Personally Dockerhub is what makes Docker truly powerful. It’s what github is to git, a open platform to share your Docker images. You can always construct a Docker image locally using docker build ... but it is always good to push this image to Dockerhub so that the next person simply has to pull for personal use.

My Docker image for Machine Learning and

Edit 2 (A quick note on virtualenvs for python, packrat for R etc.):

Personally I have not used any of the other containerising tools, however it should be noted that Docker is independent of python and R, and goes beyond containerising applications for specific programming languages.

 
If you are enjoying my tutorials/ blog posts, consider supporting me on https://www.patreon.com/deepschoolio or by subscribing to my YouTube channel https://www.youtube.com/user/sachinabey (or both!). Oh and clap! :)

 
Bio: Sachin Abeywardana is a PhD in Machine Learning and Founder of DeepSchool.io.

Original. Reposted with permission.