As more and more machine learning models get deployed to production, companies are realizing the complexity that comes with maintaining their machine learning pipeline. Coordinating retraining, deployment, and inference can be a logistical nightmare. Fortunately, Kubeflow is making it easier to ensure models remain predictive and accurate.
What is Kubeflow?
Designed to simplify machine learning pipelines, Kubeflow is a collection of tools built to run on top Kubernetes, which is, of course, the container orchestration platform.
By allowing users to use Kubernetes objects, they can declaratively define their pipelines using Kubernetes manifests and sit back as Kubernetes handles the processing of data and scaling of pods to complete the intended process. This allows DevOps people, SREs, data scientists, and data engineers to easily and quickly get their pipelines up and running with much less effort and coordination than previously required.
While Kubeflow is available on Google Cloud Platform as a fully managed solution and is much easier to set up, you may want to prototype locally without the need to sign up for a cloud service. This tutorial will show you how to deploy Kubeflow to begin prototyping straight to your laptop or local workstation.
In this tutorial we will go over the installation options available for various OS platforms. Once the install is successful, we will show you how to launch a model on your local Kubeflow cluster for training and inference.
Option 1: Microk8s
Microk8s is a wonderful tool for easily launching Kubernetes. It comes from the fine folks at Canonical, the company behind the popular Linux distribution platform, Ubuntu.
Canonical has stripped away much of the complexity that comes with standing up a Kubernetes cluster. Now, with the release of Kubeflow 1.0, they have integration with Microk8s, making it easier than ever to get started.
To ensure that this solution works properly, your system should meet the following minimum requirements:
-
4 CPU
-
50 GB storage
-
14 GB memory
The installation process should work whether you are on a virtual machine or with Ubuntu installed directly on your machine.
For Windows and Mac OS users, we recommend you use Multipass to install Microk8s because it's the simplest way to get an Ubuntu VM up and running. For Mac users you can install Microk8s directly.
Note: Being that Multipass is a Snap package, you cannot install it via Windows Subsystem for Linux as Snap is not supported.
How to Install Microk8s on Windows
Canonical recently made it super easy to install Multipass for Windows. Simply download the .exe from here. Once complete, run the installer with default options.
These next steps follow the base installation available from the Kubeflow website here.
Wait for installation of Microk8s to be ready before moving onto the next step. Use the following command to ensure install was successful:
Optional: If you do not have a version of the Kubernetes command line utility kubectl installed, you can create an alias to make life easier.
You can also add your current user to the group to avoid needing root privileges and gain access to the .kube caching directory.
Step 2: Enable features to make sure your Kubernetes cluster runs properly.
Optional: Enable GPU functionality if a GPU is available.
Step 3: Get an access token to allow external connectivity to your VM.
Step 4: Enable Kubeflow
Note: This command can take up to 30 minutes to complete. Be patient.
Once that command completes, you should see a success message with a username and password, which you can use to access your Kubeflow dashboard.
Step 5: Use port forwarding to allow external access to your VM.
Step 6: Navigate to your Kubeflow dashboard and enter the username and password from Step 4.
Get your VM IP address.
Navigate to our Kubeflow dashboard in your browser by connecting to your Multipass VM IP along with the port you configured in Step 5.
How to Install Microk8s on MacOS
Step 1: Install Homebrew
Step 2: Install Microk8s
Step 3: Enable features to make sure your Kubernetes cluster runs properly.
Optional: Enable GPU functionality if a GPU is available.
Step 4: Enable Kubeflow
Note: This command can take up to 30 minutes to complete. Be patient.
Once that command completes, you should see a success message with a username and password, which you can use to access your Kubeflow dashboard.
After the Kubeflow install is complete, navigate to the URL provided by the output.
Enter your credentials in the appropriate fields and click Login.
Option 2: Vagrant and VirtualBox
Overview
Vagrant is a tool for creating development environments that can be easily deployed on other workstations, regardless of the host operating system. VirtualBox is a mature open-source software that gives users the ability to create a launch custom VMs.
Step 1: Download and install Vagrant and Virtualbox
To start, you will need to install the Vagrant and VirtualBox software.
Note: If you are not up to installing and/or using either of these programs you can skip to Step 3, where you will use VMWare and a custom script to accomplish the same thing.
Step 2: Install MiniKF
To ensure that this solution works properly, your system should meet the following minimum requirements:
-
4 CPU
-
50 GB storage
-
12 GB memory
Open up a terminal (for Windows folks, I suggest Powershell as administrator as we will be running a PS command soon) and create a new directory.
Powershell:
Bash:
Note to Windows Users Only: If you receive a message stating that there was an error while executing `VBoxManage` (pictured below) you will need to disable Hyper-V and complete Step #2 again before continuing.
To disable Hyper-V open CMD prompt as Admin and run:
Reboot your computer and then turn Hyper-V back on.
If you receive the following error:
Run the following command instead:
Step 4: Navigate to the given URL
You should have been provided a URL to navigate to once your command from Step 3 completes.
Navigate to the given URL and wait for your MiniKF terminal to start.
Once you see this screen click ok and Kubeflow will begin provisioning. This process can take up to 30 mins so be patient.
Step 5: Login to the Kubeflow Console
Once the process finishes you will see a completion screen with a username and password.
Click connect to Kubeflow.
Enter your credentials in the appropriate fields and click Login.
Option 3: Full Install on Ubuntu
This solution is meant to be run on Ubuntu systems so you can either provision a VM using a hypervisor like VMWare, VirtualBox, or Google Cloud Platform, which offers a free tier. Alternately, you could run it on your local machine.
By the way, if you do provision a VM on GCP, remember that 2 vCPUs = 1 CPU. In order for your system to perform correctly, we suggest provisioning a VM with at least 32 vCPUs. The other thing is that you need to enable nested virtualization for your GCE instance. You can read about that here.
Installing using Google Compute Engine
Step 1: Create an Ubuntu boot disk.
Go to your GCP Console and open the Cloud Shell and run the following command to create a boot disk.
Step 2: Create an image with a license key required for nested virtualization.
Step 3: Create a GCE instance that uses the custom image we created.
Step 4: Verify that nested virtualization is enabled on your image.
Once your VM instance finishes initializing, check to make sure that virtualization is enabled. SSH into your instance and run the following command.
If you receive a non-zero response then you are good to go.
Step 5: Download and run Kubeflow install script.
SSH into your VM instance through the GCP console.
Once your shell is connected, run the following command to download and run the Kubeflow install script.
This script will take about 30 minutes to configure your environment. Be sure to save the provided IP address and port. You will need them in the following step.
Step #6 Create the proper firewall rules
To be able to connect to your Kubeflow dashboard, you need to create the proper firewall rules. [INGRESS_PORT] and [SECURE_INGRESS_PORT] are the ports that you copied from the output from the previous step.
Step #7 Get an access token to allow external connectivity to your VM.
Step #8 Use port forwarding to allow external access to your VM.
Step #9 Navigate to your Kubeflow Console
You are now ready to navigate to your Kubeflow console. Paste the external IP address of your instance along with the Port number from the previous step into your browser.
Enter your credentials in the appropriate fields and click login.
Conclusion
This tutorial showed you several ways that you can get a local Kubeflow environment up and running on your laptop or in the cloud. Kubeflow is one of the technologies that is leading the way in MLOps and mastering it will be an asset for anyone looking to step into a role as a Data Scientist or Data Engineer. Now that you have a working Kubeflow playground, go forth and build something awesome!