Kubeflow


Kubeflow is a free and open-source machine learning platform co-founded by David Aronchick, Jeremy Lewi and Vishnu Kannan, built by developers at Google, Arrikto, Cisco, IBM, Red Hat, CoreOS and CaiCloud, and first released at Kubecon North America in 2017. Kubeflow is designed to enable using machine learning pipelines to orchestrate complicated workflows running on Kubernetes. Kubeflow was based on Google's internal method to deploy TensorFlow models to Kubernetes called TensorFlow Extended.

Kubeflow Overview

Kubeflow is a free and open-sourced project designed to make running Machine Learning workflows on Kubernetes clusters simpler and more coordinated. This is a Cloud-Native framework for employing Machine Learning in containerized environments in Kubernetes. Kubernetes has become the deployment platform of choice for organizations implementing machine learning. Kubeflow's integration with and extension of Kubernetes has become seamless and Kubeflow has been designed to run everywhere Kubernetes runs: on-prem, GCP, AWS, Azure, etc.
Kubeflow began as an internal Google project as a simpler & easier way to run TensorFlow jobs on Kubernetes, based specifically on the TensorFlow Extended pipeline. The Google open-source engineers David Aronchick, Jeremy Lewi and Vishnu Kannan co-founded the Kubeflow project and after its initial release a vast number of top software companies began publicly contributing to the GitHub issue board. Kubeflow is currently in a stable release version 0.7 released on November 4, 2019. Version 0.7 has some beta capabilities and Kubeflow version 1.0 is expected to be released Q1 2020.

What is Kubeflow?

At its core, Kubeflow offers an end-to-end ML stack orchestration toolkit to build on Kubernetes as a way to deploy, scale and manage complex systems. Features such as running JupyterHub servers allowing multiple users to contribute to a project simultaneously has become an invaluable asset of Kubeflow. Detailed management of a project and in depth monitoring/analyzing of said project are paramount attributes in Kubeflow.
Data scientists and engineers are now able to develop a complete pipeline composed of segmented steps. These segmented steps in Kubeflow are loosely coupled components of an ML pipeline, a feature not core to other frameworks, allowing pipelines to become easily reusable and modifiable for other jobs. This added flexibility has the potential to save an incalculable amount of labor necessary to develop a new data pipeline for each specific use case. Through this process, Kubeflow aims to simplify Kubernetes deployments while also accounting for future needs of portability and scalability.

Kubeflow Roadmap

Kubeflow 1.0 was announced to the public on February 26, 2020 via the Kubeflow blog post. The 1.0 release is available through the public github repository. The Kubeflow 1.0 release is the culmination of the stabilization efforts of the community and recognized as a significant maturation point of the Kubeflow platform. Through four separate 1.0 release candidates, the Kubeflow team was able to gather valuable feedback from enterprise level users to single cluster users. Specifically, Kubeflow 1.0 signifies the solidification of the following core Kubeflow components: Kubeflow's UI - the central dashboard, Jupyter notebook controller and web app, Tensorflow Operator and PyTorch Operator for distributed training, kfctl for deployment and upgrades, Profile Controller and UI for multiuser management.
The following table includes the issues in the Kubeflow 1.1 Roadmap project board on github. Currently there are 27 issues being prioritized for 1.1. Traditionally Kubeflow deploys new builds quarterly and is currently targeting the 1.1 release for June 2020.