Kubeflow Pipelines allows you to build and run portable, scalable machine learning workflows using Kubernetes-managed containers.
Learn more about Kubeflow Pipelines from their project documentation page.
The K Flow platform comes with two main ways of interacting with Kubeflow Pipelines. The easier method is called Elyra, a Jupyter Lab extension that allows you to build reusable workflows without needing to write code. Advanced users can also work directly with the Kubeflow Pipelines SDK.
The main advantage of Elyra is the ability to use a pre-built catalog of components. K Flow offers a wide variety of pipeline components, and we are constantly developing new ones. Most pre-built examples that we provide will use Elyra, so it’s important to become familiar.
Although Elyra is very powerful, it has some key limitations that might prompt users with more advanced use cases to work d irectly with the Kubeflow Pipelines python SDK. There are two major versions of Kubeflow Pipelines, and as of June 2023 Elyra only supports the v1 SDK.
These instructions allow you to submit Kubeflow Pipeline jobs from a Kubeflow Notebook server. They must be run once any time you create a new setup.
In your Kubeflow notebook, open a terminal and run the following command. Be sure to update the S3_BUCKET_NAME
and KUBEFLOW_HOST
as appropriate.
S3_BUCKET_NAME=kflow-eks-dev-kfp \\
KUBEFLOW_HOST=dev.aws.kflow.ai \\
elyra-metadata create runtimes \\
--json="{ \\"display_name\\": \\"Kubeflow\\", \\"metadata\\": { \\"tags\\": [], \\"display_name\\": \\"Kubeflow\\", \\"engine\\": \\"Argo\\", \\"auth_type\\": \\"KUBERNETES_SERVICE_ACCOUNT_TOKEN\\", \\"api_endpoint\\": \\"<http://ml-pipeline.kubeflow.svc.cluster.local:8888>\\", \\"public_api_endpoint\\": \\"<https://$>{KUBEFLOW_HOST}/pipeline\\", \\"cos_auth_type\\": \\"AWS_IAM_ROLES_FOR_SERVICE_ACCOUNTS\\", \\"cos_endpoint\\": \\"<https://s3.$>{AWS_REGION}.amazonaws.com\\", \\"cos_bucket\\": \\"${S3_BUCKET_NAME}\\", \\"runtime_type\\": \\"KUBEFLOW_PIPELINES\\" }, \\"schema_name\\": \\"kfp\\" }" \\
--schema_name="kfp"
Then, set up your component catalog:
elyra-metadata create component-catalogs \\
--name="kflow" \\
--display_name "K Flow" \\
--paths="['/home/jovyan/shared/lib/pipeline-components'"] \\
--schema_name="local-directory-catalog" \\
--runtime_type="KUBEFLOW_PIPELINES"
In your Kubeflow notebook, open a terminal and run the following command. Be sure to update the KUBEFLOW_HOST
as appropriate.
KUBEFLOW_HOST=dev.gcp.kflow.ai \\
elyra-metadata create runtimes \\
--json="{ \\"display_name\\": \\"Kubeflow\\", \\"metadata\\": { \\"tags\\": [], \\"display_name\\": \\"Kubeflow\\", \\"engine\\": \\"Argo\\", \\"auth_type\\": \\"KUBERNETES_SERVICE_ACCOUNT_TOKEN\\", \\"api_endpoint\\": \\"<http://ml-pipeline.kubeflow.svc.cluster.local:8888>\\", \\"public_api_endpoint\\": \\"<https://$>{KUBEFLOW_HOST}/pipeline\\", \\"cos_auth_type\\": \\"[USER_CREDENTIALS](<https://kflowai.notion.site/d8f9ca556d63458d91dd4d5c6ffa0917>)\\", \\"cos_endpoint\\": \\"<http://minio-service.kubeflow.svc.cluster.local:9000>\\", \\"cos_bucket\\": \\"elyra\\", \\"cos_username\\": \\"minio\\", \\"cos_password\\": \\"minio123\\", \\"runtime_type\\": \\"KUBEFLOW_PIPELINES\\" }, \\"schema_name\\": \\"kfp\\" }" \\
--schema_name="kfp"
Then, set up your component catalog:
elyra-metadata create component-catalogs \\
--name="kflow" \\
--display_name "K Flow" \\
--paths="['/home/jovyan/shared/lib/pipeline-components'"] \\
--schema_name="local-directory-catalog" \\
--runtime_type="KUBEFLOW_PIPELINES"
TODO