Running Knative with KinD (Kubernetes in Docker) in Macbook Air M1

Introduction to deploying serverless apps in kubernetes in a simple way. Forget about the complexity of deployment, service, HPA, and other manifest

One of my clients is building an AI app running on top of a Kubernetes cluster with GPU. It's been more than a year I'm not touching anything about Kubernetes in production. What makes me interested is how the app is running: it's using Kubeflow's InferenceService, which is running on top of Knative.

Before we start, you need to install:

TLDR: Copy This Script

The script below will create kind cluster, install knative, run the hello world app and auto-scaling app.

brew install knative/client/kn
brew install knative-sandbox/kn-plugins/quickstart
kn quickstart kind --registry # this will also install kind local registry
kn service create hello --image ghcr.io/knative/helloworld-go:latest --port 8080 --env TARGET=World
kubectl apply -f https://raw.githubusercontent.com/knative/docs/main/docs/serving/autoscaling/autoscale-go/service.yaml

So, What Happened?

Check if everything is good

$ docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED       STATUS       PORTS                                              NAMES
a67f84478882   registry:2             "/entrypoint.sh /etc…"   6 hours ago   Up 6 hours   0.0.0.0:5001->5000/tcp                             kind-registry
40cb8f208db6   kindest/node:v1.26.6   "/usr/local/bin/entr…"   6 hours ago   Up 6 hours   127.0.0.1:60178->6443/tcp, 0.0.0.0:80->31080/tcp   knative-control-plane

$ docker stats
CONTAINER ID   NAME                    CPU %     MEM USAGE / LIMIT     MEM %     NET I/O          BLOCK I/O        PIDS
a67f84478882   kind-registry           0.10%     5.336MiB / 1.942GiB   0.27%     1.57kB / 0B      752MB / 24.6MB   6
40cb8f208db6   knative-control-plane   40.89%    1.225GiB / 1.942GiB   63.06%    279MB / 16.8MB   229GB / 7.53GB   581

$ kind get clusters # will show `knative` cluster
knative

$ kubectl get namespace # will show default namespace and knative namespace
NAME                 STATUS   AGE
default              Active   105m
+ knative-eventing     Active   101m
+ knative-serving      Active   103m
+ kourier-system       Active   102m
kube-node-lease      Active   105m
kube-public          Active   105m
kube-system          Active   105m
local-path-storage   Active   105m

$ kubectl get ksvc # it's ksvc, not svc. Will show knative services
NAME           URL                                              LATESTCREATED        LATESTREADY          READY   REASON
autoscale-go   http://autoscale-go.default.127.0.0.1.sslip.io   autoscale-go-00001   autoscale-go-00001   True
hello          http://hello.default.127.0.0.1.sslip.io          hello-00001          hello-00001          True

$ kubectl get deployment
NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
autoscale-go-00001-deployment   0/0     0            0           151m
hello-00001-deployment          0/0     0            0           4h16m

We can see that our first script will create a KinD cluster called knative. KinD is Kubernetes in Docker, so we can see the cluster and its status with docker command. But every container that runs inside kubernetes will be invisible in docker.

We have knative serving (that will response to HTTP request), knative eventing (that will response to event aka event-driven app) and kourier (for networking, previously knative use istio).

To see our running apps, we can see via k get ksvc aka knative service. Open the URL and you will get the response instantly (or you may get cold start). The interesting part is knative also creates a deployment and the replica is dynamic, if no traffic for 2 minutes (by default), then it will scale to zero. Give it a single traffic and it will change the replica to 1.

The first request will suffer a cold start, after that, we can get a more quick response. Let's look at a glance at the manifest for autoscale-go

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: autoscale-go
  namespace: default
spec:
  template:
    metadata:
      annotations:
        # Target 10 in-flight-requests per pod.
        autoscaling.knative.dev/target: "10"
    spec:
      containers:
      - image: ghcr.io/knative/autoscale-go:latest

It's a simple manifest that handles everything behind the scenes. We will have a deep talk about it in another post.

Performance Test to Trigger Autoscaling

Scale from 0 to 1 is cool, but 1 pod can't handle anything serious. So we will create some traffic with K6 performance tools. But before that, let's get the URL for app called autoscale-go

$ # Get URL for autoscale-go
$ kubectl get ksvc 
$ # Monitor if any update for our pod. You can change pod to deployment
$ kubectl get pod -l serving.knative.dev/service=autoscale-go -w

Copy the code and save it as perftest.js

import { sleep } from 'k6';
import http from 'k6/http';

// Each virtual user will run this function
export default function () {
  // Change the url 
  const appURL = "http://autoscale-go.default.127.0.0.1.sslip.io/?sleep=50&prime=10000&bloat=5"
  http.get(appURL);

  // sleep for 0.1 second before finishing the task
  sleep(0.1) 
}

Lastly, open new terminal and run k6

$ # Create and maintain 50 virtual users and run the code for 30s
$ k6 run --vus 50 --duration 30s perftest.js

Let's monitor how it will behave

After kicking the performance test, the deployment replica jumps from 1 to 7 (top left) and we can see how the lifecycle of the pod (top right). The cpu and memory usage of KinD cluster also jump from 30% and 60% to 282% and 81% respectively. It's good because we don't need to set anything funny to set up this simple app.

And how about the result?

For simplicity, we will only see the http_req_failed, http_req_duration and iterations. I think we already saturated the cluster since we only got 368 completed requests with an average 4.31s and p90 9.88s, fortunately, all requests have been served successfully. I tried it several times with different vus (e.g. 10, 20, and so on) and it got more completed requests with faster response time.

Knative Serverless is for Stateless App

Or watch full video:

Before you follow the hype, I should tell you this: Knative will be best if your app is stateless. If you have a vague idea about stateless, please read more about 12-factor app. TLDR, It's a guideline to make our app stateless, which means our app doesn't store any data (e.g. file, session, persistent data). Therefore, any data will be stored in a stateful backing service (e.g. postgres, redis). If you want to use some framework that use state by default (e.g. odoo, magento, etc), you should make some changes to be able to use it in Knative (or Kubernetes and docker in common).

What Next?

My goal is to understand this new way to deploy apps to Kubernetes. In this article, we discover a simple app that is far from production. In the future we will look how to implement more complex API, add environment variables and secret, add monitoring and logging, do rolling deployment, and ultimately implement it together with kubeflow for AI app.