Skip to main content

PodOpsLifecycle

Kubernetes provides a set of default controllers for workload management, like StatefulSet, Deployment, DaemonSet for instances. While user services outside Kubernetes have difficulty to participate in the operation lifecycle of a pod.

PodOpsLifecycle attempts to provide Kubernetes administrators and developers with finer-grained control the entire lifecycle of a pod. For example, we can develop a controller to do some necessary things in both the PreCheck and PostCheck phases to avoid traffic loss.

Goals

  1. Provides extensibility that allows users to control the whole lifecycle of pods using the PodOpsLifecycle mechanism.
  2. Provide some concurrency, multi controllers can operate the pod in the same time. For example, when a pod is going to be updated, other controllers may want to delete it.
  3. All the lifecycle phases of a pod can be traced.

Proposal

User Stories

Story 1

As a developer that focuses on pod traffic, I should remove the endpoint once the readiness gate pod.kusionstack.io/service-ready set to false which means traffic to the pod should be turned off, and I should add the endpoint once the readiness gate pod.kusionstack.io/service-ready set to false and pod is ready which means traffic to the pod should be turned on.

The finalizer can be added and removed automatically if we implement interface ReconcileAdapter provided by resourceconsist controller.

Story 2

  1. As a developer that maintain a system that provide pod operations like update and scale, I should add the label operating.podopslifecycle.kusionstack.io/<id>=<time> and operation-type.podopslifecycle.kusionstack.io/<id>=<type> at the same time when I want to operate a pod.
  2. If the operation is completed I should remove the label operating.podopslifecycle.kusionstack.io/<id>=<time> and operation-type.podopslifecycle.kusionstack.io/<id>=<type> at the same time when.
  3. If I want to cancel the operation, I need to add the label undo-operation-type.podopslifecycle.kusionstack.io/<id>=<type>.

The sequence diagram below describes how to update a pod.

Story 3

As a developer that cares about pod operation observability, I can use the <id>=<time> and <id>=<type> in the labels to tracing a pod. The <time> is a unix nano time, and the <type> is a string that describe the operation type, and the <id> is a string that used in the whole operation lifecycle.

Design Details

  1. Podopslifecycle mechanism is provided by a mutating webhook server and a controller. The mutating webhook server will chage the labels at the right time, and the controller will set the readinessgate pod.kusionstack.io/service-ready to true or false if necessary. The controller will also chage the label at some time.
  2. The label operating.podopslifecycle.kusionstack.io/<id>=<time> and operation-type.podopslifecycle.kusionstack.io/<id>=<type> will be validated by a validating webhook server, they must be added or removed at the same time by the operation controller.
  3. Traffic controller should turn the traffic on or off based on the readiness gate pod.kusionstack.io/service-ready and pod condition Ready.
  4. Protection finalizer names must have prefix prot.podopslifecycle.kusionstack.io. They are used to determine whether the traffic has been completely removed or is fully prepared.
  5. The special label podopslifecycle.kusionstack.io/service-available indicate a pod is available to serve.
  6. We can use the message <id>=<time> and <id>=<type> in the labels to tracing a pod. The <time> is a unix time.

Below we use a sequence diagram to show how to use podopslifecycle mechanism to avoid traffic loss. You can also use this podopslifecycle mechanism to do others things, for example, to prevent tasks to be interrupted when they are need to run for a long time.