Watch for Changes to Custom Resources Golang: A Practical Guide

Watch for Changes to Custom Resources Golang: A Practical Guide
watch for changes to custom resources golang
APIPark is a high-performance AI gateway that allows you to securely access the most comprehensive LLM APIs globally on the APIPark platform, including OpenAI, Anthropic, Mistral, Llama2, Google Gemini, and more.Try APIPark now! πŸ‘‡πŸ‘‡πŸ‘‡

Watch for Changes to Custom Resources Golang: A Practical Guide

The modern cloud-native landscape, dominated by Kubernetes, thrives on extensibility and automation. While Kubernetes provides a powerful set of built-in resources like Pods, Deployments, and Services, real-world applications often require custom abstractions to manage domain-specific concepts effectively. This is where Custom Resources (CRs) and Custom Resource Definitions (CRDs) come into play, allowing users to extend the Kubernetes API with their own resource types. However, merely defining a custom resource isn't enough; to harness its full potential, you need active components that observe these custom resources and react to their changes, orchestrating complex workflows. This is the domain of Kubernetes controllers, often written in Golang, the native language of Kubernetes itself.

This comprehensive guide will take you on a deep dive into the practical aspects of watching for changes to Custom Resources using Golang. We will unravel the intricacies of the client-go library, explore the powerful Shared Informers pattern, and walk through the step-by-step process of building a robust and efficient controller. By the end of this journey, you will possess a profound understanding of how to empower your Kubernetes clusters with intelligent automation, ensuring your custom resources are not just declarative data but active participants in your cloud infrastructure.

The Genesis of Custom Resources: Extending Kubernetes' Universe

Kubernetes, at its core, operates on a declarative model. Users declare their desired state – "I want three replicas of this application" – and Kubernetes continuously works to reconcile the actual state with the desired state. This model is exceptionally powerful, but its out-of-the-box resources might not always align perfectly with the unique abstractions required by every application or infrastructure component. Imagine you're building a platform that manages database instances, or perhaps an AI model deployment pipeline. You might want a resource type like DatabaseInstance or AIMLModel. Kubernetes, by default, doesn't know what these are.

This is precisely the problem that Custom Resource Definitions (CRDs) solve. A CRD allows you to define a new, arbitrary resource kind in your Kubernetes cluster. It's like telling Kubernetes, "Hey, there's a new type of object in town, and here's what it looks like." Once a CRD is registered, you can create instances of your custom resource (CRs) just like you would with a Pod or a Service, using standard Kubernetes tools like kubectl. These CRs are stored in the same etcd data store as native Kubernetes objects, benefit from Kubernetes' authentication and authorization mechanisms (RBAC), and can be managed with kubectl.

However, a CRD merely defines the schema; it doesn't imbue the new resource with any operational intelligence. Creating a DatabaseInstance CR doesn't magically provision a database. For that, you need a controller. A controller is a piece of software that "watches" specific resources (in our case, custom resources), observes changes (creations, updates, deletions), and takes action to bring the cluster's actual state closer to the desired state specified in the CR. This active reconciliation loop is the essence of building truly extensible and automated systems on Kubernetes. Without a controller, your custom resources are merely inert data points in the cluster's API.

The choice of Golang for writing these controllers is no accident. As the language in which Kubernetes itself is primarily written, Golang offers direct access to the client-go library, which is the official and most efficient way to interact with the Kubernetes API. Its strong typing, concurrency primitives (goroutines and channels), and robust tooling make it an ideal candidate for building high-performance, reliable, and scalable control planes that underpin the extensibility of Kubernetes.

Kubernetes Control Plane and the Reconciliation Pattern

Before diving into the code, it's crucial to grasp the fundamental architecture of how Kubernetes operates, particularly the concept of the control plane and its reconciliation loop. This understanding forms the bedrock upon which effective controllers are built.

The Kubernetes control plane comprises several key components: * API Server: The front end of the Kubernetes control plane. It exposes the Kubernetes API, which is the communication interface for users, management tools, and other cluster components. All operations on the cluster go through the API Server. * etcd: A highly available key-value store that serves as Kubernetes' backing store for all cluster data. All configuration data, state data, and metadata are stored here. * Controller Manager: A daemon that embeds the core control loops (controllers) shipped with Kubernetes. For example, the ReplicaSet controller ensures the desired number of pod replicas are running, and the Node controller manages node lifecycle. * Scheduler: Watches for newly created Pods with no assigned node and selects a node for them to run on. * kubelet: An agent that runs on each node in the cluster. It ensures that containers are running in a Pod.

The magic of Kubernetes lies in its reconciliation pattern. Each controller in the system continuously observes a specific set of resources in the cluster. It compares the "desired state" (as declared by the user in a resource's spec, or determined by other controllers) with the "actual state" of the cluster. If a discrepancy is found, the controller takes corrective actions to bring the actual state into alignment with the desired state. This loop runs constantly and autonomously, making Kubernetes a self-healing and self-managing system.

For instance, when you create a Deployment, the Deployment controller (part of the Controller Manager) notices this new resource. Its desired state is defined in the Deployment's spec. The controller then creates a ReplicaSet to manage the Pods. The ReplicaSet controller then notices its desired state (number of replicas) and proceeds to create Pods. The Scheduler then places these Pods on nodes, and the kubelet ensures the containers run. If a Pod crashes, the ReplicaSet controller notices the actual state (fewer Pods than desired) and creates a new one. This entire chain of events is driven by controllers constantly watching and reconciling.

When we build a custom controller for our Custom Resources, we are essentially extending this very same powerful reconciliation pattern. Our controller will watch for changes to our MyApplication CR, determine the desired state from its spec, and then interact with the Kubernetes API (e.g., creating Deployments, Services, ConfigMaps) or external systems (e.g., provisioning a database, updating an external api gateway) to achieve that desired state. This design philosophy is what makes Kubernetes so robust and extensible, allowing developers to automate complex operational tasks for any kind of workload.

Deep Dive into client-go: The Golang Interface to Kubernetes

To interact with the Kubernetes API from a Golang application, the k8s.io/client-go library is your essential toolkit. This library provides a set of powerful clients and utilities designed to simplify communication with the Kubernetes API Server, handle authentication, manage caching, and process events. Understanding its structure and key components is foundational to building any Kubernetes-aware application in Golang, especially controllers.

The client-go library isn't just a single, monolithic client; it's a collection of specialized clients, each tailored for different interaction patterns:

  1. Clientset (kubernetes.Clientset):
    • This is the most common entry point for interacting with standard Kubernetes resources like Pods, Deployments, Services, ConfigMaps, etc.
    • A Clientset aggregates clients for all built-in API groups (e.g., core/v1, apps/v1, networking.k8s.io/v1).
    • When you need to perform CRUD (Create, Read, Update, Delete) operations on these well-known resources, the Clientset is your go-to.
    • Example: clientset.AppsV1().Deployments("namespace").Create(ctx, deployment, metav1.CreateOptions{})
  2. Discovery Client (discovery.DiscoveryClient):
    • Used to discover the resources and API groups supported by the Kubernetes API Server.
    • You might use this if your application needs to dynamically adapt to different Kubernetes versions or custom resources that might or might not be present.
    • It helps determine what API versions are available for a given resource type.
  3. Dynamic Client (dynamic.Interface):
    • This is a highly flexible client that allows you to interact with any Kubernetes resource, including Custom Resources, without having specific Go types compiled into your application.
    • Instead of working with strongly typed structs (e.g., *appsv1.Deployment), you work with unstructured.Unstructured objects, which are essentially map[string]interface{} representations of Kubernetes objects.
    • The dynamic client is invaluable when dealing with Custom Resources whose Go types might not be readily available, or when building generic tools.
    • Example: dynamicClient.Resource(schema.GroupVersionResource{Group: "stable.example.com", Version: "v1", Resource: "myapplications"}).Namespace("default").Create(ctx, unstructuredObject, metav1.CreateOptions{})
  4. REST Client (rest.RESTClient):
    • The lowest-level client provided by client-go. It allows you to make raw HTTP requests to the Kubernetes API Server.
    • All other clients (Clientset, Discovery, Dynamic) are built on top of the RESTClient.
    • You typically wouldn't use this directly unless you need very fine-grained control over the HTTP requests or are interacting with a very specific, non-standard Kubernetes endpoint.

To use any of these clients, you first need to obtain a rest.Config, which holds the necessary information for authenticating and connecting to the Kubernetes API Server (e.g., host, authentication tokens, CA certificates). This configuration can be loaded from a kubeconfig file (for development outside a cluster) or from the service account token mounted in a Pod (for applications running inside the cluster).

While client-go provides excellent CRUD capabilities, continuously polling the API Server for changes is inefficient and can put undue strain on the control plane. This is where the concept of Shared Informers becomes critical. Shared Informers offer a much more efficient and scalable way to observe resource changes, forming the backbone of almost all production-grade Kubernetes controllers. They introduce a caching layer and event-driven mechanisms that significantly reduce API server load and simplify controller logic.

Shared Informers: Efficiently Observing Custom Resources

Continuously polling the Kubernetes API Server for changes to resources is not only inefficient but also problematic. It consumes excessive network bandwidth, puts unnecessary load on the API Server and etcd, and can lead to race conditions or missed events. Imagine trying to manage hundreds or thousands of custom resources by constantly querying their state – it quickly becomes unmanageable. The solution to this challenge, and the cornerstone of any robust Kubernetes controller, is the Shared Informer pattern provided by the client-go library.

Shared Informers efficiently observe resources by combining two key mechanisms:

  1. List-Watch Pattern: Instead of just polling, an informer first performs a LIST operation to get the current state of all relevant resources. It then establishes a persistent WATCH connection to the API Server. The API Server pushes real-time notifications (events) to the informer whenever a resource is created, updated, or deleted. This push-based model is far more efficient than polling.
  2. In-memory Cache: The informer maintains a local, in-memory cache of the resources it's watching. This cache is kept up-to-date by the events received from the WATCH stream. When a controller needs to retrieve a resource, it queries this local cache instead of hitting the API Server directly, drastically reducing API calls and improving performance.

Let's break down the core components that make up a Shared Informer:

  • Reflector: This component is responsible for the actual list-watch operation. It lists resources from the API Server and then maintains a watch connection, pushing received events into a queue.
  • DeltaFIFO: A buffered queue that stores events (add, update, delete) from the Reflector. It also handles event coalescing, ensuring that only the latest state of a resource is processed if multiple events for the same resource arrive quickly.
  • Indexer: The in-memory store that holds the actual resource objects. It uses the DeltaFIFO to update its cache based on incoming events. The Indexer supports efficient lookups by key (namespace/name) and can also build custom indices for more complex queries (e.g., by label).
  • Lister: Built on top of the Indexer, a Lister provides a convenient interface for retrieving objects from the local cache. It offers methods like Get() (by name) and List() (all objects or objects in a specific namespace). Crucially, the Lister only reads from the cache; it never hits the API Server.
  • Event Handlers (ResourceEventHandler): These are callback functions that your controller defines to react to specific types of events:
    • AddFunc(obj interface{}): Called when a new resource is added to the cluster.
    • UpdateFunc(oldObj, newObj interface{}): Called when an existing resource is updated. Both the old and new states are provided.
    • DeleteFunc(obj interface{}): Called when a resource is deleted.

The power of Shared Informers comes from the SharedInformerFactory. Instead of each controller component creating its own informer, a single SharedInformerFactory can be created for an API group or even for all API groups. This factory then manages a single set of Reflector, DeltaFIFO, and Indexer instances for each resource type. Multiple controllers can then register their ResourceEventHandler functions with the same underlying informer. This significantly reduces resource consumption (fewer API watches, smaller memory footprint for caches) and ensures consistency across different parts of your application that might be watching the same resources.

Here's a simplified illustration of the flow:

+------------------+     +-------------------+     +------------------+
| Kubernetes       |     | Shared Informer   |     | Your Controller  |
| API Server       |     |                   |     |                  |
+------------------+     +-------------------+     +------------------+
        ^                      ^       ^                 ^
        | LIST/WATCH           | Push  | Events          | Register
        | (initial state)      | Events| (Add/Update/Del)| Handlers
        |                      |       |                 |
        v                      v       v                 v
+------------------+     +-------------------+     +------------------+
| API Server ------|----->| Reflector ------->| DeltaFIFO --------->| Indexer --------->| Your Event Handler |
|                  |     |                   | (Events Queued)   | (In-memory Cache)| (Reconciliation Logic) |
| (Resource Events)|     |                   |                   |                  |                  |
+------------------+     +-------------------+     +------------------+     +------------------+
                                        ^
                                        | Queries
                                        | (for current state)
                                        v
                                  +------------------+
                                  | Lister (reads    |
                                  | from Indexer)    |
                                  +------------------+

This elegant pattern ensures that your controller receives timely notifications of changes, has access to an up-to-date local cache of resources, and operates efficiently without overwhelming the Kubernetes control plane. It's the standard for building reactive, high-performance applications within the Kubernetes ecosystem.

Let's look at a basic example of setting up a shared informer for standard Kubernetes resources, like Pods, to solidify this understanding before moving to custom resources. This example will demonstrate the core client-go components involved.

package main

import (
    "context"
    "fmt"
    "os"
    "os/signal"
    "syscall"
    "time"

    corev1 "k8s.io/api/core/v1"
    "k8s.io/client-go/informers"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/cache"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/klog/v2"
)

func main() {
    klog.InitFlags(nil)
    defer klog.Flush()

    // 1. Load kubeconfig
    kubeconfigPath := os.Getenv("KUBECONFIG")
    if kubeconfigPath == "" {
        kubeconfigPath = "~/.kube/config" // Default path
    }
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfigPath)
    if err != nil {
        klog.Fatalf("Error building kubeconfig: %v", err)
    }

    // 2. Create a Clientset
    clientset, err := kubernetes.NewForConfig(config)
    if err != nil {
        klog.Fatalf("Error creating clientset: %v", err)
    }

    // 3. Create a SharedInformerFactory
    // Resync period means how often the informer re-lists all objects from API server.
    // 0 means no periodic full re-list; events drive updates.
    // For production, a non-zero resync (e.g., 30s) can help recover from missed events,
    // but generally 0 is fine if the watch stream is robust.
    factory := informers.NewSharedInformerFactory(clientset, 0)

    // 4. Get an Informer for Pods
    podInformer := factory.Core().V1().Pods().Informer()

    // 5. Register Event Handlers
    podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: func(obj interface{}) {
            pod := obj.(*corev1.Pod)
            klog.Infof("Pod Added: %s/%s", pod.Namespace, pod.Name)
        },
        UpdateFunc: func(oldObj, newObj interface{}) {
            oldPod := oldObj.(*corev1.Pod)
            newPod := newObj.(*corev1.Pod)
            if oldPod.ResourceVersion == newPod.ResourceVersion {
                // Periodic resync will send update events for the same object.
                // This is only an update if the resource version changes.
                return
            }
            klog.Infof("Pod Updated: %s/%s (old RV: %s, new RV: %s)",
                newPod.Namespace, newPod.Name, oldPod.ResourceVersion, newPod.ResourceVersion)
        },
        DeleteFunc: func(obj interface{}) {
            // Ingore the 'Tombstone' case for simplicity.
            pod, ok := obj.(*corev1.Pod)
            if !ok {
                tombstone, ok := obj.(cache.DeletedFinalStateUnknown)
                if !ok {
                    klog.Errorf("error decoding object, invalid type")
                    return
                }
                pod, ok = tombstone.Obj.(*corev1.Pod)
                if !ok {
                    klog.Errorf("error decoding object tombstone, invalid type")
                    return
                }
            }
            klog.Infof("Pod Deleted: %s/%s", pod.Namespace, pod.Name)
        },
    })

    // Create a context that can be cancelled.
    ctx, cancel := context.WithCancel(context.Background())

    // Handle OS signals for graceful shutdown
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    go func() {
        <-sigChan
        klog.Info("Received termination signal, shutting down gracefully...")
        cancel() // Signal the context to cancel
    }()

    // 6. Start the Informers
    // This will start the Reflector, DeltaFIFO, and Indexer for all informers in the factory.
    factory.Start(ctx.Done())

    // 7. Wait for all caches to be synced
    // This ensures the local cache is populated before your controller starts processing events.
    klog.Info("Waiting for informer caches to sync...")
    if !cache.WaitForCacheSync(ctx.Done(), podInformer.HasSynced) {
        klog.Fatalf("Failed to sync pod informer cache")
    }
    klog.Info("Informer caches synced successfully!")

    // Keep the main goroutine running until context is cancelled
    <-ctx.Done()
    klog.Info("Controller stopped.")
}

This example lays out the fundamental blueprint for using Shared Informers. We establish a connection to Kubernetes, create an informer factory, obtain an informer for Pods, register our desired event handlers, start the factory, and wait for its caches to synchronize. Once synced, our AddFunc, UpdateFunc, and DeleteFunc will be invoked whenever a Pod changes in the cluster. This forms the reactive core of our controller. The next logical step is to apply this pattern to our own Custom Resources.

Crafting Your Own Custom Resource Definition (CRD)

Before we can watch a custom resource, we first need to define its schema and register it with Kubernetes. This is done through a Custom Resource Definition (CRD). A CRD tells Kubernetes about the new resource type, including its name, scope (namespaced or cluster-scoped), and importantly, its validation schema. The validation schema leverages OpenAPI v3 to enforce data integrity for your custom resources, ensuring that users provide valid configurations.

Let's design a simple custom resource to manage a fictional application. We'll call it MyApplication. This application might have a desired image, replica count, and perhaps a specific port it needs.

Here's an example of a MyApplication CRD:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: myapplications.stable.example.com
spec:
  group: stable.example.com # The API group for your CRD
  versions:
    - name: v1 # The version of your CRD
      served: true
      storage: true # This version is used for storage in etcd
      schema:
        openAPIV3Schema: # OpenAPI v3 schema for validation
          type: object
          properties:
            apiVersion:
              type: string
            kind:
              type: string
            metadata:
              type: object
            spec: # The specification for your MyApplication
              type: object
              properties:
                image:
                  type: string
                  description: The container image to use for the application.
                  minLength: 1
                replicas:
                  type: integer
                  description: The desired number of application replicas.
                  minimum: 1
                  default: 1
                port:
                  type: integer
                  description: The port the application listens on.
                  minimum: 80
                  maximum: 65535
              required:
                - image
            status: # The status of your MyApplication, managed by the controller
              type: object
              properties:
                availableReplicas:
                  type: integer
                  description: The number of currently available application replicas.
                conditions:
                  type: array
                  items:
                    type: object
                    properties:
                      type:
                        type: string
                      status:
                        type: string
                      message:
                        type: string
                      lastTransitionTime:
                        type: string
                        format: date-time
      # Adding subresources to enable /status and /scale endpoints
      subresources:
        status: {} # Enables /status subresource, allowing updates without changing spec
        # scale: # Uncomment and configure if you want Horizontal Pod Autoscaler to work
        #   specReplicasPath: .spec.replicas
        #   statusReplicasPath: .status.availableReplicas
        #   labelSelectorPath: .status.selector
  scope: Namespaced # CRs of this type will be tied to a namespace
  names:
    plural: myapplications # Used in kubectl get myapplications
    singular: myapplication # Used in kubectl get myapplication
    kind: MyApplication # The Kind field in your CR YAML
    shortNames:
      - ma # Optional short name for kubectl

Let's break down key parts of this CRD:

  • group: stable.example.com: This defines the API group for your custom resource. It's conventional to use a domain you own to prevent naming collisions.
  • versions: A CRD can have multiple versions. v1 is our initial version.
    • served: true: This version is exposed via the API.
    • storage: true: This version is used to store the resource in etcd. You can only have one storage version.
    • schema.openAPIV3Schema: This is crucial. It defines the structure and validation rules for your custom resource's spec and status fields using the OpenAPI schema specification. This schema ensures that any MyApplication CR created in the cluster conforms to the expected data types and constraints (e.g., replicas must be an integer, image must be a string). This provides strong type checking and immediate feedback to users who create malformed CRs.
      • spec: Contains the user-defined desired state.
      • status: Contains the controller-managed actual state.
  • subresources.status: {}: This enables the /status subresource. It's a best practice to separate spec updates from status updates. A controller updates the status of a CR, and this subresource allows the controller to do so without requiring read-write access to the entire CR (i.e., it can't accidentally change the spec).
  • scope: Namespaced: This means MyApplication resources will belong to specific namespaces, just like Pods or Deployments. You can also define Cluster scoped CRDs if your resource is global to the cluster.
  • names: These fields define how your custom resource will be referred to within Kubernetes and by kubectl.

To deploy this CRD to your cluster, save it as myapplication-crd.yaml and run:

kubectl apply -f myapplication-crd.yaml

Once the CRD is installed, you can create instances of your custom resource:

# myapplication-example.yaml
apiVersion: stable.example.com/v1
kind: MyApplication
metadata:
  name: my-first-app
  namespace: default
spec:
  image: "nginx:latest"
  replicas: 3
  port: 8080
---
apiVersion: stable.example.com/v1
kind: MyApplication
metadata:
  name: another-app
  namespace: my-dev-namespace
spec:
  image: "ubuntu/apache2:latest"
  replicas: 2
  port: 80

Apply these examples:

kubectl apply -f myapplication-example.yaml

You can now use kubectl get myapplications, kubectl describe myapplication my-first-app, etc., just like any built-in resource. However, as noted earlier, nothing happens yet when you create these. That's the controller's job.

Generating Client-go Code for Custom Resources

With our CRD defined, the next step in building our Golang controller is to generate the necessary client-go compatible code for our custom resource. While the dynamic client can interact with arbitrary resources, working with strongly typed Go structs for your custom resources is generally preferred for type safety, code readability, and leveraging Go's compiler for error detection.

The k8s.io/code-generator project provides a set of tools to automatically generate this client code based on your custom resource's Go type definitions. This includes:

  • client-gen: Generates a Clientset specific to your custom API group, allowing you to interact with your custom resources directly (e.g., clientset.StableV1().MyApplications(...)).
  • lister-gen: Generates Lister interfaces for your custom resources, enabling efficient read access to the informer's local cache.
  • informer-gen: Generates Informer interfaces and SharedInformerFactory implementations for your custom resources, completing the efficient watch pattern.
  • deepcopy-gen: Generates DeepCopy methods for your custom Go types, essential for safe manipulation of Kubernetes objects (which are often passed by value or reference) and avoiding unintended side effects in your controller.

To use these generators, you typically need to define your custom resource's Go types in a specific directory structure and add special // +kubebuilder or // +genclient comments (known as "markers") to instruct the generators.

Let's establish the directory structure and define our MyApplication Go types:

my-custom-controller/
β”œβ”€β”€ main.go
β”œβ”€β”€ go.mod
β”œβ”€β”€ go.sum
└── pkg/
    └── apis/
        └── stable/
            └── v1/
                β”œβ”€β”€ doc.go
                └── types.go

pkg/apis/stable/v1/doc.go: This file contains API group and version information for code generation.

// +k8s:deepcopy-gen=package,register
// +groupName=stable.example.com

// Package v1 is the v1 version of the stable.example.com API.
package v1
  • +k8s:deepcopy-gen=package,register: This marker tells deepcopy-gen to generate deep copy methods for all types in this package and to register them with the scheme.
  • +groupName=stable.example.com: This marker specifies the API group name, matching our CRD.

pkg/apis/stable/v1/types.go: This file defines the Go structs for your MyApplication custom resource, mirroring the OpenAPI schema we defined in the CRD.

package v1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// +genclient
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// MyApplication is the Schema for the myapplications API
type MyApplication struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   MyApplicationSpec   `json:"spec,omitempty"`
    Status MyApplicationStatus `json:"status,omitempty"`
}

// MyApplicationSpec defines the desired state of MyApplication
type MyApplicationSpec struct {
    Image    string `json:"image"`
    Replicas int32  `json:"replicas"`
    Port     int32  `json:"port"`
}

// MyApplicationStatus defines the observed state of MyApplication
type MyApplicationStatus struct {
    AvailableReplicas int32               `json:"availableReplicas"`
    Conditions        []MyApplicationCondition `json:"conditions,omitempty"`
}

// MyApplicationConditionType is a valid value for MyApplicationCondition.Type
type MyApplicationConditionType string

// These are valid conditions of a MyApplication.
const (
    // MyApplicationReady means the MyApplication is ready.
    MyApplicationReady MyApplicationConditionType = "Ready"
    // MyApplicationProgressing means the MyApplication is in progress.
    MyApplicationProgressing MyApplicationConditionType = "Progressing"
    // MyApplicationFailed means the MyApplication has failed.
    MyApplicationFailed MyApplicationConditionType = "Failed"
)

// MyApplicationCondition describes the state of a MyApplication at a certain point.
type MyApplicationCondition struct {
    Type               MyApplicationConditionType `json:"type"`
    Status             metav1.ConditionStatus     `json:"status"` // Can be True, False, Unknown
    LastTransitionTime metav1.Time                `json:"lastTransitionTime,omitempty"`
    Reason             string                     `json:"reason,omitempty"`
    Message            string                     `json:"message,omitempty"`
}


// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// MyApplicationList contains a list of MyApplication
type MyApplicationList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []MyApplication `json:"items"`
}
  • +genclient: This marker indicates that client-gen should generate a client for the MyApplication type.
  • +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object: This ensures deepcopy-gen generates the necessary interfaces and methods for MyApplication and MyApplicationList to be considered Kubernetes objects.
  • The Go struct fields (Image, Replicas, Port, etc.) correspond to the properties defined in your CRD's OpenAPI schema, including json tags for serialization/deserialization.

Now, we need to set up the code-generator scripts. A common practice is to have a hack/update-codegen.sh script:

#!/usr/bin/env bash

set -o errexit
set -o nounset
set -o pipefail

# Define package paths
MY_ORG="example.com"
MY_PKG="my-custom-controller"
APIS_PKG="${MY_PKG}/pkg/apis"
# This is the path to the Kubernetes code-generator repository, e.g., k8s.io/code-generator
K8S_CODEGEN_PKG=$(go env GOPATH)/pkg/mod/k8s.io/code-generator@v0.29.0 # Adjust version as needed

if [[ ! -d "${K8S_CODEGEN_PKG}" ]]; then
    echo "ERROR: k8s.io/code-generator not found at ${K8S_CODEGEN_PKG}. Please run 'go mod download k8s.io/code-generator@v0.29.0'."
    exit 1
fi

# Define the output base directory for generated files
OUTPUT_BASE=$(dirname "${BASH_SOURCE[0]}")/..

# Path to the code-generator scripts
CODEGEN_SCRIPT_ROOT="${K8S_CODEGEN_PKG}"

echo "Generating deepcopy functions..."
# --output-base defines where the generated files will be placed.
# --go-header-file ensures a consistent header in generated files.
# --output-package points to the root of your Go project.
${CODEGEN_SCRIPT_ROOT}/generate-groups.sh all \
  "${APIS_PKG}/clientset" \
  "${APIS_PKG}" \
  "stable:v1" \
  --output-base "${OUTPUT_BASE}" \
  --go-header-file "${CODEGEN_SCRIPT_ROOT}/hack/boilerplate.go.txt"

echo "Code generation complete."

Make sure you replace v0.29.0 with the version of k8s.io/code-generator that matches your client-go version in your go.mod file to avoid compatibility issues.

Initialize your Go module and download dependencies:

go mod init example.com/my-custom-controller
go get k8s.io/apimachinery@v0.29.0 k8s.io/client-go@v0.29.0 k8s.io/code-generator@v0.29.0
go mod tidy

Now, run the generation script:

chmod +x hack/update-codegen.sh
./hack/update-codegen.sh

After running this script, you will find new directories and files under pkg/apis/stable/v1/ and pkg/apis/clientset/. Specifically: * pkg/apis/stable/v1/zz_generated.deepcopy.go: Contains DeepCopy methods. * pkg/apis/clientset/: Contains your custom Clientset. * pkg/apis/listers/: Contains Lister interfaces for your custom types. * pkg/apis/informers/: Contains Informer implementations and SharedInformerFactory for your custom types.

This generated code is the bridge between your Go application and your custom resources, allowing you to use strongly typed Go objects and benefit from the efficient informer pattern. We are now ready to build the controller logic.

Implementing a Custom Resource Controller in Golang

With our CRD deployed and client-go code generated, we can now proceed to build the actual controller logic. The core responsibility of a controller is to watch for changes to MyApplication resources and then take actions to reconcile the desired state (specified in the CR's spec) with the actual state of the cluster. This involves creating, updating, or deleting other Kubernetes resources (like Deployments, Services, ConfigMaps) or interacting with external systems.

A robust controller typically follows a well-established pattern:

  1. Initialization: Set up Kubernetes client, informers, and a workqueue.
  2. Event Handling: Register event handlers (AddFunc, UpdateFunc, DeleteFunc) to receive notifications when CRs change. These handlers typically enqueue the changed resource's key (namespace/name) into a workqueue.
  3. Worker Pool: Start multiple worker goroutines that continuously pull items from the workqueue.
  4. Reconciliation Loop (syncHandler): The heart of the controller. For each item pulled from the workqueue, this function:
    • Retrieves the latest state of the custom resource from the informer's cache.
    • Compares the desired state (from the CR's spec) with the actual state of related resources in the cluster.
    • Performs necessary actions to achieve the desired state (e.g., create a Deployment, update a Service).
    • Updates the status field of the custom resource to reflect the actual state.
    • Handles errors and retries.

Let's start building our main.go for the controller.

package main

import (
    "context"
    "fmt"
    "os"
    "os/signal"
    "syscall"
    "time"

    "example.com/my-custom-controller/pkg/apis/clientset/versioned"
    stableinformers "example.com/my-custom-controller/pkg/apis/informers/externalversions"
    stablev1 "example.com/my-custom-controller/pkg/apis/stable/v1"
    stablelisters "example.com/my-custom-controller/pkg/apis/listers/stable/v1"

    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/errors"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/util/intstr"
    "k8s.io/apimachinery/pkg/util/runtime"
    "k8s.io/apimachinery/pkg/util/wait"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/kubernetes/scheme"
    appsinformers "k8s.io/client-go/informers/apps/v1"
    coreinformers "k8s.io/client-go/informers/core/v1"
    appslisters "k8s.io/client-go/listers/apps/v1"
    corelisters "k8s.io/client-go/listers/core/v1"
    "k8s.io/client-go/tools/cache"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/util/workqueue"
    "k8s.io/klog/v2"
)

const controllerAgentName = "my-application-controller"

// Controller is the controller for MyApplication resources
type Controller struct {
    kubeclientset    kubernetes.Interface // Client for standard Kubernetes resources
    myAppclientset   versioned.Interface  // Client for MyApplication custom resources

    deploymentsLister appslisters.DeploymentLister
    deploymentsSynced cache.InformerSynced

    servicesLister corelisters.ServiceLister
    servicesSynced cache.InformerSynced

    myApplicationsLister stablelisters.MyApplicationLister
    myApplicationsSynced cache.InformerSynced

    workqueue workqueue.RateLimitingInterface // Workqueue for processing MyApplication keys
}

// NewController returns a new MyApplication controller
func NewController(
    kubeclientset kubernetes.Interface,
    myAppclientset versioned.Interface,
    deploymentInformer appsinformers.DeploymentInformer,
    serviceInformer coreinformers.ServiceInformer,
    myApplicationInformer stableinformers.MyApplicationInformer) *Controller {

    klog.Info("Setting up event handlers")

    // Create the controller
    controller := &Controller{
        kubeclientset:    kubeclientset,
        myAppclientset:   myAppclientset,
        deploymentsLister: deploymentInformer.Lister(),
        deploymentsSynced: deploymentInformer.Informer().HasSynced,
        servicesLister:    serviceInformer.Lister(),
        servicesSynced:    serviceInformer.Informer().HasSynced,
        myApplicationsLister: myApplicationInformer.Lister(),
        myApplicationsSynced: myApplicationInformer.Informer().HasSynced,
        workqueue:            workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "MyApplications"),
    }

    // Register event handlers for MyApplication resources
    myApplicationInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: controller.enqueueMyApplication,
        UpdateFunc: func(old, new interface{}) {
            controller.enqueueMyApplication(new)
        },
        DeleteFunc: controller.enqueueMyApplicationForDelete,
    })

    // Register event handlers for Deployments created by this controller
    // These handlers will re-enqueue the parent MyApplication if a child Deployment changes
    deploymentInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc:    controller.handleObject,
        UpdateFunc: func(old, new interface{}) { controller.handleObject(new) },
        DeleteFunc: controller.handleObject,
    })

    // Register event handlers for Services created by this controller
    serviceInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc:    controller.handleObject,
        UpdateFunc: func(old, new interface{}) { controller.handleObject(new) },
        DeleteFunc: controller.handleObject,
    })

    return controller
}

// enqueueMyApplication takes a MyApplication resource and converts it into a namespace/name
// string which is then put onto the work queue. This method should *not* be
// passed object. It is intended to be used only as a callback for a shared
// informer.
func (c *Controller) enqueueMyApplication(obj interface{}) {
    var key string
    var err error
    if key, err = cache.MetaNamespaceKeyFunc(obj); err != nil {
        runtime.HandleError(err)
        return
    }
    c.workqueue.Add(key)
}

// enqueueMyApplicationForDelete ensures that even if a MyApplication is deleted,
// its associated objects are cleaned up.
func (c *Controller) enqueueMyApplicationForDelete(obj interface{}) {
    var key string
    var err error
    if key, err = cache.MetaNamespaceKeyFunc(obj); err != nil {
        runtime.HandleError(err)
        return
    }
    c.workqueue.Add(key)
}

// handleObject will take any resource that belongs to a MyApplication controller.
// It will retrieve the parent MyApplication and add that MyApplication to the workqueue.
func (c *Controller) handleObject(obj interface{}) {
    var object metav1.Object
    var ok bool
    if object, ok = obj.(metav1.Object); !ok {
        tombstone, ok := obj.(cache.DeletedFinalStateUnknown)
        if !ok {
            runtime.HandleError(fmt.Errorf("error decoding object, invalid type"))
            return
        }
        object, ok = tombstone.Obj.(metav1.Object)
        if !ok {
            runtime.HandleError(fmt.Errorf("error decoding object tombstone, invalid type"))
            return
        }
        klog.V(4).Infof("Recovered deleted object '%s' from tombstone", object.GetName())
    }
    klog.V(4).Infof("Processing object: %s", object.GetName())
    // Get owner reference for the object.
    // If the object has an owner reference that points to a MyApplication,
    // enqueue that MyApplication for reconciliation.
    if ownerRef := metav1.Get ")
            return
        }
        c.workqueue.Add(key)
        return
    }
}

// Run will set up the event handlers for types we are interested in, as well
// as syncing informer caches and starting workers. It will block until ctx.Done() is closed.
func (c *Controller) Run(workers int, stopCh <-chan struct{}) error {
    defer runtime.HandleCrash()
    defer c.workqueue.ShutDown()

    klog.Info("Starting MyApplication controller")

    klog.Info("Waiting for informer caches to sync")
    if ok := cache.WaitForCacheSync(stopCh, c.deploymentsSynced, c.servicesSynced, c.myApplicationsSynced); !ok {
        return fmt.Errorf("failed to wait for caches to sync")
    }
    klog.Info("Informer caches synced successfully")

    for i := 0; i < workers; i++ {
        go wait.Until(c.runWorker, time.Second, stopCh)
    }

    klog.Info("Started workers")
    <-stopCh
    klog.Info("Shutting down workers")

    return nil
}

// runWorker is a long-running function that will continually call the
// processNextWorkItem function in order to read and process a message on the
// workqueue.
func (c *Controller) runWorker() {
    for c.processNextWorkItem() {
    }
}

// processNextWorkItem will read a single item from the workqueue and
// attempt to process it, by calling the syncHandler.
func (c *Controller) processNextWorkItem() bool {
    obj, shutdown := c.workqueue.Get()

    if shutdown {
        return false
    }

    // We wrap this block in a func so we can defer c.workqueue.Done.
    err := func(obj interface{}) error {
        defer c.workqueue.Done(obj)
        var key string
        var ok bool
        if key, ok = obj.(string); !ok {
            c.workqueue.Forget(obj)
            runtime.HandleError(fmt.Errorf("expected string in workqueue but got %#v", obj))
            return nil
        }
        // Run the syncHandler, passing it the namespace/name string of the
        // MyApplication resource to be synced.
        if err := c.syncHandler(key); err != nil {
            // Put the item back on the workqueue to handle any transient errors.
            c.workqueue.AddRateLimited(key)
            return fmt.Errorf("error syncing '%s': %s, requeuing", key, err.Error())
        }
        // If no error occurs we Forget this item so it does not get queued again until another change happens.
        c.workqueue.Forget(obj)
        klog.Infof("Successfully synced '%s'", key)
        return nil
    }(obj)

    if err != nil {
        runtime.HandleError(err)
        return true
    }

    return true
}

// syncHandler compares the actual state with the desired, and attempts to
// converge the two. It fetches the MyApplication resource, then checks
// for the existence of an associated Deployment and Service, creating or
// updating them as needed.
func (c *Controller) syncHandler(key string) error {
    namespace, name, err := cache.SplitMetaNamespaceKey(key)
    if err != nil {
        runtime.HandleError(fmt.Errorf("invalid resource key: %s", key))
        return nil
    }

    // Get the MyApplication resource from the lister.
    myApp, err := c.myApplicationsLister.MyApplications(namespace).Get(name)
    if err != nil {
        // The MyApplication resource may no longer exist, in which case we stop processing.
        if errors.IsNotFound(err) {
            klog.V(2).Infof("MyApplication '%s' in work queue no longer exists", key)
            // Cleanup child resources if MyApplication is deleted.
            // In a real controller, you would typically use finalizers for robust deletion.
            return c.cleanupChildResources(namespace, name)
        }
        return err
    }

    // 1. Ensure Deployment exists and matches the MyApplication's spec
    deployment, err := c.deploymentsLister.Deployments(myApp.Namespace).Get(deploymentName(myApp))
    if errors.IsNotFound(err) {
        klog.Infof("Creating new Deployment for MyApplication '%s/%s'", myApp.Namespace, myApp.Name)
        deployment, err = c.kubeclientset.AppsV1().Deployments(myApp.Namespace).Create(context.TODO(), newDeployment(myApp), metav1.CreateOptions{})
    } else if err != nil {
        return err
    } else if !metav1.Is </my_custom_controller/pkg/apis/clientset/versioned> API Management Platform](https://apipark.com/) can provide robust capabilities for managing and securing these exposed endpoints. APIPark, as an open-source AI Gateway and `API` Management Platform, offers features like unified `API` formats for various services (including AI models configured via CRs), prompt encapsulation, and comprehensive lifecycle management. This ensures that even as a controller intelligently configures underlying infrastructure components, the resultant `API`s are well-governed, discoverable, and secure for consumption.

    // 3. Update the MyApplication status
    return c.updateMyApplicationStatus(myApp, deployment)
}

// cleanupChildResources deletes the Deployment and Service associated with a deleted MyApplication.
func (c *Controller) cleanupChildResources(namespace, name string) error {
    klog.Infof("Cleaning up child resources for deleted MyApplication '%s/%s'", namespace, name)

    // Delete Deployment
    deploymentName := deploymentName(&stablev1.MyApplication{ObjectMeta: metav1.ObjectMeta{Name: name}})
    err := c.kubeclientset.AppsV1().Deployments(namespace).Delete(context.TODO(), deploymentName, metav1.DeleteOptions{})
    if err != nil && !errors.IsNotFound(err) {
        return fmt.Errorf("failed to delete Deployment '%s': %w", deploymentName, err)
    }
    klog.Infof("Deleted Deployment '%s/%s'", namespace, deploymentName)

    // Delete Service
    serviceName := serviceName(&stablev1.MyApplication{ObjectMeta: metav1.ObjectMeta{Name: name}})
    err = c.kubeclientset.CoreV1().Services(namespace).Delete(context.TODO(), serviceName, metav1.DeleteOptions{})
    if err != nil && !errors.IsNotFound(err) {
        return fmt.Errorf("failed to delete Service '%s': %w", serviceName, err)
    }
    klog.Infof("Deleted Service '%s/%s'", namespace, serviceName)

    return nil
}

// updateMyApplicationStatus updates the status of the MyApplication resource
func (c *Controller) updateMyApplicationStatus(myApp *stablev1.MyApplication, deployment *appsv1.Deployment) error {
    // NEVER modify objects in the store. It was pulled from the store and is immutable.
    // Therefore, make a copy to modify it and then reconcile it.
    myAppCopy := myApp.DeepCopy()
    myAppCopy.Status.AvailableReplicas = deployment.Status.AvailableReplicas

    // Set a simple condition based on deployment status
    condition := stablev1.MyApplicationCondition{
        Type:               stablev1.MyApplicationProgressing,
        Status:             metav1.ConditionUnknown, // Default
        LastTransitionTime: metav1.Now(),
        Reason:             "Reconciling",
        Message:            "Controller is processing application state.",
    }

    if deployment.Status.ReadyReplicas == myApp.Spec.Replicas {
        condition.Type = stablev1.MyApplicationReady
        condition.Status = metav1.ConditionTrue
        condition.Reason = "DeploymentReady"
        condition.Message = fmt.Sprintf("Deployment has %d ready replicas.", deployment.Status.ReadyReplicas)
    } else if deployment.Status.ObservedGeneration < deployment.Generation {
        condition.Type = stablev1.MyApplicationProgressing
        condition.Status = metav1.ConditionFalse // Still progressing, not yet ready
        condition.Reason = "DeploymentInProgress"
        condition.Message = "Deployment is rolling out new changes."
    } else {
        condition.Type = stablev1.MyApplicationFailed
        condition.Status = metav1.ConditionFalse
        condition.Reason = "DeploymentNotReady"
        condition.Message = "Deployment is not fully ready."
    }
    myAppCopy.Status.Conditions = []stablev1.MyApplicationCondition{condition}


    // If the Custom Resource's status is not updated,
    // a controller will not be able to see the state of their objects.
    _, err := c.myAppclientset.StableV1().MyApplications(myApp.Namespace).UpdateStatus(context.TODO(), myAppCopy, metav1.UpdateOptions{})
    return err
}

// newDeployment creates a new Deployment for a MyApplication resource.
func newDeployment(myApp *stablev1.MyApplication) *appsv1.Deployment {
    labels := labelsForMyApplication(myApp.Name)
    return &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:      deploymentName(myApp),
            Namespace: myApp.Namespace,
            OwnerReferences: []metav1.OwnerReference{
                *metav1.NewControllerRef(myApp, stablev1.SchemeGroupVersion.WithKind("MyApplication")),
            },
            Labels: labels,
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: &myApp.Spec.Replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: labels,
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: labels,
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{
                        {
                            Name:  "my-application",
                            Image: myApp.Spec.Image,
                            Ports: []corev1.ContainerPort{
                                {
                                    ContainerPort: myApp.Spec.Port,
                                },
                            },
                        },
                    },
                },
            },
        },
    }
}

// newService creates a new Service for a MyApplication resource.
func newService(myApp *stablev1.MyApplication) *corev1.Service {
    labels := labelsForMyApplication(myApp.Name)
    return &corev1.Service{
        ObjectMeta: metav1.ObjectMeta{
            Name:      serviceName(myApp),
            Namespace: myApp.Namespace,
            OwnerReferences: []metav1.OwnerReference{
                *metav1.NewControllerRef(myApp, stablev1.SchemeGroupVersion.WithKind("MyApplication")),
            },
            Labels: labels,
        },
        Spec: corev1.ServiceSpec{
            Selector: labels,
            Ports: []corev1.ServicePort{
                {
                    Port:       myApp.Spec.Port,
                    TargetPort: intstr.FromInt(int(myApp.Spec.Port)),
                },
            },
            Type: corev1.ServiceTypeClusterIP, // Expose within the cluster
        },
    }
}

func deploymentName(myApp *stablev1.MyApplication) string {
    return fmt.Sprintf("%s-deployment", myApp.Name)
}

func serviceName(myApp *stablev1.MyApplication) string {
    return fmt.Sprintf("%s-service", myApp.Name)
}

func labelsForMyApplication(name string) map[string]string {
    return map[string]string{"app": "my-application", "myapplication_cr": name}
}

// entrypoint for the controller
func main() {
    klog.InitFlags(nil)
    defer klog.Flush()

    // Load kubeconfig
    kubeconfigPath := os.Getenv("KUBECONFIG")
    if kubeconfigPath == "" {
        home, err := os.UserHomeDir()
        if err != nil {
            klog.Fatalf("Error getting user home dir: %v", err)
        }
        kubeconfigPath = fmt.Sprintf("%s/.kube/config", home)
    }
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfigPath)
    if err != nil {
        klog.Fatalf("Error building kubeconfig: %v", err)
    }

    // Create a standard Kubernetes clientset
    kubeClientset, err := kubernetes.NewForConfig(config)
    if err != nil {
        klog.Fatalf("Error creating kubernetes clientset: %v", err)
    }

    // Create a clientset for our custom resources
    myAppClientset, err := versioned.NewForConfig(config)
    if err != nil {
        klog.Fatalf("Error creating myapp clientset: %v", err)
    }

    // Add our custom resource to the Kubernetes scheme.
    // This is important for things like OwnerReference's Kind lookup.
    _ = stablev1.AddToScheme(scheme.Scheme)

    // Create shared informer factories
    kubeInformerFactory := informers.NewSharedInformerFactory(kubeClientset, time.Second*30) // Resync every 30 seconds
    myAppInformerFactory := stableinformers.NewSharedInformerFactory(myAppClientset, time.Second*30)

    // Get informers for Deployments, Services, and our Custom Resource
    deploymentInformer := kubeInformerFactory.Apps().V1().Deployments()
    serviceInformer := kubeInformerFactory.Core().V1().Services()
    myApplicationInformer := myAppInformerFactory.Stable().V1().MyApplications()

    // Create the controller
    controller := NewController(
        kubeClientset,
        myAppClientset,
        deploymentInformer,
        serviceInformer,
        myApplicationInformer,
    )

    // Set up a channel to receive OS signals for graceful shutdown
    stopCh := make(chan struct{})
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    go func() {
        <-sigChan
        klog.Info("Received termination signal, stopping controller...")
        close(stopCh) // Signal the controller to stop
    }()

    // Start the informer factories
    kubeInformerFactory.Start(stopCh)
    myAppInformerFactory.Start(stopCh)

    // Run the controller
    if err = controller.Run(2, stopCh); err != nil { // Run with 2 workers
        klog.Fatalf("Error running controller: %v", err)
    }
}

This main.go file outlines a functional controller. Let's dissect its key components and design choices:

  1. Clientsets: The controller initializes two types of clients:
    • kubeClientset: A standard kubernetes.Clientset for interacting with built-in resources like Deployments and Services.
    • myAppClientset: Our generated custom Clientset (versioned.Interface) for interacting with MyApplication CRs.
  2. Informer Factories: We create two SharedInformerFactory instances: one for standard Kubernetes resources (kubeInformerFactory) and one for our custom resources (myAppInformerFactory). This allows us to efficiently watch both our CRs and the standard resources they manage.
  3. Informer & Lister Setup: Inside NewController, we acquire specific informer and lister instances for Deployments, Services, and MyApplications. The Listers (deploymentsLister, servicesLister, myApplicationsLister) are crucial for querying the in-memory cache, avoiding direct API server calls during reconciliation.
  4. Workqueue (workqueue.RateLimitingInterface): This is a critical component for decoupling event handling from the actual processing logic.
    • When an Add, Update, or Delete event occurs for a MyApplication (or its owned Deployment/Service), the enqueueMyApplication or handleObject functions add the resource's namespace/name key to the workqueue.
    • The RateLimitingInterface automatically handles retries with backoff for failed processing attempts, preventing a rapid-fire re-queue that could overwhelm the controller or the API server.
  5. Event Handlers:
    • myApplicationInformer.Informer().AddEventHandler(...): These handlers (e.g., enqueueMyApplication) are triggered when MyApplication CRs are created, updated, or deleted. They simply push the CR's key to the workqueue.
    • deploymentInformer.Informer().AddEventHandler(...) & serviceInformer.Informer().AddEventHandler(...): These handlers (handleObject) are more subtle. They are crucial for the "owner reference" pattern. If a Deployment or Service owned by a MyApplication changes, these handlers identify the parent MyApplication and re-enqueue it into the workqueue. This ensures that if a child resource is accidentally modified or deleted, the parent MyApplication controller will notice and reconcile it back to the desired state.
  6. Run Method:
    • Waits for all informer caches to synchronize (cache.WaitForCacheSync). This ensures the local cache is fully populated with the cluster's current state before any reconciliation begins, preventing the controller from making decisions based on incomplete data.
    • Starts multiple worker goroutines (runWorker) that will continuously pull items from the workqueue and process them.
  7. syncHandler: This is the heart of the reconciliation logic.
    • It fetches the latest MyApplication from the cache using its Lister.
    • It then checks for the existence of the Deployment and Service it expects to manage, again using Listers.
    • Reconciliation Logic:
      • If the Deployment or Service is missing, it creates them (newDeployment, newService).
      • If they exist but don't match the MyApplication's spec (e.g., image or replicas changed), it updates them.
      • Owner References: When creating Deployments and Services, they are given an OwnerReference pointing back to the MyApplication CR. This is a critical Kubernetes mechanism:
        • It tells Kubernetes that the Deployment and Service are managed by the MyApplication.
        • It enables cascading deletion: if the MyApplication is deleted, Kubernetes' garbage collector will automatically delete the owned Deployment and Service. Our cleanupChildResources is a fallback, but the OwnerReference is the primary mechanism.
        • It's also used by handleObject to find the parent MyApplication from a child resource.
    • updateMyApplicationStatus: After ensuring the desired state is achieved, the controller updates the status field of the MyApplication CR. This provides crucial feedback to users about the operational state of their custom resource. Crucially, UpdateStatus should be used instead of Update to modify only the status subresource, preserving the spec from accidental changes and leveraging the /status subresource we defined in the CRD.

This detailed implementation demonstrates the power of the reconciliation pattern combined with efficient client-go mechanisms. The controller is reactive, self-correcting, and robust, forming the foundation for complex automation scenarios within Kubernetes.

Advanced Controller Concepts and Best Practices

Building a basic functional controller is a great start, but real-world scenarios demand more sophistication. Here, we delve into advanced concepts and best practices that elevate a controller from a simple demonstrator to a production-grade, resilient, and scalable component.

1. Owner References and Finalizers: Robust Deletion Semantics

While our controller uses OwnerReferences for cascading deletion, sometimes you need more control over the cleanup process before a resource is fully removed. This is where Finalizers come in. A finalizer is a list of strings on an object that prevents it from being deleted until all finalizers are removed.

Scenario: Imagine our MyApplication controller provisioned an external database when the MyApplication was created. If the MyApplication is deleted, we need to ensure this external database is deprovisioned before the CR object is removed from Kubernetes.

How it works: 1. When your controller creates a MyApplication (or on its first reconciliation), it adds a finalizer to the MyApplication object (e.g., finalizers: ["stable.example.com/database-cleanup"]). 2. When a user deletes the MyApplication, Kubernetes sets the metadata.deletionTimestamp field but does not remove the object because of the finalizer. 3. Your controller observes the deletionTimestamp being set on the MyApplication. 4. It performs its cleanup logic (e.g., deprovisioning the external database). 5. Once cleanup is complete, the controller removes its finalizer from the MyApplication object. 6. Kubernetes then sees no more finalizers and finally deletes the MyApplication object from etcd.

Finalizers ensure that external resources are cleaned up deterministically, preventing dangling resources or inconsistent states.

2. Leader Election for High Availability

If you deploy multiple replicas of your controller, you typically want only one instance actively performing reconciliation at any given time to avoid race conditions or duplicate work. This is achieved through Leader Election.

Kubernetes offers a built-in leader election mechanism (using Lease objects or ConfigMaps in older versions) that your controller can leverage. The client-go library provides the leader-election package. When configured, controller instances will contend for leadership. Only the elected leader will run the syncHandler and perform reconciliation. If the leader fails, another instance will automatically take over, ensuring high availability. This is crucial for maintaining a responsive and resilient control plane.

3. Context Management and Graceful Shutdown

Controllers are long-running processes. Proper context management is essential for propagating cancellation signals and ensuring graceful shutdown. Our main function uses context.WithCancel and signal.Notify to handle SIGINT and SIGTERM signals. When a signal is received, the context is cancelled, and stopCh is closed, prompting the informers and worker goroutines to shut down cleanly. This prevents data corruption or abrupt termination, especially during deployments or scaling events.

4. Error Handling and Rate Limiting

Robust error handling is paramount. * Transient Errors: Network issues, temporary API server unavailability. For these, re-queueing the item with workqueue.AddRateLimited is appropriate. The RateLimitingInterface automatically handles exponential backoff, preventing a flood of retries. * Permanent Errors: Invalid CRD spec, unrecoverable configuration errors. For these, workqueue.Forget should be called to prevent infinite retries of an unfixable item. You should log these errors prominently and potentially update the CR's status to reflect the error.

5. Testing Your Controller

A comprehensive testing strategy is vital: * Unit Tests: Test individual functions and components in isolation (e.g., newDeployment, labelsForMyApplication). * Integration Tests: Test the interaction between your controller and a mock Kubernetes API server (e.g., using k8s.io/client-go/kubernetes/fake or envtest from sigs.k8s.io/controller-runtime/pkg/envtest). This allows you to simulate CR creations/updates and assert on the resulting Kubernetes objects. * End-to-End (E2E) Tests: Deploy your controller and CRD to a real (or ephemeral) Kubernetes cluster and verify its behavior with kubectl and API calls. These are the most comprehensive but also the slowest tests.

6. Performance and Scalability Considerations

  • Efficient Reconciliation: Ensure your syncHandler is idempotent (running it multiple times produces the same result) and performs minimal work. Avoid heavy computations or long-running external calls within the critical path of reconciliation. Offload complex tasks to separate goroutines or external systems if possible.
  • Informer Resync Period: While a non-zero resync period (e.g., 30s) can act as a safety net against missed events, it also increases API server load. If your watch stream is robust, a zero resync period is generally preferred, letting events drive updates.
  • Shared Informers: Always use SharedInformerFactory to minimize watches and cache memory footprint, especially if your controller watches multiple resource types.
  • Resource Management: Monitor your controller's CPU and memory usage. Tune worker counts and resource limits for the controller's Pods to ensure it operates within defined bounds and doesn't disrupt the cluster.

7. Security: RBAC for Your Controller

Your controller will need appropriate Role-Based Access Control (RBAC) permissions to interact with the Kubernetes API. * ServiceAccount: Create a ServiceAccount for your controller Pod. * Role/ClusterRole: Define a Role (for namespaced resources) or ClusterRole (for cluster-scoped resources and CRDs) that grants the necessary get, list, watch, create, update, patch, delete permissions on the resources your controller manages (e.g., myapplications, deployments, services). * RoleBinding/ClusterRoleBinding: Bind the ServiceAccount to the Role or ClusterRole.

Granting least privilege is crucial. Only give your controller the permissions it absolutely needs.

8. Observability: Logging, Metrics, and Tracing

  • Structured Logging: Use structured logging (e.g., klog or zap) to output logs in a machine-readable format (JSON). This makes log aggregation and analysis much easier. Include relevant contextual information like resource namespace/name, operation, and error details.
  • Metrics: Expose Prometheus-compatible metrics from your controller (e.g., using github.com/prometheus/client_golang). Metrics can track:
    • Workqueue depth and processing time.
    • Reconciliation duration.
    • Number of created/updated/deleted resources.
    • Error rates.
  • Tracing: For complex controllers interacting with many resources or external systems, distributed tracing can help understand the flow of operations and identify performance bottlenecks.

These advanced considerations transform a basic controller into a production-ready system capable of handling complex operational tasks reliably and efficiently within the dynamic Kubernetes environment. They reflect the hard-earned lessons from years of operating Kubernetes at scale and are vital for anyone serious about building robust cloud-native applications.

When to Build a Custom Controller and Alternatives

Building a custom controller is a powerful way to extend Kubernetes, but it's not always the appropriate solution. Understanding when to invest in a custom controller versus exploring alternatives is crucial for effective cloud-native development.

When to Build a Custom Controller (and CRD):

  • Managing Complex, Domain-Specific Workflows: When you have a distinct concept that doesn't fit neatly into existing Kubernetes primitives (e.g., a "DatabaseInstance", an "AIModelPipeline", an "EdgeDeviceConfig").
  • Automating Operational Tasks: If you find yourself repeatedly executing a sequence of kubectl commands or manual steps to manage a particular application or infrastructure component, a controller can automate this.
  • Encapsulating Operational Knowledge: A controller allows you to encode the expertise of operating a system directly into Kubernetes, making it self-managing and reducing human error.
  • Extending Kubernetes' Core Behavior: When you need Kubernetes to react dynamically to changes in your custom resources, such as provisioning external services, integrating with third-party api gateways, or managing specific network policies.
  • Providing a Higher-Level Abstraction: For end-users or other teams, a custom resource can provide a simpler, more declarative interface than directly manipulating underlying Deployments, Services, ConfigMaps, etc.
  • Leveraging the Kubernetes Control Plane: You benefit from Kubernetes' built-in features like eventual consistency, scaling, authentication, authorization (RBAC), and OpenAPI schema validation for your custom objects.

Alternatives to a Custom Controller:

  1. Standard Kubernetes Resources with Labels/Annotations: For simpler configuration or grouping, often built-in resources combined with strategic labels and annotations are sufficient. You can select resources based on labels, and annotations can store metadata. This is the simplest approach and should always be considered first.
  2. Kubernetes Webhooks (Mutating/Validating Admission Controllers):
    • Mutating Webhooks: Can intercept resource creation/update requests and modify them before they are stored in etcd. Useful for injecting sidecar containers, defaulting fields, or transforming configurations.
    • Validating Webhooks: Can intercept resource creation/update/deletion requests and reject them if they don't meet specific criteria. Useful for enforcing complex policies that go beyond OpenAPI schema validation.
    • Distinction: Webhooks are reactive (they respond to API server requests) but don't perform continuous reconciliation or manage external state. They are part of the admission control chain, not the control loop.
  3. External Automation/Orchestration Tools: Sometimes, a traditional CI/CD pipeline, a configuration management tool (like Ansible, Terraform), or a custom script running outside Kubernetes can manage external resources. This decouples the logic from Kubernetes but loses the declarative, self-healing nature of controllers.
  4. Existing Operator Frameworks (Kubebuilder, Operator SDK): If you decide to build a controller, these frameworks significantly streamline the development process. They provide scaffolding, boilerplate code, testing utilities, and best practices, making it much easier to create production-grade operators. They abstract away much of the client-go informer setup and workqueue management, letting you focus more on the core reconciliation logic.

The Ecosystem Role of API, Gateway, and OpenAPI:

  • API: At its heart, a custom controller extends the Kubernetes API. The resources it manages are part of this extended API. The controller itself interacts with the API server, making it a powerful API consumer and, indirectly, an API provider (by managing CRs that represent higher-level APIs).
  • Gateway: Controllers often manage external services. If these services expose APIs, an api gateway becomes essential for managing access, security, routing, and traffic. A controller might, for example, watch a MyApplication CR, create a Deployment and Service for it, and then update an external api gateway (like an Ingress controller or a dedicated API management platform such as APIPark) to expose that service to external clients. This establishes a robust and secure access point.
  • OpenAPI: The OpenAPI specification is fundamental for defining and validating CRD schemas. It ensures that the structure of your custom resources is well-defined, predictable, and verifiable. Beyond CRDs, if your controller manages services that expose their own APIs, documenting these APIs with OpenAPI (formerly Swagger) and integrating with an api gateway that understands OpenAPI is a best practice for discoverability and consumer-friendliness.

In essence, a custom controller is a sophisticated tool for solving complex automation challenges within Kubernetes. It brings the power of the Kubernetes control plane to your specific domain, creating a truly extensible and intelligent infrastructure. However, like any powerful tool, it should be wielded judiciously, always weighing its benefits against simpler alternatives.

Conclusion

The journey through watching for changes to Custom Resources in Golang reveals the incredible power and flexibility of the Kubernetes platform. By extending the Kubernetes API with your own Custom Resources and implementing dedicated controllers in Golang, you transform your cluster from a generic container orchestrator into a highly specialized, domain-aware automation engine.

We've covered the foundational concepts, starting from the declarative nature of Kubernetes and the reconciliation loop that drives its self-healing capabilities. We then delved into the intricacies of client-go, the essential Golang library for interacting with the Kubernetes API, highlighting the efficiency of the Shared Informer pattern over inefficient polling. Crafting your own Custom Resource Definition with robust OpenAPI schema validation was a critical step, followed by the automated generation of type-safe client-go code that empowers your controller to interact with these custom objects naturally.

The core of our practical guide involved building a production-grade controller, demonstrating how to set up informers for both custom and standard resources, utilize rate-limiting workqueues, and implement a robust reconciliation loop. This loop intelligently observes changes, reconciles desired states with actual states, and manages lifecycle events for dependent resources using owner references and status updates. We emphasized the importance of a clean separation between the controller logic and the underlying Kubernetes API through the use of api interfaces and the proactive management of the status field.

Finally, we explored advanced considerations such as finalizers for deterministic cleanup, leader election for high availability, comprehensive error handling, and robust testing strategies. We also touched upon the critical roles of api management, api gateways, and OpenAPI specifications in the broader ecosystem, particularly how a controller might interact with or configure these elements to expose and secure the services it manages. For instance, a controller might define its application's external exposure via a Custom Resource, and an external api gateway like APIPark could then read that configuration (or be configured by the controller) to provide a unified API layer, prompt encapsulation for AI models, and comprehensive lifecycle management, thus enhancing the operational experience for both developers and consumers.

The ability to build such controllers is a cornerstone skill for anyone operating complex applications in a Kubernetes environment. It unlocks unprecedented levels of automation, reduces operational burden, and paves the way for truly intelligent, self-managing cloud-native systems. As you venture forth to build your own custom controllers, remember the principles of idempotency, eventual consistency, and robust error handling. The Kubernetes control plane is a powerful paradigm, and by mastering these techniques, you become an architect of its endless extensibility.

Frequently Asked Questions (FAQ)

1. What is the primary benefit of using Custom Resources and Golang controllers over simply deploying standard Kubernetes resources? The primary benefit lies in extending the Kubernetes API to manage domain-specific concepts naturally and automating their lifecycle. Standard resources like Deployments and Services are generic. Custom Resources allow you to define abstractions specific to your application (e.g., DatabaseInstance, AIModelDeployment), making your configuration more intuitive and declarative. Golang controllers then provide the active intelligence to interpret these custom declarations and reconcile them into concrete actions (e.g., provisioning a database, deploying an AI model, configuring an api gateway), offering a truly self-managing and self-healing system tailored to your unique needs.

2. Why are Shared Informers crucial for controller performance, and what problem do they solve? Shared Informers are crucial because they efficiently observe changes in Kubernetes resources without overloading the API Server. They solve the problem of inefficient polling. Instead of repeatedly querying the API Server for the current state (which consumes bandwidth, CPU, and can lead to missed events), Informers perform an initial LIST and then maintain a long-lived WATCH connection. They push real-time events (add, update, delete) to an in-memory cache, which your controller queries. This push-based, cached approach significantly reduces API Server load, improves response times, and ensures your controller always works with an up-to-date view of the cluster state.

3. What is the role of OpenAPI in Custom Resource Definitions (CRDs)? OpenAPI (specifically OpenAPI v3 schema) plays a vital role in CRDs by providing a robust mechanism for schema validation. When you define a CRD, you include an OpenAPI schema that specifies the structure, data types, required fields, and constraints for your custom resource's spec and status fields. This ensures that any custom resource created in the cluster adheres to a predefined contract. If a user attempts to create a malformed CR, the Kubernetes API Server will reject it immediately based on the OpenAPI schema, preventing invalid configurations from entering the system and improving data integrity.

4. How does a Golang controller ensure that child resources (like Deployments and Services) are cleaned up when their parent Custom Resource is deleted? The primary mechanism for cleaning up child resources is through OwnerReferences combined with Kubernetes' built-in garbage collector. When a controller creates a Deployment or Service on behalf of a Custom Resource (e.g., MyApplication), it sets an OwnerReference on the child resource pointing back to the parent MyApplication. This tells Kubernetes that the child resource is "owned" by the parent. When the MyApplication is deleted, the Kubernetes garbage collector automatically identifies and deletes all resources that have an OwnerReference pointing to the deleted parent. For more complex cleanup scenarios involving external systems (e.g., deprovisioning a cloud database), Finalizers are used to delay the parent CR's deletion until the controller has completed its specific cleanup tasks.

5. How can a controller integrate with an API management platform like APIPark? A controller can integrate with an API management platform like APIPark in several ways. If your Custom Resource defines the characteristics of an application or service that exposes an api (e.g., MyApplication spec might include externalApiDomain), your controller can: * Configure the Gateway: The controller could, upon creating the application's Deployment and Service, make API calls to APIPark to register the new service, define its API endpoints, set up routing rules, and apply security policies. * Manage AI Models: If your CRDs manage AI model deployments, the controller could leverage APIPark's AI Gateway capabilities to unify API formats, manage authentication, track costs, and encapsulate prompts into REST APIs. * Expose Controller's Own APIs: If your controller itself exposes an API for management or status queries, APIPark can serve as a robust api gateway to manage access, apply OpenAPI-defined schemas for documentation, and provide lifecycle governance for that controller API. This ensures consistency, security, and discoverability for all services, whether internal or external, that your controller manages or exposes.

πŸš€You can securely and efficiently call the OpenAI API on APIPark in just two steps:

Step 1: Deploy the APIPark AI gateway in 5 minutes.

APIPark is developed based on Golang, offering strong product performance and low development and maintenance costs. You can deploy APIPark with a single command line.

curl -sSO https://download.apipark.com/install/quick-start.sh; bash quick-start.sh
APIPark Command Installation Process

In my experience, you can see the successful deployment interface within 5 to 10 minutes. Then, you can log in to APIPark using your account.

APIPark System Interface 01

Step 2: Call the OpenAI API.

APIPark System Interface 02
Article Summary Image