Watch for Changes to Custom Resources Golang: A Practical Guide
APIPark is a high-performance AI gateway that allows you to securely access the most comprehensive LLM APIs globally on the APIPark platform, including OpenAI, Anthropic, Mistral, Llama2, Google Gemini, and more.Try APIPark now! πππ
Watch for Changes to Custom Resources Golang: A Practical Guide
The modern cloud-native landscape, dominated by Kubernetes, thrives on extensibility and automation. While Kubernetes provides a powerful set of built-in resources like Pods, Deployments, and Services, real-world applications often require custom abstractions to manage domain-specific concepts effectively. This is where Custom Resources (CRs) and Custom Resource Definitions (CRDs) come into play, allowing users to extend the Kubernetes API with their own resource types. However, merely defining a custom resource isn't enough; to harness its full potential, you need active components that observe these custom resources and react to their changes, orchestrating complex workflows. This is the domain of Kubernetes controllers, often written in Golang, the native language of Kubernetes itself.
This comprehensive guide will take you on a deep dive into the practical aspects of watching for changes to Custom Resources using Golang. We will unravel the intricacies of the client-go library, explore the powerful Shared Informers pattern, and walk through the step-by-step process of building a robust and efficient controller. By the end of this journey, you will possess a profound understanding of how to empower your Kubernetes clusters with intelligent automation, ensuring your custom resources are not just declarative data but active participants in your cloud infrastructure.
The Genesis of Custom Resources: Extending Kubernetes' Universe
Kubernetes, at its core, operates on a declarative model. Users declare their desired state β "I want three replicas of this application" β and Kubernetes continuously works to reconcile the actual state with the desired state. This model is exceptionally powerful, but its out-of-the-box resources might not always align perfectly with the unique abstractions required by every application or infrastructure component. Imagine you're building a platform that manages database instances, or perhaps an AI model deployment pipeline. You might want a resource type like DatabaseInstance or AIMLModel. Kubernetes, by default, doesn't know what these are.
This is precisely the problem that Custom Resource Definitions (CRDs) solve. A CRD allows you to define a new, arbitrary resource kind in your Kubernetes cluster. It's like telling Kubernetes, "Hey, there's a new type of object in town, and here's what it looks like." Once a CRD is registered, you can create instances of your custom resource (CRs) just like you would with a Pod or a Service, using standard Kubernetes tools like kubectl. These CRs are stored in the same etcd data store as native Kubernetes objects, benefit from Kubernetes' authentication and authorization mechanisms (RBAC), and can be managed with kubectl.
However, a CRD merely defines the schema; it doesn't imbue the new resource with any operational intelligence. Creating a DatabaseInstance CR doesn't magically provision a database. For that, you need a controller. A controller is a piece of software that "watches" specific resources (in our case, custom resources), observes changes (creations, updates, deletions), and takes action to bring the cluster's actual state closer to the desired state specified in the CR. This active reconciliation loop is the essence of building truly extensible and automated systems on Kubernetes. Without a controller, your custom resources are merely inert data points in the cluster's API.
The choice of Golang for writing these controllers is no accident. As the language in which Kubernetes itself is primarily written, Golang offers direct access to the client-go library, which is the official and most efficient way to interact with the Kubernetes API. Its strong typing, concurrency primitives (goroutines and channels), and robust tooling make it an ideal candidate for building high-performance, reliable, and scalable control planes that underpin the extensibility of Kubernetes.
Kubernetes Control Plane and the Reconciliation Pattern
Before diving into the code, it's crucial to grasp the fundamental architecture of how Kubernetes operates, particularly the concept of the control plane and its reconciliation loop. This understanding forms the bedrock upon which effective controllers are built.
The Kubernetes control plane comprises several key components: * API Server: The front end of the Kubernetes control plane. It exposes the Kubernetes API, which is the communication interface for users, management tools, and other cluster components. All operations on the cluster go through the API Server. * etcd: A highly available key-value store that serves as Kubernetes' backing store for all cluster data. All configuration data, state data, and metadata are stored here. * Controller Manager: A daemon that embeds the core control loops (controllers) shipped with Kubernetes. For example, the ReplicaSet controller ensures the desired number of pod replicas are running, and the Node controller manages node lifecycle. * Scheduler: Watches for newly created Pods with no assigned node and selects a node for them to run on. * kubelet: An agent that runs on each node in the cluster. It ensures that containers are running in a Pod.
The magic of Kubernetes lies in its reconciliation pattern. Each controller in the system continuously observes a specific set of resources in the cluster. It compares the "desired state" (as declared by the user in a resource's spec, or determined by other controllers) with the "actual state" of the cluster. If a discrepancy is found, the controller takes corrective actions to bring the actual state into alignment with the desired state. This loop runs constantly and autonomously, making Kubernetes a self-healing and self-managing system.
For instance, when you create a Deployment, the Deployment controller (part of the Controller Manager) notices this new resource. Its desired state is defined in the Deployment's spec. The controller then creates a ReplicaSet to manage the Pods. The ReplicaSet controller then notices its desired state (number of replicas) and proceeds to create Pods. The Scheduler then places these Pods on nodes, and the kubelet ensures the containers run. If a Pod crashes, the ReplicaSet controller notices the actual state (fewer Pods than desired) and creates a new one. This entire chain of events is driven by controllers constantly watching and reconciling.
When we build a custom controller for our Custom Resources, we are essentially extending this very same powerful reconciliation pattern. Our controller will watch for changes to our MyApplication CR, determine the desired state from its spec, and then interact with the Kubernetes API (e.g., creating Deployments, Services, ConfigMaps) or external systems (e.g., provisioning a database, updating an external api gateway) to achieve that desired state. This design philosophy is what makes Kubernetes so robust and extensible, allowing developers to automate complex operational tasks for any kind of workload.
Deep Dive into client-go: The Golang Interface to Kubernetes
To interact with the Kubernetes API from a Golang application, the k8s.io/client-go library is your essential toolkit. This library provides a set of powerful clients and utilities designed to simplify communication with the Kubernetes API Server, handle authentication, manage caching, and process events. Understanding its structure and key components is foundational to building any Kubernetes-aware application in Golang, especially controllers.
The client-go library isn't just a single, monolithic client; it's a collection of specialized clients, each tailored for different interaction patterns:
- Clientset (
kubernetes.Clientset):- This is the most common entry point for interacting with standard Kubernetes resources like Pods, Deployments, Services, ConfigMaps, etc.
- A
Clientsetaggregates clients for all built-in API groups (e.g.,core/v1,apps/v1,networking.k8s.io/v1). - When you need to perform CRUD (Create, Read, Update, Delete) operations on these well-known resources, the
Clientsetis your go-to. - Example:
clientset.AppsV1().Deployments("namespace").Create(ctx, deployment, metav1.CreateOptions{})
- Discovery Client (
discovery.DiscoveryClient):- Used to discover the resources and API groups supported by the Kubernetes API Server.
- You might use this if your application needs to dynamically adapt to different Kubernetes versions or custom resources that might or might not be present.
- It helps determine what API versions are available for a given resource type.
- Dynamic Client (
dynamic.Interface):- This is a highly flexible client that allows you to interact with any Kubernetes resource, including Custom Resources, without having specific Go types compiled into your application.
- Instead of working with strongly typed structs (e.g.,
*appsv1.Deployment), you work withunstructured.Unstructuredobjects, which are essentiallymap[string]interface{}representations of Kubernetes objects. - The dynamic client is invaluable when dealing with Custom Resources whose Go types might not be readily available, or when building generic tools.
- Example:
dynamicClient.Resource(schema.GroupVersionResource{Group: "stable.example.com", Version: "v1", Resource: "myapplications"}).Namespace("default").Create(ctx, unstructuredObject, metav1.CreateOptions{})
- REST Client (
rest.RESTClient):- The lowest-level client provided by
client-go. It allows you to make raw HTTP requests to the Kubernetes API Server. - All other clients (Clientset, Discovery, Dynamic) are built on top of the
RESTClient. - You typically wouldn't use this directly unless you need very fine-grained control over the HTTP requests or are interacting with a very specific, non-standard Kubernetes endpoint.
- The lowest-level client provided by
To use any of these clients, you first need to obtain a rest.Config, which holds the necessary information for authenticating and connecting to the Kubernetes API Server (e.g., host, authentication tokens, CA certificates). This configuration can be loaded from a kubeconfig file (for development outside a cluster) or from the service account token mounted in a Pod (for applications running inside the cluster).
While client-go provides excellent CRUD capabilities, continuously polling the API Server for changes is inefficient and can put undue strain on the control plane. This is where the concept of Shared Informers becomes critical. Shared Informers offer a much more efficient and scalable way to observe resource changes, forming the backbone of almost all production-grade Kubernetes controllers. They introduce a caching layer and event-driven mechanisms that significantly reduce API server load and simplify controller logic.
Shared Informers: Efficiently Observing Custom Resources
Continuously polling the Kubernetes API Server for changes to resources is not only inefficient but also problematic. It consumes excessive network bandwidth, puts unnecessary load on the API Server and etcd, and can lead to race conditions or missed events. Imagine trying to manage hundreds or thousands of custom resources by constantly querying their state β it quickly becomes unmanageable. The solution to this challenge, and the cornerstone of any robust Kubernetes controller, is the Shared Informer pattern provided by the client-go library.
Shared Informers efficiently observe resources by combining two key mechanisms:
- List-Watch Pattern: Instead of just polling, an informer first performs a
LISToperation to get the current state of all relevant resources. It then establishes a persistentWATCHconnection to the API Server. The API Server pushes real-time notifications (events) to the informer whenever a resource is created, updated, or deleted. This push-based model is far more efficient than polling. - In-memory Cache: The informer maintains a local, in-memory cache of the resources it's watching. This cache is kept up-to-date by the events received from the
WATCHstream. When a controller needs to retrieve a resource, it queries this local cache instead of hitting the API Server directly, drastically reducing API calls and improving performance.
Let's break down the core components that make up a Shared Informer:
- Reflector: This component is responsible for the actual list-watch operation. It lists resources from the API Server and then maintains a watch connection, pushing received events into a queue.
- DeltaFIFO: A buffered queue that stores events (add, update, delete) from the Reflector. It also handles event coalescing, ensuring that only the latest state of a resource is processed if multiple events for the same resource arrive quickly.
- Indexer: The in-memory store that holds the actual resource objects. It uses the
DeltaFIFOto update its cache based on incoming events. TheIndexersupports efficient lookups by key (namespace/name) and can also build custom indices for more complex queries (e.g., by label). - Lister: Built on top of the
Indexer, aListerprovides a convenient interface for retrieving objects from the local cache. It offers methods likeGet()(by name) andList()(all objects or objects in a specific namespace). Crucially, theListeronly reads from the cache; it never hits the API Server. - Event Handlers (
ResourceEventHandler): These are callback functions that your controller defines to react to specific types of events:AddFunc(obj interface{}): Called when a new resource is added to the cluster.UpdateFunc(oldObj, newObj interface{}): Called when an existing resource is updated. Both the old and new states are provided.DeleteFunc(obj interface{}): Called when a resource is deleted.
The power of Shared Informers comes from the SharedInformerFactory. Instead of each controller component creating its own informer, a single SharedInformerFactory can be created for an API group or even for all API groups. This factory then manages a single set of Reflector, DeltaFIFO, and Indexer instances for each resource type. Multiple controllers can then register their ResourceEventHandler functions with the same underlying informer. This significantly reduces resource consumption (fewer API watches, smaller memory footprint for caches) and ensures consistency across different parts of your application that might be watching the same resources.
Here's a simplified illustration of the flow:
+------------------+ +-------------------+ +------------------+
| Kubernetes | | Shared Informer | | Your Controller |
| API Server | | | | |
+------------------+ +-------------------+ +------------------+
^ ^ ^ ^
| LIST/WATCH | Push | Events | Register
| (initial state) | Events| (Add/Update/Del)| Handlers
| | | |
v v v v
+------------------+ +-------------------+ +------------------+
| API Server ------|----->| Reflector ------->| DeltaFIFO --------->| Indexer --------->| Your Event Handler |
| | | | (Events Queued) | (In-memory Cache)| (Reconciliation Logic) |
| (Resource Events)| | | | | |
+------------------+ +-------------------+ +------------------+ +------------------+
^
| Queries
| (for current state)
v
+------------------+
| Lister (reads |
| from Indexer) |
+------------------+
This elegant pattern ensures that your controller receives timely notifications of changes, has access to an up-to-date local cache of resources, and operates efficiently without overwhelming the Kubernetes control plane. It's the standard for building reactive, high-performance applications within the Kubernetes ecosystem.
Let's look at a basic example of setting up a shared informer for standard Kubernetes resources, like Pods, to solidify this understanding before moving to custom resources. This example will demonstrate the core client-go components involved.
package main
import (
"context"
"fmt"
"os"
"os/signal"
"syscall"
"time"
corev1 "k8s.io/api/core/v1"
"k8s.io/client-go/informers"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/cache"
"k8s.io/client-go/tools/clientcmd"
"k8s.io/klog/v2"
)
func main() {
klog.InitFlags(nil)
defer klog.Flush()
// 1. Load kubeconfig
kubeconfigPath := os.Getenv("KUBECONFIG")
if kubeconfigPath == "" {
kubeconfigPath = "~/.kube/config" // Default path
}
config, err := clientcmd.BuildConfigFromFlags("", kubeconfigPath)
if err != nil {
klog.Fatalf("Error building kubeconfig: %v", err)
}
// 2. Create a Clientset
clientset, err := kubernetes.NewForConfig(config)
if err != nil {
klog.Fatalf("Error creating clientset: %v", err)
}
// 3. Create a SharedInformerFactory
// Resync period means how often the informer re-lists all objects from API server.
// 0 means no periodic full re-list; events drive updates.
// For production, a non-zero resync (e.g., 30s) can help recover from missed events,
// but generally 0 is fine if the watch stream is robust.
factory := informers.NewSharedInformerFactory(clientset, 0)
// 4. Get an Informer for Pods
podInformer := factory.Core().V1().Pods().Informer()
// 5. Register Event Handlers
podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
pod := obj.(*corev1.Pod)
klog.Infof("Pod Added: %s/%s", pod.Namespace, pod.Name)
},
UpdateFunc: func(oldObj, newObj interface{}) {
oldPod := oldObj.(*corev1.Pod)
newPod := newObj.(*corev1.Pod)
if oldPod.ResourceVersion == newPod.ResourceVersion {
// Periodic resync will send update events for the same object.
// This is only an update if the resource version changes.
return
}
klog.Infof("Pod Updated: %s/%s (old RV: %s, new RV: %s)",
newPod.Namespace, newPod.Name, oldPod.ResourceVersion, newPod.ResourceVersion)
},
DeleteFunc: func(obj interface{}) {
// Ingore the 'Tombstone' case for simplicity.
pod, ok := obj.(*corev1.Pod)
if !ok {
tombstone, ok := obj.(cache.DeletedFinalStateUnknown)
if !ok {
klog.Errorf("error decoding object, invalid type")
return
}
pod, ok = tombstone.Obj.(*corev1.Pod)
if !ok {
klog.Errorf("error decoding object tombstone, invalid type")
return
}
}
klog.Infof("Pod Deleted: %s/%s", pod.Namespace, pod.Name)
},
})
// Create a context that can be cancelled.
ctx, cancel := context.WithCancel(context.Background())
// Handle OS signals for graceful shutdown
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-sigChan
klog.Info("Received termination signal, shutting down gracefully...")
cancel() // Signal the context to cancel
}()
// 6. Start the Informers
// This will start the Reflector, DeltaFIFO, and Indexer for all informers in the factory.
factory.Start(ctx.Done())
// 7. Wait for all caches to be synced
// This ensures the local cache is populated before your controller starts processing events.
klog.Info("Waiting for informer caches to sync...")
if !cache.WaitForCacheSync(ctx.Done(), podInformer.HasSynced) {
klog.Fatalf("Failed to sync pod informer cache")
}
klog.Info("Informer caches synced successfully!")
// Keep the main goroutine running until context is cancelled
<-ctx.Done()
klog.Info("Controller stopped.")
}
This example lays out the fundamental blueprint for using Shared Informers. We establish a connection to Kubernetes, create an informer factory, obtain an informer for Pods, register our desired event handlers, start the factory, and wait for its caches to synchronize. Once synced, our AddFunc, UpdateFunc, and DeleteFunc will be invoked whenever a Pod changes in the cluster. This forms the reactive core of our controller. The next logical step is to apply this pattern to our own Custom Resources.
Crafting Your Own Custom Resource Definition (CRD)
Before we can watch a custom resource, we first need to define its schema and register it with Kubernetes. This is done through a Custom Resource Definition (CRD). A CRD tells Kubernetes about the new resource type, including its name, scope (namespaced or cluster-scoped), and importantly, its validation schema. The validation schema leverages OpenAPI v3 to enforce data integrity for your custom resources, ensuring that users provide valid configurations.
Let's design a simple custom resource to manage a fictional application. We'll call it MyApplication. This application might have a desired image, replica count, and perhaps a specific port it needs.
Here's an example of a MyApplication CRD:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: myapplications.stable.example.com
spec:
group: stable.example.com # The API group for your CRD
versions:
- name: v1 # The version of your CRD
served: true
storage: true # This version is used for storage in etcd
schema:
openAPIV3Schema: # OpenAPI v3 schema for validation
type: object
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec: # The specification for your MyApplication
type: object
properties:
image:
type: string
description: The container image to use for the application.
minLength: 1
replicas:
type: integer
description: The desired number of application replicas.
minimum: 1
default: 1
port:
type: integer
description: The port the application listens on.
minimum: 80
maximum: 65535
required:
- image
status: # The status of your MyApplication, managed by the controller
type: object
properties:
availableReplicas:
type: integer
description: The number of currently available application replicas.
conditions:
type: array
items:
type: object
properties:
type:
type: string
status:
type: string
message:
type: string
lastTransitionTime:
type: string
format: date-time
# Adding subresources to enable /status and /scale endpoints
subresources:
status: {} # Enables /status subresource, allowing updates without changing spec
# scale: # Uncomment and configure if you want Horizontal Pod Autoscaler to work
# specReplicasPath: .spec.replicas
# statusReplicasPath: .status.availableReplicas
# labelSelectorPath: .status.selector
scope: Namespaced # CRs of this type will be tied to a namespace
names:
plural: myapplications # Used in kubectl get myapplications
singular: myapplication # Used in kubectl get myapplication
kind: MyApplication # The Kind field in your CR YAML
shortNames:
- ma # Optional short name for kubectl
Let's break down key parts of this CRD:
group: stable.example.com: This defines the API group for your custom resource. It's conventional to use a domain you own to prevent naming collisions.versions: A CRD can have multiple versions.v1is our initial version.served: true: This version is exposed via the API.storage: true: This version is used to store the resource in etcd. You can only have onestorageversion.schema.openAPIV3Schema: This is crucial. It defines the structure and validation rules for your custom resource'sspecandstatusfields using theOpenAPIschema specification. This schema ensures that anyMyApplicationCR created in the cluster conforms to the expected data types and constraints (e.g.,replicasmust be an integer,imagemust be a string). This provides strong type checking and immediate feedback to users who create malformed CRs.spec: Contains the user-defined desired state.status: Contains the controller-managed actual state.
subresources.status: {}: This enables the/statussubresource. It's a best practice to separatespecupdates fromstatusupdates. A controller updates thestatusof a CR, and this subresource allows the controller to do so without requiring read-write access to the entire CR (i.e., it can't accidentally change thespec).scope: Namespaced: This meansMyApplicationresources will belong to specific namespaces, just like Pods or Deployments. You can also defineClusterscoped CRDs if your resource is global to the cluster.names: These fields define how your custom resource will be referred to within Kubernetes and bykubectl.
To deploy this CRD to your cluster, save it as myapplication-crd.yaml and run:
kubectl apply -f myapplication-crd.yaml
Once the CRD is installed, you can create instances of your custom resource:
# myapplication-example.yaml
apiVersion: stable.example.com/v1
kind: MyApplication
metadata:
name: my-first-app
namespace: default
spec:
image: "nginx:latest"
replicas: 3
port: 8080
---
apiVersion: stable.example.com/v1
kind: MyApplication
metadata:
name: another-app
namespace: my-dev-namespace
spec:
image: "ubuntu/apache2:latest"
replicas: 2
port: 80
Apply these examples:
kubectl apply -f myapplication-example.yaml
You can now use kubectl get myapplications, kubectl describe myapplication my-first-app, etc., just like any built-in resource. However, as noted earlier, nothing happens yet when you create these. That's the controller's job.
Generating Client-go Code for Custom Resources
With our CRD defined, the next step in building our Golang controller is to generate the necessary client-go compatible code for our custom resource. While the dynamic client can interact with arbitrary resources, working with strongly typed Go structs for your custom resources is generally preferred for type safety, code readability, and leveraging Go's compiler for error detection.
The k8s.io/code-generator project provides a set of tools to automatically generate this client code based on your custom resource's Go type definitions. This includes:
client-gen: Generates aClientsetspecific to your custom API group, allowing you to interact with your custom resources directly (e.g.,clientset.StableV1().MyApplications(...)).lister-gen: GeneratesListerinterfaces for your custom resources, enabling efficient read access to the informer's local cache.informer-gen: GeneratesInformerinterfaces andSharedInformerFactoryimplementations for your custom resources, completing the efficient watch pattern.deepcopy-gen: GeneratesDeepCopymethods for your custom Go types, essential for safe manipulation of Kubernetes objects (which are often passed by value or reference) and avoiding unintended side effects in your controller.
To use these generators, you typically need to define your custom resource's Go types in a specific directory structure and add special // +kubebuilder or // +genclient comments (known as "markers") to instruct the generators.
Let's establish the directory structure and define our MyApplication Go types:
my-custom-controller/
βββ main.go
βββ go.mod
βββ go.sum
βββ pkg/
βββ apis/
βββ stable/
βββ v1/
βββ doc.go
βββ types.go
pkg/apis/stable/v1/doc.go: This file contains API group and version information for code generation.
// +k8s:deepcopy-gen=package,register
// +groupName=stable.example.com
// Package v1 is the v1 version of the stable.example.com API.
package v1
+k8s:deepcopy-gen=package,register: This marker tellsdeepcopy-gento generate deep copy methods for all types in this package and to register them with the scheme.+groupName=stable.example.com: This marker specifies the API group name, matching our CRD.
pkg/apis/stable/v1/types.go: This file defines the Go structs for your MyApplication custom resource, mirroring the OpenAPI schema we defined in the CRD.
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// +genclient
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// MyApplication is the Schema for the myapplications API
type MyApplication struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec MyApplicationSpec `json:"spec,omitempty"`
Status MyApplicationStatus `json:"status,omitempty"`
}
// MyApplicationSpec defines the desired state of MyApplication
type MyApplicationSpec struct {
Image string `json:"image"`
Replicas int32 `json:"replicas"`
Port int32 `json:"port"`
}
// MyApplicationStatus defines the observed state of MyApplication
type MyApplicationStatus struct {
AvailableReplicas int32 `json:"availableReplicas"`
Conditions []MyApplicationCondition `json:"conditions,omitempty"`
}
// MyApplicationConditionType is a valid value for MyApplicationCondition.Type
type MyApplicationConditionType string
// These are valid conditions of a MyApplication.
const (
// MyApplicationReady means the MyApplication is ready.
MyApplicationReady MyApplicationConditionType = "Ready"
// MyApplicationProgressing means the MyApplication is in progress.
MyApplicationProgressing MyApplicationConditionType = "Progressing"
// MyApplicationFailed means the MyApplication has failed.
MyApplicationFailed MyApplicationConditionType = "Failed"
)
// MyApplicationCondition describes the state of a MyApplication at a certain point.
type MyApplicationCondition struct {
Type MyApplicationConditionType `json:"type"`
Status metav1.ConditionStatus `json:"status"` // Can be True, False, Unknown
LastTransitionTime metav1.Time `json:"lastTransitionTime,omitempty"`
Reason string `json:"reason,omitempty"`
Message string `json:"message,omitempty"`
}
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// MyApplicationList contains a list of MyApplication
type MyApplicationList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []MyApplication `json:"items"`
}
+genclient: This marker indicates thatclient-genshould generate a client for theMyApplicationtype.+k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object: This ensuresdeepcopy-gengenerates the necessary interfaces and methods forMyApplicationandMyApplicationListto be considered Kubernetes objects.- The Go struct fields (
Image,Replicas,Port, etc.) correspond to thepropertiesdefined in your CRD'sOpenAPIschema, includingjsontags for serialization/deserialization.
Now, we need to set up the code-generator scripts. A common practice is to have a hack/update-codegen.sh script:
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
# Define package paths
MY_ORG="example.com"
MY_PKG="my-custom-controller"
APIS_PKG="${MY_PKG}/pkg/apis"
# This is the path to the Kubernetes code-generator repository, e.g., k8s.io/code-generator
K8S_CODEGEN_PKG=$(go env GOPATH)/pkg/mod/k8s.io/code-generator@v0.29.0 # Adjust version as needed
if [[ ! -d "${K8S_CODEGEN_PKG}" ]]; then
echo "ERROR: k8s.io/code-generator not found at ${K8S_CODEGEN_PKG}. Please run 'go mod download k8s.io/code-generator@v0.29.0'."
exit 1
fi
# Define the output base directory for generated files
OUTPUT_BASE=$(dirname "${BASH_SOURCE[0]}")/..
# Path to the code-generator scripts
CODEGEN_SCRIPT_ROOT="${K8S_CODEGEN_PKG}"
echo "Generating deepcopy functions..."
# --output-base defines where the generated files will be placed.
# --go-header-file ensures a consistent header in generated files.
# --output-package points to the root of your Go project.
${CODEGEN_SCRIPT_ROOT}/generate-groups.sh all \
"${APIS_PKG}/clientset" \
"${APIS_PKG}" \
"stable:v1" \
--output-base "${OUTPUT_BASE}" \
--go-header-file "${CODEGEN_SCRIPT_ROOT}/hack/boilerplate.go.txt"
echo "Code generation complete."
Make sure you replace v0.29.0 with the version of k8s.io/code-generator that matches your client-go version in your go.mod file to avoid compatibility issues.
Initialize your Go module and download dependencies:
go mod init example.com/my-custom-controller
go get k8s.io/apimachinery@v0.29.0 k8s.io/client-go@v0.29.0 k8s.io/code-generator@v0.29.0
go mod tidy
Now, run the generation script:
chmod +x hack/update-codegen.sh
./hack/update-codegen.sh
After running this script, you will find new directories and files under pkg/apis/stable/v1/ and pkg/apis/clientset/. Specifically: * pkg/apis/stable/v1/zz_generated.deepcopy.go: Contains DeepCopy methods. * pkg/apis/clientset/: Contains your custom Clientset. * pkg/apis/listers/: Contains Lister interfaces for your custom types. * pkg/apis/informers/: Contains Informer implementations and SharedInformerFactory for your custom types.
This generated code is the bridge between your Go application and your custom resources, allowing you to use strongly typed Go objects and benefit from the efficient informer pattern. We are now ready to build the controller logic.
Implementing a Custom Resource Controller in Golang
With our CRD deployed and client-go code generated, we can now proceed to build the actual controller logic. The core responsibility of a controller is to watch for changes to MyApplication resources and then take actions to reconcile the desired state (specified in the CR's spec) with the actual state of the cluster. This involves creating, updating, or deleting other Kubernetes resources (like Deployments, Services, ConfigMaps) or interacting with external systems.
A robust controller typically follows a well-established pattern:
- Initialization: Set up Kubernetes client, informers, and a workqueue.
- Event Handling: Register event handlers (
AddFunc,UpdateFunc,DeleteFunc) to receive notifications when CRs change. These handlers typically enqueue the changed resource's key (namespace/name) into a workqueue. - Worker Pool: Start multiple worker goroutines that continuously pull items from the workqueue.
- Reconciliation Loop (
syncHandler): The heart of the controller. For each item pulled from the workqueue, this function:- Retrieves the latest state of the custom resource from the informer's cache.
- Compares the desired state (from the CR's
spec) with the actual state of related resources in the cluster. - Performs necessary actions to achieve the desired state (e.g., create a Deployment, update a Service).
- Updates the
statusfield of the custom resource to reflect the actual state. - Handles errors and retries.
Let's start building our main.go for the controller.
package main
import (
"context"
"fmt"
"os"
"os/signal"
"syscall"
"time"
"example.com/my-custom-controller/pkg/apis/clientset/versioned"
stableinformers "example.com/my-custom-controller/pkg/apis/informers/externalversions"
stablev1 "example.com/my-custom-controller/pkg/apis/stable/v1"
stablelisters "example.com/my-custom-controller/pkg/apis/listers/stable/v1"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"
"k8s.io/apimachinery/pkg/util/runtime"
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/kubernetes/scheme"
appsinformers "k8s.io/client-go/informers/apps/v1"
coreinformers "k8s.io/client-go/informers/core/v1"
appslisters "k8s.io/client-go/listers/apps/v1"
corelisters "k8s.io/client-go/listers/core/v1"
"k8s.io/client-go/tools/cache"
"k8s.io/client-go/tools/clientcmd"
"k8s.io/client-go/util/workqueue"
"k8s.io/klog/v2"
)
const controllerAgentName = "my-application-controller"
// Controller is the controller for MyApplication resources
type Controller struct {
kubeclientset kubernetes.Interface // Client for standard Kubernetes resources
myAppclientset versioned.Interface // Client for MyApplication custom resources
deploymentsLister appslisters.DeploymentLister
deploymentsSynced cache.InformerSynced
servicesLister corelisters.ServiceLister
servicesSynced cache.InformerSynced
myApplicationsLister stablelisters.MyApplicationLister
myApplicationsSynced cache.InformerSynced
workqueue workqueue.RateLimitingInterface // Workqueue for processing MyApplication keys
}
// NewController returns a new MyApplication controller
func NewController(
kubeclientset kubernetes.Interface,
myAppclientset versioned.Interface,
deploymentInformer appsinformers.DeploymentInformer,
serviceInformer coreinformers.ServiceInformer,
myApplicationInformer stableinformers.MyApplicationInformer) *Controller {
klog.Info("Setting up event handlers")
// Create the controller
controller := &Controller{
kubeclientset: kubeclientset,
myAppclientset: myAppclientset,
deploymentsLister: deploymentInformer.Lister(),
deploymentsSynced: deploymentInformer.Informer().HasSynced,
servicesLister: serviceInformer.Lister(),
servicesSynced: serviceInformer.Informer().HasSynced,
myApplicationsLister: myApplicationInformer.Lister(),
myApplicationsSynced: myApplicationInformer.Informer().HasSynced,
workqueue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "MyApplications"),
}
// Register event handlers for MyApplication resources
myApplicationInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: controller.enqueueMyApplication,
UpdateFunc: func(old, new interface{}) {
controller.enqueueMyApplication(new)
},
DeleteFunc: controller.enqueueMyApplicationForDelete,
})
// Register event handlers for Deployments created by this controller
// These handlers will re-enqueue the parent MyApplication if a child Deployment changes
deploymentInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: controller.handleObject,
UpdateFunc: func(old, new interface{}) { controller.handleObject(new) },
DeleteFunc: controller.handleObject,
})
// Register event handlers for Services created by this controller
serviceInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: controller.handleObject,
UpdateFunc: func(old, new interface{}) { controller.handleObject(new) },
DeleteFunc: controller.handleObject,
})
return controller
}
// enqueueMyApplication takes a MyApplication resource and converts it into a namespace/name
// string which is then put onto the work queue. This method should *not* be
// passed object. It is intended to be used only as a callback for a shared
// informer.
func (c *Controller) enqueueMyApplication(obj interface{}) {
var key string
var err error
if key, err = cache.MetaNamespaceKeyFunc(obj); err != nil {
runtime.HandleError(err)
return
}
c.workqueue.Add(key)
}
// enqueueMyApplicationForDelete ensures that even if a MyApplication is deleted,
// its associated objects are cleaned up.
func (c *Controller) enqueueMyApplicationForDelete(obj interface{}) {
var key string
var err error
if key, err = cache.MetaNamespaceKeyFunc(obj); err != nil {
runtime.HandleError(err)
return
}
c.workqueue.Add(key)
}
// handleObject will take any resource that belongs to a MyApplication controller.
// It will retrieve the parent MyApplication and add that MyApplication to the workqueue.
func (c *Controller) handleObject(obj interface{}) {
var object metav1.Object
var ok bool
if object, ok = obj.(metav1.Object); !ok {
tombstone, ok := obj.(cache.DeletedFinalStateUnknown)
if !ok {
runtime.HandleError(fmt.Errorf("error decoding object, invalid type"))
return
}
object, ok = tombstone.Obj.(metav1.Object)
if !ok {
runtime.HandleError(fmt.Errorf("error decoding object tombstone, invalid type"))
return
}
klog.V(4).Infof("Recovered deleted object '%s' from tombstone", object.GetName())
}
klog.V(4).Infof("Processing object: %s", object.GetName())
// Get owner reference for the object.
// If the object has an owner reference that points to a MyApplication,
// enqueue that MyApplication for reconciliation.
if ownerRef := metav1.Get ")
return
}
c.workqueue.Add(key)
return
}
}
// Run will set up the event handlers for types we are interested in, as well
// as syncing informer caches and starting workers. It will block until ctx.Done() is closed.
func (c *Controller) Run(workers int, stopCh <-chan struct{}) error {
defer runtime.HandleCrash()
defer c.workqueue.ShutDown()
klog.Info("Starting MyApplication controller")
klog.Info("Waiting for informer caches to sync")
if ok := cache.WaitForCacheSync(stopCh, c.deploymentsSynced, c.servicesSynced, c.myApplicationsSynced); !ok {
return fmt.Errorf("failed to wait for caches to sync")
}
klog.Info("Informer caches synced successfully")
for i := 0; i < workers; i++ {
go wait.Until(c.runWorker, time.Second, stopCh)
}
klog.Info("Started workers")
<-stopCh
klog.Info("Shutting down workers")
return nil
}
// runWorker is a long-running function that will continually call the
// processNextWorkItem function in order to read and process a message on the
// workqueue.
func (c *Controller) runWorker() {
for c.processNextWorkItem() {
}
}
// processNextWorkItem will read a single item from the workqueue and
// attempt to process it, by calling the syncHandler.
func (c *Controller) processNextWorkItem() bool {
obj, shutdown := c.workqueue.Get()
if shutdown {
return false
}
// We wrap this block in a func so we can defer c.workqueue.Done.
err := func(obj interface{}) error {
defer c.workqueue.Done(obj)
var key string
var ok bool
if key, ok = obj.(string); !ok {
c.workqueue.Forget(obj)
runtime.HandleError(fmt.Errorf("expected string in workqueue but got %#v", obj))
return nil
}
// Run the syncHandler, passing it the namespace/name string of the
// MyApplication resource to be synced.
if err := c.syncHandler(key); err != nil {
// Put the item back on the workqueue to handle any transient errors.
c.workqueue.AddRateLimited(key)
return fmt.Errorf("error syncing '%s': %s, requeuing", key, err.Error())
}
// If no error occurs we Forget this item so it does not get queued again until another change happens.
c.workqueue.Forget(obj)
klog.Infof("Successfully synced '%s'", key)
return nil
}(obj)
if err != nil {
runtime.HandleError(err)
return true
}
return true
}
// syncHandler compares the actual state with the desired, and attempts to
// converge the two. It fetches the MyApplication resource, then checks
// for the existence of an associated Deployment and Service, creating or
// updating them as needed.
func (c *Controller) syncHandler(key string) error {
namespace, name, err := cache.SplitMetaNamespaceKey(key)
if err != nil {
runtime.HandleError(fmt.Errorf("invalid resource key: %s", key))
return nil
}
// Get the MyApplication resource from the lister.
myApp, err := c.myApplicationsLister.MyApplications(namespace).Get(name)
if err != nil {
// The MyApplication resource may no longer exist, in which case we stop processing.
if errors.IsNotFound(err) {
klog.V(2).Infof("MyApplication '%s' in work queue no longer exists", key)
// Cleanup child resources if MyApplication is deleted.
// In a real controller, you would typically use finalizers for robust deletion.
return c.cleanupChildResources(namespace, name)
}
return err
}
// 1. Ensure Deployment exists and matches the MyApplication's spec
deployment, err := c.deploymentsLister.Deployments(myApp.Namespace).Get(deploymentName(myApp))
if errors.IsNotFound(err) {
klog.Infof("Creating new Deployment for MyApplication '%s/%s'", myApp.Namespace, myApp.Name)
deployment, err = c.kubeclientset.AppsV1().Deployments(myApp.Namespace).Create(context.TODO(), newDeployment(myApp), metav1.CreateOptions{})
} else if err != nil {
return err
} else if !metav1.Is </my_custom_controller/pkg/apis/clientset/versioned> API Management Platform](https://apipark.com/) can provide robust capabilities for managing and securing these exposed endpoints. APIPark, as an open-source AI Gateway and `API` Management Platform, offers features like unified `API` formats for various services (including AI models configured via CRs), prompt encapsulation, and comprehensive lifecycle management. This ensures that even as a controller intelligently configures underlying infrastructure components, the resultant `API`s are well-governed, discoverable, and secure for consumption.
// 3. Update the MyApplication status
return c.updateMyApplicationStatus(myApp, deployment)
}
// cleanupChildResources deletes the Deployment and Service associated with a deleted MyApplication.
func (c *Controller) cleanupChildResources(namespace, name string) error {
klog.Infof("Cleaning up child resources for deleted MyApplication '%s/%s'", namespace, name)
// Delete Deployment
deploymentName := deploymentName(&stablev1.MyApplication{ObjectMeta: metav1.ObjectMeta{Name: name}})
err := c.kubeclientset.AppsV1().Deployments(namespace).Delete(context.TODO(), deploymentName, metav1.DeleteOptions{})
if err != nil && !errors.IsNotFound(err) {
return fmt.Errorf("failed to delete Deployment '%s': %w", deploymentName, err)
}
klog.Infof("Deleted Deployment '%s/%s'", namespace, deploymentName)
// Delete Service
serviceName := serviceName(&stablev1.MyApplication{ObjectMeta: metav1.ObjectMeta{Name: name}})
err = c.kubeclientset.CoreV1().Services(namespace).Delete(context.TODO(), serviceName, metav1.DeleteOptions{})
if err != nil && !errors.IsNotFound(err) {
return fmt.Errorf("failed to delete Service '%s': %w", serviceName, err)
}
klog.Infof("Deleted Service '%s/%s'", namespace, serviceName)
return nil
}
// updateMyApplicationStatus updates the status of the MyApplication resource
func (c *Controller) updateMyApplicationStatus(myApp *stablev1.MyApplication, deployment *appsv1.Deployment) error {
// NEVER modify objects in the store. It was pulled from the store and is immutable.
// Therefore, make a copy to modify it and then reconcile it.
myAppCopy := myApp.DeepCopy()
myAppCopy.Status.AvailableReplicas = deployment.Status.AvailableReplicas
// Set a simple condition based on deployment status
condition := stablev1.MyApplicationCondition{
Type: stablev1.MyApplicationProgressing,
Status: metav1.ConditionUnknown, // Default
LastTransitionTime: metav1.Now(),
Reason: "Reconciling",
Message: "Controller is processing application state.",
}
if deployment.Status.ReadyReplicas == myApp.Spec.Replicas {
condition.Type = stablev1.MyApplicationReady
condition.Status = metav1.ConditionTrue
condition.Reason = "DeploymentReady"
condition.Message = fmt.Sprintf("Deployment has %d ready replicas.", deployment.Status.ReadyReplicas)
} else if deployment.Status.ObservedGeneration < deployment.Generation {
condition.Type = stablev1.MyApplicationProgressing
condition.Status = metav1.ConditionFalse // Still progressing, not yet ready
condition.Reason = "DeploymentInProgress"
condition.Message = "Deployment is rolling out new changes."
} else {
condition.Type = stablev1.MyApplicationFailed
condition.Status = metav1.ConditionFalse
condition.Reason = "DeploymentNotReady"
condition.Message = "Deployment is not fully ready."
}
myAppCopy.Status.Conditions = []stablev1.MyApplicationCondition{condition}
// If the Custom Resource's status is not updated,
// a controller will not be able to see the state of their objects.
_, err := c.myAppclientset.StableV1().MyApplications(myApp.Namespace).UpdateStatus(context.TODO(), myAppCopy, metav1.UpdateOptions{})
return err
}
// newDeployment creates a new Deployment for a MyApplication resource.
func newDeployment(myApp *stablev1.MyApplication) *appsv1.Deployment {
labels := labelsForMyApplication(myApp.Name)
return &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: deploymentName(myApp),
Namespace: myApp.Namespace,
OwnerReferences: []metav1.OwnerReference{
*metav1.NewControllerRef(myApp, stablev1.SchemeGroupVersion.WithKind("MyApplication")),
},
Labels: labels,
},
Spec: appsv1.DeploymentSpec{
Replicas: &myApp.Spec.Replicas,
Selector: &metav1.LabelSelector{
MatchLabels: labels,
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: labels,
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "my-application",
Image: myApp.Spec.Image,
Ports: []corev1.ContainerPort{
{
ContainerPort: myApp.Spec.Port,
},
},
},
},
},
},
},
}
}
// newService creates a new Service for a MyApplication resource.
func newService(myApp *stablev1.MyApplication) *corev1.Service {
labels := labelsForMyApplication(myApp.Name)
return &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: serviceName(myApp),
Namespace: myApp.Namespace,
OwnerReferences: []metav1.OwnerReference{
*metav1.NewControllerRef(myApp, stablev1.SchemeGroupVersion.WithKind("MyApplication")),
},
Labels: labels,
},
Spec: corev1.ServiceSpec{
Selector: labels,
Ports: []corev1.ServicePort{
{
Port: myApp.Spec.Port,
TargetPort: intstr.FromInt(int(myApp.Spec.Port)),
},
},
Type: corev1.ServiceTypeClusterIP, // Expose within the cluster
},
}
}
func deploymentName(myApp *stablev1.MyApplication) string {
return fmt.Sprintf("%s-deployment", myApp.Name)
}
func serviceName(myApp *stablev1.MyApplication) string {
return fmt.Sprintf("%s-service", myApp.Name)
}
func labelsForMyApplication(name string) map[string]string {
return map[string]string{"app": "my-application", "myapplication_cr": name}
}
// entrypoint for the controller
func main() {
klog.InitFlags(nil)
defer klog.Flush()
// Load kubeconfig
kubeconfigPath := os.Getenv("KUBECONFIG")
if kubeconfigPath == "" {
home, err := os.UserHomeDir()
if err != nil {
klog.Fatalf("Error getting user home dir: %v", err)
}
kubeconfigPath = fmt.Sprintf("%s/.kube/config", home)
}
config, err := clientcmd.BuildConfigFromFlags("", kubeconfigPath)
if err != nil {
klog.Fatalf("Error building kubeconfig: %v", err)
}
// Create a standard Kubernetes clientset
kubeClientset, err := kubernetes.NewForConfig(config)
if err != nil {
klog.Fatalf("Error creating kubernetes clientset: %v", err)
}
// Create a clientset for our custom resources
myAppClientset, err := versioned.NewForConfig(config)
if err != nil {
klog.Fatalf("Error creating myapp clientset: %v", err)
}
// Add our custom resource to the Kubernetes scheme.
// This is important for things like OwnerReference's Kind lookup.
_ = stablev1.AddToScheme(scheme.Scheme)
// Create shared informer factories
kubeInformerFactory := informers.NewSharedInformerFactory(kubeClientset, time.Second*30) // Resync every 30 seconds
myAppInformerFactory := stableinformers.NewSharedInformerFactory(myAppClientset, time.Second*30)
// Get informers for Deployments, Services, and our Custom Resource
deploymentInformer := kubeInformerFactory.Apps().V1().Deployments()
serviceInformer := kubeInformerFactory.Core().V1().Services()
myApplicationInformer := myAppInformerFactory.Stable().V1().MyApplications()
// Create the controller
controller := NewController(
kubeClientset,
myAppClientset,
deploymentInformer,
serviceInformer,
myApplicationInformer,
)
// Set up a channel to receive OS signals for graceful shutdown
stopCh := make(chan struct{})
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-sigChan
klog.Info("Received termination signal, stopping controller...")
close(stopCh) // Signal the controller to stop
}()
// Start the informer factories
kubeInformerFactory.Start(stopCh)
myAppInformerFactory.Start(stopCh)
// Run the controller
if err = controller.Run(2, stopCh); err != nil { // Run with 2 workers
klog.Fatalf("Error running controller: %v", err)
}
}
This main.go file outlines a functional controller. Let's dissect its key components and design choices:
- Clientsets: The controller initializes two types of clients:
kubeClientset: A standardkubernetes.Clientsetfor interacting with built-in resources likeDeploymentsandServices.myAppClientset: Our generated customClientset(versioned.Interface) for interacting withMyApplicationCRs.
- Informer Factories: We create two
SharedInformerFactoryinstances: one for standard Kubernetes resources (kubeInformerFactory) and one for our custom resources (myAppInformerFactory). This allows us to efficiently watch both our CRs and the standard resources they manage. - Informer & Lister Setup: Inside
NewController, we acquire specificinformerandlisterinstances forDeployments,Services, andMyApplications. TheListers(deploymentsLister,servicesLister,myApplicationsLister) are crucial for querying the in-memory cache, avoiding direct API server calls during reconciliation. - Workqueue (
workqueue.RateLimitingInterface): This is a critical component for decoupling event handling from the actual processing logic.- When an
Add,Update, orDeleteevent occurs for aMyApplication(or its ownedDeployment/Service), theenqueueMyApplicationorhandleObjectfunctions add the resource'snamespace/namekey to theworkqueue. - The
RateLimitingInterfaceautomatically handles retries with backoff for failed processing attempts, preventing a rapid-fire re-queue that could overwhelm the controller or the API server.
- When an
- Event Handlers:
myApplicationInformer.Informer().AddEventHandler(...): These handlers (e.g.,enqueueMyApplication) are triggered whenMyApplicationCRs are created, updated, or deleted. They simply push the CR's key to theworkqueue.deploymentInformer.Informer().AddEventHandler(...)&serviceInformer.Informer().AddEventHandler(...): These handlers (handleObject) are more subtle. They are crucial for the "owner reference" pattern. If aDeploymentorServiceowned by aMyApplicationchanges, these handlers identify the parentMyApplicationand re-enqueue it into theworkqueue. This ensures that if a child resource is accidentally modified or deleted, the parentMyApplicationcontroller will notice and reconcile it back to the desired state.
RunMethod:- Waits for all informer caches to synchronize (
cache.WaitForCacheSync). This ensures the local cache is fully populated with the cluster's current state before any reconciliation begins, preventing the controller from making decisions based on incomplete data. - Starts multiple worker goroutines (
runWorker) that will continuously pull items from theworkqueueand process them.
- Waits for all informer caches to synchronize (
syncHandler: This is the heart of the reconciliation logic.- It fetches the latest
MyApplicationfrom the cache using itsLister. - It then checks for the existence of the
DeploymentandServiceit expects to manage, again usingListers. - Reconciliation Logic:
- If the
DeploymentorServiceis missing, it creates them (newDeployment,newService). - If they exist but don't match the
MyApplication'sspec(e.g., image or replicas changed), it updates them. - Owner References: When creating
DeploymentsandServices, they are given anOwnerReferencepointing back to theMyApplicationCR. This is a critical Kubernetes mechanism:- It tells Kubernetes that the
DeploymentandServiceare managed by theMyApplication. - It enables cascading deletion: if the
MyApplicationis deleted, Kubernetes' garbage collector will automatically delete the ownedDeploymentandService. OurcleanupChildResourcesis a fallback, but the OwnerReference is the primary mechanism. - It's also used by
handleObjectto find the parentMyApplicationfrom a child resource.
- It tells Kubernetes that the
- If the
updateMyApplicationStatus: After ensuring the desired state is achieved, the controller updates thestatusfield of theMyApplicationCR. This provides crucial feedback to users about the operational state of their custom resource. Crucially,UpdateStatusshould be used instead ofUpdateto modify only the status subresource, preserving the spec from accidental changes and leveraging the/statussubresource we defined in the CRD.
- It fetches the latest
This detailed implementation demonstrates the power of the reconciliation pattern combined with efficient client-go mechanisms. The controller is reactive, self-correcting, and robust, forming the foundation for complex automation scenarios within Kubernetes.
Advanced Controller Concepts and Best Practices
Building a basic functional controller is a great start, but real-world scenarios demand more sophistication. Here, we delve into advanced concepts and best practices that elevate a controller from a simple demonstrator to a production-grade, resilient, and scalable component.
1. Owner References and Finalizers: Robust Deletion Semantics
While our controller uses OwnerReferences for cascading deletion, sometimes you need more control over the cleanup process before a resource is fully removed. This is where Finalizers come in. A finalizer is a list of strings on an object that prevents it from being deleted until all finalizers are removed.
Scenario: Imagine our MyApplication controller provisioned an external database when the MyApplication was created. If the MyApplication is deleted, we need to ensure this external database is deprovisioned before the CR object is removed from Kubernetes.
How it works: 1. When your controller creates a MyApplication (or on its first reconciliation), it adds a finalizer to the MyApplication object (e.g., finalizers: ["stable.example.com/database-cleanup"]). 2. When a user deletes the MyApplication, Kubernetes sets the metadata.deletionTimestamp field but does not remove the object because of the finalizer. 3. Your controller observes the deletionTimestamp being set on the MyApplication. 4. It performs its cleanup logic (e.g., deprovisioning the external database). 5. Once cleanup is complete, the controller removes its finalizer from the MyApplication object. 6. Kubernetes then sees no more finalizers and finally deletes the MyApplication object from etcd.
Finalizers ensure that external resources are cleaned up deterministically, preventing dangling resources or inconsistent states.
2. Leader Election for High Availability
If you deploy multiple replicas of your controller, you typically want only one instance actively performing reconciliation at any given time to avoid race conditions or duplicate work. This is achieved through Leader Election.
Kubernetes offers a built-in leader election mechanism (using Lease objects or ConfigMaps in older versions) that your controller can leverage. The client-go library provides the leader-election package. When configured, controller instances will contend for leadership. Only the elected leader will run the syncHandler and perform reconciliation. If the leader fails, another instance will automatically take over, ensuring high availability. This is crucial for maintaining a responsive and resilient control plane.
3. Context Management and Graceful Shutdown
Controllers are long-running processes. Proper context management is essential for propagating cancellation signals and ensuring graceful shutdown. Our main function uses context.WithCancel and signal.Notify to handle SIGINT and SIGTERM signals. When a signal is received, the context is cancelled, and stopCh is closed, prompting the informers and worker goroutines to shut down cleanly. This prevents data corruption or abrupt termination, especially during deployments or scaling events.
4. Error Handling and Rate Limiting
Robust error handling is paramount. * Transient Errors: Network issues, temporary API server unavailability. For these, re-queueing the item with workqueue.AddRateLimited is appropriate. The RateLimitingInterface automatically handles exponential backoff, preventing a flood of retries. * Permanent Errors: Invalid CRD spec, unrecoverable configuration errors. For these, workqueue.Forget should be called to prevent infinite retries of an unfixable item. You should log these errors prominently and potentially update the CR's status to reflect the error.
5. Testing Your Controller
A comprehensive testing strategy is vital: * Unit Tests: Test individual functions and components in isolation (e.g., newDeployment, labelsForMyApplication). * Integration Tests: Test the interaction between your controller and a mock Kubernetes API server (e.g., using k8s.io/client-go/kubernetes/fake or envtest from sigs.k8s.io/controller-runtime/pkg/envtest). This allows you to simulate CR creations/updates and assert on the resulting Kubernetes objects. * End-to-End (E2E) Tests: Deploy your controller and CRD to a real (or ephemeral) Kubernetes cluster and verify its behavior with kubectl and API calls. These are the most comprehensive but also the slowest tests.
6. Performance and Scalability Considerations
- Efficient Reconciliation: Ensure your
syncHandleris idempotent (running it multiple times produces the same result) and performs minimal work. Avoid heavy computations or long-running external calls within the critical path of reconciliation. Offload complex tasks to separate goroutines or external systems if possible. - Informer Resync Period: While a non-zero resync period (e.g., 30s) can act as a safety net against missed events, it also increases API server load. If your watch stream is robust, a zero resync period is generally preferred, letting events drive updates.
- Shared Informers: Always use
SharedInformerFactoryto minimize watches and cache memory footprint, especially if your controller watches multiple resource types. - Resource Management: Monitor your controller's CPU and memory usage. Tune worker counts and resource limits for the controller's Pods to ensure it operates within defined bounds and doesn't disrupt the cluster.
7. Security: RBAC for Your Controller
Your controller will need appropriate Role-Based Access Control (RBAC) permissions to interact with the Kubernetes API. * ServiceAccount: Create a ServiceAccount for your controller Pod. * Role/ClusterRole: Define a Role (for namespaced resources) or ClusterRole (for cluster-scoped resources and CRDs) that grants the necessary get, list, watch, create, update, patch, delete permissions on the resources your controller manages (e.g., myapplications, deployments, services). * RoleBinding/ClusterRoleBinding: Bind the ServiceAccount to the Role or ClusterRole.
Granting least privilege is crucial. Only give your controller the permissions it absolutely needs.
8. Observability: Logging, Metrics, and Tracing
- Structured Logging: Use structured logging (e.g.,
klogorzap) to output logs in a machine-readable format (JSON). This makes log aggregation and analysis much easier. Include relevant contextual information like resourcenamespace/name,operation, anderrordetails. - Metrics: Expose Prometheus-compatible metrics from your controller (e.g., using
github.com/prometheus/client_golang). Metrics can track:- Workqueue depth and processing time.
- Reconciliation duration.
- Number of created/updated/deleted resources.
- Error rates.
- Tracing: For complex controllers interacting with many resources or external systems, distributed tracing can help understand the flow of operations and identify performance bottlenecks.
These advanced considerations transform a basic controller into a production-ready system capable of handling complex operational tasks reliably and efficiently within the dynamic Kubernetes environment. They reflect the hard-earned lessons from years of operating Kubernetes at scale and are vital for anyone serious about building robust cloud-native applications.
When to Build a Custom Controller and Alternatives
Building a custom controller is a powerful way to extend Kubernetes, but it's not always the appropriate solution. Understanding when to invest in a custom controller versus exploring alternatives is crucial for effective cloud-native development.
When to Build a Custom Controller (and CRD):
- Managing Complex, Domain-Specific Workflows: When you have a distinct concept that doesn't fit neatly into existing Kubernetes primitives (e.g., a "DatabaseInstance", an "AIModelPipeline", an "EdgeDeviceConfig").
- Automating Operational Tasks: If you find yourself repeatedly executing a sequence of
kubectlcommands or manual steps to manage a particular application or infrastructure component, a controller can automate this. - Encapsulating Operational Knowledge: A controller allows you to encode the expertise of operating a system directly into Kubernetes, making it self-managing and reducing human error.
- Extending Kubernetes' Core Behavior: When you need Kubernetes to react dynamically to changes in your custom resources, such as provisioning external services, integrating with third-party
api gateways, or managing specific network policies. - Providing a Higher-Level Abstraction: For end-users or other teams, a custom resource can provide a simpler, more declarative interface than directly manipulating underlying Deployments, Services, ConfigMaps, etc.
- Leveraging the Kubernetes Control Plane: You benefit from Kubernetes' built-in features like eventual consistency, scaling, authentication, authorization (RBAC), and
OpenAPIschema validation for your custom objects.
Alternatives to a Custom Controller:
- Standard Kubernetes Resources with Labels/Annotations: For simpler configuration or grouping, often built-in resources combined with strategic labels and annotations are sufficient. You can select resources based on labels, and annotations can store metadata. This is the simplest approach and should always be considered first.
- Kubernetes Webhooks (Mutating/Validating Admission Controllers):
- Mutating Webhooks: Can intercept resource creation/update requests and modify them before they are stored in etcd. Useful for injecting sidecar containers, defaulting fields, or transforming configurations.
- Validating Webhooks: Can intercept resource creation/update/deletion requests and reject them if they don't meet specific criteria. Useful for enforcing complex policies that go beyond
OpenAPIschema validation. - Distinction: Webhooks are reactive (they respond to API server requests) but don't perform continuous reconciliation or manage external state. They are part of the admission control chain, not the control loop.
- External Automation/Orchestration Tools: Sometimes, a traditional CI/CD pipeline, a configuration management tool (like Ansible, Terraform), or a custom script running outside Kubernetes can manage external resources. This decouples the logic from Kubernetes but loses the declarative, self-healing nature of controllers.
- Existing Operator Frameworks (
Kubebuilder,Operator SDK): If you decide to build a controller, these frameworks significantly streamline the development process. They provide scaffolding, boilerplate code, testing utilities, and best practices, making it much easier to create production-grade operators. They abstract away much of theclient-goinformer setup and workqueue management, letting you focus more on the core reconciliation logic.
The Ecosystem Role of API, Gateway, and OpenAPI:
- API: At its heart, a custom controller extends the Kubernetes
API. The resources it manages are part of this extendedAPI. The controller itself interacts with theAPIserver, making it a powerfulAPIconsumer and, indirectly, anAPIprovider (by managing CRs that represent higher-levelAPIs). - Gateway: Controllers often manage external services. If these services expose
APIs, anapi gatewaybecomes essential for managing access, security, routing, and traffic. A controller might, for example, watch aMyApplicationCR, create aDeploymentandServicefor it, and then update an externalapi gateway(like an Ingress controller or a dedicatedAPImanagement platform such as APIPark) to expose that service to external clients. This establishes a robust and secure access point. - OpenAPI: The
OpenAPIspecification is fundamental for defining and validating CRD schemas. It ensures that the structure of your custom resources is well-defined, predictable, and verifiable. Beyond CRDs, if your controller manages services that expose their ownAPIs, documenting theseAPIs withOpenAPI(formerly Swagger) and integrating with anapi gatewaythat understandsOpenAPIis a best practice for discoverability and consumer-friendliness.
In essence, a custom controller is a sophisticated tool for solving complex automation challenges within Kubernetes. It brings the power of the Kubernetes control plane to your specific domain, creating a truly extensible and intelligent infrastructure. However, like any powerful tool, it should be wielded judiciously, always weighing its benefits against simpler alternatives.
Conclusion
The journey through watching for changes to Custom Resources in Golang reveals the incredible power and flexibility of the Kubernetes platform. By extending the Kubernetes API with your own Custom Resources and implementing dedicated controllers in Golang, you transform your cluster from a generic container orchestrator into a highly specialized, domain-aware automation engine.
We've covered the foundational concepts, starting from the declarative nature of Kubernetes and the reconciliation loop that drives its self-healing capabilities. We then delved into the intricacies of client-go, the essential Golang library for interacting with the Kubernetes API, highlighting the efficiency of the Shared Informer pattern over inefficient polling. Crafting your own Custom Resource Definition with robust OpenAPI schema validation was a critical step, followed by the automated generation of type-safe client-go code that empowers your controller to interact with these custom objects naturally.
The core of our practical guide involved building a production-grade controller, demonstrating how to set up informers for both custom and standard resources, utilize rate-limiting workqueues, and implement a robust reconciliation loop. This loop intelligently observes changes, reconciles desired states with actual states, and manages lifecycle events for dependent resources using owner references and status updates. We emphasized the importance of a clean separation between the controller logic and the underlying Kubernetes API through the use of api interfaces and the proactive management of the status field.
Finally, we explored advanced considerations such as finalizers for deterministic cleanup, leader election for high availability, comprehensive error handling, and robust testing strategies. We also touched upon the critical roles of api management, api gateways, and OpenAPI specifications in the broader ecosystem, particularly how a controller might interact with or configure these elements to expose and secure the services it manages. For instance, a controller might define its application's external exposure via a Custom Resource, and an external api gateway like APIPark could then read that configuration (or be configured by the controller) to provide a unified API layer, prompt encapsulation for AI models, and comprehensive lifecycle management, thus enhancing the operational experience for both developers and consumers.
The ability to build such controllers is a cornerstone skill for anyone operating complex applications in a Kubernetes environment. It unlocks unprecedented levels of automation, reduces operational burden, and paves the way for truly intelligent, self-managing cloud-native systems. As you venture forth to build your own custom controllers, remember the principles of idempotency, eventual consistency, and robust error handling. The Kubernetes control plane is a powerful paradigm, and by mastering these techniques, you become an architect of its endless extensibility.
Frequently Asked Questions (FAQ)
1. What is the primary benefit of using Custom Resources and Golang controllers over simply deploying standard Kubernetes resources? The primary benefit lies in extending the Kubernetes API to manage domain-specific concepts naturally and automating their lifecycle. Standard resources like Deployments and Services are generic. Custom Resources allow you to define abstractions specific to your application (e.g., DatabaseInstance, AIModelDeployment), making your configuration more intuitive and declarative. Golang controllers then provide the active intelligence to interpret these custom declarations and reconcile them into concrete actions (e.g., provisioning a database, deploying an AI model, configuring an api gateway), offering a truly self-managing and self-healing system tailored to your unique needs.
2. Why are Shared Informers crucial for controller performance, and what problem do they solve? Shared Informers are crucial because they efficiently observe changes in Kubernetes resources without overloading the API Server. They solve the problem of inefficient polling. Instead of repeatedly querying the API Server for the current state (which consumes bandwidth, CPU, and can lead to missed events), Informers perform an initial LIST and then maintain a long-lived WATCH connection. They push real-time events (add, update, delete) to an in-memory cache, which your controller queries. This push-based, cached approach significantly reduces API Server load, improves response times, and ensures your controller always works with an up-to-date view of the cluster state.
3. What is the role of OpenAPI in Custom Resource Definitions (CRDs)? OpenAPI (specifically OpenAPI v3 schema) plays a vital role in CRDs by providing a robust mechanism for schema validation. When you define a CRD, you include an OpenAPI schema that specifies the structure, data types, required fields, and constraints for your custom resource's spec and status fields. This ensures that any custom resource created in the cluster adheres to a predefined contract. If a user attempts to create a malformed CR, the Kubernetes API Server will reject it immediately based on the OpenAPI schema, preventing invalid configurations from entering the system and improving data integrity.
4. How does a Golang controller ensure that child resources (like Deployments and Services) are cleaned up when their parent Custom Resource is deleted? The primary mechanism for cleaning up child resources is through OwnerReferences combined with Kubernetes' built-in garbage collector. When a controller creates a Deployment or Service on behalf of a Custom Resource (e.g., MyApplication), it sets an OwnerReference on the child resource pointing back to the parent MyApplication. This tells Kubernetes that the child resource is "owned" by the parent. When the MyApplication is deleted, the Kubernetes garbage collector automatically identifies and deletes all resources that have an OwnerReference pointing to the deleted parent. For more complex cleanup scenarios involving external systems (e.g., deprovisioning a cloud database), Finalizers are used to delay the parent CR's deletion until the controller has completed its specific cleanup tasks.
5. How can a controller integrate with an API management platform like APIPark? A controller can integrate with an API management platform like APIPark in several ways. If your Custom Resource defines the characteristics of an application or service that exposes an api (e.g., MyApplication spec might include externalApiDomain), your controller can: * Configure the Gateway: The controller could, upon creating the application's Deployment and Service, make API calls to APIPark to register the new service, define its API endpoints, set up routing rules, and apply security policies. * Manage AI Models: If your CRDs manage AI model deployments, the controller could leverage APIPark's AI Gateway capabilities to unify API formats, manage authentication, track costs, and encapsulate prompts into REST APIs. * Expose Controller's Own APIs: If your controller itself exposes an API for management or status queries, APIPark can serve as a robust api gateway to manage access, apply OpenAPI-defined schemas for documentation, and provide lifecycle governance for that controller API. This ensures consistency, security, and discoverability for all services, whether internal or external, that your controller manages or exposes.
πYou can securely and efficiently call the OpenAI API on APIPark in just two steps:
Step 1: Deploy the APIPark AI gateway in 5 minutes.
APIPark is developed based on Golang, offering strong product performance and low development and maintenance costs. You can deploy APIPark with a single command line.
curl -sSO https://download.apipark.com/install/quick-start.sh; bash quick-start.sh

In my experience, you can see the successful deployment interface within 5 to 10 minutes. Then, you can log in to APIPark using your account.

Step 2: Call the OpenAI API.

