Implementing a Controller to Watch for Changes to CRD
The realm of cloud-native infrastructure, spearheaded by Kubernetes, has fundamentally reshaped how applications are designed, deployed, and managed. At its core, Kubernetes offers a powerful declarative API that allows users to define the desired state of their systems, entrusting the platform to continuously reconcile that state. However, the intrinsic power of Kubernetes truly expands beyond its built-in resource types like Deployments, Services, and Pods. It provides a sophisticated mechanism for users to define their own custom resources, perfectly tailored to their specific domain logic and application needs. This mechanism, known as Custom Resource Definitions (CRDs), empowers developers to extend the Kubernetes API itself, transforming it into a control plane for virtually any operational concern.
While CRDs provide the blueprint for these custom resources, they are inherently passive. To bring them to life, to make Kubernetes actively manage and react to these custom definitions, we need controllers. A Kubernetes controller acts as the operational brain, constantly observing the cluster for changes to specific resource types – including our custom resources – and taking predefined actions to drive the cluster towards the desired state encapsulated within those resources. Implementing a controller to watch for changes to a CRD is not merely an advanced Kubernetes topic; it is a foundational skill for anyone looking to build robust, automated, and truly cloud-native solutions, from simple application operators to sophisticated infrastructure management systems, including those that govern the complexities of an API Gateway, an AI Gateway, or even a specialized LLM Gateway. This deep dive will explore the intricate process of building such a controller, illustrating its components, best practices, and profound implications for modern system architectures.
The journey to building a CRD controller involves understanding several interconnected concepts. First, we must grasp the essence of Custom Resources and why they are an indispensable extension point in Kubernetes. Then, we delve into the core pattern of Kubernetes controllers, dissecting the reconciliation loop and its vital components like informers and workqueues. With this theoretical foundation, we will embark on a practical exploration of implementing a controller, detailing the steps from defining the custom resource to deploying the operational controller. Finally, we will illustrate the transformative power of CRDs and controllers in managing complex systems, particularly within the burgeoning landscape of AI and API management, where dynamic configuration of an API Gateway, AI Gateway, or LLM Gateway based on custom resources can dramatically enhance operational efficiency and scalability.
Understanding Kubernetes Custom Resources (CRDs)
Kubernetes thrives on a declarative model, where you tell it what you want, and it figures out how to get there. This model is powered by its API, which exposes various resource types (like Pods, Deployments, Services) that represent parts of your application and infrastructure. Custom Resource Definitions (CRDs) are the mechanism through which you can extend this API with your own, application-specific resource types. They allow Kubernetes to become a platform for your specific domain, not just generic container orchestration.
What are CRDs and Why Are They Important?
At a fundamental level, a CRD is a Kubernetes object that defines a new type of resource in your cluster. Once a CRD is created, you can then create instances of that custom resource, just like you would create an instance of a Pod or a Deployment. These instances are called Custom Resources (CRs).
Imagine you are building a system that manages external databases. Instead of writing imperative scripts to provision databases, you could define a DatabaseInstance CRD. Each DatabaseInstance CR would then describe a desired database (e.g., kind: DatabaseInstance, spec.type: PostgreSQL, spec.version: 14, spec.storage: 100Gi). Your controller would then watch for these DatabaseInstance CRs and provision/manage the actual external PostgreSQL databases.
The primary components of a CRD definition include:
apiVersion: Specifies the API version of the CRD object itself (e.g.,apiextensions.k8s.io/v1).kind: AlwaysCustomResourceDefinition.metadata: Standard Kubernetes metadata likename. Thenamemust be in the format<plural>.<group>.spec: This is where the magic happens, defining the properties of your new resource type.group: A logical grouping for your custom resource (e.g.,example.com). This helps avoid naming collisions and organizes your APIs.names: Defines the singular, plural, short names, andkindof your custom resource. Thekindis crucial as it will be used when creating instances of your custom resource.scope: Specifies whether your custom resource isNamespaced(like Pods) orCluster(like Nodes).versions: An array defining the schema for different versions of your custom resource. Each version typically includes:name: The version string (e.g.,v1alpha1,v1).served: A boolean indicating if this version is served via the API.storage: A boolean indicating if this version is used for storing the resource in etcd. Only one version can be marked asstorage: true.schema.openAPIV3Schema: The most critical part. This is an OpenAPI v3 schema that validates the structure and types of your custom resource's.specand.statusfields. It enforces data integrity and helps API clients understand the resource's structure.
conversion: Defines strategies for converting between different API versions of your custom resource, which is essential for smooth upgrades and maintaining backward compatibility.
Why use CRDs?
- Extensibility: CRDs allow you to extend Kubernetes' capabilities beyond its built-in resource types. This is crucial for domain-specific applications and operators.
- Declarative Configuration: By defining your application's operational state as CRs, you adopt Kubernetes' declarative paradigm. You define what you want, not how to achieve it, and the controller handles the "how."
- API-driven Automation: CRDs allow you to manage complex application states through the Kubernetes API, using
kubectlor any Kubernetes client library, just like native resources. This enables powerful automation and integration with other Kubernetes tools. - GitOps Compatibility: Since CRs are standard Kubernetes objects, they can be stored in Git repositories, enabling GitOps workflows for managing application infrastructure and configuration.
- Building Custom Operators: CRDs are the cornerstone for building Kubernetes Operators, which automate the lifecycle management of complex applications, behaving like human operators but tirelessly and consistently. This is particularly relevant for managing intricate systems like an API Gateway or specialized AI Gateway, where configuration changes can be frequent and require precise orchestration.
Consider an example of a simple CRD for defining a custom AI service:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: aimodels.ai.example.com
spec:
group: ai.example.com
names:
kind: AIModel
listKind: AIModelList
plural: aimodels
singular: aimodel
shortNames:
- aim
scope: Namespaced
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
type: object
properties:
modelName:
type: string
description: The name of the AI model.
modelProvider:
type: string
description: The provider of the AI model (e.g., OpenAI, HuggingFace).
version:
type: string
description: The specific version of the AI model.
endpoint:
type: string
description: The internal endpoint where the model is served.
resourceRequests:
type: object
properties:
cpu:
type: string
memory:
type: string
accessPolicies:
type: array
items:
type: string
description: List of access policies for this AI model.
required:
- modelName
- modelProvider
- version
- endpoint
status:
type: object
properties:
state:
type: string
description: Current state of the AI model (e.g., Ready, Deploying, Failed).
observedGeneration:
type: integer
description: The most recent generation observed by the controller.
message:
type: string
description: A human-readable message about the current state.
This AIModel CRD allows users to declare their desired AI models within Kubernetes. A controller would then watch these AIModel resources and take actions: perhaps deploying a container with the specified model, or configuring an AI Gateway to route traffic to the endpoint with the defined accessPolicies. This brings a level of automation and standardization that would be difficult to achieve otherwise.
The Kubernetes Controller Pattern
While CRDs provide the extensibility, it's the controller that breathes life into them. A controller is a control loop that continuously watches the actual state of a cluster through the Kubernetes API, compares it to the desired state (as defined in resource objects like CRs), and then takes action to move the actual state closer to the desired state. This fundamental pattern is what makes Kubernetes work.
What is a Controller? The Core of Kubernetes' Automation
Every core Kubernetes component, from the Deployment controller ensuring the correct number of pods are running to the Service controller managing load balancers, is an implementation of this controller pattern. The core idea is:
- Observe: A controller constantly watches a specific set of resource types in the Kubernetes API for changes (creations, updates, deletions).
- Analyze: When a relevant change occurs, the controller retrieves the object(s) involved and compares their current state with the desired state (often defined in the
.specof the resource). - Act: Based on the analysis, the controller performs actions, typically through the Kubernetes API, to reconcile the actual state with the desired state. This could involve creating new resources, updating existing ones, or deleting obsolete ones.
This cycle is often referred to as a "reconciliation loop." It's an idempotent process, meaning that applying the same desired state multiple times yields the same result, and it can recover gracefully from transient errors.
Key Components of a Controller
Building a robust controller, especially one that watches CRD changes, involves several sophisticated components that abstract away much of the complexity of interacting with the Kubernetes API at scale.
Informer
The Informer is a critical component responsible for observing resource changes. Directly watching the Kubernetes API for every resource update would be inefficient and place undue burden on the API server. Informers solve this by:
- Listing: Performing an initial listing of all resources of a specific type.
- Watching: Establishing a long-lived connection to the Kubernetes API to receive event notifications (add, update, delete) for subsequent changes.
- Caching: Maintaining a local, in-memory cache of the observed resources. This cache (often accessed via a
Lister) allows the controller to read resource data without making repeated calls to the API server, significantly reducing API server load and improving performance. - Shared Informers: In a controller manager running multiple controllers,
SharedInformersare used. They ensure that all controllers watching the same resource type share a single informer, minimizing resource consumption and API server traffic.
When an event occurs (e.g., an AIModel CR is updated), the informer's EventHandler callbacks (AddFunc, UpdateFunc, DeleteFunc) are triggered. These handlers typically enqueue the key (namespace/name) of the affected object into a workqueue for processing.
Workqueue
The Workqueue acts as a buffer and a mechanism to decouple the event handling logic from the heavy-lifting reconciliation logic. When an informer detects a change, instead of immediately processing it, it pushes the object's key into the workqueue. The controller's reconciliation loop then pulls items from this queue for processing.
Key features of a workqueue include:
- Rate Limiting: Prevents the controller from flooding the cluster with requests during periods of high change volume or when an object is repeatedly failing reconciliation. It can delay retries for a failing item.
- Retries: If a reconciliation fails (e.g., due to a temporary network issue or a dependency not yet being ready), the item can be re-queued, often with an exponential backoff, ensuring eventual consistency without crashing the controller.
- Deduplication: If multiple events for the same object arrive rapidly, the workqueue ensures that the object is only processed once, reflecting the latest state.
Reconcile Function
This is the heart of your controller's logic. The Reconcile function is called for each item pulled from the workqueue. Its responsibility is to:
- Fetch the Desired State: Retrieve the custom resource (e.g.,
AIModel) that triggered the reconciliation from the informer's cache or directly from the API server. - Fetch the Current State: Query the Kubernetes API or other external systems to determine the actual state of the resources that should be managed by this custom resource (e.g., the actual Deployment, Service, or external database).
- Calculate the Diff: Compare the desired state (from the CR's
.spec) with the current actual state. - Take Actions: Based on the diff, perform necessary operations. This could involve:
- Creating new resources if they don't exist.
- Updating existing resources to match the desired state.
- Deleting resources that are no longer needed.
- Interacting with external systems (e.g., configuring an API Gateway or provisioning cloud resources).
- Update Status: Crucially, after taking action, the controller should update the
.statusfield of the custom resource itself. This provides feedback to the user about the actual state of the managed resources (e.g.,status.state: Ready,status.message: "Successfully deployed AI model") and tracks the observed generation to prevent unnecessary reconciliation. - Handle Errors and Requeue: If an error occurs that prevents successful reconciliation, the function should typically return a
reconcile.Resultthat indicates the item should be re-queued, possibly with a delay. If successful, it returns an emptyreconcile.Result.
Controller-Runtime and Operator SDK: Tools for Building Controllers
While it's possible to build a controller from scratch using client-go (the official Go client library for Kubernetes), it's a complex undertaking due to the intricacies of informers, workqueues, and error handling. Fortunately, powerful frameworks simplify controller development:
controller-runtime: This library provides a high-level abstraction for building Kubernetes controllers. It handles the boilerplate code for informers, workqueues, and client management, allowing developers to focus purely on the reconciliation logic. It simplifies setting up the controller manager, defining watches, and implementing theReconcilemethod.- Operator SDK: Built on top of
controller-runtime, the Operator SDK provides tools and scaffolds to accelerate the development of Kubernetes Operators. It helps generate CRD definitions, Go types from CRDs, deployment manifests, and provides testing utilities. For anyone serious about building production-grade operators, the Operator SDK is an invaluable asset.
These tools abstract away much of the underlying complexity, allowing developers to concentrate on the domain-specific logic of their controllers, thereby significantly reducing development time and potential for errors.
Step-by-Step Implementation Guide: Watching CRD Changes
Let's walk through the practical implementation of a Kubernetes controller designed to watch for changes to a Custom Resource Definition. For this guide, we'll use an example CRD called APIGatewayRoute, which will define routing rules for an API Gateway. This provides a concrete example of how CRDs and controllers can manage real-world infrastructure components like an API Gateway, an AI Gateway, or an LLM Gateway.
Our goal is to build a controller that, upon creation or update of an APIGatewayRoute custom resource, ensures a corresponding configuration is applied to an underlying API gateway system (for simplicity, we'll simulate this configuration rather than interacting with a real gateway for this guide, but the principle holds true).
A. Defining Our Custom Resource: APIGatewayRoute
First, we need to define our custom resource. This APIGatewayRoute will encapsulate the desired state of a routing configuration for our gateway. It will live under the gateway.example.com group.
Here's the YAML for our APIGatewayRoute CRD:
# apigatewayroute_crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: apigatewayroutes.gateway.example.com
spec:
group: gateway.example.com
names:
kind: APIGatewayRoute
listKind: APIGatewayRouteList
plural: apigatewayroutes
singular: apigatewayroute
shortNames:
- agr
scope: Namespaced
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
type: object
properties:
path:
type: string
description: The incoming request path to match.
pattern: "^/.*" # Must start with /
method:
type: string
description: HTTP method to match (e.g., GET, POST, ANY).
enum: ["GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS", "ANY"]
destinationService:
type: string
description: The Kubernetes service name to route traffic to (e.g., my-backend-service.default.svc.cluster.local).
destinationPort:
type: integer
description: The port on the destination service.
minimum: 1
maximum: 65535
rateLimit:
type: integer
description: Requests per second allowed for this route. 0 means no limit.
minimum: 0
authenticationRequired:
type: boolean
description: Whether authentication is required for this route.
required:
- path
- method
- destinationService
- destinationPort
status:
type: object
properties:
observedGeneration:
type: integer
description: The most recent generation observed by the controller.
state:
type: string
description: Current state of the gateway route (e.g., Ready, Pending, Failed).
message:
type: string
description: A human-readable message about the current state.
lastReconciledTime:
type: string
format: date-time
description: Timestamp of the last successful reconciliation.
To install this CRD in your cluster: kubectl apply -f apigatewayroute_crd.yaml.
Here's an example instance of an APIGatewayRoute CR:
# my-api-route.yaml
apiVersion: gateway.example.com/v1
kind: APIGatewayRoute
metadata:
name: my-backend-route
namespace: default
spec:
path: "/techblog/en/api/v1/users"
method: "GET"
destinationService: "user-service.default.svc.cluster.local"
destinationPort: 8080
rateLimit: 100
authenticationRequired: true
B. Setting Up the Controller Project
We'll use Go and controller-runtime to build our controller. Initialize a Go module:
mkdir apigateway-controller
cd apigateway-controller
go mod init gateway.example.com/apigateway-controller
go get sigs.k8s.io/controller-runtime@v0.16.0 # Or the latest stable version
go get k8s.io/apimachinery@v0.28.0 # Match your controller-runtime dependencies
Create the directory structure: apigateway-controller/api/v1/ will hold our Go types for APIGatewayRoute. apigateway-controller/controllers/ will hold our reconciler logic.
C. Generating CRD Go Types
We need Go structs that represent our APIGatewayRoute CRD. controller-gen can generate these automatically from annotations.
Create api/v1/apigatewayroute_types.go:
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// APIGatewayRouteSpec defines the desired state of APIGatewayRoute
type APIGatewayRouteSpec struct {
Path string `json:"path"`
Method string `json:"method"`
DestinationService string `json:"destinationService"`
DestinationPort int `json:"destinationPort"`
RateLimit int `json:"rateLimit,omitempty"`
AuthenticationRequired bool `json:"authenticationRequired,omitempty"`
}
// APIGatewayRouteStatus defines the observed state of APIGatewayRoute
type APIGatewayRouteStatus struct {
ObservedGeneration int64 `json:"observedGeneration,omitempty"`
State string `json:"state,omitempty"`
Message string `json:"message,omitempty"`
LastReconciledTime *metav1.Time `json:"lastReconciledTime,omitempty"`
}
// +kubebuilder:object:marker
// +kubebuilder:resource:path=apigatewayroutes,scope=Namespaced,shortName=agr
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Path",type="string",JSONPath=".spec.path",description="Incoming request path"
// +kubebuilder:printcolumn:name="Method",type="string",JSONPath=".spec.method",description="HTTP method"
// +kubebuilder:printcolumn:name="Destination",type="string",JSONPath=".spec.destinationService",description="Target K8s service"
// +kubebuilder:printcolumn:name="State",type="string",JSONPath=".status.state",description="Current state of the route"
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
// APIGatewayRoute is the Schema for the apigatewayroutes API
type APIGatewayRoute struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec APIGatewayRouteSpec `json:"spec,omitempty"`
Status APIGatewayRouteStatus `json:"status,omitempty"`
}
// +kubebuilder:object:marker
// APIGatewayRouteList contains a list of APIGatewayRoute
type APIGatewayRouteList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []APIGatewayRoute `json:"items"`
}
func init() {
SchemeBuilder.Register(&APIGatewayRoute{}, &APIGatewayRouteList{})
}
You'll also need api/v1/groupversion_info.go:
package v1
import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/controller-runtime/pkg/scheme"
)
var (
// GroupVersion is group version used to register these objects
GroupVersion = schema.GroupVersion{Group: "gateway.example.com", Version: "v1"}
// SchemeBuilder is used to add go types to the GroupVersionKind scheme
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}
// AddToScheme adds the types in this group-version to the given scheme.
AddToScheme = SchemeBuilder.AddToScheme
)
Now generate:
go get sigs.k8s.io/controller-tools/cmd/controller-gen@v0.13.0 # Or the latest compatible version
go install sigs.k8s.io/controller-tools/cmd/controller-gen@latest
controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./api/..."
This will generate zz_generated.deepcopy.go in api/v1/ and a scheme_test.go in api/v1. For hack/boilerplate.go.txt, you can just create an empty file or put a license header.
D. Initializing the Manager
The manager in controller-runtime is the orchestrator. It sets up shared caches, clients, and starts all registered controllers.
Create main.go:
package main
import (
"context"
"os"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/log/zap"
gatewayv1 "gateway.example.com/apigateway-controller/api/v1"
"gateway.example.com/apigateway-controller/controllers"
// +kubebuilder:scaffold:imports
)
var (
scheme = runtime.NewScheme()
setupLog = ctrl.Log.WithName("setup")
)
func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
utilruntime.Must(gatewayv1.AddToScheme(scheme))
// +kubebuilder:scaffold:scheme
}
func main() {
var metricsAddr string
var enableLeaderElection bool
var probeAddr string
// You would typically parse these from command line args
metricsAddr = ":8080"
probeAddr = ":8081"
enableLeaderElection = false // Set to true for HA deployments
ctrl.SetLogger(zap.New(zap.UseFlagOptions(&zap.Options{Development: true})))
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
MetricsBindAddress: metricsAddr,
Port: 9443,
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "apigateway-controller-leader-election",
// LeaderElectionReleaseOnCancel: true, // Recommended for Kubernetes 1.25+
})
if err != nil {
setupLog.Error(err, "unable to start manager")
os.Exit(1)
}
if err = (&controllers.APIGatewayRouteReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "APIGatewayRoute")
os.Exit(1)
}
// +kubebuilder:scaffold:builder
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up health check")
os.Exit(1)
}
if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil {
setupLog.Error(err, "unable to set up ready check")
os.Exit(1)
}
setupLog.Info("starting manager")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
setupLog.Error(err, "problem running manager")
os.Exit(1)
}
}
E. Implementing the Reconciler
Now, let's write the actual controller logic in controllers/apigatewayroute_controller.go.
package controllers
import (
"context"
"fmt"
"time"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
gatewayv1 "gateway.example.com/apigateway-controller/api/v1"
)
// APIGatewayRouteReconciler reconciles an APIGatewayRoute object
type APIGatewayRouteReconciler struct {
client.Client
Scheme *runtime.Scheme
}
// +kubebuilder:rbac:groups=gateway.example.com,resources=apigatewayroutes,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=gateway.example.com,resources=apigatewayroutes/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=gateway.example.com,resources=apigatewayroutes/finalizers,verbs=update
// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the APIGatewayRoute object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.16.0/pkg/reconcile
func (r *APIGatewayRouteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// Fetch the APIGatewayRoute instance
apiGatewayRoute := &gatewayv1.APIGatewayRoute{}
err := r.Get(ctx, req.NamespacedName, apiGatewayRoute)
if err != nil {
if errors.IsNotFound(err) {
// Object not found, could have been deleted after reconcile request.
// Return and don't requeue
logger.Info("APIGatewayRoute resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
// Error reading the object - requeue the request.
logger.Error(err, "Failed to get APIGatewayRoute")
return ctrl.Result{}, err
}
logger.Info("Reconciling APIGatewayRoute", "Name", apiGatewayRoute.Name, "Namespace", apiGatewayRoute.Namespace, "Spec", apiGatewayRoute.Spec)
// --- Your core business logic goes here ---
// This is where you would interact with your actual API Gateway
// For this example, we'll just log and update the status.
// Check if the route is valid (minimal check)
if !isValidRoute(apiGatewayRoute.Spec) {
logger.Error(nil, "Invalid APIGatewayRoute specification", "Spec", apiGatewayRoute.Spec)
// Update status to reflect failure
apiGatewayRoute.Status.State = "Failed"
apiGatewayRoute.Status.Message = fmt.Sprintf("Invalid route spec: path %s, method %s", apiGatewayRoute.Spec.Path, apiGatewayRoute.Spec.Method)
apiGatewayRoute.Status.ObservedGeneration = apiGatewayRoute.Generation
if updateErr := r.Status().Update(ctx, apiGatewayRoute); updateErr != nil {
logger.Error(updateErr, "Failed to update APIGatewayRoute status after validation error")
return ctrl.Result{}, updateErr
}
return ctrl.Result{}, fmt.Errorf("invalid APIGatewayRoute spec") // Requeue for potential fixes
}
// Simulate applying configuration to an API Gateway
// In a real scenario, this would involve calling the API Gateway's API
// or writing configuration files for a gateway like Kong, Envoy, Nginx.
// For instance, if this were an **AI Gateway** or **LLM Gateway** configuration,
// you might update routing rules for specific AI models, add authentication
// policies, or set rate limits based on the CRD's spec.
gatewayConfig := map[string]interface{}{
"path": apiGatewayRoute.Spec.Path,
"method": apiGatewayRoute.Spec.Method,
"target": fmt.Sprintf("http://%s:%d", apiGatewayRoute.Spec.DestinationService, apiGatewayRoute.Spec.DestinationPort),
"rateLimit": apiGatewayRoute.Spec.RateLimit,
"authenticationNeeded": apiGatewayRoute.Spec.AuthenticationRequired,
}
logger.Info("Simulating API Gateway configuration update", "Config", gatewayConfig)
// If the configuration was successfully applied to the API Gateway
// We update the APIGatewayRoute's status
apiGatewayRoute.Status.State = "Ready"
apiGatewayRoute.Status.Message = "API Gateway route successfully configured"
apiGatewayRoute.Status.ObservedGeneration = apiGatewayRoute.Generation
now := metav1.Now()
apiGatewayRoute.Status.LastReconciledTime = &now
if err := r.Status().Update(ctx, apiGatewayRoute); err != nil {
logger.Error(err, "Failed to update APIGatewayRoute status")
return ctrl.Result{}, err
}
logger.Info("APIGatewayRoute reconciliation complete")
return ctrl.Result{}, nil
}
// isValidRoute performs basic validation on the route spec
func isValidRoute(spec gatewayv1.APIGatewayRouteSpec) bool {
if spec.Path == "" || spec.DestinationService == "" || spec.DestinationPort == 0 {
return false
}
// Add more complex validation as needed
return true
}
// SetupWithManager sets up the controller with the Manager.
func (r *APIGatewayRouteReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&gatewayv1.APIGatewayRoute{}). // Watch for APIGatewayRoute objects
Complete(r)
}
F. Inside the Reconcile Loop: Business Logic and Gateway Interaction
The Reconcile function is where your controller's intelligence resides.
- Fetching the CRD instance:
r.Get(ctx, req.NamespacedName, apiGatewayRoute)retrieves theAPIGatewayRouteobject that triggered the reconciliation. If it'sIsNotFound, the object was deleted, and we simply return. - Handling Not Found: This is crucial for handling deletions. When a CR is deleted, the informer still triggers an event. If the object is no longer found in the API server, it means it's gone, and the controller can stop processing it.
- Business Logic: This is the core. For our
APIGatewayRoutecontroller:- We perform a basic
isValidRoutecheck. In a real controller, this validation would be more comprehensive. - We simulate the application of configuration to an API Gateway. This
gatewayConfigdictionary represents the data that would be sent to your chosen gateway's administrative API (e.g., REST API calls to Nginx, Kong, Istio, or a custom AI Gateway like APIPark). The controller extracts details likepath,method,destinationService,rateLimit, andauthenticationRequiredfrom theAPIGatewayRoute.Specand transforms them into the gateway's native configuration format. - CRD interaction with Gateways: This is an excellent place to illustrate the utility for
AI Gateway,API Gateway, andLLM Gateway. Imagine ourAPIGatewayRoutehad fields foraiModelName,promptTemplate, orllmVersion. The controller watching this CRD could then configure an AI Gateway to specifically route requests for a given prompt to a particular LLM version, apply rate limits specific to AI inferences, and enforce specialized authentication. A product like APIPark, which functions as an AI Gateway and API Gateway, could ingest such configurations directly, streamlining the management of hundreds of AI models through Kubernetes-native declarations. This enables a powerful declarative model for AI service management.
- We perform a basic
- Updating Status: The
APIGatewayRoute.Statusfield is updated to reflect the outcome of the reconciliation.StateandMessageprovide user-friendly feedback.ObservedGenerationensures the controller doesn't needlessly re-reconcile if only metadata changes, but the spec remains the same. TheLastReconciledTimeadds an audit trail.
G. Error Handling and Retries
- If
r.Getfails (and it's notIsNotFound), we return anerr, which tellscontroller-runtimeto requeue the request. The workqueue's rate-limiting logic will apply exponential backoff. - If our
isValidRoutecheck fails, we update the status toFailedand also return an error, triggering a requeue. This allows administrators to observe the failure and correct theAPIGatewayRoutedefinition. - If
r.Status().Updatefails, we also return an error, indicating a problem in writing back the status, and the request will be retried. - A successful reconciliation returns
ctrl.Result{}, signaling that the object is in its desired state, and no immediate requeue is needed.
H. Deployment
To deploy this controller to a Kubernetes cluster, you need several manifests:
- CRD Definition: (
apigatewayroute_crd.yaml) - already created. - RBAC: A
ServiceAccount,Role, andRoleBindingto grant the controller the necessary permissions to watchAPIGatewayRouteobjects and update their status. - Deployment: A
Deploymentobject to run your controller application.
Example RBAC (rbac.yaml):
apiVersion: v1
kind: ServiceAccount
metadata:
name: apigateway-controller-sa
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: apigateway-controller-role
namespace: default
rules:
- apiGroups: ["gateway.example.com"]
resources: ["apigatewayroutes"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["gateway.example.com"]
resources: ["apigatewayroutes/status"]
verbs: ["get", "update", "patch"]
- apiGroups: ["gateway.example.com"]
resources: ["apigatewayroutes/finalizers"]
verbs: ["update"]
# If your controller manages other K8s resources (e.g., Deployments, Services),
# you would add rules for those here.
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: apigateway-controller-rb
namespace: default
subjects:
- kind: ServiceAccount
name: apigateway-controller-sa
namespace: default
roleRef:
kind: Role
name: apigateway-controller-role
apiGroup: rbac.authorization.k8s.io
Example Deployment (deployment.yaml):
apiVersion: apps/v1
kind: Deployment
metadata:
name: apigateway-controller
namespace: default
labels:
app: apigateway-controller
spec:
replicas: 1
selector:
matchLabels:
app: apigateway-controller
template:
metadata:
labels:
app: apigateway-controller
spec:
serviceAccountName: apigateway-controller-sa
containers:
- name: controller
image: gateway.example.com/apigateway-controller:latest # Build and push your image
imagePullPolicy: Always
command: ["/techblog/en/manager"] # If you name your binary 'manager'
args:
- "--metrics-bind-address=0" # Disable default metrics port if not needed, or configure
- "--probe-bind-address=:8081"
livenessProbe:
httpGet:
path: /healthz
port: 8081
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /readyz
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 10m
memory: 64Mi
To run this: 1. Apply CRD: kubectl apply -f apigatewayroute_crd.yaml 2. Apply RBAC: kubectl apply -f rbac.yaml 3. Build your Docker image: docker build -t gateway.example.com/apigateway-controller:latest . 4. Push your image (if deploying to a remote cluster): docker push gateway.example.com/apigateway-controller:latest 5. Apply Deployment: kubectl apply -f deployment.yaml
Once deployed, apply my-api-route.yaml, and observe your controller's logs for the reconciliation process. Then, kubectl get apigatewayroute my-backend-route -o yaml will show the updated status.
Advanced Controller Concepts and Best Practices
Building a simple CRD controller is a great start, but production-grade operators often require more sophisticated patterns and careful consideration of edge cases.
Owner References
Owner references are a crucial Kubernetes mechanism for managing the lifecycle of dependent objects. When a controller creates a resource (e.g., a Deployment) on behalf of a custom resource (e.g., an APIGatewayRoute), it should set the custom resource as the owner of the created Deployment. This has two primary benefits:
- Garbage Collection: If the owner resource (the
APIGatewayRoute) is deleted, Kubernetes' garbage collector will automatically delete all its dependents (the associated Deployment, Service, etc.), ensuring proper cleanup. - Tracking: It makes it easy to see which resources are managed by which custom resource using
kubectl get <resource> -o yamland looking atmetadata.ownerReferences.
Example of setting an owner reference in Go:
// myDeployment is the Deployment object we are creating
// apiGatewayRoute is the APIGatewayRoute object (the owner)
err := ctrl.SetControllerReference(apiGatewayRoute, myDeployment, r.Scheme)
if err != nil {
// Handle error
}
// Then create/update myDeployment
Finalizers
Finalizers are special keys on an object that prevent it from being deleted until the finalizer is removed. They are typically used when a controller needs to perform cleanup operations on external resources before a Kubernetes object is truly deleted.
For example, if our APIGatewayRoute controller configures an external API Gateway instance, and the APIGatewayRoute CR is deleted, the controller needs an opportunity to tell the external gateway to remove that route. Without a finalizer, Kubernetes would delete the CR immediately, and the controller wouldn't get a chance to clean up the external resource, leading to "orphan" configurations.
Workflow with Finalizers: 1. When a controller creates an external resource or starts managing a CR that has external dependencies, it adds a unique finalizer string (e.g., gateway.example.com/finalizer) to the CR's metadata.finalizers list. 2. When a user kubectl deletes the CR, Kubernetes doesn't immediately delete it. Instead, it sets metadata.deletionTimestamp and continues to show the object as "Terminating". 3. The controller observes this "deletion in progress" state (via deletionTimestamp). It then performs the necessary external cleanup (e.g., removing the route from the API Gateway). 4. Once cleanup is complete, the controller removes its finalizer string from the metadata.finalizers list. 5. Kubernetes then sees that the deletionTimestamp is set and the finalizer list is empty, and proceeds with the final deletion of the CR.
Finalizers are critical for maintaining data consistency between Kubernetes and external systems.
Field Selectors and Label Selectors
While a controller typically watches all instances of a specific CRD, sometimes you might only want to process a subset.
- Label Selectors: You can configure a controller to only watch CRs that have specific labels. For example,
builder.Watches(&source.Kind{Type: &v1.APIGatewayRoute{}}, &handler.EnqueueRequestForObject{}, builder.WithPredicates(predicate.LabelSelectorPredicate(labels.SelectorFromSet(map[string]string{"env": "production"}))))would only reconcileAPIGatewayRouteobjects with the labelenv=production. - Field Selectors: Less commonly used for custom resources themselves, but useful for filtering built-in resources based on fields like
spec.nodeNamefor Pods.
Predicates
Predicates offer even finer-grained control over which events trigger a reconciliation. They are functions that evaluate incoming events (create, update, delete) and return true if the event should be processed by the reconciler, or false otherwise. This helps reduce unnecessary reconciliation cycles, improving controller efficiency.
Common uses for predicates: * Generation changed: Only reconcile if metadata.generation has changed (meaning .spec has changed), ignoring metadata-only updates. controller-runtime provides predicate.GenerationChangedPredicate. * Status updates only: If you have a separate controller managing status, you might ignore spec updates. * Specific field changes: Reconcile only if a particular field in the spec has changed.
Example:
import (
"sigs.k8s.io/controller-runtime/pkg/predicate"
)
// In SetupWithManager
return ctrl.NewControllerManagedBy(mgr).
For(&gatewayv1.APIGatewayRoute{}).
WithEventFilter(predicate.GenerationChangedPredicate{}). // Only reconcile if spec changes
Complete(r)
Watching Other Resources
A controller often manages other Kubernetes resources (like Deployments, Services, ConfigMaps) that are children of its primary custom resource. To ensure robust reconciliation, the controller needs to be notified if these child resources change unexpectedly.
For example, if an APIGatewayRoute controller creates a Service, and that Service is manually deleted by a user, the controller needs to know to recreate it. This is achieved by having the controller Owns these secondary resources and watches them:
// In SetupWithManager
return ctrl.NewControllerManagedBy(mgr).
For(&gatewayv1.APIGatewayRoute{}). // Primary watch
Owns(&appsv1.Deployment{}). // Watch Deployments owned by APIGatewayRoute
Owns(&corev1.Service{}). // Watch Services owned by APIGatewayRoute
Complete(r)
When a Deployment or Service owned by an APIGatewayRoute is created, updated, or deleted, the APIGatewayRoute controller will be triggered, allowing it to reconcile and correct any drift from the desired state.
Idempotency
All controller actions must be idempotent. This means that applying the same desired state multiple times should always result in the same actual state, without any unintended side effects. For example, when creating a Deployment, always specify a unique name. If the Deployment already exists with that name, the create operation should gracefully fail or be a no-op from the controller's perspective. When updating, ensure you are only changing the fields necessary. This makes controllers resilient to retries and ensures consistency.
Resource Version
The resourceVersion field in Kubernetes metadata is an opaque value used by clients to detect object changes and for optimistic concurrency control. When you update an object, you should typically provide the resourceVersion of the object you last read. If the object has been updated by another client in the meantime, the update operation will fail (due to resourceVersion mismatch), preventing data loss. This is automatically handled by client-go and controller-runtime's update methods, but it's important to understand its purpose.
Testing Controllers
Thorough testing is paramount for controllers. * Unit Tests: Test individual functions and reconciliation logic in isolation using Go's standard testing framework. * Integration Tests: Use envtest (provided by controller-runtime) to spin up a minimal Kubernetes API server and etcd instance in-memory. This allows you to test your controller against a real Kubernetes-like environment without needing a full cluster. You can create CRs, simulate changes, and assert on the resources your controller creates/updates.
The Role of CRDs and Controllers in Modern API Management and AI Infrastructure
The declarative power of CRDs combined with the continuous reconciliation of controllers offers a transformative approach to managing complex infrastructure, particularly in the rapidly evolving fields of API management and artificial intelligence.
Connecting CRDs to API Gateways
An API Gateway serves as the single entry point for all API calls, handling routing, authentication, rate limiting, and analytics. Traditionally, configuring an API Gateway involves imperative API calls or manual configuration file edits. This can become cumbersome and error-prone in dynamic environments with numerous APIs and frequent changes.
By leveraging CRDs and controllers, API Gateway configuration can be declarative and GitOps-friendly:
- Declarative API Configuration: Define
APIRoute,APIAuthenticationPolicy,APIRateLimitas CRDs. These custom resources encapsulate the desired state of specific gateway configurations. - Automated Gateway Provisioning: A controller watches these CRDs. When an
APIRouteCR is created or updated, the controller translates the CR's.specinto the native configuration language or API calls of the target API Gateway (e.g., Nginx, Envoy, Kong, Apigee, or APIPark). It then applies this configuration, ensuring the gateway always reflects the desired state defined in Kubernetes. - Version Control and Rollback: Since all configurations are defined as Kubernetes objects (YAML files), they can be versioned in Git. This enables full audit trails, easy rollbacks to previous configurations, and collaborative development using standard Git workflows.
- Self-Service and Democratization: Developers can define their API routing and policy requirements directly within their application's Kubernetes manifests. The controller automatically provisions and updates the API Gateway, reducing reliance on a centralized operations team and accelerating development cycles.
This approach transforms the API Gateway itself into an extension of the Kubernetes control plane, offering unparalleled automation and consistency.
CRDs and AI Gateways/LLM Gateways: The APIPark Advantage
The proliferation of AI models, especially large language models (LLMs), presents new challenges in terms of management, integration, and access control. An AI Gateway or LLM Gateway is a specialized form of an API Gateway designed to handle the unique demands of AI services, such as unified invocation formats, prompt management, cost tracking, and model versioning.
This is precisely where CRDs and controllers, especially in conjunction with a robust platform like APIPark, can demonstrate their immense value.
Imagine defining an AIModelEndpoint CRD:
apiVersion: ai.apipark.com/v1
kind: AIModelEndpoint
metadata:
name: my-sentiment-analysis
namespace: default
spec:
modelID: "openai-gpt-4"
version: "latest"
promptTemplate: "Analyze the sentiment of the following text: {{.text}}"
accessGroup: "team-a"
rateLimit: 100 # RPM
costTrackingEnabled: true
unifiedAPIFormat: true
status:
gatewayStatus: "Configured"
externalURL: "https://my-gateway.com/ai/sentiment-analysis"
A controller designed for APIPark would watch for changes to AIModelEndpoint CRD instances.
- Dynamic AI Model Integration: When a new
AIModelEndpointis created or updated, the controller extracts themodelID,version,promptTemplate, and other parameters from the CRD. It then communicates with APIPark's administrative API. - Unified API Invocation: APIPark, acting as the AI Gateway, would use this information to expose the specified AI model through a standardized API endpoint, abstracting away the underlying model provider's specific API. The controller ensures that the
unifiedAPIFormat: truedirective from the CRD is honored, configuring APIPark to handle the data transformation. - Prompt Encapsulation into REST API: The
promptTemplatefrom the CRD can be directly consumed by APIPark. The controller can instruct APIPark to create a new REST API endpoint (e.g.,/ai/sentiment-analysis) that, when invoked, automatically applies thepromptTemplateto the request payload before forwarding it to the actual AI model. This eliminates the need for applications to manage complex prompt engineering directly. - End-to-End API Lifecycle Management: Through such CRDs, you can manage the full lifecycle of your AI APIs – from design (in the CRD spec) to publication (by the controller configuring APIPark) to invocation (through APIPark) and eventual decommission (by deleting the CRD). APIPark's lifecycle features, such as traffic forwarding, load balancing, and versioning, would be dynamically configured by the controller based on the CRD's instructions.
- Access Control and Cost Tracking: The
accessGroupandcostTrackingEnabledfields in the CRD can directly translate to APIPark's powerful security and analytics features. The controller ensures that APIPark applies the correct access permissions and enables detailed cost tracking for eachAIModelEndpoint, allowing for granular control and visibility. APIPark also enables features like subscription approval and independent API/access permissions for each tenant, which could be configured via tenant-specific CRDs.
The synergy between CRDs, controllers, and an AI Gateway like APIPark creates an extraordinarily powerful and flexible system for managing AI services. This declarative approach, backed by Kubernetes' robust reconciliation, means:
- Automation: AI services can be provisioned, updated, and managed with minimal human intervention.
- Consistency: All AI API configurations adhere to defined standards and policies.
- Scalability: New AI models and endpoints can be introduced and scaled rapidly.
- Observability: The status of AI APIs is reflected directly in Kubernetes, making it easy to monitor and troubleshoot.
APIPark provides an open-source AI Gateway and API Management Platform that perfectly complements this CRD-driven architecture. With its ability to quickly integrate over 100+ AI models, offer a unified API format for AI invocation, and encapsulate prompts into REST APIs, APIPark can act as the configurable backend for such a controller. By defining AIModelEndpoint CRDs, a controller can effectively manage APIPark's configuration, bringing the entire AI model exposure and management workflow under Kubernetes' declarative paradigm. The platform's high performance, detailed API call logging, and powerful data analysis capabilities, all configurable through Kubernetes manifests and managed by controllers, further enhance its value proposition.
For example, a controller watching an AIModelEndpoint CR could leverage APIPark to: 1. Quickly integrate a new AI model by telling APIPark the modelID and version. 2. Enforce a unified API format for invocations via APIPark, simplifying client-side consumption, as declared in the unifiedAPIFormat field. 3. Encapsulate the promptTemplate into a dedicated REST API endpoint through APIPark's features, reducing the burden on application developers. 4. Apply rateLimit and authenticationRequired from the CRD to APIPark's traffic management rules, ensuring controlled and secure access. 5. Enable costTrackingEnabled to leverage APIPark's detailed logging and data analysis for AI model usage, offering insights into performance and expenditure.
This integration transforms Kubernetes into a powerful control plane for AI service delivery, with APIPark serving as the intelligent execution layer.
Table: Comparison of Traditional vs. CRD-Driven API/AI Gateway Management
| Feature | Traditional API/AI Gateway Management | CRD-Driven API/AI Gateway Management (with Controller like APIPark) |
|---|---|---|
| Configuration Model | Imperative (API calls, UI, manual config files) | Declarative (YAML manifests, Kubernetes API) |
| Version Control | Manual or separate tools | Git-native (GitOps), integrated with code |
| Automation | Scripting, CI/CD pipelines (imperative steps) | Kubernetes controllers (continuous reconciliation loop) |
| Scalability | Often requires manual intervention for new routes/policies | Highly automated, scales with Kubernetes resources |
| Consistency | Prone to human error, configuration drift | Enforced by controller, desired state always maintained |
| Rollbacks | Complex, manual process, potential downtime | Simple Git revert + controller reconciliation |
| Self-Service | Limited, often requires ops team intervention | Developers define policies alongside app code, controller applies |
| AI Model Management | Ad-hoc per model, provider-specific APIs | Unified CRDs for model endpoints, prompt encapsulation, routing |
| AI Gateway Specifics | Separate tools for cost, unified API, prompt mgmt | Integrated into CRD spec, managed by controller & APIPark |
| Observability | Gateway-specific dashboards | Kubernetes-native kubectl get <CRD> and logs, integrated metrics |
Case Study/Example Scenario: Dynamic AI Model Endpoint Management with APIPark
Let's expand on a concrete scenario demonstrating the power of a CRD controller for managing AI model endpoints through an AI Gateway like APIPark.
Scenario: A large enterprise develops multiple internal AI models for various business units (e.g., fraud detection, customer churn prediction, document summarization). They also consume external LLMs. Each model needs specific routing, authentication, rate limiting, and prompt engineering. Manually configuring these in a centralized API Gateway or even multiple specialized AI Gateway instances is a nightmare for consistency and agility.
Solution with CRDs, Controller, and APIPark:
- Define
AIModelConfigCRD: The platform team defines aAIModelConfigCRD (similar toAIModelEndpointabove) that specifies:modelName: Unique identifier (e.g.,fraud-v2,llm-summarizer).provider: Internal, OpenAI, HuggingFace, etc.sourceEndpoint: The internal Kubernetes Service or external URL of the actual AI model.promptTemplate: A template for requests (e.g.,{"text": "{{.input}}", "prompt": "Summarize this: {{.text}}"}).accessPolicies: Kubernetes RBAC rules or internal team IDs.rateLimits: Per-second or per-minute limits.costCode: A billing code for tracking usage.exposeAsAPI: Boolean, whether to expose via APIPark.externalPath: The desired URL path on APIPark (e.g.,/ai/summarize).
- Deploy the APIPark Controller: An
APIParkAIControlleris deployed into the Kubernetes cluster. This controller is configured to watch forAIModelConfigCRs. - Developer Action (Declarative): A development team, needing to expose their new
customer-churn-v1model, simply creates anAIModelConfigCR:yaml apiVersion: ai.apipark.com/v1 kind: AIModelConfig metadata: name: customer-churn-model namespace: development spec: modelName: "customer-churn-v1" provider: "internal" sourceEndpoint: "customer-churn-service.development.svc.cluster.local:8080" promptTemplate: "Analyze customer data for churn risk: {{.customerData}}" accessPolicies: ["dev-team-a", "sales-team"] rateLimits: 50 # RPM costCode: "BU-SALES-001" exposeAsAPI: true externalPath: "/techblog/en/ai/customer-churn" - Controller's Reconciliation Loop:
- The
APIParkAIControllerobserves the newcustomer-churn-modelAIModelConfigCR. - It reads the
specand determines the desired state. - It then makes API calls to APIPark (our AI Gateway) to configure a new route:
- Route Setup: APIPark is instructed to create a new route matching
/ai/customer-churnthat forwards requests tocustomer-churn-service.development.svc.cluster.local:8080. - Unified API Format: APIPark automatically normalizes incoming requests to a consistent format and applies the
promptTemplatebefore sending to the backend model. - Authentication/Authorization: APIPark applies the
accessPoliciesfordev-team-aandsales-team, ensuring only authorized users can invoke this AI endpoint. - Rate Limiting: APIPark configures a rate limit of 50 requests per minute for this specific route.
- Cost Tracking: APIPark enables detailed logging for this endpoint, tagging requests with
BU-SALES-001for later analysis.
- Route Setup: APIPark is instructed to create a new route matching
- Once APIPark confirms the configuration, the controller updates the
AIModelConfigCR'sstatustoReadyand populates anexternalURLfield (e.g.,https://apipark.yourcompany.com/ai/customer-churn).
- The
- LLM Gateway Specifics: For an LLM model, say
llm-summarizer-v1, theAIModelConfigCR might includemodelType: LLM,maxTokens: 500,temperature: 0.7. The controller, interacting with APIPark (which can also act as an LLM Gateway), would configure APIPark to apply these LLM-specific parameters to the invocation, ensuring consistent and controlled usage of large language models.
Benefits: * Rapid Deployment: New AI models (and even different versions of the same model with updated prompts/policies) can be exposed through the AI Gateway in minutes by simply applying a YAML file. * Consistency and Compliance: All AI APIs adhere to enterprise-defined standards, security policies, and cost tracking mandates, reducing manual errors. * Self-Service: Development teams are empowered to manage their AI API exposure without requiring manual intervention from a centralized operations team. * Observability: The status of each AI model endpoint is visible directly within Kubernetes, and APIPark provides detailed logs and analytics for every API call. * GitOps: The entire AI API configuration is version-controlled, enabling full auditability, easy rollbacks, and collaborative development.
This robust framework, leveraging Kubernetes CRDs and controllers with APIPark as the intelligent AI Gateway and API Gateway, transforms the once-complex task of managing AI service exposure into a streamlined, automated, and highly scalable process. The ability of APIPark to support over 100 AI models and provide unified API invocation makes it an ideal partner in such a CRD-driven architecture. The quick deployment and powerful data analysis features mentioned in the APIPark product description directly contribute to the success of this automated management paradigm.
Conclusion
The journey into implementing a controller to watch for changes to CRDs reveals the profound extensibility and automation potential of Kubernetes. Custom Resource Definitions empower users to mold the Kubernetes API to their specific domain, effectively turning Kubernetes into a universal control plane for any operational concern. Controllers, as the operational brain, continuously reconcile the desired state declared in these custom resources with the actual state of the cluster and any external systems.
We've dissected the core components of a controller – the informers for efficient event notification, the workqueue for robust, rate-limited processing, and the all-important reconciliation loop that drives desired-state convergence. Furthermore, we've explored advanced concepts like owner references, finalizers, and predicates, which are indispensable for building production-grade, resilient, and intelligent operators.
The real-world impact of this pattern is particularly evident in the sophisticated management of modern application infrastructure, especially for an API Gateway, an AI Gateway, or an LLM Gateway. By defining API routes, access policies, rate limits, and even AI model specific configurations (like prompt templates and model versions) as Kubernetes Custom Resources, organizations can achieve unparalleled levels of automation, consistency, and agility. The controller acts as the bridge, translating these declarative specifications into the dynamic configuration of the underlying gateway system.
This approach not only simplifies the management of complex, distributed systems but also fosters a GitOps-centric workflow, enabling version control, auditability, and collaborative development for infrastructure configurations. Products like APIPark, serving as an open-source AI Gateway and API Management Platform, perfectly complement this ecosystem by providing the robust, performant, and feature-rich backend that can be dynamically configured by these CRD-driven controllers. The synergy between Kubernetes' declarative power and specialized gateway solutions is not just an evolutionary step; it is a fundamental shift towards truly automated, intelligent, and scalable infrastructure management for the cloud-native era.
FAQ
- What is the fundamental difference between a Custom Resource Definition (CRD) and a Custom Resource (CR)? A CRD is the definition or schema for a new API extension in Kubernetes. It defines the
kind,group,scope, and schema validation for a new type of object. Once a CRD is registered with the Kubernetes API server, you can then create instances of that defined type. These instances are called Custom Resources (CRs). Think of a CRD as a class blueprint and a CR as an object created from that class. - Why do I need a Kubernetes controller to work with CRDs? CRDs are purely declarative; they define what a new type of object looks like and what desired state it represents. They are passive. A controller is the active component that watches for changes to CRs (or any Kubernetes resource), compares the desired state (from the CR's spec) with the current actual state, and then takes actions to reconcile them. Without a controller, your CRs would just sit in the Kubernetes API server without doing anything meaningful beyond storing data.
- What problem do CRDs and controllers solve for API Gateway management? For API Gateways, CRDs and controllers enable declarative, automated configuration. Instead of manually configuring routes, policies, and rate limits through a gateway's API or UI, these configurations can be defined as Kubernetes CRs. A controller watches these CRs and automatically updates the API Gateway, ensuring the gateway's state always matches the desired state defined in Kubernetes. This brings GitOps, version control, and self-service capabilities to API gateway management.
- How do CRDs and controllers benefit AI Gateway and LLM Gateway solutions like APIPark? For AI Gateways and LLM Gateways, CRDs and controllers allow for the declarative management of AI model endpoints, prompt templates, access controls, and cost tracking. An
AIModelConfigCRD, for instance, can define all parameters for exposing an AI model. A controller then watches these CRs and configures an AI Gateway (like APIPark) to expose the model with the specified settings. This automates the integration of 100+ AI models, enforces unified API formats, encapsulates prompts into REST APIs, and centralizes lifecycle management, dramatically simplifying the operations of AI services. - What are some best practices for building robust Kubernetes controllers? Key best practices include:
- Idempotency: Ensure your reconciliation logic can be safely re-run multiple times without side effects.
- Owner References: Use owner references to establish parent-child relationships for resources your controller creates, enabling automatic garbage collection.
- Finalizers: Implement finalizers for cleanup of external resources when a CR is deleted.
- Status Updates: Always update the
.statusfield of your CRs to provide feedback on the actual state of the managed resources. - Error Handling and Retries: Gracefully handle transient errors and use workqueue retries with exponential backoff.
- Observability: Ensure logging is informative, and expose metrics for monitoring.
- Testing: Write comprehensive unit and integration tests (using
envtest).
🚀You can securely and efficiently call the OpenAI API on APIPark in just two steps:
Step 1: Deploy the APIPark AI gateway in 5 minutes.
APIPark is developed based on Golang, offering strong product performance and low development and maintenance costs. You can deploy APIPark with a single command line.
curl -sSO https://download.apipark.com/install/quick-start.sh; bash quick-start.sh

In my experience, you can see the successful deployment interface within 5 to 10 minutes. Then, you can log in to APIPark using your account.

Step 2: Call the OpenAI API.

