By apipark — 25 Nov 2025

Gloo AI Gateway: Simplify & Secure Your AI Services

gloo ai gateway

The digital landscape is undergoing a profound transformation, driven by the relentless advancement of Artificial Intelligence. From sophisticated machine learning models predicting market trends to the revolutionary capabilities of Large Language Models (LLMs) generating human-like text, images, and code, AI is no longer a futuristic concept but a present-day imperative for businesses striving for innovation and competitive advantage. However, the path to integrating and managing these powerful AI services within an enterprise infrastructure is fraught with complexities. Developers and operations teams grapple with a myriad of challenges, including securing sensitive data exchanged with AI models, ensuring high availability and performance, managing diverse AI endpoints, and controlling spiraling costs. This is where the concept of an AI Gateway emerges as a critical architectural component, providing the necessary abstraction, security, and management layer. Among the leading solutions in this burgeoning space, Gloo AI Gateway stands out as a robust, enterprise-grade platform designed to specifically address these intricate demands, offering unparalleled simplification and security for your burgeoning AI services.

The journey of AI from experimental labs to production environments has highlighted a significant gap in traditional infrastructure. While conventional API gateways have long served as the front door for RESTful APIs, the unique characteristics of AI workloads—such as prompt engineering, token-based billing, model versioning, and the critical need for data sanitization and guardrails—demand a more specialized and intelligent intermediary. Gloo AI Gateway rises to this challenge by building upon proven cloud-native technologies, extending their capabilities to cater specifically to the nuances of AI and LLM Gateway functionalities. This article will delve deep into the imperative for such a gateway, exploring its core features, architectural advantages, and the tangible benefits it delivers in streamlining the deployment, management, and securing of AI-powered applications, ultimately empowering organizations to harness the full potential of their intelligent automation initiatives without compromising on security or operational efficiency.

The AI Revolution and Its Infrastructure Demands: Navigating the New Frontier

The past few years have witnessed an explosion in AI capabilities, particularly with the advent of generative AI models. These models, often referred to as Large Language Models (LLMs) when dealing with text, have moved beyond simple classification and prediction tasks to generating creative content, summarizing complex information, assisting in code development, and even powering sophisticated chatbots that mimic human conversation. This rapid evolution and widespread adoption are not merely technological feats; they represent a paradigm shift in how businesses operate, interact with customers, and innovate. Enterprises are scrambling to integrate these powerful tools into their products and internal workflows, recognizing their potential to unlock unprecedented levels of productivity, personalization, and competitive differentiation.

However, the very power and versatility of AI, especially LLMs, introduce a new set of complex infrastructure and operational challenges that traditional systems are ill-equipped to handle. The sheer diversity of AI models available today—ranging from open-source models hosted on platforms like Hugging Face to proprietary services offered by tech giants like OpenAI, Google, and Anthropic—means that developers often need to interact with multiple APIs, each with its own quirks, data formats, authentication mechanisms, and rate limits. This fragmentation creates significant integration overhead, making it difficult to maintain a consistent and scalable approach to AI consumption.

Beyond integration, the security implications of AI are paramount. Sending sensitive enterprise data or customer information to external AI models raises serious concerns about data privacy, intellectual property protection, and compliance with regulations such as GDPR and HIPAA. The nascent field of "prompt engineering," while crucial for guiding AI behavior, also introduces new attack vectors like prompt injection, where malicious inputs can trick an AI into revealing confidential information or performing unintended actions. Without robust security controls at the perimeter, organizations risk exposing themselves to significant data breaches and reputational damage.

Furthermore, the operational aspects of managing AI services at scale present formidable hurdles. Performance and scalability are constant concerns; AI inference, especially with LLMs, can be computationally intensive and latency-sensitive. Ensuring that AI applications remain responsive under peak loads requires sophisticated traffic management, load balancing, and caching strategies. Cost management is another critical dimension, as many AI services are billed on a per-token or per-call basis, making it essential to monitor usage, enforce quotas, and optimize routing to the most cost-effective providers without sacrificing quality. Finally, gaining observability into AI interactions—understanding what prompts are being sent, how models are responding, and identifying errors or biases—is vital for debugging, auditing, and continuous improvement. Without a dedicated infrastructure layer to address these multifaceted challenges, the promise of AI can quickly turn into an operational nightmare, hindering innovation rather than accelerating it. This intricate web of demands underscores the urgent need for a specialized AI Gateway, a purpose-built solution that can mediate, secure, and optimize the flow of data to and from intelligent services.

What is an AI Gateway? The Essential Mediator for Intelligent Services

At its core, an AI Gateway serves as an intelligent intermediary layer positioned between AI-powered applications and the diverse array of AI models they consume. Conceptually, it extends the foundational principles of a traditional API Gateway but is specifically engineered to understand and manage the unique characteristics and requirements of Artificial Intelligence workloads, particularly those involving Large Language Models (LLMs). While a conventional API gateway focuses on providing a unified entry point, enforcing security, and managing traffic for general-purpose RESTful APIs, an AI Gateway adds a layer of AI-specific intelligence and functionality.

The primary role of an AI Gateway is to abstract away the inherent complexities and heterogeneity of the AI landscape. Imagine an application that needs to leverage multiple LLMs for different tasks – one for summarization, another for translation, and perhaps a third for code generation. Each of these models might have a different API, require distinct authentication tokens, and present varying latency characteristics. Without an AI Gateway, the application would be burdened with integrating directly with each of these disparate endpoints, leading to brittle code, increased development effort, and a nightmare for maintenance. The gateway steps in to provide a single, consistent interface for all AI interactions, regardless of the underlying model or provider.

Beyond this crucial abstraction, an AI Gateway imbues the AI interaction layer with critical functionalities that enhance security, performance, cost efficiency, and operational visibility. These capabilities are not merely nice-to-haves but essential for robust, scalable, and secure AI deployments in production environments.

Key Functions of an AI Gateway:

Unified Access Layer and Model Abstraction:
- The Problem: Developers face a fragmented ecosystem of AI models, each with its own API, authentication scheme, and data format. Integrating diverse models directly into applications is complex and time-consuming.
- The Solution: An AI Gateway provides a single, standardized API endpoint for all AI services. It normalizes requests and responses, allowing applications to interact with any AI model (e.g., OpenAI, Anthropic, custom fine-tuned models, open-source LLMs) through a consistent interface. This abstraction layer means that underlying AI model changes, updates, or even migrations between providers can happen transparently to the consuming application, significantly reducing development and maintenance overhead. This is particularly valuable for an LLM Gateway that needs to support multiple generative AI providers.
- Detail: This function often involves defining a common data schema for prompts and responses, translating application requests into the specific format required by the target AI model, and converting model responses back into a standardized format for the application. It acts as a universal adapter, making AI consumption as straightforward as calling a single, well-defined endpoint.
Advanced Security Enforcement:
- The Problem: AI interactions involve sensitive data (user prompts, proprietary information, generated content) and expose new attack surfaces like prompt injection. Traditional security measures are often insufficient.
- The Solution: An AI Gateway acts as a fortified perimeter for AI services. It provides robust authentication (e.g., API keys, OAuth, JWT, mTLS) and authorization (RBAC) to ensure only legitimate users and applications can access AI models. Crucially, it incorporates AI-specific security features such as Data Loss Prevention (DLP) to scan prompts and responses for sensitive information (PII, PCI, PHI) and redact or block them before they leave the enterprise boundary. It also implements prompt injection prevention mechanisms, analyzing incoming prompts for malicious patterns or attempts to manipulate the AI's behavior, acting as a specialized Web Application Firewall (WAF) for AI.
- Detail: Security policies can be granularly applied based on user roles, application identities, or the sensitivity level of the data involved. For example, specific AI models might only be accessible to certain internal teams, or prompts containing customer PII might be automatically sanitized before being sent to an external LLM.
Intelligent Traffic Management and Resilience:
- The Problem: AI workloads can be unpredictable, with fluctuating demand and varying performance characteristics across models. Ensuring high availability, low latency, and efficient resource utilization is challenging.
- The Solution: The gateway provides sophisticated traffic management capabilities tailored for AI. This includes intelligent routing (e.g., routing requests based on model availability, cost, performance, or specific prompt characteristics), load balancing across multiple instances of an AI model or across different AI providers, and implementing resilience patterns like circuit breakers, retries, and timeouts to prevent cascading failures. It can also manage versioning for AI models, enabling safe canary releases or A/B testing of new models or prompt engineering strategies.
- Detail: For instance, if an LLM provider experiences degraded performance, the gateway can automatically failover to an alternative provider or a locally hosted model. It can also route requests for cheaper, smaller models for less complex tasks, reserving more powerful (and expensive) models for critical or complex queries.
Comprehensive Observability and Analytics:
- The Problem: Understanding how AI models are being used, their performance, and identifying errors or problematic prompts is difficult without deep visibility into individual interactions.
- The Solution: An AI Gateway acts as a central point for logging, monitoring, and tracing all AI interactions. It captures detailed metadata about each request, including the prompt, the model used, the response received, latency, token usage, and any errors encountered. This rich data can then be exported to centralized logging systems (e.g., Elasticsearch, Splunk), monitoring dashboards (e.g., Prometheus, Grafana), and distributed tracing tools (e.g., Jaeger, Zipkin) to provide end-to-end visibility into the AI lifecycle.
- Detail: This level of observability is crucial for debugging AI applications, identifying prompt engineering issues, monitoring model drift, auditing usage for compliance, and performing cost analysis. For an LLM Gateway, tracking token usage is particularly important for billing and resource allocation.
Cost Optimization and Control:
- The Problem: AI services, especially LLMs, are often billed per token or per call, leading to potentially unpredictable and high operational costs.
- The Solution: The gateway enables fine-grained cost management by tracking usage metrics (e.g., token count for LLMs) per user, application, or team. It can enforce quotas, set budget alerts, and even implement policy-driven routing to cheaper models or providers for certain types of requests.
- Detail: For example, a development team might have a lower budget for LLM usage than a production customer-facing application. The gateway can automatically enforce these limits, potentially by returning an error, rate limiting requests, or routing to a less expensive, smaller model once a threshold is reached.
Prompt and Model Governance:
- The Problem: Managing and versioning prompts, ensuring safe AI outputs, and controlling which models are used for which tasks can be chaotic.
- The Solution: An AI Gateway can provide capabilities for storing, versioning, and managing prompts, ensuring consistency and enabling A/B testing of different prompt strategies. It can also implement guardrails to filter or transform AI outputs, preventing the generation of unsafe, biased, or inappropriate content. Furthermore, it can enforce policies around model selection, ensuring that only approved models are used for specific use cases or data types.
- Detail: This feature allows organizations to maintain a "golden set" of prompts for critical business functions, experiment with new prompts in a controlled environment, and ensure that AI outputs align with ethical guidelines and brand safety standards.

In essence, an AI Gateway transforms the complex, fragmented world of AI integration into a managed, secure, and observable ecosystem. It is the crucial piece of infrastructure that allows enterprises to confidently and efficiently leverage the power of AI, mitigating risks while maximizing innovation.

Gloo AI Gateway: A Deep Dive into Its Enterprise-Grade Capabilities

Gloo AI Gateway represents a sophisticated evolution of the API Gateway concept, specifically engineered to meet the demanding requirements of enterprise AI deployments. Built by Solo.io, a company renowned for its expertise in cloud-native application networking, Gloo AI Gateway leverages battle-tested technologies like Envoy Proxy and Istio (where applicable to its broader Gloo Platform offerings) to provide a robust, scalable, and highly performant foundation for managing AI services. This robust architecture empowers organizations to simplify AI integration, fortify security, optimize performance, and gain unparalleled control over their intelligent workloads.

The power of Gloo AI Gateway lies in its ability to encapsulate the complex nuances of AI interactions behind a declarative, policy-driven interface. This approach allows platform engineers and AI teams to define rules and configurations that govern how AI models are accessed and consumed, rather than requiring application developers to hard-code these intricacies. This separation of concerns significantly accelerates development cycles, reduces errors, and ensures consistency across diverse AI initiatives.

Simplified AI Integration: Unifying the Fragmented AI Landscape

One of the most immediate and profound benefits of Gloo AI Gateway is its ability to simplify the integration of heterogeneous AI models. The current AI ecosystem is a mosaic of different providers, open-source models, and custom-trained solutions, each with its own API contract, authentication methods, and specific invocation patterns.

Connecting to Diverse LLMs and AI Models: Gloo AI Gateway acts as a universal adapter, capable of connecting to a wide array of AI services. Whether your application needs to interact with OpenAI's GPT models, Anthropic's Claude, Google's Gemini, various models hosted on platforms like Hugging Face, or even internal, custom-built machine learning models, Gloo AI provides the necessary connectors and translation layers. This eliminates the need for applications to manage distinct SDKs or API clients for each AI provider, drastically reducing code complexity and integration effort. For instance, a single endpoint exposed by Gloo AI Gateway could dynamically route requests to different LLM providers based on the prompt's content, user permissions, or even real-time cost considerations.
Unified API for Heterogeneous AI Services: The gateway abstracts away the low-level details of each AI model's API. Instead of interacting with different REST endpoints, JSON structures, and parameter names, applications can send requests to a single, standardized API exposed by Gloo AI Gateway. The gateway then translates these standardized requests into the specific format required by the target AI model and converts the model's response back into a consistent format for the application. This unified interface is invaluable for promoting consistency across an enterprise's AI portfolio, making it easier to switch between models or providers without requiring application-level code changes. This is a cornerstone for any effective LLM Gateway, ensuring flexibility and future-proofing.
Declarative Configuration: Gloo AI Gateway adopts a declarative configuration model, leveraging Kubernetes-native Custom Resource Definitions (CRDs). This means that all aspects of AI service management—from routing rules and security policies to rate limits and observability settings—are defined as YAML files. These configurations can be version-controlled, reviewed, and deployed using standard GitOps workflows, bringing the best practices of modern software development to AI infrastructure. This approach enhances transparency, reduces human error, and facilitates automated deployments, ensuring that changes to AI infrastructure are managed with the same rigor as application code.

Enhanced Security Posture for AI: Guarding the Intelligent Frontier

Security is paramount when dealing with AI services, especially given the sensitive nature of data often processed by these models and the emergence of new AI-specific attack vectors. Gloo AI Gateway provides a robust suite of security features designed to protect AI workloads and prevent data breaches or misuse.

Robust Authentication and Authorization: The gateway provides enterprise-grade authentication and authorization mechanisms. It can integrate with existing identity providers (IdPs) through standards like OAuth 2.0, OpenID Connect, and JWT, ensuring that only authenticated users and services can access AI models. Fine-grained Role-Based Access Control (RBAC) allows administrators to define precise permissions, determining which users or applications can access specific AI models, perform certain operations, or consume a defined amount of tokens. This prevents unauthorized access and ensures compliance with internal security policies.
Data Loss Prevention (DLP) for Sensitive Input/Output: A critical security feature for AI gateways is the ability to inspect and protect sensitive data. Gloo AI Gateway can be configured to perform real-time scanning of prompts before they are sent to an AI model and responses before they are returned to the application. It can identify and redact, mask, or block Personally Identifiable Information (PII), Protected Health Information (PHI), payment card industry (PCI) data, or other proprietary secrets. This proactive DLP capability is essential for preventing sensitive data from leaving the enterprise boundary or being exposed through AI interactions, ensuring compliance with privacy regulations.
Prompt Injection Prevention (WAF-like Capabilities): As AI models become more sophisticated, so do the methods to exploit them. Prompt injection attacks, where malicious users craft prompts to manipulate the AI's behavior, extract confidential information, or bypass security rules, pose a significant threat. Gloo AI Gateway incorporates intelligent prompt analysis capabilities, acting as a specialized Web Application Firewall (WAF) for AI. It can detect and mitigate common prompt injection patterns, preventing the AI model from being coerced into performing unintended actions or revealing sensitive internal data. This layer of defense is crucial for maintaining the integrity and trustworthiness of AI applications.
Auditing and Compliance: For regulated industries or environments with strict compliance requirements, detailed auditing of AI interactions is essential. Gloo AI Gateway meticulously logs all AI requests, responses, and policy enforcement actions. This comprehensive audit trail includes details such as the requesting user, the AI model invoked, the full prompt and response (potentially redacted for sensitive data), token usage, latency, and any security policies applied. This detailed logging supports forensic analysis, compliance audits, and provides irrefutable evidence of how AI services are being consumed and secured.

Advanced Traffic Management and Resilience: Ensuring AI Reliability and Performance

The performance and reliability of AI services are critical, especially for customer-facing applications. Gloo AI Gateway provides sophisticated traffic management capabilities to ensure high availability, optimal performance, and efficient resource utilization for AI workloads.

Intelligent Routing (A/B Testing, Canary Deployments for AI Models): Gloo AI Gateway enables advanced routing strategies that go beyond simple load balancing. It can route requests based on various criteria, such as the originating application, user identity, specific attributes within the prompt, or even real-time metrics of the target AI model. This facilitates sophisticated deployment patterns like A/B testing of different AI models or prompt engineering strategies, allowing organizations to compare their performance and effectiveness in a controlled manner. Canary deployments for new AI model versions or configuration changes become seamless, gradually shifting traffic to the new version while closely monitoring its performance and behavior, minimizing risks.
Load Balancing Across Multiple AI Instances/Providers: To handle high volumes of AI requests and ensure resilience, Gloo AI Gateway can intelligently distribute traffic across multiple instances of an AI model, whether they are hosted internally or provided by external vendors. This horizontal scaling capability ensures that no single AI endpoint becomes a bottleneck. Furthermore, it can perform active health checks on upstream AI services, dynamically removing unhealthy instances from the load balancing pool and re-routing traffic to healthy ones, thereby enhancing overall service reliability.
Circuit Breakers, Retries, Timeouts: Resilience patterns are fundamental for robust distributed systems, and AI services are no exception. Gloo AI Gateway implements circuit breakers to prevent cascading failures by temporarily halting requests to an unhealthy or overloaded AI model. It supports configurable retry policies, automatically reattempting failed AI requests (with exponential backoff) to overcome transient network issues or temporary service unavailability. Configurable timeouts ensure that applications don't hang indefinitely waiting for an AI response, improving user experience and system stability.
Rate Limiting to Prevent Abuse and Manage Costs: Uncontrolled access to AI services can lead to excessive costs and potential denial-of-service scenarios. Gloo AI Gateway offers comprehensive rate limiting capabilities, allowing administrators to define the maximum number of requests a user, application, or IP address can make to an AI model within a specified timeframe. This prevents abuse, protects backend AI services from being overwhelmed, and is crucial for managing operational costs, especially with token-based billing models. Policies can be applied globally, per route, or per consumer, offering granular control.

Comprehensive Observability and Analytics: Illuminating AI Interactions

Understanding the behavior and performance of AI services is vital for debugging, optimization, and auditing. Gloo AI Gateway provides deep observability into every AI interaction, offering rich insights into how models are being used and how they are performing.

Detailed Logging of AI Requests/Responses: Every request and response passing through Gloo AI Gateway is meticulously logged. This includes not only standard HTTP metadata but also AI-specific details such as the full prompt (potentially redacted), the generated response (also potentially redacted), the AI model invoked, the number of input and output tokens consumed (for LLMs), the latency of the AI model's response, and any error codes. This granular logging is indispensable for troubleshooting, understanding user intent, and identifying potential prompt engineering issues.
Metrics for Performance and Usage: The gateway automatically generates a wealth of metrics related to AI service consumption. These metrics include request rates, error rates, latency distributions, and crucially, token usage statistics per model, user, or application. These metrics are exposed in standard formats (e.g., Prometheus) and can be easily collected by existing monitoring systems. By analyzing these metrics, teams can track the performance of their AI models over time, identify bottlenecks, forecast capacity needs, and monitor usage trends for cost analysis.
Distributed Tracing for AI Interactions: For complex applications involving multiple microservices and AI calls, understanding the end-to-end flow of a request is critical. Gloo AI Gateway integrates with distributed tracing systems (e.g., Jaeger, Zipkin, OpenTelemetry) by injecting and propagating trace contexts. This allows developers to visualize the entire journey of an AI request, from the application's initial call, through the gateway, to the upstream AI model, and back. This capability is invaluable for debugging performance issues, pinpointing latency sources, and understanding the causal chain of events in distributed AI systems.
Integration with Prometheus, Grafana, Splunk: All the observability data collected by Gloo AI Gateway—logs, metrics, and traces—can be seamlessly integrated with popular enterprise monitoring and logging stacks. This ensures that AI operational data is not siloed but becomes part of a unified observability strategy, enabling operations teams to use their familiar tools for monitoring, alerting, and analysis of AI services. Custom dashboards can be built in Grafana, for example, to visualize LLM token consumption, identify top-consuming applications, and track error rates for specific AI models.

Cost Optimization and Control: Managing the AI Budget

The variable and often substantial costs associated with consuming external AI services, especially LLMs, necessitate robust cost management capabilities. Gloo AI Gateway provides the tools to track, control, and optimize AI-related expenditures.

Tracking Token Usage Per User/Application: With LLMs typically billed on a per-token basis, granular tracking of token consumption is paramount. Gloo AI Gateway accurately measures and attributes token usage to specific users, applications, or teams. This provides transparency into who is spending what on AI services, enabling accurate chargebacks, budget allocation, and identification of high-usage patterns that might warrant optimization.
Policy-Based Routing to Cheaper Models/Providers: Cost optimization isn't just about tracking; it's about intelligent decision-making. Gloo AI Gateway allows administrators to define policies that dynamically route AI requests based on cost considerations. For example, less critical or routine tasks might be routed to a cheaper, smaller LLM or an open-source model hosted internally, while high-value or complex requests are directed to a premium, more capable (and expensive) provider. This intelligent routing ensures that the right model is used for the right task at the optimal cost.
Budget Enforcement: To prevent unexpected cost overruns, Gloo AI Gateway supports the enforcement of budget limits. Organizations can set daily, weekly, or monthly spending caps for individual teams, projects, or applications. Once a budget threshold is approached or exceeded, the gateway can trigger alerts, apply stricter rate limits, or even temporarily block further AI requests until the budget is reset or increased. This proactive financial control is vital for managing enterprise AI investments.

Prompt and Model Governance: Ensuring Consistency and Safety

As enterprises increasingly rely on AI, governing the use of prompts and the selection of models becomes a strategic imperative. Gloo AI Gateway offers features that facilitate structured prompt management and responsible AI deployment.

Version Control for Prompts: Effective prompt engineering is crucial for getting reliable and high-quality outputs from LLMs. Gloo AI Gateway can help manage prompts by allowing them to be versioned and stored centrally. This ensures that applications are using approved and tested prompts, prevents "prompt drift," and makes it easy to roll back to previous versions if a new prompt performs poorly. This also enables A/B testing of different prompt variations to optimize AI performance and output quality.
A/B Testing of Prompts: Beyond version control, the gateway facilitates controlled experimentation with prompts. Different versions of a prompt can be exposed to distinct user segments or traffic percentages, allowing AI teams to quantitatively assess which prompts yield the best results (e.g., higher accuracy, better user satisfaction, lower token count) before rolling them out to all users. This data-driven approach to prompt engineering significantly improves the efficacy of AI applications.
Guardrails for AI Output: The outputs generated by AI models, especially generative AI, can sometimes be unpredictable, unsafe, or undesirable. Gloo AI Gateway can implement guardrails to filter or transform AI responses before they reach the end-user. This might involve sanitizing content, checking for PII in the output, filtering out hate speech or inappropriate content, or ensuring that responses adhere to specific brand guidelines or compliance requirements. These guardrails are essential for responsible AI deployment and maintaining user trust.
Model Selection Based on Context/Cost/Performance: The gateway provides the intelligence to dynamically select the most appropriate AI model for a given request. This selection can be based on a multitude of factors, including the criticality of the task, the sensitivity of the data, the desired latency, the required level of accuracy, and crucially, the cost of invoking different models. For example, a quick internal query might use a cheaper, faster model, while a customer-facing financial advice application would always use a highly secure, audited, and potentially more expensive LLM. This dynamic model routing ensures optimal resource allocation and performance.

By integrating these advanced capabilities, Gloo AI Gateway transforms AI consumption from a complex, risky, and costly endeavor into a streamlined, secure, and highly manageable process. It empowers organizations to rapidly innovate with AI while maintaining stringent control over security, performance, and expenses, thereby unlocking the full transformative potential of intelligent automation.

APIPark is a high-performance AI gateway that allows you to securely access the most comprehensive LLM APIs globally on the APIPark platform, including OpenAI, Anthropic, Mistral, Llama2, Google Gemini, and more.Try APIPark now! 👇👇👇

Install APIPark – it’s free

Specific Use Cases and Scenarios for Gloo AI Gateway

The versatility and robustness of Gloo AI Gateway make it an indispensable component across a multitude of enterprise scenarios, addressing specific pain points and enabling new possibilities for AI adoption. Its capabilities are particularly impactful in environments where security, scalability, and control over diverse AI models are paramount.

Enterprise AI Adoption: Centralized Control for Diverse Teams

Large enterprises often have multiple business units and development teams, each experimenting with or deploying AI models from various providers. Without a centralized management layer, this leads to a fragmented, inconsistent, and difficult-to-govern AI landscape.

Scenario: A financial institution has several teams building AI-powered applications: one for fraud detection (using an internal ML model), another for customer service chatbots (using an external LLM like GPT-4), and a third for market analysis (leveraging another external LLM like Claude).
Gloo AI Gateway Solution: The gateway provides a single, unified control plane for all these AI services. Each team can access their designated AI models through the gateway, adhering to enterprise-wide security policies, rate limits, and data governance rules. The gateway ensures that sensitive financial data sent to the fraud detection model remains within the internal network, while customer interaction data sent to external LLMs is first sanitized for PII. It provides a consistent API experience for developers across teams, accelerating development and reducing shadow IT. Operations teams gain a centralized view of all AI traffic, performance, and costs, simplifying management and auditing. This dramatically reduces the "wild west" scenario of uncontrolled AI consumption, bringing order and security to enterprise AI initiatives.

Building AI-Powered Applications: Streamlining Development and Improving Resilience

Developers building applications that integrate AI face challenges related to integrating diverse AI APIs, ensuring application resilience against AI model outages, and managing prompt engineering complexities.

Scenario: A software company is developing a new content creation platform that relies heavily on LLMs for generating text, summarizing articles, and translating content. They want to integrate with multiple LLM providers for redundancy and specific task performance.
Gloo AI Gateway Solution: The gateway simplifies the application's codebase by abstracting away the specifics of each LLM provider. The application only needs to call a single gateway endpoint. Gloo AI Gateway can then intelligently route requests to the best-performing or most cost-effective LLM provider for each specific task (e.g., OpenAI for creative writing, Google Gemini for factual summarization). If one LLM provider experiences an outage, the gateway can automatically failover to another, ensuring the application remains operational and providing a seamless user experience. Furthermore, the gateway can manage and version prompts, allowing developers to iterate on prompt engineering strategies without modifying application code, streamlining the development process and improving the overall resilience of the AI-powered application.

Securing Sensitive AI Workloads: Financial, Healthcare, and Personal Data

Industries handling highly sensitive data (e.g., financial records, patient health information) have stringent regulatory requirements and face high risks if data is exposed through AI interactions.

Scenario: A healthcare provider wants to use an LLM for summarizing patient notes and assisting doctors with differential diagnoses. However, they cannot directly send raw patient data to an external, third-party LLM due to HIPAA compliance and data privacy concerns.
Gloo AI Gateway Solution: Gloo AI Gateway acts as a crucial security intermediary. Before any patient notes are sent to the external LLM, the gateway's Data Loss Prevention (DLP) capabilities automatically scan and redact or mask all Protected Health Information (PHI) such as patient names, dates of birth, social security numbers, and medical record numbers. Only sanitized, non-identifiable information is forwarded to the LLM. When the LLM generates a summary, the gateway can also apply guardrails to ensure the output adheres to medical ethics and does not contain potentially harmful or speculative advice. All these interactions are meticulously logged for auditing purposes, demonstrating compliance and providing an ironclad layer of protection for sensitive healthcare data.

Multicloud/Hybrid Cloud AI Deployments: Consistent Management Across Environments

Many enterprises operate in hybrid or multi-cloud environments, making it challenging to maintain consistent AI management and security policies across disparate infrastructure.

Scenario: A global retailer uses internal ML models hosted on-premises for inventory optimization and external LLMs in a public cloud for customer sentiment analysis from social media feeds. They need a unified way to manage and secure these diverse AI services.
Gloo AI Gateway Solution: Gloo AI Gateway can be deployed consistently across on-premises data centers and various public cloud environments, providing a single, coherent management plane for all AI interactions. It ensures that security policies (authentication, authorization, DLP) and traffic management rules (rate limiting, routing) are uniformly applied, regardless of where the AI model or the consuming application resides. This consistency simplifies operations, reduces the risk of policy misconfigurations, and enables the retailer to leverage the best AI solutions from any environment without creating management silos.

Managing LLMs at Scale: Handling High Traffic and Optimizing Costs

The scale and cost implications of LLM usage can be substantial, especially for applications with high user traffic. Efficient management of LLM resources is critical for economic viability.

Scenario: An e-commerce platform uses an LLM to generate product descriptions dynamically for millions of items. This results in extremely high token usage and varied performance requirements.
Gloo AI Gateway Solution: As an LLM Gateway, Gloo AI Gateway is perfectly suited for this. It provides sophisticated load balancing and intelligent routing to distribute the immense traffic across multiple LLM instances or providers, preventing any single point of failure and ensuring high throughput. Its cost optimization features track token usage per product category or even per user, allowing the platform to analyze spending patterns. Based on this analysis, the gateway can be configured to use a cheaper, smaller LLM for less critical product descriptions and reserve a more advanced, more expensive LLM for premium or high-visibility items. It also implements rate limiting to prevent individual users or rogue applications from incurring excessive costs, thereby keeping the LLM budget under control while maintaining performance.

AI Model Experimentation and Lifecycle Management: Canary Releases and A/B Testing

Innovating with AI often involves experimenting with new models, fine-tuning existing ones, or trying different prompt strategies. Managing the lifecycle of these experiments safely and efficiently is crucial.

Scenario: An R&D team wants to test a new version of their internal recommendation engine (an ML model) or a new prompt strategy for their content generation LLM without disrupting the production application.
Gloo AI Gateway Solution: Gloo AI Gateway facilitates safe experimentation through capabilities like canary releases and A/B testing. The team can deploy the new model or prompt variation behind the gateway and direct a small percentage of live traffic to it. The gateway meticulously collects metrics and logs for this canary release, allowing the team to monitor its performance, stability, and impact on user experience in real-time. If the new version performs well, traffic can be gradually shifted; if issues arise, it can be immediately rolled back. This controlled experimentation environment accelerates innovation while minimizing risks to the production system, allowing for continuous improvement of AI models and prompt engineering.

These examples illustrate how Gloo AI Gateway serves as a pivotal piece of infrastructure, transforming the challenges of AI integration and management into opportunities for streamlined operations, enhanced security, and accelerated innovation across diverse enterprise use cases.

Comparing AI Gateways and the Evolving Ecosystem

The emergence of the AI Gateway as a distinct product category underscores the specific and pressing challenges that AI workloads bring to enterprise infrastructure. While the fundamental principles might echo those of traditional API Gateways, the AI-specific functionalities differentiate them significantly. Traditional API gateways excel at routing, authenticating, and managing standard RESTful APIs, focusing on HTTP methods, resource paths, and general request/response schemas. They are less concerned with the semantic content of requests or responses, the intricacies of prompt engineering, token economics, or the specific security vulnerabilities unique to AI models like prompt injection.

Table 1: Traditional API Gateway vs. AI Gateway (LLM Gateway Focus)

Feature	Traditional API Gateway	AI Gateway (LLM Gateway)
Primary Focus	General HTTP API management, microservices	AI/LLM service management, AI-specific security
Core Functions	Routing, AuthN/AuthZ, Rate Limiting, Load Bal.	Above + Model Abstraction, Prompt Mgmt, Token Tracking, DLP, AI Guardrails
Data Awareness	Headers, URL, basic payload validation	Semantic content of prompts/responses, PII detection, sensitive data filtering
Security Scope	OWASP Top 10, general API attacks	Above + Prompt Injection, Model Manipulation, Data Exfiltration via AI
Cost Management	Request count, bandwidth	Token usage, model-specific billing, dynamic cost-based routing
Observability	HTTP metrics, request/response logs	AI-specific metrics (tokens, model ID), prompt/response logging (redacted), model tracing
Integration	General REST/gRPC endpoints	Heterogeneous AI APIs (OpenAI, Hugging Face, custom), unified invocation
Prompt Management	Not applicable	Versioning, A/B testing, guardrails, templating
Model Versioning	Not applicable	Canary releases of AI models, routing based on model version
Resilience	Circuit breakers, retries, timeouts	Above + intelligent failover between AI providers

Gloo AI Gateway, with its foundation in robust cloud-native technologies, positions itself at the forefront of this specialized category. It leverages the strengths of Envoy Proxy for high-performance traffic management and Istio (in its broader platform context) for powerful service mesh capabilities, extending these to intelligently manage AI workloads. This allows for deep integration with Kubernetes and modern infrastructure practices, offering a comprehensive solution for enterprises already invested in cloud-native ecosystems.

However, the evolving landscape of AI infrastructure also presents other innovative solutions. For instance, APIPark stands out as an open-source AI Gateway and API management platform that offers a compelling alternative or complementary tool for comprehensive AI and API governance. Developed under the Apache 2.0 license, APIPark is designed to help developers and enterprises manage, integrate, and deploy both AI and traditional REST services with remarkable ease and flexibility.

APIPark's Key Strengths, as an Open Source AI Gateway:

Quick Integration of 100+ AI Models: APIPark provides the capability to integrate a vast array of AI models with a unified management system. This simplifies the often-tedious process of connecting to different AI providers, offering a consistent approach for authentication and crucial cost tracking across diverse models. This means whether you're working with a cutting-edge LLM or a specialized image recognition model, APIPark can bring it under a single management umbrella quickly.
Unified API Format for AI Invocation: A significant challenge in multi-AI environments is the varying API formats. APIPark addresses this by standardizing the request data format across all integrated AI models. This standardization is a game-changer because it ensures that changes in underlying AI models or even prompt updates do not necessitate modifications to your application or microservices. This drastically simplifies AI usage and reduces long-term maintenance costs, providing unparalleled agility.
Prompt Encapsulation into REST API: One of APIPark's particularly innovative features is the ability for users to quickly combine AI models with custom prompts to create new, specialized REST APIs. For example, you could encapsulate a complex sentiment analysis prompt with an LLM and expose it as a simple POST /sentiment endpoint. This democratizes access to sophisticated AI functions, allowing developers to consume AI capabilities as standard REST services without needing deep AI expertise.
End-to-End API Lifecycle Management: Beyond AI, APIPark excels as a comprehensive API management platform. It assists with the entire lifecycle of APIs, from initial design and publication to invocation and eventual decommissioning. This includes regulating API management processes, managing traffic forwarding, sophisticated load balancing, and versioning of published APIs. This holistic approach ensures that both AI and traditional APIs are governed with the same level of rigor and control.
API Service Sharing within Teams: Collaboration is key in modern development. APIPark facilitates this by offering a centralized display of all API services. This makes it incredibly easy for different departments, teams, or even external partners to discover, understand, and use the required API services, fostering an ecosystem of internal and external API consumption.
Independent API and Access Permissions for Each Tenant: For organizations with multiple business units or external partners, APIPark enables the creation of multiple teams (tenants). Each tenant can have independent applications, data, user configurations, and security policies, all while sharing underlying applications and infrastructure. This multi-tenancy improves resource utilization and significantly reduces operational costs, offering strong isolation without excessive overhead.
API Resource Access Requires Approval: Security is paramount. APIPark allows for the activation of subscription approval features, ensuring that callers must subscribe to an API and await administrator approval before they can invoke it. This prevents unauthorized API calls and potential data breaches, adding an essential layer of human oversight to API access.
Performance Rivaling Nginx: Performance is not sacrificed for features. APIPark boasts impressive performance, capable of achieving over 20,000 Transactions Per Second (TPS) with just an 8-core CPU and 8GB of memory. It also supports cluster deployment, making it highly scalable to handle massive traffic loads, positioning it as a robust backbone for demanding enterprise environments.
Detailed API Call Logging: Comprehensive logging is essential for operational visibility and troubleshooting. APIPark provides granular logging capabilities, recording every detail of each API call. This feature is invaluable for businesses needing to quickly trace and diagnose issues, ensuring system stability and data security while supporting audit trails.
Powerful Data Analysis: Leveraging its detailed log data, APIPark analyzes historical call data to display long-term trends and performance changes. This predictive insight helps businesses perform preventive maintenance and identify potential issues before they impact services, contributing to a more proactive operational posture.

APIPark, being an open-source product from Eolink, a leader in API lifecycle governance, represents a powerful, community-driven solution that democratizes access to advanced AI Gateway and API management functionalities. While Gloo AI Gateway offers robust, commercially backed features often appealing to large enterprises deeply invested in the Solo.io ecosystem, APIPark provides a flexible, high-performance, and feature-rich open-source alternative or complement, especially for organizations prioritizing community engagement and extensive multi-model integration capabilities at a foundational level. The choice between solutions often hinges on existing infrastructure, specific feature requirements, and strategic alignment with open-source versus commercial offerings. Both signify the growing recognition that specialized gateways are indispensable for navigating the complexities of the modern AI and API landscape.

Implementation Considerations and Best Practices for Gloo AI Gateway

Successfully deploying and operationalizing Gloo AI Gateway, or any sophisticated AI Gateway, within an enterprise environment requires careful planning and adherence to best practices. Moving from conceptual understanding to practical implementation involves several key considerations that impact scalability, security, maintainability, and overall system resilience.

1. Deployment Strategies: Cloud-Native First

Gloo AI Gateway is built on cloud-native principles, making Kubernetes the ideal deployment platform. This alignment allows it to leverage Kubernetes' orchestration capabilities for scalability, high availability, and simplified management.

Kubernetes-Native Deployment: Deploy Gloo AI Gateway as a set of custom resources within your Kubernetes clusters. This allows you to define gateway configurations, routing rules, and security policies using familiar YAML manifests, integrating seamlessly with existing GitOps pipelines. For multi-cluster or multi-cloud environments, consider deploying Gloo AI Gateway instances in each cluster or region and potentially federating their management for a unified control plane. Ensure your Kubernetes clusters are adequately resourced (CPU, memory, network) to handle the expected AI traffic, especially during peak loads.
Infrastructure as Code (IaC): Treat your Gloo AI Gateway configurations, including its Custom Resource Definitions (CRDs), as code. Use tools like Terraform or Pulumi to manage the underlying infrastructure (Kubernetes clusters, network policies) and use Git for versioning and collaboration on gateway configurations. This ensures repeatability, auditability, and reduces manual errors during deployment and updates.
High Availability: Deploy Gloo AI Gateway components with appropriate replicas and anti-affinity rules to ensure they are spread across different nodes and availability zones. This minimizes the impact of node failures. Leverage Kubernetes' built-in health checks and self-healing capabilities to automatically restart or reschedule failed gateway pods.

2. Integration with CI/CD Pipelines: Automating AI Infrastructure

Integrating Gloo AI Gateway configurations into your existing Continuous Integration/Continuous Delivery (CI/CD) pipelines is crucial for agile development and reliable deployments.

Automated Configuration Deployment: Automate the deployment of gateway configurations (e.g., new AI routes, updated security policies, prompt versions) as part of your application release process. When an AI-powered application is deployed or updated, its corresponding gateway configurations should be automatically applied.
Testing Gateway Configurations: Include automated tests for your gateway configurations within your CI/CD pipeline. This could involve integration tests that send mock AI requests through the gateway to verify routing, authentication, and policy enforcement, ensuring that changes don't inadvertently break existing AI services.
Rollback Capabilities: Design your CI/CD pipelines with robust rollback mechanisms. If a new gateway configuration causes issues, you should be able to quickly revert to a previous stable version with minimal downtime. GitOps principles, where Git serves as the single source of truth for your configurations, greatly facilitate this.

3. Monitoring and Alerting: Staying Ahead of AI Issues

Comprehensive monitoring and proactive alerting are essential for maintaining the health, performance, and security of your AI services mediated by Gloo AI Gateway.

Centralized Observability Stack: Integrate Gloo AI Gateway's metrics, logs, and traces into your existing centralized observability platforms (e.g., Prometheus/Grafana for metrics, Elasticsearch/Splunk for logs, Jaeger/OpenTelemetry for traces). This provides a unified view of your entire application and AI infrastructure.
Key Metrics to Monitor: Pay close attention to AI-specific metrics like requests per second to AI models, error rates (e.g., 4xx, 5xx responses from upstream AI), latency distributions (p99, p95 latencies), and particularly for LLMs, input/output token counts. Monitor gateway resource utilization (CPU, memory) to identify potential bottlenecks.
Proactive Alerting: Set up alerts for critical conditions such as spikes in AI service error rates, unusually high latency, unexpected changes in token consumption, or security policy violations (e.g., DLP triggering frequently). Integrate these alerts with your incident management systems to ensure prompt notification and resolution.

4. Scalability Planning: Growing with Your AI Demands

As AI adoption grows, your AI Gateway must scale seamlessly to meet increasing demands without compromising performance.

Horizontal Scaling: Gloo AI Gateway components are designed to scale horizontally. Plan for automatic scaling (e.g., Kubernetes Horizontal Pod Autoscalers) based on CPU utilization, memory consumption, or custom metrics like AI request queue depth.
Capacity Planning: Regularly review your AI service usage patterns and forecast future demands. This includes anticipating peak loads, understanding how new AI features might impact traffic, and planning for sufficient underlying infrastructure resources (e.g., compute, network bandwidth) for both the gateway and the AI models it manages.
Caching Strategies: Explore implementing caching at the gateway level for frequently requested AI responses, especially for deterministic AI models or prompts. This can significantly reduce latency and offload upstream AI services, improving scalability and potentially reducing costs.

5. Security Hardening: A Multi-Layered Defense

Beyond the AI-specific security features of Gloo AI Gateway, apply general security best practices to the gateway itself and its surrounding environment.

Least Privilege: Configure Gloo AI Gateway with the principle of least privilege. Its service accounts in Kubernetes should only have the permissions absolutely necessary to perform its functions.
Network Segmentation: Deploy Gloo AI Gateway in a dedicated network segment (e.g., a specific Kubernetes namespace or VPC subnet) with strict network policies controlling ingress and egress traffic. Only allow necessary connections to and from the gateway.
Secrets Management: Securely manage API keys, authentication tokens, and other sensitive credentials used by the gateway to interact with AI models. Use a robust secrets management solution (e.g., Kubernetes Secrets, HashiCorp Vault) and ensure these secrets are encrypted at rest and in transit.
Regular Audits and Updates: Regularly audit your gateway configurations for security vulnerabilities or misconfigurations. Keep Gloo AI Gateway and its underlying components (e.g., Envoy Proxy, Kubernetes) updated to the latest stable versions to benefit from security patches and performance improvements.
AI-Specific Security Policies: Continuously refine and update your AI-specific security policies within the gateway, such as DLP rules and prompt injection prevention mechanisms, as new AI threats emerge or as the sensitivity of your AI workloads evolves.

By meticulously addressing these implementation considerations and adhering to these best practices, organizations can maximize the value of Gloo AI Gateway, transforming it from a mere piece of software into a strategic asset that underpins secure, scalable, and intelligent AI operations. This thoughtful approach ensures that your enterprise can fully realize the transformative potential of AI, confidently and efficiently.

Conclusion: The Indispensable Role of Gloo AI Gateway in the Intelligent Enterprise

The rapid proliferation of Artificial Intelligence, particularly the transformative capabilities of Large Language Models, has ushered in an era of unprecedented innovation and disruption. Yet, beneath the surface of these powerful technologies lies a complex web of infrastructure, security, and management challenges. Enterprises embracing AI must navigate a fragmented ecosystem of models, mitigate new security risks like prompt injection, ensure high performance and scalability, and meticulously control spiraling costs associated with AI consumption. The traditional API Gateway, while foundational for microservices, simply lacks the specialized intelligence and features required to effectively manage and secure these intelligent workloads.

This is precisely where Gloo AI Gateway emerges as an indispensable architectural component for the modern intelligent enterprise. By serving as an intelligent intermediary, it transforms the chaotic landscape of AI integration into a streamlined, secure, and observable ecosystem. Gloo AI Gateway provides a unified access layer that abstracts away the complexities of diverse AI models, allowing developers to consume AI services through a consistent interface, regardless of the underlying provider or technology. This simplification accelerates development cycles and fosters agility, enabling organizations to experiment and innovate with AI at an unprecedented pace.

Beyond simplification, Gloo AI Gateway significantly fortifies the security posture of AI services. Its robust authentication and authorization mechanisms, combined with AI-specific capabilities like Data Loss Prevention (DLP) for sensitive input/output and advanced prompt injection prevention, act as a critical last line of defense. These features protect proprietary data, ensure compliance with stringent regulations, and safeguard against novel AI-specific attack vectors, instilling confidence in the secure deployment of AI.

Furthermore, Gloo AI Gateway offers unparalleled control over AI operations and costs. Its advanced traffic management capabilities, including intelligent routing, load balancing, and resilience patterns, ensure that AI applications remain highly available and performant even under extreme loads. Crucially, its comprehensive observability tools provide deep insights into AI usage, performance, and errors, while granular cost optimization features (such as token usage tracking and policy-based routing to cheaper models) empower organizations to manage their AI budget effectively. For organizations seeking a powerful and flexible open-source alternative or complement, platforms like APIPark also demonstrate the breadth of innovation in this space, offering extensive multi-model integration and robust API management capabilities.

In essence, Gloo AI Gateway is not just a piece of infrastructure; it is a strategic enabler. It bridges the gap between the raw power of AI models and the practical demands of enterprise-grade deployment, allowing businesses to confidently and efficiently harness the full potential of intelligent automation. As AI continues to evolve and become more deeply embedded in every facet of business operations, the role of a specialized AI Gateway or LLM Gateway will only grow in importance, becoming the critical backbone for simplified, secure, and sustainable AI innovation. Embracing such a solution is no longer an option but a strategic imperative for any organization committed to leading in the intelligent future.

Frequently Asked Questions (FAQs)

1. What is the primary difference between a traditional API Gateway and an AI Gateway like Gloo AI Gateway? A traditional API Gateway primarily focuses on managing standard RESTful APIs, handling authentication, authorization, routing, and rate limiting based on HTTP protocols and general API contracts. An AI Gateway, such as Gloo AI Gateway, extends these capabilities with AI-specific intelligence. It understands the semantic content of prompts, manages token usage for LLMs, provides Data Loss Prevention (DLP) for sensitive AI data, offers prompt injection prevention, and enables intelligent routing based on AI model characteristics, cost, or performance. Essentially, it adds a layer of AI-aware management and security that traditional gateways lack.

2. How does Gloo AI Gateway help with managing costs for LLMs? Gloo AI Gateway offers several mechanisms for cost optimization. It meticulously tracks token usage for Large Language Models (LLMs) per user, application, or team, providing granular visibility into spending. It can implement policy-based routing, directing requests to cheaper LLM models or providers for less critical tasks, while reserving more expensive models for high-value operations. Additionally, it supports budget enforcement by setting quotas and triggering alerts or rate limits when spending thresholds are approached or exceeded, giving organizations proactive control over their AI expenditures.

3. Can Gloo AI Gateway secure sensitive data exchanged with external AI models? Absolutely. Security for sensitive data is a cornerstone of Gloo AI Gateway. It provides advanced Data Loss Prevention (DLP) capabilities that can scan prompts and responses in real-time for Personally Identifiable Information (PII), Protected Health Information (PHI), or other proprietary secrets. It can then redact, mask, or block this sensitive data before it ever leaves your enterprise boundary or before an AI response reaches a user, ensuring compliance with data privacy regulations like GDPR and HIPAA. This prevents inadvertent exposure of confidential information to external AI services.

4. How does Gloo AI Gateway support multi-cloud or hybrid-cloud AI deployments? Gloo AI Gateway is designed with cloud-native principles, allowing for consistent deployment and management across various environments, whether on-premises Kubernetes clusters or multiple public cloud providers. By providing a unified control plane, it ensures that security policies, traffic management rules, and observability configurations are uniformly applied to all AI services, regardless of where they or the consuming applications are hosted. This consistency simplifies operations, reduces configuration drift, and enables organizations to leverage AI resources across their entire distributed infrastructure.

5. What is "prompt injection prevention" and how does Gloo AI Gateway address it? Prompt injection is a security vulnerability where a malicious user crafts a prompt to manipulate an AI model into ignoring its intended instructions, revealing confidential information, or performing unintended actions. Gloo AI Gateway addresses this by acting as an intelligent intermediary. It incorporates specialized WAF-like capabilities to analyze incoming prompts for patterns indicative of injection attempts. By detecting and mitigating these malicious prompts before they reach the AI model, Gloo AI Gateway protects the integrity and security of your AI applications, preventing the AI from being coerced into undesirable behavior or exposing sensitive data.

🚀You can securely and efficiently call the OpenAI API on APIPark in just two steps:

Step 1: Deploy the APIPark AI gateway in 5 minutes.

APIPark is developed based on Golang, offering strong product performance and low development and maintenance costs. You can deploy APIPark with a single command line.

curl -sSO https://download.apipark.com/install/quick-start.sh; bash quick-start.sh

In my experience, you can see the successful deployment interface within 5 to 10 minutes. Then, you can log in to APIPark using your account.

Step 2: Call the OpenAI API.