How to reduce API latency and optimize your API-APIPark

As an e-commerce shopper on the hunt for a product, when you search for anything, your request is sent over to the application’s API gateway. This gateway is responsible for connecting with numerous backend services, including product catalog, user authentication, and inventory. And here’s the amazing thing—all that connection happens seamlessly and within a fraction of a second to a few seconds. The time it takes to bounce between these services and return a response to you is called latency.

To ensure a positive user experience, minimizing API gateway latency is key. OpenTelemetry (OTel) is a framework that does just that. It offers standard instrumentation libraries, APIs, and observability tools. These help developers track and understand application performance, especially in setups like microservices. For instance, in an e-commerce app connected to multiple backend services via an API gateway, OpenTelemetry provides insights into how the various services interact, their latencies, processing steps, and any bottlenecks within the application.

In this tutorial, you’ll learn how to reduce API latency issues and optimize your APIs using OpenTelemetry and the Gateway API. You’ll also learn about some best practices that can help you conquer your API gateway latency issues.

What is API gateway latency?

API gateway latency refers to the amount of time it takes for an API gateway to go through the various processing stages within the gateway, aggregate or compose the final response, and send it. Latency is the result of various factors, including network communication, request processing, authentication, authorization, the volume of data being transmitted, response times from backend servers, and communication times between services in a microservices setup.

Consider an e-commerce platform again. In this context, the various services not only boost API functionality and security but also add extra processing steps that affect response time. The wait time you experience determines if your application is high latency or low latency. Low-latency applications provide quick responses with minimal communication and processing time. In contrast, high latency means longer response times, often resulting in a poor user experience.

High API gateway latency can be attributed to three main factors: network issues, inadequate server resources, and unoptimized code:

1. Network issues: This encompasses slow or unreliable internet connections, considerable geographical distance between the API gateway and various microservices, network congestion during peak usage, and delays in DNS resolution. These factors collectively extend the time it takes for the API gateway to traverse the network and prepare a response.

2. Inadequate server resources: When an API gateway server or backend service lacks necessary resources, like RAM, storage, CPU, and network speed, issues arise. This can lead to delays in tasks like authentication, authorization, request validation, data retrieval, and processing—all resulting in higher latency.

3. Unoptimized code: This introduces problems such as excessive processing time due to redundant computations (ie unnecessary loops or inefficient algorithms), memory leaks, blocked operations without asynchronous handling, lack of data caching, inefficient data formats, and poor resource usage. It can also hinder horizontal scaling efforts, affecting performance as traffic increases.

These factors collectively contribute to higher API latency and an inferior user experience.

Observe and measure API gateway latency

Now that you understand what API gateway latency is, dive into how you can set up your application or microservices to gather telemetry data. With this data, you are able to observe and identify what’s causing latency in your API services. Observability refers to the ability to use data produced by a system, application, or infrastructure to indicate its performance or current state. This data is usually captured in the form of logs, metrics, and traces:

* Logs are records of events in a software system, including errors, file access, and warnings.

* Metrics are numerical values used to measure system performance, such as CPU usage, memory, and bandwidth.

* Traces record the journey of a request through an application, showing processing times and interactions with services. Traces help you pinpoint errors and slow areas, especially in distributed systems. Individual operations within a trace are called spans (*ie* database operations or API calls).

Solution architecture and prerequisites

In this tutorial, all traces will be captured from the application and transferred to the OpenTelemetry Collector. Once this detail gets to the Collector, you’ll be able to push it to an observability backend:

Before you begin, you need to familiarize yourself with the following:

* Instrumented microservice/app: This refers to the various microservices you may be running in your environment. These can be written in any language of your choice. This guide uses a simple REST API service for a parcel delivery application, written in Go and intentionally made to simulate a high-latency application. This will be instrumented to send traces to the OpenTelemetry Collector both with high latency and low latency.

* OpenTelemetry Collector: You need an installed and running instance of the contrib version of the OpenTelemetry Collector. If you’re familiar with Docker, you can easily get started using the Docker instance by downloading the package built for your respective operating system from the project’s Releases page. Later, you’ll provide your instance with a YAML configuration file that tells the Collector how to receive, process, and export data to the observability backend of your choice.

* Observability backends: This will be the observability platform of your choice, where you’ll visualize the incoming data from your instrumented application. This tutorial will use Logz.io, TelemetryHub, and SigNoz Cloud.

Set up the sample app

To begin, grab the sample application from this GitHub repository and set up a Go development environment. This repository has three branches:

* The `basic-app` branch holds the application with high latency and is not instrumented.

* The `instrumented` branch is an instrumented version of the basic app with high latency introduced using random wait times between operations.

* The `main` branch is an instrumented version of the API without the high-latency simulation.

This repository also contains a small program written to generate enough POST, PUT, and GET traffic to the API service so that you can have enough trace data to visualize on the respective backends. You can also find all the YAML configuration files for the respective backends in the repository.

Clone the repository using this Git command:

$ git clone https://github.com/rexfordnyrk/opentelemetry-instrumentation-example.git

Then check out to the `basic-app` branch with this command:

$ git checkout instrumentated

Now, you’re ready to follow along with the instrumentation process.

Instrument the app

To start collecting information from your API service, you need to instrument your service using the respective OpenTelemetry libraries for the programming language of your service. Here, the application is written in Go using the Gin web framework and the GORM ORM for database interactions. The otelgin and otelgorm packages will be used to instrument the application.

To find other available libraries, tracer implementations, and utilities for instrumenting your tech stack, check out the OpenTelemetry Registry. In this article, you’ll learn about the implementation of automatic, manual, and database instrumentation to help you better understand the performance of your database queries.

Automatic instrumentation

Automatic instrumentation enables you to easily collect data from your app by importing the instrumentation library and adding a few lines of code to your application. This may or may not be available and supported depending on your tech stack. You can check the OpenTelemetry Registry to find out. In this case, it’s partially supported.

The `opentelemetry.go` file contains the code for the instrumentation of the app, and the following `initTracer()` function from that file shows you how the tracer is initiated:


go
func initTracer() func(context.Context) error {
 //Setting the Service name from the environmental variable if exists
 if strings.TrimSpace(os.Getenv("SERVICE_NAME")) != "" {
 serviceName = os.Getenv("SERVICE_NAME")
 }
 //Setting the Collector endpoint from the environmental variable if exists
 if strings.TrimSpace(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")) != "" {
 collectorURL = os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
 }
 //Setting up the exporter for the tracer
 exporter, err := otlptrace.New(
 context.Background(),
 otlptracegrpc.NewClient(
 otlptracegrpc.WithInsecure(),
 otlptracegrpc.WithEndpoint(collectorURL),
 ),
 )
 //Log a fatal error if exporter could not be setup
 if err != nil {
 log.Fatal(err)
 }
 // Setting up the resources for the tracer. this includes the context and other attributes
 //to identify the source of the traces
 resources, err := resource.New(
 context.Background(),
 resource.WithAttributes(
 attribute.String("service.name", serviceName),
 attribute.String("language", "go"),
 ),
 )
 if err != nil {
 log.Println("Could not set resources: ", err)
 }
 //Using the resources and exporter to set up a trace provider
 otel.SetTracerProvider(
 sdktrace.NewTracerProvider(
 sdktrace.WithSampler(sdktrace.AlwaysSample()),
 sdktrace.WithBatcher(exporter),
 sdktrace.WithResource(resources),
 ),
 )
 return exporter.Shutdown
}

In this code, in the `main.go` file, the `initTracer()` is called to create a trace provider that is injected into the router as a middleware with this line of code:


go
//Adding otelgin as middleware to auto-instrument ALL API requests
router.Use(otelgin.Middleware(serviceName))

This automatically traces all the API requests made to the service.

Manual instrumentation

In this use case, automatic instrumentation is limited as it only works for the entire request and may not necessarily capture details pertaining to the individual operations or processes of your application. However, this isn’t true of all languages. For instance, manual instrumentation requires that you specify sections of your application code that you may want to instrument.

Navigate back to the `opentelemetry.go` file. You’ll notice another function there, one that accepts the context, an action, and a name string as values and returns an error:


go
// ChildSpan A utility function to create child spans for specific operations
func ChildSpan(c *gin.Context, action, name string, task func() error) error {
 //Setting up a tracer either from the existing context or creating a new one
 var tracer trace.Tracer
 tracerInterface, ok := c.Get(tracerKey)
 if ok {
 tracer, ok = tracerInterface.(trace.Tracer)
 }
 if !ok {
 tracer = otel.GetTracerProvider().Tracer(
 tracerName,
 trace.WithInstrumentationVersion(otelgin.Version()),
 )
 }
 savedContext := c.Request.Context()
 defer func() {
 c.Request = c.Request.WithContext(savedContext)
 }()
 //Adding attributes to identify the operation captured in this span
 opt := trace.WithAttributes(attribute.String("service.action", action))
 _, span := tracer.Start(savedContext, name, opt)
 // Simulate delay in operation
 time.Sleep(time.Millisecond * time.Duration(randomDelay(400, 800)))
 //running function provided in span to add to trace
 if err := task(); err != nil {
 // recording an error into the span if there is any
 span.RecordError(err)
 span.SetStatus(codes.Error, fmt.Sprintf("action %s failure", action))
 span.End()
 return err
 }
 //ending the span
 span.End()
 return nil
}

This function creates a span under the current request context to trace the execution of the function provided. The action and name strings are used to create attributes to identify the operation that the span was created for. If there is an error during the execution of the provided function, the error information is added to the span, and the span is closed or stopped.

Inside the `router.go` file, you’ll notice that the POST and PUT request handlers have all calls sent to the `generateFee()` function:


go
//running the generate fees call in a span to see how long this external service takes
if err := ChildSpan(c, "Generate Fee", "Generate Fee Call", parcel.GenerateFee); err != nil {
 c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to generate fee"})
 return
}

This function makes an external API call to simulate the generation of the delivery fee based on distance and parcel size.

Database instrumentation

You can also track database calls using the otelgorm package. In the `main.go` file, you’ll see the following lines under the database setup, which adds the instrumentation plugin to the database connection:


go
//Adding otelgorm plugging to GORM ORM for db instrumentation
if err := db.Use(otelgorm.NewPlugin()); err != nil {
 panic(err)
}

After that, every call to the database is modified to use the request context to create spans in the current request trace. This is done by calling the `WithContext()` method of the `db` object and passing the request context. For example, see the following line:


go
//Saving record to database 
db.Model(&parcel).Updates(&updatedParcel)

The preceding line is now updated with this:


go
//Saving record to database
db.WithContext(c.Request.Context()).Model(&parcel).Updates(&updatedParcel)

This gives you insight into your database operations to identify nonperformant queries that may tend to increase latency.

Measure API gateway latency

Now that your app is instrumented, you need to configure your Collector to send the data to the respective observability platform of your choice. While it’s possible to send data to more than one backend at the same time, here, you’ll configure the Collector for each service separately. More information about the YAML configuration file for the OpenTelemetry Collector is available in the official docs.

Here, you’ll measure the latency of the API, first simulating a high-latency situation and, second, measuring low latency.

This tutorial assumes you’re running the sample app, your OpenTelemetry Collector, and the traffic generator on the same machine. You should change the endpoint URLs in the code if this is not the case.

Logz.io

Logz.io is an observability backend built on open source technologies, including OpenSearch for logs analytics, Jaeger for trace analytics, and Prometheus for metric analytics. In addition, Logz.io provides a cloud SIEM to help you effectively monitor, detect, isolate, and analyze security threats on your cloud infrastructure. This is a good tool to use if you’re already familiar with these technologies but would prefer to have everything in one place.

You can sign up for a free trial and go through the Get started wizard to obtain the right configuration for your Collector instance.

Once you have the right configuration for your instance, start the API server with the following command:

bash $ go run .

Then from another terminal, change directories to the traffic directory and run the traffic program with the following command to start generating some traffic and trace data to Logz.io:

bash $ cd traffic && go run .

In about five minutes (or less), you should start seeing the graph populated on the Jaeger UI of the traces section of your Logz.io dashboard:

Logz.io traces observability with high API latency

As you can see, the traces are filtered to show the last 500 traces in the last 15 minutes and sorted by the longest first. In this case, you’ll see that the trace with the longest time is 1.95 seconds. Clicking on that trace gives you a breakdown of all the spans:

High-latency trace details

Notice how the Generate Fee Call span takes most of the execution time. Additionally, beneath that, you’ll see that the database span gorm.Create is captured with various details, including the full query and the execution time. This helps you easily identify that Generate Fee Call may need further investigation and optimization.

TelemetryHub

TelemetryHub is a full-stack observability tool built from scratch that is solely based on OpenTelemetry. As such, it aims to be the single most affordable destination for all your OTel data. It’s powered by a clean UI with a responsive chart and visualization, almost in real time. Additionally, TelemetryHub offers all this as a service, so you don’t have to worry about setting up your own infrastructure.

You can sign up for a free account to get started.

After you’ve signed up, you need to set up a service group (if it isn’t automatically created). This provides you with an ingestion key that you need for your Collector’s configuration file to export your OTel data. You can use the Collector Setup page to obtain a basic working YAML configuration for running your Collector.

To start the API server and run the traffic generator to help generate enough traffic for the TelemetryHub dashboard, you’ll use the same process as before. This is what it should look like:

TelemetryHub traces observability dashboard with high API latency

In TelemetryHub, you’re provided with a visual representation of the performance of your API as well as which endpoints take the highest time to fulfil requests. Beneath the graph is a traditional tabular structure with filters for the various trace records that have accumulated. Here, they’re sorted by the longest-running trace first, which is 3.6 seconds. Clicking on that trace takes you to the insights page, where the span is expanded:

High-latency trace detail screen on TelemetryHub

On the insights page, you’ll find details for each span in the trace, both for the Generate Fee Call span and the database query, along with the time they took to complete.

SigNoz

SigNoz is an open source observability solution that enables you to have all your observable components, such as APM, logs, metrics, and exceptions management, in one place, with a powerful query builder for exploring it all. You also have the ability to set up alerts triggered by various thresholds or conditions if they do arise.

Unlike Logz.io, SigNoz can be self-hosted either within your local infrastructure or somewhere in the cloud. But if, for various valid reasons, you’d prefer not to manage an installation by yourself or within your infrastructure, you can always sign up for the SigNoz Cloud.

Once you sign up for the SigNoz Cloud, you’ll receive an email with a URL to your SigNoz instance, your password, and an ingestion key. You can use the YAML configuration in this documentation to configure and run your OpenTelemetry Collector instance. Be sure to replace the `<SIGNOZ_API_KEY>` with the ingestion key you received via email. Additionally, set the `{region}` to the region in your instance URL.

Just like the process described in the Logz.io section earlier, run the API server and then start the traffic program to generate enough traces for your SigNoz dashboard. In a few minutes, on the Traces page, you should have an output similar to this:

SigNoz Traces observability dashboard with high API latency

In this image, the traces are filtered to show only the traces in the last fifteen minutes and sorted by the longest first. In this case, you’ll notice that the trace with the longest time is 3.02 seconds. Clicking on that trace gives you a breakdown with all the spans under it:

High-latency trace detail on SigNoz

You’ll notice that clicking or selecting any of the spans in the trace provides more details about the span on the right pane of the page. Since the gorm.Update span is selected, the details are shown on the right. Again, you’ll find that the Generate Fee Call span takes a long time to complete.

Troubleshooting high latency

Up to this point, you’ve learned how to export the trace data collected using OpenTelemetry to a few observability platforms or tools for visualization and analysis. When combined with metrics and logs data, you can use this information to identify and enhance performance and latency within your application, as outlined in the following steps:

* Monitor API performance: Capture and observe the journey of API requests from the API gateway across the different services of your application and analyze the traces. This helps you identify which stages of processing are contributing the most to high latency. This includes the various request/response times, error rates, and the behavior of different components within the API gateway.

* Check server resources: Single out and observe the metrics of resource usage, such as CPU, memory, and disk usage on the server running the API gateway. Ensure that resources are not overutilized. You can compare resource consumption with expected values to determine if the server has enough capacity to handle peak traffic loads.

* Analyze the network: From the visualization of your system metrics data, you can identify traffic patterns from the network, such as network latency, packet loss, and peak traffic times. You may further analyze and check for DNS resolution delays and also ensure that data transmission is efficient between clients, API gateway, and backend interservices communications.

* Check database performance: Database queries are known to be a possible bottleneck contributing to high latency. You need to incorporate OpenTelemetry to trace database interactions and evaluate the performance of any databases or storage systems involved. Check query execution times, connection pools, and response times to ensure they’re not causing delays in data retrieval.

* Profile and optimize code: Trace segments related to the various processing steps of your application per transaction or request and identify where code execution is taking longer than expected. Use profiling tools for your respective programming language to focus optimization efforts. In this case, you would use Go’s built-in profiler.

* Test third-party services: Certain aspects of your application may be dependent on various external or third-party services and APIs, such as payment gateways and notification services (*ie* emails, SMS, push notifications). With OpenTelemetry’s distributed tracing, you can easily examine traces related to interactions and response times between the API gateway and third-party or external services and APIs. If any of these services exhibit high latency, it can impact your overall API performance.

* Perform load testing: Before identifying and after fixing potential issues, perform load testing using tools like Grafana Cloud k6, Apache JMeter, or Locust to simulate a realistic number of concurre

How to reduce API latency and optimize your API