This article aims to elucidate how to leverage DeepFlow's zero-code feature based on eBPF to construct an observability solution for .
With the growing emphasis on the observability of application components, Apache has introduced a plugin mechanism to enrich observability signals. However, these data are scattered across multiple stacks, creating data silos. This article aims to elucidate how to leverage DeepFlow's zero-code feature based on eBPF to construct an observability solution for . On this basis, it integrates the rich data sources of existing plugins to eliminate data silos and build an all-in-one platform for comprehensive observability of the gateway.
Through DeepFlow, can achieve comprehensive observability from traffic monitoring and tracing analysis to performance optimization, eliminating data dispersion and providing a centralized view. This accelerates fault diagnosis and performance tuning, making the work of DevOps and SRE teams more efficient. This article will focus on how 's tracing data, metric data, access logs, and performance profiling data can be integrated with DeepFlow.
1. Install and DeepFlow
For convenience, this article describes deploying both DeepFlow and as Kubernetes services, with the entire deployment process taking approximately 5 minutes. For detailed deployment steps, refer to the DeepFlow and official deployment documentation.
Note: To leverage DeepFlow's observability capabilities that utilize eBPF technology, your host's Linux kernel must be version 4.14 or higher.
2. Distributed Tracing
There are two approaches to implementing distributed tracing for and backend services using DeepFlow: Firstly, DeepFlow leverages eBPF to enable out-of-the-box, RPC-level distributed tracing for and backend services, requiring no code changes. Secondly, if backend services have APM (Application Performance Monitoring) tools like OpenTelemetry or SkyWalking enabled, you can integrate all tracing data into DeepFlow using the Tracers plugin. This enables comprehensive, end-to-end tracing at the application function level.
2.1 DeepFlow eBPF AutoTracing
DeepFlow offers out-of-the-box distributed tracing (AutoTracing) that requires no plugins or code changes to be enabled. It only necessitates deploying the deepflow-agent on the server where is located. In Grafana, find the Distributed Tracing Dashboard provided by DeepFlow, where you can initiate a trace on a specific request and see the end-to-end trace of that request in both and its backend services, as illustrated below:
- (1): Accesses the gateway service on the K8s Node NIC via nodeport.
- (2): Enters the NIC of the POD corresponding to the gateway service.
- (3): Goes into the OpenResty process within the gateway service.
- (4): The request is forwarded to the backend service by the OpenResty process.
- (5): Forwarded by the NIC of the POD corresponding to the gateway service.
- (6)/(7): Forwarded to the backend service.
2.2 DeepFlow eBPF + OpenTelemetry
This approach involves generating trace data using the OpenTelemetry plugin, while the backend service also has APM capabilities and can convert generated trace data into the OpenTelemetry format. When and backend services both send trace data to DeepFlow, it can create a comprehensive trace-tree without any blind spots, incorporating APM application SPAN, eBPF system SPAN, and cBPF network SPAN.
This method is ideal for achieving function-level distributed tracing inside the application process or when the backend service uses a thread pool for call handling, which may disrupt DeepFlow AutoTracing.
2.2.1 Deploy Backend Services with APM Enabled
To demonstrate the full tracing effect, we first deploy a demo application behind the gateway that supports OpenTelemetry. The deployment of the Demo application can refer to: "DeepFlow Demo - One-click deployment of a WebShop application composed of five microservices written in Spring Boot". Create a route on to access the backend service, with the access domain being .deepflow.demo
.
apiVersion: .apache.org/v2kind: Routemetadata: name: deepflow--demo namespace: deepflow-otel-spring-demospec: http: - name: deepflow--demo match: hosts: - .deepflow.demo paths: - "/*" backends: - serviceName: web-shop servicePort: 18090
2.2.2 Enable the OpenTelemetry Plugin in
Add OpenTelemetry plugins to the configuration:
## vim .//.values.yamlplugins: - opentelemetry#...pluginAttrs: opentelemetry: resource: service.name: collector: ## Send data to deepflow-agent ## Of course, you can also send it to otel-collector for processing, and then have otel-collector forward it to deepflow-agent address: deepflow-agent.deepflow.svc.cluster.local/api/v1/otel/trace request_timeout: 3## After adding, update helm upgrade --install -n ./
Enable OpenTelemetry functionality for a specific route:
## View router id## Find the router id for the domaincurl -s http://10.109.77.186:9180//admin/routes -H 'X-API-KEY: This is -admin token' | jq
## Enable the otel feature for a specific routecurl http://10.109.77.186:9180//admin/routes -H 'X-API-KEY: This is -admin token' -X PUT -d '{ "name": "deepflow--demo", ## Assign a name to this route "methods": ["GET"], "uris": ["/*"], "plugins": { "opentelemetry": { "sampler": { "name": "always_on" }, "additional_attributes": [ ## Customize tags for span through `additional_attributes` "deepflow=demo" ] } }, "upstream": { "type": "roundrobin", ## Round Robin Load Balancing "nodes": { ## Upstream Address "10.1.23.200:18090": 1 ## Service access address: Upstream ID } }}'
2.2.3 Using DeepFlow to Integrate OpenTelemetry Traces
The integration of OpenTelemetry Span data through DeepFlow Agent is enabled by default and requires no additional configuration.
## View the default configuration of deepflow-agent## deepflow-ctl agent-group-config example## This parameter controls whether to enable receiving data from external sources, including Prometheus, Telegraf, OpenTelemetry, and SkyWalking.## Data Integration Socket## Default: 1. Options: 0 (disabled), 1 (enabled).## Note: Whether to enable receiving external data sources such as Prometheus,## Telegraf, OpenTelemetry, and SkyWalking.#external_agent_http_proxy_enabled: 1
2.2.4 OpenTelemetry Integration Showcase
We initiate a command from the client to access the WebShop service:
curl -H "Host: .deepflow.demo" 10.1.23.200:44640/shop/full-test## Here, the IP is the K8s cluster node IP, and port 44640 is the NodePort exposed by 9180.
Open the Distributed Tracing Dashboard provided by DeepFlow in Grafana, find the corresponding request, and initiate tracing. You'll be able to see traces from both and the backend services. Moreover, the application SPANs generated by APM and the network SPANs and system SPANs generated by DeepFlow are all comprehensively associated on one flame graph:
Note: In the flame graph, "A" represents the application SPAN generated by APM, while "N" and "S" represent the network SPAN and system SPAN generated by DeepFlow, respectively.
3. Performance Metrics
DeepFlow offers immediate insights into metrics, featuring detailed RED (Rate, Error, Duration) performance metrics at the endpoint level, along with comprehensive TCP network performance metrics, including throughput, retransmissions, zero window, and connection anomalies. Metrics from , including HTTP status codes, bandwidth, connections, and latency, captured by Metrics-type plugins like Prometheus and node-status, can also be integrated into DeepFlow. This data, detailing both instance and route granularity, is viewable on the -provided Grafana Dashboard.
3.1 Out-of-the-Box eBPF Metrics
Once the deepflow-agent is deployed on the server hosting , it automatically gathers highly detailed application and network level metrics. This includes metrics such as request rates, response latencies, and error statuses for specific clients or endpoints, as well as TCP connection setup times, connection anomalies, and more. Detailed metrics can be found on the DeepFlow official website in the metrics section. By opening the Application - K8s Ingress Dashboard provided by DeepFlow in Grafana, you can view application layer performance metrics related to . Similarly, network-related metrics can be viewed in the Network - K8s Pod Dashboard.
3.2 Enable the Prometheus Plugin in
Add the Prometheus plugin to the configuration:
## vim .//.values.yamlplugins: - prometheus# ...pluginAttrs: prometheus: export_uri: /metrics ## The default URI is `//prometheus/metrics` export_addr: ip: 0.0.0.0 ## Scrape Address port: 9091 ## Default port 9091 metrics: http_status: extra_labels: - upstream_addr: $upstream_addr ## For example, add an upstream server address (the variable here is an NGINX variable) - upstream_status: $upstream_status ## For example, add the status of an upstream server (the variable here is an NGINX variable) ## Built-in Variables: https://.apache.org/docs//3.2/-variable/ ## NGINX Built-in Variables:https://nginx.org/en/docs/varindex.html
Enable Prometheus plugin:
## Note: Since the otel feature has been enabled above, here we need to enable Prometheus on top of the otel functionality.curl http://10.109.77.186:9180//admin/routes/$router_id -H 'X-API-KEY: $-admin token' -X PUT -d '{ "name": "deepflow--demo", ## Assign a name to this route "methods": ["GET"], "uris": ["/*"], "plugins": { "prometheus":{ ## Enable Prometheus "prefer_name": true ## When set to "true," the Prometheus metrics will display the route/service name instead of the ID. } }, "upstream": {<span cla