By apipark — 16 Nov 2025

What is Red Hat RPM Compression Ratio? A Detailed Guide

what is redhat rpm compression ratio

In the intricate world of Linux systems, particularly within the Red Hat ecosystem, the efficient management and distribution of software are paramount. At the heart of this process lies the Red Hat Package Manager (RPM), a robust and venerable system that has shaped how software is installed, updated, and removed on countless servers and workstations globally. Yet, beneath the surface of seemingly straightforward package installation lies a complex interplay of technologies, chief among them data compression. The "Red Hat RPM Compression Ratio" is not merely a technical metric; it is a critical determinant of system performance, network efficiency, and overall operational agility, subtly influencing everything from quick patches to large-scale deployments across vast data centers.

This comprehensive guide will embark on an in-depth exploration of RPM compression. We will dissect the fundamental concepts of RPMs, delve into the various compression algorithms employed, unravel the methods for measuring and interpreting compression ratios, and critically assess the far-reaching implications of these choices on system resources, network bandwidth, and the speed of software deployment. Furthermore, we will examine the historical evolution of compression within the Red Hat sphere, offer practical insights for administrators, and even connect these foundational packaging efficiencies to modern infrastructure components like API gateways and Management Control Planes, demonstrating how seemingly low-level optimizations contribute to the seamless operation of high-level services. By the end of this journey, you will possess a profound understanding of why RPM compression ratio is more than just a number—it’s a cornerstone of reliable and efficient Linux system administration.

Understanding RPM Packages: The Foundation of Red Hat Software Management

Before diving into the nuances of compression, it is essential to grasp the fundamental nature of RPM packages. An RPM package is more than just an archive of files; it is a meticulously structured entity designed for robust software management in Linux distributions that use the RPM format, such as Red Hat Enterprise Linux (RHEL), CentOS, Fedora, openSUSE, and many others. These packages encapsulate all necessary components for a piece of software, ensuring consistency, reliability, and ease of maintenance.

The Anatomy of an RPM Package

At its core, an RPM package (.rpm file) is an archive format with a specific structure. It typically contains several key sections:

Lead: This is the initial header that provides basic information about the package, such as the RPM format version and architecture. It's the first thing the rpm utility reads to confirm it's a valid RPM file.
Signature Header: Crucially important for security, this section contains cryptographic signatures (often GPG/PGP) that verify the package's authenticity and integrity. This ensures that the package originates from a trusted source and has not been tampered with since it was signed. For any enterprise environment, validating these signatures is a non-negotiable step to prevent the introduction of malicious software or corrupted files into the system.
Header Section (Metadata): This is perhaps the most information-rich part of an RPM. It houses an extensive array of metadata fields that describe the package in detail. This includes the package name, version, release, architecture (e.g., x86_64, aarch64), a summary description, a full description, license information, URL to the project, build host, build time, and, critically, dependency information. Dependencies are a list of other packages or capabilities that this package requires to function correctly, as well as those it provides. This metadata is what enables RPM to perform intelligent dependency resolution, preventing "dependency hell" that plagued earlier manual software installations.
Payload Section (Compressed Files): This is where the actual software files reside. This section is essentially a compressed archive (historically cpio archives, with the tar format also seen in some cases for source RPMs) that contains all the executables, libraries, configuration files, documentation, and other assets that make up the software. The efficiency and chosen algorithm for compressing this payload are the central focus of our discussion on compression ratios. The files within this payload are stored in their target directory structure relative to the root (/) of the filesystem, making installation a straightforward extraction process.
Scripts (Pre-install, Post-install, Pre-uninstall, Post-uninstall): RPMs can include scriptlets that execute at various stages of the installation or uninstallation process. These scripts can perform tasks like creating user accounts, modifying system configuration files, starting or stopping services, or cleaning up temporary data. They are powerful tools for ensuring that software integrates seamlessly into the operating environment, handling specific setup or teardown requirements that a simple file copy cannot address.

Why RPMs are Essential

The structured nature of RPMs offers profound advantages for system administrators, developers, and users alike:

Consistency and Standardization: RPMs provide a standardized format for packaging software, ensuring that applications are installed and configured consistently across all Red Hat-based systems. This reduces variability and simplifies management tasks.
Dependency Management: The robust dependency tracking mechanism prevents broken installations by ensuring that all prerequisites are met before a package is installed. This automated dependency resolution is a major time-saver and greatly enhances system stability.
Ease of Installation and Uninstallation: With simple commands like yum install or dnf install (or rpm -i), software can be installed with minimal user intervention. Similarly, yum remove or dnf remove gracefully uninstalls packages, often cleaning up associated files and configurations, thus maintaining a tidy system.
Upgrades and Downgrades: RPMs facilitate seamless upgrades, allowing new versions of software to replace older ones while attempting to preserve configuration files. Downgrades are also possible, providing flexibility in managing software versions.
Integrity and Security: Cryptographic signatures embedded in RPMs provide a critical layer of trust. By verifying these signatures, administrators can be confident that the software they are installing is legitimate and has not been compromised, a fundamental requirement for secure computing environments.
Version Control: Each RPM package carries precise version and release information, making it easy to track software versions and revert to previous states if necessary. This granular control is vital for auditing and maintaining compliance in regulated industries.

In essence, RPMs transform the complex process of software deployment into a manageable, reliable, and secure operation. They are the backbone of the Red Hat software ecosystem, enabling the efficient delivery and maintenance of applications that power everything from desktop environments to critical enterprise infrastructure. The efficiency with which these packages are created and distributed, heavily influenced by their compression strategy, directly impacts the operational overhead and performance characteristics of these systems.

The Concept of Compression in Software Distribution

Data compression is a fundamental technique in computer science, aimed at reducing the size of data while retaining its original information content. In the context of software distribution, particularly with RPM packages, compression is not merely a desirable feature but an absolute necessity. It profoundly impacts several critical aspects of system management and resource utilization.

General Principles of Data Compression

At a high level, compression algorithms work by identifying and encoding redundancies within data. There are two primary categories of compression:

Lossless Compression: This type of compression allows the original data to be perfectly reconstructed from the compressed data. No information is lost in the process. Examples include ZIP, GZIP, BZIP2, and XZ, which are all relevant to RPMs. Lossless compression is mandatory for software packages because even a single bit of information loss could render an executable or library file unusable, leading to system instability or security vulnerabilities.
Lossy Compression: This type of compression permanently removes some information from the original data, resulting in a smaller file size but with a degree of quality degradation. It's commonly used for multimedia files like images (JPEG), audio (MP3), and video (MPEG), where some loss of detail is acceptable or imperceptible to humans. This is never used for software packages.

The effectiveness of a lossless compression algorithm is measured by its compression ratio, which is typically calculated as (Original Size - Compressed Size) / Original Size or expressed as Original Size / Compressed Size. A higher ratio indicates more effective compression and a smaller resulting file.

Why Compression is Vital for Package Management

For RPM packages, the application of robust lossless compression techniques yields significant and tangible benefits across the entire software lifecycle:

Reduced Storage Footprint:
- Server Storage: In environments with hundreds or thousands of packages, such as yum or dnf repositories, even marginal improvements in compression can translate into substantial savings in disk space. A typical RHEL repository might contain tens of thousands of RPMs, and multiplying the savings per package across this vast collection results in terabytes of reduced storage requirements. This is critical for repository mirrors and content delivery networks (CDNs) that host these packages globally.
- Client Storage: While individual user machines might not feel the impact as severely as servers, reducing the size of downloaded packages contributes to less temporary disk usage during installation and can be beneficial for systems with limited storage capacity, such as embedded devices or minimal installations.
Faster Downloads (Network Bandwidth Conservation):
- Internet Traffic: For users and servers downloading packages over the internet, a smaller file size directly translates to faster download times. This is particularly crucial in regions with slower internet connections or during peak network usage. Faster downloads improve the user experience and reduce the time required for system updates, which can be a significant operational concern for fleets of machines.
- Internal Network Traffic: Within corporate networks or data centers, where large-scale rollouts of software updates are common, efficient compression minimizes internal network congestion. When hundreds or thousands of servers simultaneously fetch the same set of updates, the cumulative bandwidth savings become immense, preventing bottlenecks and ensuring that critical updates can be deployed rapidly without impacting other network-dependent services. This efficiency is especially relevant for modern microservices architectures and distributed systems, where frequent deployments and updates are the norm.
Efficiency in Mirror Synchronization:
- Organizations often maintain internal mirrors of official Red Hat repositories for security, compliance, and performance reasons. Synchronizing these mirrors with upstream sources involves transferring vast quantities of data. Highly compressed RPMs drastically reduce the amount of data that needs to be transferred during synchronization processes, making mirror maintenance faster, less resource-intensive, and more resilient to network interruptions. This translates to quicker propagation of security patches and bug fixes across an enterprise's entire infrastructure.
Impact on Deployment Times:
- Installation Speed: While decompression adds a small amount of CPU overhead during installation, the time saved during the download phase often outweighs this. For very large packages or slow networks, the net effect is a significantly faster overall deployment process. This is particularly important for automated deployments, where shaving minutes off installation times across a large fleet can add up to hours or even days of saved operational time.
- Container Image Sizes: In the era of containerization (Docker, Podman, Kubernetes), base images for containers are often built upon minimalistic Linux distributions that still leverage RPMs for their core components. Smaller RPMs contribute to smaller container image sizes, which in turn leads to faster image pulls, reduced storage for container registries, and quicker container startup times. This optimization is fundamental for cloud-native applications and serverless functions where rapid scaling and deployment are key performance indicators.

In summary, data compression within RPM packages is a silent workhorse, tirelessly optimizing the software distribution pipeline. It's a critical balancing act between maximizing storage and bandwidth efficiency and minimizing the computational cost of compression and decompression. The choices made regarding compression algorithms and levels have profound downstream effects, impacting the agility, cost-effectiveness, and reliability of Red Hat-based systems across all scales of deployment.

RPM Compression Mechanisms and Algorithms: A Deep Dive

The choice of compression algorithm within an RPM package is a significant technical decision, impacting the trade-offs between file size, compression speed, decompression speed, and memory usage. Historically, RPM has evolved its default compression methods to adapt to changing hardware capabilities and network bandwidths. Understanding these algorithms is key to appreciating the subtle complexities of RPM package management.

Historical Evolution of Compression in RPM

The journey of RPM compression has largely followed the advancements in general-purpose lossless data compression:

gzip (GNU Zip): For a long time, gzip was the de facto standard for RPM payload compression. Based on the DEFLATE algorithm (a combination of LZ77 and Huffman coding), gzip offered a good balance of compression speed and ratio, making it suitable for earlier computing environments where CPU cycles were more precious than disk space or network bandwidth.
bzip2: As network bandwidth improved and storage costs decreased, the demand for better compression ratios increased. bzip2, introduced later, provided significantly better compression than gzip at the cost of higher CPU usage for both compression and decompression. It gained popularity for applications where maximizing storage efficiency was a priority, and the increased computational overhead was deemed acceptable.
xz (LZMA): xz is the newest and most advanced compression format commonly used for RPMs. It leverages the LZMA (Lempel-Ziv-Markov chain Algorithm) algorithm, which delivers superior compression ratios, often outperforming bzip2 and gzip by a considerable margin. This excellent compression comes at the cost of even higher CPU and memory usage during compression, and sometimes during decompression, though modern CPUs handle it efficiently. xz has become the default for many newer Linux distributions, including recent versions of Red Hat Enterprise Linux and Fedora, for their binary RPM payloads.

Deep Dive into Compression Algorithms

Let's dissect each of these algorithms to understand their inner workings and characteristics.

1. `gzip` (DEFLATE Algorithm)

How it Works: gzip utilizes the DEFLATE algorithm, which is a hybrid lossless data compression scheme that combines two fundamental compression techniques:
- LZ77 (Lempel-Ziv 1977): This component identifies repeated sequences of bytes (strings) in the data. Instead of storing the repeated string, it stores a back-reference to a previously encountered identical string. This reference consists of a "distance" (how far back the string was found) and a "length" (how long the repeated string is). For example, if "the quick brown fox" appears twice, the second instance might be replaced by a pointer to the first.
- Huffman Coding: After the LZ77 stage replaces repeated strings with references, Huffman coding is applied to both the literal bytes (those not replaced by references) and the LZ77 back-references. Huffman coding is a variable-length coding scheme that assigns shorter bit sequences to frequently occurring symbols (bytes or references) and longer sequences to less frequent ones, further reducing the overall data size.
Advantages:
- Speed: gzip is generally the fastest of the three algorithms for both compression and decompression. This makes it suitable for scenarios where speed is critical, such as real-time data streaming or situations with limited CPU resources.
- Low Memory Usage: It requires relatively little memory during both compression and decompression, making it suitable for embedded systems or environments with tight memory constraints.
- Widespread Support: Being an older standard, gzip is universally supported across virtually all computing platforms and operating systems.
Disadvantages:
- Lower Compression Ratio: Compared to bzip2 and xz, gzip achieves the lowest compression ratios, meaning the compressed files are larger.
- Less Effective for Highly Redundant Data: While effective, its LZ77 dictionary size is limited, which means it might not find the longest or most distant matches in extremely large and highly redundant files as effectively as algorithms with larger dictionaries.

2. `bzip2` (Burrows-Wheeler Transform + MTF + RLE + Huffman)

How it Works: bzip2 employs a more sophisticated multi-stage compression process:
- Burrows-Wheeler Transform (BWT): This is the most distinctive feature of bzip2. BWT reorganizes the input data into blocks (typically 100-900 KB) in such a way that characters with similar contexts appear close to each other. This transformation doesn't compress the data itself but makes it much more amenable to subsequent compression stages by clustering identical characters. It sorts all cyclic shifts of a block of data and then stores the last character of each sorted row.
- Move-To-Front (MTF) Transform: The output of BWT still contains many repeated characters but might not have long runs of identical characters. MTF coding replaces each character with the number of unique characters that have occurred since its last occurrence. This helps convert character sequences into numerical sequences with smaller values for frequently occurring characters, making them easier to compress.
- Run-Length Encoding (RLE): After MTF, there are often long runs of identical (or zero) values. RLE detects these runs and replaces them with a count and the value, further compacting the data.
- Huffman Coding: Finally, similar to gzip, Huffman coding is applied to the output of the RLE stage to encode the most frequent symbols with the shortest bit sequences.
Advantages:
- Better Compression Ratio: bzip2 consistently achieves significantly better compression ratios than gzip, often reducing file sizes by an additional 10-30%. This makes it a preferred choice when disk space or network bandwidth is a primary concern.
Disadvantages:
- Slower Speed: Both compression and decompression are substantially slower than gzip, demanding more CPU cycles. This can impact rpmbuild times and installation durations, especially on older or resource-constrained systems.
- Higher Memory Usage: bzip2 requires more memory than gzip, particularly during compression due to the block processing for BWT.

3. `xz` (LZMA Algorithm)

How it Works: xz utilizes the LZMA (Lempel-Ziv-Markov chain Algorithm) algorithm, which is highly optimized for compression ratios:
- LZ77-based Dictionary Compression: LZMA uses a large sliding dictionary (up to 4 GB, though typically much smaller in practice) to find and replace repeated data sequences, similar to the LZ77 component of DEFLATE but with significantly enhanced capabilities. The large dictionary size allows it to find longer and more distant matches, leading to better compression.
- Markov Chain Modeling / Range Coder: Instead of Huffman coding, LZMA employs a sophisticated entropy coder called a "range coder" (or arithmetic coder) in conjunction with Markov chain models. This allows it to model the probability of each symbol appearing based on its context, achieving near-optimal compression for the given symbol probabilities. This is significantly more effective than Huffman coding but also more computationally intensive.
Advantages:
- Best Compression Ratio: xz generally achieves the highest compression ratios among the three, often outperforming bzip2 by another 10-20%. This makes it ideal for archival purposes, long-term storage, and distribution of very large software packages where minimizing file size is the absolute priority.
Disadvantages:
- Slowest Compression: xz is by far the slowest for compression, often taking significantly longer than bzip2. This can substantially increase the build times for RPM packages (rpmbuild).
- Potentially Slower Decompression: While often faster than bzip2 for decompression on modern CPUs due to its simpler structure compared to bzip2's multi-stage process, it can still be slower than gzip. Decompression speed is also influenced by the dictionary size used during compression; larger dictionaries lead to slower decompression.
- Highest Memory Usage: xz can demand substantial memory, especially during compression, and also during decompression if a large dictionary size was used by the compressor. This can be a concern for systems with limited RAM.

RPM Tooling and Configuration

The rpmbuild utility, used to create RPM packages from source code and spec files, allows package maintainers to specify which compression algorithm to use for the payload. This is typically configured in the RPM macro configuration files (e.g., /etc/rpm/macros or user-specific ~/.rpmmacros).

The relevant macros are:

%_source_compress_command: Defines the command used to compress source archives (.tar.gz, .tar.bz2, .tar.xz) within a source RPM (SRPM).
%_source_compress_extension: Defines the file extension for source archives.
%_binary_payload_compressor: Specifies the compressor for the binary RPM's payload (the cpio archive of files). This is the primary one for our discussion.
%_binary_payload_compresslevel: Sets the compression level (e.g., -9 for best compression, -1 for fastest).

For example, to explicitly set xz with maximum compression for binary RPM payloads:

%_binary_payload_compressor xz
%_binary_payload_compresslevel 9

Red Hat distributions have shifted their defaults over time. Modern RHEL and Fedora versions typically default to xz for binary RPM payloads to leverage its superior compression, reflecting a calculated trade-off favoring reduced distribution size over faster build times or slightly faster installation.

Understanding these algorithms and their characteristics is vital for anyone involved in building, distributing, or deploying software via RPMs. The choice of compression strategy is a subtle yet powerful lever that can optimize resource usage and influence the overall efficiency of an entire IT infrastructure, touching upon aspects as diverse as repository management and the rapid deployment of api gateway components.

Measuring and Understanding Compression Ratio

The compression ratio is a crucial metric that quantifies the effectiveness of a compression algorithm. For RPM packages, understanding this ratio allows administrators and developers to make informed decisions regarding package size, network transfer times, and storage requirements. It's not just about getting a smaller file; it's about optimizing the trade-offs.

Defining Compression Ratio

The compression ratio can be expressed in several ways, but the most common definitions are:

Ratio of Original Size to Compressed Size: $$ \text{Compression Ratio} = \frac{\text{Original Size}}{\text{Compressed Size}} $$
- Example: If a 100 MB file compresses to 20 MB, the ratio is $100 / 20 = 5:1$. A higher number indicates better compression.
Compression Percentage (Reduction Percentage): $$ \text{Compression Percentage} = \frac{\text{Original Size} - \text{Compressed Size}}{\text{Original Size}} \times 100\% $$
- Example: For the same 100 MB file compressed to 20 MB, the reduction is $(100 - 20) / 100 \times 100\% = 80\%$. A higher percentage indicates more data removed.

Both metrics convey similar information, but the "X:1" ratio is often more intuitive for comparing how many times smaller the file became. For the purpose of this guide, we will primarily refer to the "X:1" ratio or simply discuss the percentage reduction.

Factors Influencing Compression Ratio

The actual compression ratio achieved for an RPM package is not solely dependent on the chosen algorithm. Several other factors play a significant role:

Nature of the Data (Redundancy):
- Highly Redundant Data: Files containing a lot of repetitive patterns, such as plain text files (logs, documentation), source code, or specific types of binary data (e.g., unoptimized bitmaps, executables with large sections of zeros), tend to compress very well. For instance, a text file with common words and phrases will have many recurring sequences that compression algorithms can efficiently encode.
- Less Redundant Data: Already compressed data (e.g., JPEG images, MP3 audio, video files, other zip or gzip archives) will not compress much further because they have already had most of their redundancy removed. Attempting to re-compress these files often results in marginal gains or even slight increases in size due to the overhead of the compression headers. Encrypted data also appears random and is generally incompressible.
- Binary Files: Executables and libraries often contain a mix of unique instructions and repetitive code sections or data structures. Their compressibility varies widely depending on how they were compiled and linked. Libraries that contain many identical symbols or data blocks will compress better.
- Mixed Content: An RPM package typically contains a heterogeneous mix of files: binaries, shared libraries, configuration files, documentation (often plain text), man pages, and possibly some images or other assets. The overall compression ratio of the package will be an average of how well each of these individual components compresses. Packages primarily consisting of text or highly repetitive data will show better overall ratios than those dominated by already compressed assets.
Algorithm Chosen: As discussed in the previous section:
- xz typically yields the best ratios.
- bzip2 offers better ratios than gzip.
- gzip provides the lowest ratios but fastest operation.
Compression Level: Most compression algorithms allow for different "compression levels," typically on a scale from 1 (fastest, least compression) to 9 (slowest, most compression).
- A higher compression level instructs the algorithm to spend more CPU time searching for optimal redundancies and encoding patterns. This generally results in a smaller file size but significantly increases the time required for compression.
- Conversely, a lower compression level sacrifices some compression efficiency for faster compression times.
- For RPMs, maintainers usually choose a high compression level for the binary payload (e.g., xz -9) because the package is compressed once during the build process but downloaded and decompressed many times. The one-time cost of compression is amortized over many deployments.

Practical Examples and Calculation

Let's consider a hypothetical RPM package containing the following:

A large text file (logs): 50 MB
An executable binary: 30 MB
A shared library: 15 MB
Already compressed image (JPEG): 5 MB

Total uncompressed size: 100 MB.

Now, let's look at how different algorithms and data types might compress:

File Type	Uncompressed Size (MB)	Gzip (Ratio)	Gzip Compressed (MB)	Bzip2 (Ratio)	Bzip2 Compressed (MB)	XZ (Ratio)	XZ Compressed (MB)
Large Text File	50	4:1	12.5	6:1	8.3	8:1	6.25
Executable Binary	30	2.5:1	12	3.5:1	8.5	4.5:1	6.67
Shared Library	15	3:1	5	4:1	3.75	5:1	3
Compressed Image	5	1.05:1	4.75	1.03:1	4.85	1.02:1	4.9
Total Payload	100	~2.8:1	34.25	~3.5:1	25.4	~4.4:1	20.82
Overall Reduction		65.75%		74.6%		79.18%

(Note: These ratios are illustrative; actual ratios vary widely based on specific content.)

From this example, it's clear that: * xz offers the best overall compression, reducing the package to roughly 20% of its original size. * bzip2 is a strong contender, achieving about 25% of the original size. * gzip is less effective, resulting in a package closer to 34% of the original. * Files that are already compressed (like the JPEG) see very little additional reduction, regardless of the algorithm. This highlights the diminishing returns of re-compressing already efficient data.

Tools to Inspect RPM Contents and Compression Details

Several utilities allow you to inspect the characteristics of an RPM package, including its compressed size and, indirectly, its compression efficiency:

rpm -qp --queryformat '%{SIZE}' <package.rpm>: This command queries the uncompressed size of all files contained within the RPM package payload. This gives you the Original Size component of the ratio calculation.bash $ rpm -qp --queryformat '%{SIZE}' example-package-1.0-1.x86_64.rpm 123456789 # This is the total uncompressed size in bytes
ls -lh <package.rpm>: This standard Linux command will show you the compressed file size of the RPM itself. This gives you the Compressed Size component.bash $ ls -lh example-package-1.0-1.x86_64.rpm -rw-r--r--. 1 user group 25M Dec 1 10:00 example-package-1.0-1.x86_64.rpm In this case, the compressed size is 25 MB. If the rpm -qp command revealed an uncompressed size of, say, 100 MB, then the compression ratio would be 4:1 ($100 / 25$).
rpm -qlp <package.rpm>: This lists all files contained within the package payload, which can give you an idea of the types of content inside.
file <package.rpm>: This command can sometimes reveal the compression type of the internal payload, though it's not always explicitly stated for the CPIO archive itself. It usually just identifies it as an "RPM" package.
lsrpm -v <package.rpm> (from rpmdevtools package): The lsrpm utility, part of the rpmdevtools package, can provide more detailed information, including the payload compressor. For example:bash $ lsrpm -v example-package-1.0-1.x86_64.rpm ... Payload CPIO: xz ... This clearly indicates that the payload uses xz compression.

By combining these tools, administrators can effectively measure and analyze the compression ratios of their RPM packages, providing critical data for capacity planning, network optimization, and performance tuning. This analytical approach ensures that the packaging strategy aligns with the operational requirements of the underlying infrastructure, whether for a standalone server or a sprawling distributed system managed by a sophisticated Management Control Plane (MCP).

APIPark is a high-performance AI gateway that allows you to securely access the most comprehensive LLM APIs globally on the APIPark platform, including OpenAI, Anthropic, Mistral, Llama2, Google Gemini, and more.Try APIPark now! 👇👇👇

Install APIPark – it’s free

Impact of Compression Choices on System Performance and Resource Management

The seemingly minor technical detail of RPM compression choice reverberates throughout the entire system lifecycle, significantly influencing core operational aspects like disk space, network bandwidth, CPU utilization, and overall deployment speed. Making an informed decision requires a deep understanding of these trade-offs, especially in diverse environments ranging from resource-constrained embedded systems to high-performance cloud infrastructures.

1. Disk Space

Obvious Benefit: The most direct and universally appreciated benefit of effective compression is the reduction in disk space. For individual workstations, this might mean a few gigabytes saved over time. However, the impact scales dramatically in enterprise environments.
Repository Servers: For servers hosting large yum or dnf repositories, or internal mirrors, aggregate savings can amount to terabytes. Imagine a repository for a Red Hat Enterprise Linux release, containing thousands of packages and their historical versions. If xz compression reduces each package by an additional 10-20% compared to bzip2, the cumulative storage savings for a repository containing 10,000 packages could be substantial, directly translating into lower hardware costs, reduced backup sizes, and faster disk operations.
Storage Tiers: In multi-tier storage setups, efficient compression can allow more frequently accessed data (like recent package versions) to reside on faster, more expensive storage (e.g., SSDs), while older versions or less critical packages can be moved to slower, cheaper archival storage, all while maximizing the capacity of each tier.

2. Network Bandwidth

Crucial for Large Deployments and Updates: Network bandwidth is often a bottleneck, especially when deploying software to numerous machines simultaneously or when relying on internet-based repositories. Smaller RPM files directly translate to less data traversing the network.
Faster Downloads: Reduced file sizes mean packages download faster, minimizing the wait time for users and automated systems. In a continuous integration/continuous deployment (CI/CD) pipeline, where new versions of applications might be deployed multiple times a day across hundreds of servers, faster downloads contribute significantly to reducing overall deployment times.
Reduced Congestion: When hundreds or thousands of servers initiate updates concurrently (e.g., during a security patch rollout), minimized file sizes prevent network links from becoming saturated. This is critical for maintaining the performance of other network-dependent services running in the same infrastructure.
Cost Savings in Cloud Environments: In cloud platforms, egress bandwidth (data transferred out of a data center) is often a metered and costly resource. By reducing the size of RPM packages, organizations can significantly lower their monthly cloud networking bills, especially for widespread deployments or frequent updates across multiple availability zones or regions.
Impact on Remote Sites: For organizations with many remote offices connected by limited bandwidth links, efficient RPM compression makes it feasible to distribute updates without overwhelming the network, ensuring that remote systems remain secure and up-to-date.

3. CPU Usage (Compression and Decompression)

This is where the major trade-offs emerge.

During rpmbuild (Compression):
- Higher Compression = Higher CPU Cost: Algorithms like xz use more sophisticated mathematical models and larger dictionaries, demanding significantly more CPU cycles and time during the package creation phase (rpmbuild). Building a large application from source that generates many RPMs can take hours longer with xz -9 compared to gzip -1.
- Developer Impact: For package maintainers and developers, longer build times can slow down iteration cycles, especially when repeatedly building and testing new packages.
- Build Infrastructure Costs: If an organization builds a vast number of custom RPMs, the increased CPU demand for compression can necessitate more powerful build servers or a larger build farm, impacting infrastructure costs.
During Installation/Upgrade (Decompression):
- Decompression Overhead: When an RPM is installed, its compressed payload must be decompressed. This process consumes CPU resources on the target machine.
- gzip (Fast Decompression): gzip typically offers the fastest decompression speed, making it attractive for environments where rapid installation is paramount and CPU resources are limited.
- bzip2 (Slower Decompression): bzip2 decompression is generally slower than gzip, requiring more CPU time.
- xz (Variable Decompression): xz decompression speed can be surprisingly fast on modern CPUs, often competitive with or even faster than bzip2, especially for large files. However, it can also be slower than gzip in some scenarios and uses more memory. The performance depends heavily on the specific processor architecture and the decompression library implementation.
- Concurrent Installations: On systems performing multiple package installations or updates concurrently, the cumulative CPU load from decompression can become noticeable, potentially impacting other running services.

4. Memory Usage (Decompression)

Dictionary Size: The decompression process, particularly for bzip2 and xz, requires a certain amount of memory to hold the dictionary or block data.
xz and Memory: xz can be particularly memory-intensive during decompression if the package was compressed with a very large dictionary size (though common xz RPMs usually use reasonable dictionary sizes to avoid excessive memory demands). This could be a concern for very low-memory systems, like older embedded devices or small virtual machines with only a few hundred megabytes of RAM.
Impact on System Stability: If a system's memory resources are extremely limited, a large-dictionary xz decompression could potentially lead to excessive swapping or even out-of-memory errors, compromising system stability. However, for most modern server environments, this is rarely a significant issue.

5. Deployment Speed (Overall Perceived Speed)

The "overall deployment speed" is a holistic metric that considers both download time and installation/decompression time.

Network-Bound Environments: In environments with slow network connections (e.g., remote offices, satellite links, older internet connections), the reduction in download time achieved by higher compression ratios (like xz) almost always outweighs the increased decompression time. The package arrives faster, and the total perceived installation time is shorter.
CPU-Bound Environments: In environments with very fast networks but limited CPU resources (e.g., specific embedded systems, very old servers), the faster decompression of gzip might lead to a quicker overall deployment, even if the download takes slightly longer.
Modern Server Environments: For most modern data centers with high-speed internal networks and powerful multi-core CPUs, xz often strikes an excellent balance. The download savings are substantial, and modern CPUs can handle the decompression overhead efficiently enough that the total deployment time is still minimized compared to gzip or bzip2.

Considerations for Specialized Environments

Embedded Systems: These often have limited CPU power, minimal RAM, and slow I/O. For such systems, gzip might be preferred due to its low resource footprint during decompression, even if it means larger package sizes.
High-Performance Computing (HPC): In HPC clusters, rapid provisioning and updates are critical. While networks are usually fast, the sheer number of nodes means even small optimizations scale up. xz is typically favored to keep repository sizes down and ensure efficient distribution.
Cloud-Native & Containerized Deployments: For container base images, smaller RPMs (and thus smaller layers) are highly desirable for faster image pulls, reduced registry storage, and quicker container startup times. Here, xz is often the preferred choice, as the underlying VM or host typically has ample CPU and memory to handle decompression.

In conclusion, the selection of an RPM compression algorithm is a nuanced decision that demands a careful evaluation of an organization's specific operational context, network characteristics, and hardware capabilities. There is no single "best" algorithm; rather, there is an optimal choice that balances file size, network efficiency, CPU consumption, and deployment speed to meet specific performance and resource management goals. This optimization is particularly relevant when deploying and managing critical infrastructure components like an API gateway, where the underlying efficiency of package distribution directly impacts the reliability and performance of services exposing APIs.

Evolution of RPM Compression in Red Hat Distributions

The history of RPM compression within Red Hat's ecosystem is a testament to the continuous drive for efficiency and adaptation to evolving technological landscapes. From the early reliance on gzip to the current widespread adoption of xz, each shift reflects a careful re-evaluation of the trade-offs between storage, bandwidth, and computational resources. This evolution has profound implications for anyone managing Red Hat-based systems.

From `gzip` to `bzip2`: Prioritizing Storage and Bandwidth

In the early days of Red Hat Linux and later Red Hat Enterprise Linux (RHEL), gzip (using the DEFLATE algorithm) was the standard for compressing RPM payloads. This choice was pragmatic for its time:

Early 2000s Context: Network bandwidth, especially internet bandwidth, was significantly more constrained and expensive than today. Disk storage was also relatively more costly. CPU power, while improving, was still a valuable resource, and fast decompression was often prioritized for user experience.
gzip's Advantages: Its high decompression speed and low memory footprint made it an excellent fit. Build times for RPMs were also faster with gzip.
The Shift: As network speeds slowly began to increase and hard drive capacities grew, the demand for even smaller package sizes became more pronounced. bzip2 emerged as a viable alternative, offering superior compression ratios compared to gzip, albeit with a performance penalty. Red Hat and other distributions began to experiment with and gradually adopt bzip2 for certain packages or as an optional default. This shift primarily targeted maximizing disk space savings and further reducing download times for large packages or entire repository synchronizations. The trade-off of slightly slower installation times was often considered acceptable for the benefit of smaller files. This transition typically happened in the mid-to-late 2000s for many distributions.

The Rise of `xz` (LZMA): Maximizing Compression for the Modern Era

The advent and maturation of the xz utility (and its underlying LZMA algorithm) marked another significant turning point. xz promised and delivered substantially better compression ratios than both gzip and bzip2.

Mid-to-Late 2000s and 2010s Context: By this period, network bandwidth (both local and internet) had dramatically improved, and multi-core CPUs became standard even in modest server configurations. Disk storage became extremely cheap and capacious. The primary bottlenecks shifted: while download speeds were good, the sheer volume of data in repositories continued to grow, and the efficiency of data transfer remained critical for large-scale operations.
xz's Advantages: The unparalleled compression ratios of xz offered the greatest reduction in file sizes. This directly translated to:
- Minimal Repository Footprint: Crucial for mirrors and content distribution networks.
- Fastest Download Times (Overall): Despite its slower compression and sometimes slower decompression (depending on the specific CPU and xz parameters), the significantly smaller file size often ensured the quickest overall transfer-plus-install time in modern, high-bandwidth, high-CPU environments.
Red Hat's Adoption: Fedora, as Red Hat's upstream testing ground, was an early adopter of xz for its RPM payloads. Red Hat Enterprise Linux, known for its stability and conservative approach, gradually transitioned to xz in later major releases (e.g., RHEL 6, 7, and 8 series, with xz becoming the predominant default). This transition was carefully managed, balancing the performance benefits with compatibility and reliability considerations. The improved compression of xz was a compelling argument, especially for the distribution of base OS packages, kernels, and large application suites.
Backward Compatibility: While new packages were built with xz, the rpm utility itself retained full backward compatibility, capable of installing packages compressed with gzip, bzip2, or xz. This ensured that older software or third-party packages built with legacy compression algorithms could still be installed on newer systems. This interoperability is a hallmark of the RPM system's robustness.

Current Defaults and Future Considerations

Current RHEL/Fedora Defaults: As of recent Red Hat Enterprise Linux releases (e.g., RHEL 8, 9) and contemporary Fedora versions, xz is the default compression algorithm for binary RPM payloads. This decision reflects the current balance of priorities: maximizing storage and bandwidth efficiency, leveraging modern CPU capabilities for decompression, and acknowledging that package build times (the primary penalty for xz) are often a one-time cost amortized over many deployments.
Delta RPMs (DRPMs): It's important to note that compression of the full RPM payload operates independently of delta RPMs. Delta RPMs are an additional optimization layer designed for updates. Instead of downloading an entire new RPM, a DRPM contains only the differences (patches) between an old version and a new version of a package. These patches are then applied on the client side to reconstruct the new RPM. This reduces update sizes even further, often by magnitudes, on top of any gains from payload compression. Both xz payload compression and DRPMs work together to make Red Hat system updates incredibly efficient.
Future Trends: While xz is highly effective, research into even more advanced compression algorithms continues. Future shifts might consider algorithms that offer even better ratios, potentially with improved parallelization capabilities for compression/decompression on multi-core CPUs, or specialized algorithms optimized for certain data types. However, any new adoption would need to demonstrate significant advantages while maintaining acceptable performance characteristics and, crucially, robust, open-source implementations.

The evolution of RPM compression mirrors the broader advancements in computing infrastructure. Each step—from gzip to bzip2 to xz—has been a strategic move to optimize the software delivery pipeline, catering to the changing demands of network speed, storage capacity, and CPU power. This continuous refinement ensures that Red Hat-based systems remain at the forefront of efficient and reliable software management, a fundamental requirement for the stable operation of any sophisticated IT environment, from general servers to specialized components like an API gateway.

Advanced Considerations and Best Practices for RPM Compression

While the default RPM compression settings are typically well-optimized for most Red Hat environments, there are situations where advanced considerations and custom configurations become beneficial. Understanding these nuances can help administrators and package maintainers fine-tune their software distribution strategies for specific use cases, whether it's optimizing build times, conserving resources in specialized deployments, or ensuring compatibility.

When to Override Default Compression Settings

The default xz compression with a high level (e.g., xz -9) is excellent for most binary RPMs intended for public distribution or large-scale enterprise deployments. However, there are scenarios where deviating from these defaults makes sense:

Minimizing Build Times for Internal/Development Packages:
- Scenario: If you are frequently building custom RPMs for internal development, testing, or rapid prototyping, and the packages are primarily used within a high-bandwidth local network, the extra time taken by xz -9 compression can become a bottleneck.
- Solution: You might opt for a faster compressor like gzip (e.g., gzip -6) or a lower compression level for xz (e.g., xz -1). This would reduce build times at the expense of slightly larger package sizes. The rpmbuild configuration can be modified locally (e.g., in ~/.rpmmacros) to override system-wide defaults without affecting official packages.
- Example ~/.rpmmacros entry: %_binary_payload_compressor gzip %_binary_payload_compresslevel 6
Resource-Constrained Environments (e.g., Embedded Systems):
- Scenario: For highly specialized embedded devices with very limited CPU power, minimal RAM, or slow storage (e.g., older flash memory), the decompression overhead of xz or even bzip2 might be too high. Slow decompression could delay boot times, application startup, or system updates, impacting the device's responsiveness.
- Solution: gzip might be the preferred choice due to its extremely fast decompression and low memory usage, even if it results in larger binaries. The slightly larger download might be acceptable if the network is less of a bottleneck than the CPU.
Packages with Already Compressed Content:
- Scenario: If an RPM package primarily consists of files that are already highly compressed (e.g., image files like JPEGs or PNGs, audio/video files, or pre-compressed archives like .tar.gz within the payload), applying xz -9 to the entire payload might offer negligible additional compression benefits while still incurring significant CPU time for the compression process.
- Solution: In such niche cases, using a faster but less aggressive compression algorithm or a lower compression level might be a more efficient trade-off, as the marginal gains from maximum compression on already optimized data are minimal.
- Consideration: However, most RPMs contain a mix of content, including many uncompressed binaries and text files, where xz still provides significant value. This is a very specific optimization for packages dominated by specific content types.

Impact on `createrepo` and Repository Sizes

The compression choice for RPM payloads directly impacts the size of local repositories and the performance of the createrepo utility.

createrepo Performance: When createrepo (or createrepo_c) generates repository metadata (like repomd.xml, filelists.xml.gz, primary.xml.gz, etc.), it also compresses some of these metadata files. While this metadata compression is typically done with gzip, the overall repository size is dominated by the RPMs themselves. If RPMs are smaller due to xz compression, the repository will be smaller overall.
Repository Synchronization: Smaller RPMs within a repository mean faster synchronization for rsync or other mirror tools, reducing bandwidth consumption and update times for internal mirrors. This is particularly relevant for large organizations maintaining geographically distributed yum/dnf mirrors.

Security Implications (Indirect)

While RPM compression itself doesn't directly introduce security vulnerabilities (RPM's signature checking is the primary security mechanism for package integrity), there are indirect considerations:

Speed of Patch Deployment: Efficient compression (leading to smaller, faster-downloading packages) indirectly enhances security by enabling quicker deployment of security patches. When a critical vulnerability is discovered, the ability to roll out updated packages rapidly across an entire fleet is paramount. Slow downloads due to inefficient compression could delay patching, leaving systems vulnerable for longer.
Maliciously Crafted Compressed Files: Although RPM's signature verifies integrity, theoretically, a maliciously crafted compressed payload could attempt to exploit decompression vulnerabilities. However, robust decompression libraries (like zlib, bzip2lib, liblzma) are heavily scrutinized and patched for such issues. The primary protection comes from ensuring the RPM itself is signed by a trusted entity.

Compression for Different Types of Payloads (Source vs. Binary RPMs)

It's important to distinguish between Source RPMs (SRPMs) and Binary RPMs:

Binary RPMs (.rpm): These contain the compiled software and are what users typically install. Their payload compression (e.g., xz) is our main focus, as it directly affects distribution size and installation speed.
Source RPMs (.src.rpm): These contain the original source code, patches, and the .spec file used to build the binary RPM. The source code itself is often a compressed tarball (e.g., tar.gz, tar.bz2, tar.xz). The compression of this source tarball is managed by the Source0, Source1, etc., directives in the .spec file and the %_source_compress_command macro. For SRPMs, xz is also frequently used for the source tarballs because build time is less critical than preserving bandwidth for potentially very large source trees.

Using Delta RPMs (DRPMs) for Greater Efficiency

Delta RPMs (.drpm) are a powerful optimization that complements payload compression. They are not a replacement for payload compression but rather an additional layer of efficiency for updates.

How they work: When an update is available, instead of downloading the full new RPM, the yum or dnf client can download a DRPM. This DRPM contains only the binary differences (deltas) between the old version of a package already installed on the system and the new version. The client then applies these deltas to the locally installed old RPM to reconstruct the new RPM.
Benefits: This drastically reduces the amount of data transferred for updates, especially for minor version bumps where only small parts of the package have changed. DRPMs are particularly effective for large packages like the kernel or major applications.
Interaction with Payload Compression: DRPMs effectively apply to the uncompressed contents of the RPMs. So, the efficiency of the underlying payload compression (e.g., xz) is still important, as it determines the size of the original new RPM that would be downloaded if a DRPM wasn't used or couldn't be generated. Both technologies work in concert to achieve maximum efficiency in the update process.

Connecting to Modern Infrastructure: API Gateway and Management Control Plane

The foundational efficiencies derived from meticulous RPM compression strategies, while seemingly low-level, have a direct and tangible impact on the performance and reliability of high-level infrastructure components like an API gateway and a Management Control Plane (MCP).

For organizations managing complex API ecosystems, an efficient API gateway is paramount. Platforms like APIPark provide robust solutions for AI gateway and API management, ensuring seamless integration and deployment of services. The underlying infrastructure supporting such critical systems often relies on highly optimized package distribution. If an API gateway component, its dependencies, or the operating system it runs on are delivered via RPMs, then understanding and optimizing RPM compression directly contributes to:

Faster Deployment/Updates of Gateway Components: Quicker downloads and installations of API gateway software or security patches mean less downtime, faster scaling, and more agile infrastructure changes. This is critical for maintaining high availability and responsiveness of exposed APIs.
Reduced Operational Costs: Lower network bandwidth usage for gateway updates means reduced data transfer costs, particularly in cloud environments.
Smaller Base Images for Containerized Gateways: If the API gateway is deployed in containers, smaller underlying RPMs contribute to leaner base images, leading to faster image pulls and quicker container startup times, which is essential for dynamic scaling of an API gateway.

Similarly, in sophisticated enterprise architectures, especially those leveraging cloud or hybrid environments, a robust Management Control Plane (MCP) is essential for orchestrating and governing a multitude of services. This MCP frequently interacts with underlying operating systems and application components, many of which are deployed and updated through RPM packages. The compression ratio of these RPMs plays a subtle yet critical role in the MCP's operational efficiency:

Efficient Fleet Management: When an MCP pushes updates to thousands of nodes across a distributed system, the aggregate bandwidth savings from well-compressed RPMs are immense, preventing network congestion and ensuring updates are rolled out swiftly and consistently. This allows the MCP to maintain a desired state across the entire infrastructure with greater agility.
Faster Provisioning: New nodes provisioned by the MCP will download and install base packages faster if they are efficiently compressed, reducing the time-to-readiness for new compute resources.
Resource Optimization: By reducing the data footprint, the MCP can manage its own storage requirements more efficiently and minimize its network egress costs when interacting with external repositories.

In essence, optimizing RPM compression is a foundational step that creates a ripple effect, enhancing the efficiency, security, and agility of the entire software delivery chain, ultimately supporting the reliable operation of critical services and infrastructure components such as API gateways and Management Control Planes. It's a testament to how low-level technical decisions underpin high-level strategic capabilities in modern IT.

Conclusion

The journey through the intricacies of Red Hat RPM compression ratio reveals a profound interplay of technology, operational efficiency, and strategic decision-making. Far from being a mere technical detail, the choice and implementation of compression algorithms within RPM packages are fundamental to the robust and agile management of Red Hat-based systems. We've dissected the foundational structure of RPMs, explored the historical evolution of compression from gzip to bzip2 and ultimately to the highly efficient xz, and delved into the detailed mechanics of each algorithm.

Our investigation highlighted that the "compression ratio" is a dynamic metric, influenced not only by the chosen algorithm but also by the inherent redundancy of the data and the specified compression level. Understanding these factors allows for intelligent optimization, balancing the tangible benefits of reduced storage footprint and expedited network transfers against the computational costs of compression and decompression. The impact extends across the entire software lifecycle, affecting everything from build times for package maintainers to deployment speeds for system administrators, and ultimately influencing the overall cost-effectiveness and responsiveness of IT infrastructure.

Furthermore, we've demonstrated how these seemingly low-level packaging efficiencies cascade upwards, directly affecting the performance and reliability of sophisticated, high-level components crucial to modern enterprise architectures. Efficient RPM compression enables faster, more secure, and more cost-effective deployments of critical services, including the very software that comprises an API gateway or components within a Management Control Plane (MCP). For instance, the seamless operation and rapid update cycles facilitated by optimized RPMs are indispensable for robust API management platforms like APIPark, which depend on resilient underlying infrastructure to deliver their promise of unified AI and REST service integration.

In conclusion, the Red Hat RPM compression ratio is a silent but potent force in the world of Linux system administration. It embodies a continuous quest for efficiency that has shaped the landscape of software distribution. For administrators, developers, and architects, a deep comprehension of these compression nuances is not just academic; it is a vital skill for building, managing, and maintaining secure, performant, and scalable Red Hat environments that can confidently meet the demands of today's dynamic digital infrastructure. As technology continues to evolve, the principles of efficient data packaging will remain a cornerstone, adapting to new challenges and continuing to optimize the intricate dance between software and the systems that host it.

Frequently Asked Questions (FAQs)

1. What is the primary purpose of compression in Red Hat RPM packages? The primary purpose of compression in Red Hat RPM packages is to reduce their file size. This reduction offers several key benefits: it minimizes storage requirements on repository servers and client machines, drastically decreases network bandwidth consumption during downloads and updates, and ultimately leads to faster overall software deployment times, especially for large-scale rollouts or in environments with limited network capacity. It's a critical factor in enhancing the efficiency and agility of software distribution across the Red Hat ecosystem.

2. Which compression algorithms are commonly used for RPM payloads, and what are their trade-offs? Historically, gzip was the default, offering fast compression and decompression with moderate file size reduction. bzip2 later provided significantly better compression ratios than gzip at the cost of slower compression and decompression speeds and higher memory usage. Currently, xz (using the LZMA algorithm) is the default for most modern Red Hat distributions, delivering the best compression ratios and thus the smallest file sizes. This comes at the expense of the slowest compression times but often offers competitive (or even faster) decompression speeds compared to bzip2 on modern CPUs, though with potentially higher memory consumption. The choice involves balancing desired file size against computational overhead and memory footprint.

3. How does RPM compression impact network bandwidth and deployment speed? RPM compression significantly impacts network bandwidth by reducing the amount of data that needs to be transferred for package downloads and updates. Smaller package sizes mean faster downloads, which directly contributes to a quicker overall deployment speed. In scenarios with limited bandwidth or when updating a large number of systems simultaneously (such as in data centers or during security patch rollouts), efficient compression is crucial for minimizing network congestion and ensuring that software can be deployed rapidly and consistently across the infrastructure, thereby improving operational agility.

4. Can I change the compression algorithm used when building my own RPMs? Yes, package maintainers and administrators can specify the compression algorithm and level when building their own RPMs. This is typically done by configuring RPM macros in files like /etc/rpm/macros or ~/.rpmmacros. The key macros are %_binary_payload_compressor (e.g., gzip, bzip2, xz) and %_binary_payload_compresslevel (e.g., 1 for fastest, 9 for best compression). This flexibility allows for optimization based on specific requirements, such as prioritizing faster build times for development packages or maximizing compression for widely distributed releases.

5. How does RPM compression relate to modern infrastructure components like API Gateways or Management Control Planes? Efficient RPM compression, while a low-level detail, fundamentally underpins the performance and reliability of modern infrastructure. For components like an API gateway or a Management Control Plane (MCP), which are often deployed and updated via RPMs, optimized compression translates directly to faster deployments, quicker application of security patches, and reduced network bandwidth costs. This ensures that the underlying infrastructure supporting critical services (such as those managed by platforms like APIPark for API management) remains agile, secure, and responsive, ultimately enhancing the overall operational efficiency and stability of complex IT environments.

🚀You can securely and efficiently call the OpenAI API on APIPark in just two steps:

Step 1: Deploy the APIPark AI gateway in 5 minutes.

APIPark is developed based on Golang, offering strong product performance and low development and maintenance costs. You can deploy APIPark with a single command line.

curl -sSO https://download.apipark.com/install/quick-start.sh; bash quick-start.sh

In my experience, you can see the successful deployment interface within 5 to 10 minutes. Then, you can log in to APIPark using your account.

Step 2: Call the OpenAI API.