The surprisingly rich
history of making
files smaller
From telegraph operators squeezing Morse code in the 1800s to the algorithms running silently inside every PDF you have ever emailed, compression is one of the oldest and most underappreciated arts in computing.
Before there was broadband, before there was Wi-Fi, before there was even a consumer internet to speak of, engineers were already obsessed with the same problem that frustrates you when you try to email a PDF: how do you send more information through a channel that was not built to carry it?
The answer, then as now, was compression. The art of representing the same information in fewer bits. It is a discipline that sits at the intersection of mathematics, linguistics, and engineering, and its history stretches back much further than most people realize.
It started with dots and dashes
The telegraph operators of the nineteenth century were, in a very real sense, the first data compression engineers. Samuel Morse's code was not designed randomly. The most common letters in the English language were assigned the shortest sequences. E, the most frequent letter, is a single dot. T is a single dash. Q, rarely used, requires four symbols.
This insight, that you should use shorter codes for more common things, became the mathematical foundation for nearly every compression algorithm that followed. It would take another century for Claude Shannon to formalize it, but the intuition was already baked into the way telegraph operators tapped their keys.
Compression is not about losing information. It is about finding a smarter language to express it in.
Shannon's 1948 paper, A Mathematical Theory of Communication, introduced the concept of entropy. A precise measure of how much information a message actually contains, as opposed to how many bits it happens to be stored in. The gap between those two numbers is redundancy, and redundancy is what compression eliminates.
The algorithms that
built the internet
Through the 1970s and 1980s, compression moved from theory into practice at remarkable speed. The algorithms that emerged during this period became so fundamental that you are almost certainly using them right now without knowing it.
Lempel-Ziv (LZ77)
Abraham Lempel and Jacob Ziv publish the algorithm that forms the basis of ZIP files, PNG images, and the deflate compression used inside every PDF today. The key insight: replace repeated sequences with references to earlier occurrences.
Huffman Coding goes mainstream
David Huffman's 1952 algorithm, which assigns shorter bit sequences to more common symbols, becomes standard inside image and document formats. It remains one of the most elegant algorithms in computer science.
GIF and LZW compression
CompuServe introduces the GIF format using LZW compression, making images small enough to transmit over dial-up modems. The format war between GIF and PNG that followed shaped how the web handles images to this day.
JPEG changes photography forever
The Joint Photographic Experts Group releases JPEG, the first widely adopted lossy compression standard. By discarding visual information the human eye cannot easily detect, a 10MB photograph becomes a 500KB file with no perceptible quality loss.
Adobe launches PDF
Adobe's Portable Document Format arrives, built on PostScript and incorporating deflate compression for text and streams. The format is designed to look identical on any device, a revolutionary idea at a time when documents routinely broke across different computers.
The email attachment problem
As PDFs become the standard for sharing documents, file size becomes a daily frustration. Email servers impose 10MB and 25MB attachment limits. A generation of workers learns to dread the words "your attachment was too large to deliver."
What is actually inside
a PDF file?
A PDF is not a single thing. It is a container. A carefully structured archive that holds text, fonts, images, vector graphics, metadata, and interactive elements, all organized according to a specification that runs to thousands of pages.
When you open a PDF in any viewer, that software is interpreting a stream of instructions. Draw this text at this position in this font. Place this image here. This page is 8.5 inches by 11 inches. The file contains both the content and the instructions for rendering it.
Text streams — the actual characters on the page, stored as compressed sequences of character codes referencing embedded fonts.
Font data — PDFs embed font files so they look correct on any device. A single embedded font can add 100KB to 500KB to a file.
Embedded images — photos, screenshots, and scanned pages stored as raw pixel data or compressed with JPEG.
Vector graphics — shapes, lines, and illustrations stored as mathematical instructions rather than pixels.
Metadata — creation date, author, software used, revision history, and other information invisible to the reader but stored in the file.
Cross-reference table — an index at the end of the file telling the PDF reader where to find each object. In old or poorly generated PDFs, this alone can be surprisingly large.
Why PDFs get large and how
compression fixes it
The most common cause of bloated PDFs is embedded images. When you scan a document, take a screenshot, or export a presentation to PDF, the software often embeds those images at full resolution with no compression applied. A single scanned page at 300 DPI can weigh 5MB before any other content is added.
The second most common cause is embedded fonts. Every unique font face used in a document gets embedded in its entirety. A document using four fonts could carry 1MB of font data for text that represents a few kilobytes of actual content.
The third cause is what engineers call unoptimized object streams. The internal structure of the PDF contains references, indexes, and metadata that accumulate over time, particularly in documents that have been edited, merged, annotated, or digitally signed multiple times.
Lossless compression
Lossless compression reduces file size without removing any information. The original data can be reconstructed perfectly. Deflate compression, a descendant of Lempel and Ziv's 1977 algorithm, is applied to text streams and vector graphics in PDFs. A compressed text stream decompresses to exactly the same bytes that went in.
Lossy compression
Lossy compression achieves much greater size reductions by permanently discarding information that is difficult for humans to perceive. JPEG compression breaks an image into 8x8 blocks of pixels and applies a mathematical transform that identifies which visual details can be removed without the image looking noticeably different. The human visual system is much more sensitive to brightness than to color, and JPEG exploits this ruthlessly.
When you compress a PDF that contains scanned documents or photographs, the dominant technique is re-encoding those embedded images at lower JPEG quality. An image originally stored at quality 95, nearly indistinguishable from the raw scan, might be re-encoded at quality 60 with file size cut to a third, and the visual difference invisible at normal reading distances.
What Ghostscript actually does
Ghostscript, the open-source engine that powers many professional PDF compression tools including SlimPDF, takes a more comprehensive approach. Rather than just recompressing images, it effectively re-renders the entire PDF. It interprets every instruction, re-optimizes every stream, subsets fonts to include only the characters actually used, flattens transparency, removes duplicate objects, and rebuilds the cross-reference table from scratch.
The result is a PDF that is visually identical to the original but structurally far more efficient. A document that has been edited twelve times in four different applications and emailed back and forth across a company for three years can often be cut to a quarter of its size by Ghostscript without losing a single visible pixel.
The best PDF compressor is one you never have to think about. You drop a file in. A smaller file comes out. That is the entire contract.
The target size problem and
why nobody solved it
For most of the history of PDF compression, tools offered you a slider or a preset. Low quality. Medium quality. High quality. You compressed the file, checked the size, decided it was still too large, compressed it again at a lower setting, and repeated until you hit something acceptable.
What nobody offered was the ability to simply say: make this file under 2MB. The gap seems obvious in retrospect. Job application portals, government upload systems, and email servers all impose size limits in bytes, not quality levels. But the compression tools spoke in quality and the upload forms spoke in megabytes, and users translated between them manually.
SlimPDF's custom target mode closes this gap by searching the compression parameter space. Given a target size, it runs multiple compression attempts, trying different quality settings, different DPI thresholds, and different combinations of Ghostscript presets, until it finds the combination that produces the smallest file that still meets the target. If no combination can reach the target, it tells you honestly: this is the smallest this file can get.
The next frontier
The compression algorithms in widespread use today are extraordinarily mature. Deflate, the algorithm inside ZIP files and PDF streams, is essentially unchanged from its 1993 specification. JPEG compression, approaching its fortieth anniversary, remains the dominant format for photographic images. The improvements in modern formats like JPEG 2000 and AVIF are real but modest for the use cases most people encounter day to day.
The genuinely new frontier is AI-assisted compression. Using neural networks to identify which parts of an image contain information the human eye cares about, and preserving those with higher fidelity while aggressively compressing the rest. Early results from this approach suggest compression ratios that would have seemed impossible under classical algorithms. It is, in a sense, a return to Shannon's original insight: the limit of compression is not an algorithm, it is human perception.
For now, the most practical compression remains the kind that has worked since 1993. Find the redundancy, remove it, and hand back a file that does the same job in less space. Simple in concept. Surprisingly deep in practice.
Try it yourself, free
Drop in a PDF under 5MB and see exactly how much smaller we can make it. No account needed.
Compress a PDF Free