The SlimPDF Guide

The surprisingly rich
history of making
files smaller

From telegraph operators squeezing Morse code in the 1800s to the algorithms running silently inside every PDF you have ever emailed, compression is one of the oldest and most underappreciated arts in computing.

BEFORE AFTER THE ART OF SAYING MORE WITH LESS
The flow of data. Compression has been fundamental to digital communication since the earliest computers.

Before there was broadband, before there was Wi-Fi, before there was even a consumer internet to speak of, engineers were already obsessed with the same problem that frustrates you when you try to email a PDF: how do you send more information through a channel that was not built to carry it?

The answer, then as now, was compression. The art of representing the same information in fewer bits. It is a discipline that sits at the intersection of mathematics, linguistics, and engineering, and its history stretches back much further than most people realize.

80%
Average size reduction on image-heavy PDFs
3B+
PDFs created every year worldwide
1993
Year Adobe introduced the PDF format

It started with dots and dashes

E T A Q FREQUENCY IN ENGLISH E T A O Q Z Common letters get shorter codes. Rare letters get longer ones.
Morse code assigned the shortest sequences to the most frequent letters. E is a single dot. Q requires four symbols. This is compression in its oldest form.

The telegraph operators of the nineteenth century were, in a very real sense, the first data compression engineers. Samuel Morse's code was not designed randomly. The most common letters in the English language were assigned the shortest sequences. E, the most frequent letter, is a single dot. T is a single dash. Q, rarely used, requires four symbols.

This insight, that you should use shorter codes for more common things, became the mathematical foundation for nearly every compression algorithm that followed. It would take another century for Claude Shannon to formalize it, but the intuition was already baked into the way telegraph operators tapped their keys.

Compression is not about losing information. It is about finding a smarter language to express it in.

Shannon's 1948 paper, A Mathematical Theory of Communication, introduced the concept of entropy. A precise measure of how much information a message actually contains, as opposed to how many bits it happens to be stored in. The gap between those two numbers is redundancy, and redundancy is what compression eliminates.

The algorithms that
built the internet

LZ77 — REPLACE REPEATED SEQUENCES WITH REFERENCES THE CAT SAT THE CAT THE HAT THE CAT SAT ←3 CAT ←6 HAT 7 tokens 5 tokens 29% smaller
How LZ77 works: repeated sequences are replaced with back-references to where they first appeared. The more repetition in a file, the greater the compression.

Through the 1970s and 1980s, compression moved from theory into practice at remarkable speed. The algorithms that emerged during this period became so fundamental that you are almost certainly using them right now without knowing it.

1977

Lempel-Ziv (LZ77)

Abraham Lempel and Jacob Ziv publish the algorithm that forms the basis of ZIP files, PNG images, and the deflate compression used inside every PDF today. The key insight: replace repeated sequences with references to earlier occurrences.

1985

Huffman Coding goes mainstream

David Huffman's 1952 algorithm, which assigns shorter bit sequences to more common symbols, becomes standard inside image and document formats. It remains one of the most elegant algorithms in computer science.

1987

GIF and LZW compression

CompuServe introduces the GIF format using LZW compression, making images small enough to transmit over dial-up modems. The format war between GIF and PNG that followed shaped how the web handles images to this day.

1992

JPEG changes photography forever

The Joint Photographic Experts Group releases JPEG, the first widely adopted lossy compression standard. By discarding visual information the human eye cannot easily detect, a 10MB photograph becomes a 500KB file with no perceptible quality loss.

1993

Adobe launches PDF

Adobe's Portable Document Format arrives, built on PostScript and incorporating deflate compression for text and streams. The format is designed to look identical on any device, a revolutionary idea at a time when documents routinely broke across different computers.

2000s

The email attachment problem

As PDFs become the standard for sharing documents, file size becomes a daily frustration. Email servers impose 10MB and 25MB attachment limits. A generation of workers learns to dread the words "your attachment was too large to deliver."

What is actually inside
a PDF file?

PDF IMAGE Text streams Lossless deflate Embedded images JPEG re-encoding Font data Subsetting unused glyphs Cross-reference table Object cleanup
A PDF contains multiple distinct layers, each compressed differently. Text uses lossless deflate. Images use lossy JPEG. Fonts get subsetted to only the characters actually used in the document.

A PDF is not a single thing. It is a container. A carefully structured archive that holds text, fonts, images, vector graphics, metadata, and interactive elements, all organized according to a specification that runs to thousands of pages.

When you open a PDF in any viewer, that software is interpreting a stream of instructions. Draw this text at this position in this font. Place this image here. This page is 8.5 inches by 11 inches. The file contains both the content and the instructions for rendering it.

What lives inside a PDF

Text streams — the actual characters on the page, stored as compressed sequences of character codes referencing embedded fonts.

Font data — PDFs embed font files so they look correct on any device. A single embedded font can add 100KB to 500KB to a file.

Embedded images — photos, screenshots, and scanned pages stored as raw pixel data or compressed with JPEG.

Vector graphics — shapes, lines, and illustrations stored as mathematical instructions rather than pixels.

Metadata — creation date, author, software used, revision history, and other information invisible to the reader but stored in the file.

Cross-reference table — an index at the end of the file telling the PDF reader where to find each object. In old or poorly generated PDFs, this alone can be surprisingly large.

Why PDFs get large and how
compression fixes it

The most common cause of bloated PDFs is embedded images. When you scan a document, take a screenshot, or export a presentation to PDF, the software often embeds those images at full resolution with no compression applied. A single scanned page at 300 DPI can weigh 5MB before any other content is added.

The second most common cause is embedded fonts. Every unique font face used in a document gets embedded in its entirety. A document using four fonts could carry 1MB of font data for text that represents a few kilobytes of actual content.

The third cause is what engineers call unoptimized object streams. The internal structure of the PDF contains references, indexes, and metadata that accumulate over time, particularly in documents that have been edited, merged, annotated, or digitally signed multiple times.

Lossless compression

Lossless compression reduces file size without removing any information. The original data can be reconstructed perfectly. Deflate compression, a descendant of Lempel and Ziv's 1977 algorithm, is applied to text streams and vector graphics in PDFs. A compressed text stream decompresses to exactly the same bytes that went in.

Lossy compression

Lossy compression achieves much greater size reductions by permanently discarding information that is difficult for humans to perceive. JPEG compression breaks an image into 8x8 blocks of pixels and applies a mathematical transform that identifies which visual details can be removed without the image looking noticeably different. The human visual system is much more sensitive to brightness than to color, and JPEG exploits this ruthlessly.

When you compress a PDF that contains scanned documents or photographs, the dominant technique is re-encoding those embedded images at lower JPEG quality. An image originally stored at quality 95, nearly indistinguishable from the raw scan, might be re-encoded at quality 60 with file size cut to a third, and the visual difference invisible at normal reading distances.

What Ghostscript actually does

Ghostscript, the open-source engine that powers many professional PDF compression tools including SlimPDF, takes a more comprehensive approach. Rather than just recompressing images, it effectively re-renders the entire PDF. It interprets every instruction, re-optimizes every stream, subsets fonts to include only the characters actually used, flattens transparency, removes duplicate objects, and rebuilds the cross-reference table from scratch.

The result is a PDF that is visually identical to the original but structurally far more efficient. A document that has been edited twelve times in four different applications and emailed back and forth across a company for three years can often be cut to a quarter of its size by Ghostscript without losing a single visible pixel.

The best PDF compressor is one you never have to think about. You drop a file in. A smaller file comes out. That is the entire contract.

The target size problem and
why nobody solved it

THE OLD WAY Quality setting Medium Result: 3.2 MB Need under 2MB. Try again. repeat... THE SLIMPDF WAY Compress to under: 2 MB ✓ Target met — 1.87 MB Done in one pass. THE GAP BETWEEN QUALITY SETTINGS AND BYTE LIMITS
The old approach required trial and error. SlimPDF's target mode searches the compression space automatically and hits the number you need.

For most of the history of PDF compression, tools offered you a slider or a preset. Low quality. Medium quality. High quality. You compressed the file, checked the size, decided it was still too large, compressed it again at a lower setting, and repeated until you hit something acceptable.

What nobody offered was the ability to simply say: make this file under 2MB. The gap seems obvious in retrospect. Job application portals, government upload systems, and email servers all impose size limits in bytes, not quality levels. But the compression tools spoke in quality and the upload forms spoke in megabytes, and users translated between them manually.

SlimPDF's custom target mode closes this gap by searching the compression parameter space. Given a target size, it runs multiple compression attempts, trying different quality settings, different DPI thresholds, and different combinations of Ghostscript presets, until it finds the combination that produces the smallest file that still meets the target. If no combination can reach the target, it tells you honestly: this is the smallest this file can get.

The next frontier

The compression algorithms in widespread use today are extraordinarily mature. Deflate, the algorithm inside ZIP files and PDF streams, is essentially unchanged from its 1993 specification. JPEG compression, approaching its fortieth anniversary, remains the dominant format for photographic images. The improvements in modern formats like JPEG 2000 and AVIF are real but modest for the use cases most people encounter day to day.

The genuinely new frontier is AI-assisted compression. Using neural networks to identify which parts of an image contain information the human eye cares about, and preserving those with higher fidelity while aggressively compressing the rest. Early results from this approach suggest compression ratios that would have seemed impossible under classical algorithms. It is, in a sense, a return to Shannon's original insight: the limit of compression is not an algorithm, it is human perception.

For now, the most practical compression remains the kind that has worked since 1993. Find the redundancy, remove it, and hand back a file that does the same job in less space. Simple in concept. Surprisingly deep in practice.

Try it yourself, free

Drop in a PDF under 5MB and see exactly how much smaller we can make it. No account needed.

Compress a PDF Free