TL;DR: use TokuDB. Always. It massively reduces storage requirements and
is massively faster than InnoDB, even on low end hardware. My tables were compressed to just 12%
original size and that’s fairly typical.
A very simple structure:
CREATE
TABLE test (
date DATE NOT NULL,
site VARCHAR(8) NOT NULL,
order_id VARCHAR(64) NOT NULL,
time TIME NOT NULL,
kind ENUM(...) NOT
NULL,
account INTEGER,
user INTEGER,
client_price
BIGINT,
supplier_id INTEGER,
supplier_cost BIGINT,
client_ref VARCHAR(32),
data JSON,
PRIMARY KEY (date, site, order_id),
INDEX (account, user, client_ref)
) DEFAULT CHARSET utf8
PARTITION BY RANGE (TO_DAYS(date)) ( ...
This
system is using the RDBMS as a document store; the order items are being flattened into a document store
(the ‘data’ field).
The ETL is upserting order events as they are processed, using
multi-row INSERT INTO … ON DUPLICATE KEY UPDATE …
And I’ve been
running performance tests with some real data. 27M rows of real data, all on the same day (so all in
the same partition). 37M upsert statements inserting or updating those 27M rows, issued in batches of
1K-2K per multi-row statement. This is enough rows to be able to get a grasp of upsert performance and
extrapolate storage requirements.
The test box is an AWS t2.medium instance. This is
a very weak VM with 2 (virtual) CPUs and 4GB RAM. Not normal DB fare, so performance beware.
The
27M rows take 10.48GB uncompressed. It takes InnoDB 10 hours to load all the data.
In the old days, MySQL InnoDB offered ‘row_compression’. This compresses individual
rows. This reduces disk requirements to 5.73GB (45% space saved). So it halves the
disk space! Sadly, the test now took 15 hours instead of 10 hours.
More recently,
MySQL introduced ‘table compression’. (MariaDB also added it, calling it ‘page compression’; the CREATE
TABLE syntax is infuriatingly slightly different).
Underneath, InnoDB is organized
as ‘pages’. This table compression intercepts these page reads and writes and, as pages are written to
disk, they are individually compressed and written as a ‘sparse’ file.
As these InnoDB
pages are larger than the file system pages, there is the opportunity for a ‘hole’ to be ‘punched’ in
each InnoDB page. Imagine an InnoDB page is 12KB, and the file system page size is 4KB. The page may
compress down to 1KB, meaning it only needs the first file system page (4KB), and 8KB would be saved.
This example illustrates how wasteful this approach can be; there can still be a lot of slack in each
page. The total file-size as reported by ls is unchanged, but the actual used disk space (as shown by
e.g. ls -s) is reduced. Page compression compresses this table to 4.07GB (61% savings).
The bad news is that performance sinks awfully! I aborted the run after 32 hours, as upsert speed had
dropped to under 100 orders/sec. I’ve seen this on MariaDB RDS and on my little t2.medium. Its
atrocious!
TokuDB is an alternative storage engine to InnoDB.
It is nicely packaged by Percona so its very straightforward to install.
TokuDB
compresses databases with zlib by default. It compresses pages, but doesn’t store them as sparse files;
instead, the total file size you see on disk is the total file size.
TokuDB with zlib
compresses to 1.45GB (86% savings)! This is the same compression algorithm as InnoDB
page compression and operating at much the same logical page level, but conferring much bigger savings
because each ‘page’ is not rounded up to a multiple of the file system page size.
Because
InnoDB puts the whole table (or, as in this case, partition) into a single file you cannot see how much
of the storage is for the primary rows and how much is for secondary indexes. TokuDB, however, has
separate files for each index. So in this case there are two files for the partition, and I can see
that the 27M rows takes 1.02GB and the by-client index takes 0.44GB.
TokuDB allows you
to specify other compression algorithms. The weakest (thus fastest) is ‘snappy’, which is an LZ77
compressor without entropy coding. It compresses to 2.31GB (78% savings; same ratio rows to index).
There
is also QuickLZ (between snappy and zlib in CPU requirements; 1.95GB; 81% savings) and LZMA (stronger
than zlib; 1.23GB; 88% savings!).
LZMA is a clear winner here.
What’s fascinating about the TokuDB though is not just its space-saving but also its performance: it
takes just 2 hours to run my ETL and choice of compression algorithm has no
noticeable effect on that!
Now the dire uncompressed InnoDB performance is perhaps the
classic “InnoDB is not good with databases that don’t fit into RAM”. I was using Percona packages so
the defaults were not as mad as they used to be out-of-the-box with other linux sources, but I did up it
to 2GB and a big log. Configuring InnoDB optimally is like reading tea leaves and no amount of doing it
makes me confident I’ve hit the sweat spot for any particular workload. This upsert workload seems to
be particularly stressful. The t2.medium has two (virtual) CPUs, and as upsert performance dropped I
saw ‘top’ reporting load average over 2.0 even though ETL was waiting on the DB. Not good.
However
the dire compressed InnoDB performance I saw repeated on a MariaDB RDS m4.2xlarge with the same
data-set. There’s something wrong with either InnoDB page compression, or with EBS spare file
performance!
On the other hand, TokuDB shows how to do things properly! The
out-of-the-box defaults are very sane (it doesn’t use O_DIRECT; instead, it takes 50% to store
uncompressed pages, and lets the OS cache the compressed pages in the normal file buffer cache.
Inspired!). And the performance is just so stable! On the t2.medium, which is a piddling little box,
I saw 5000 orders/sec sustained throughout the entire run.
I think I’ve heard that
TokuDB is marginally slower than (uncompressed) InnoDB on small tables. And perhaps most users have
small tables. I don’t think there’s actually that measurable a gap when everything is tiny. But the
moment you go over a million rows or contemplate partitioning then I think TokuDB is the obvious choice.
In fact it would make a very good default.
Oh, and for kicks, try a COUNT(*) on a big
InnoDB table. And then go make a coffee! TokuDB … snap! Not based on meta-data like MyIASM
did neither; just so much faster.
TokuDB redefines what a ‘small’ database is. It
moves the goalposts. It means I can build bigger systems without scaling sideways. I don’t get why
everyone still uses InnoDB ;)