A number of our competitors make bold compression claims,
like “we provide 4X, or even 10X+ compression”. What they do not tell you is
that you have to choose between multiple different compression levels, and/or
compression libraries/types. They also do not mention that you cannot use their
compression on tables that have inserts, updates, and/or deletes occurring. Nor
do they mention the overhead that you can expect to see if you turn on their
compression.
Let’s look at these three points in a little more detail.
In today’s world of reduced
budgets, one of the easiest ways to save money with a data warehouse is to
reduce the amount of disk space required to store the data. To this end, nearly
all data warehouse solutions offer some form of database compression. In order
to use compression with many of our competitors the table must be an
Append-only table.
Append-only tables have the following limitations:
•Cannot UPDATE rows in an append-only table
•Cannot
DELETE rows from an append-only table
•Cannot
ALTER TABLE...ADD COLUMN to an append-only table
•Cannot
add indexes to an append-only table.
These limitations are because
of the way these vendors have implemented their ‘database compression’. While
Netezza has built in algorithms specifically designed for database usage,
others uses a compression library that “compresses” the data file as it is
being written to disk, and then uses the same library to un-compress the file
as it reads it from disk. For anyone that has used a tool like gzip, WinZip,
pkzip, WinRar, etc. you all know how slow these tools are, and how much CPU
cycles they use. This is the same consideration and overhead you will have with
these other vendors if you use their compression. In fact this overhead can be
so bad that some customers who have presented at our competitors’ conferences
have talked about tests where a single query running on a table with
compression used up over 95% of the CPU, while the same query against the same
data in a table that was not compressed used less than 5% of the CPU.
On top of the performance impact, there is also the DBA
overhead. With one competitor the DBA has to choose between 3 types of compression (i.e.
compression libraries) and 9 different levels of compression, each of which
work better for different data types. That is 27 different
combinations/permutations that the DBA has to choose between, for each table.
With
IBM Netezza compression is always on, there is no “better algorithm” for
different tables, and because of the way the Netezza architecture works, when
you get 4X compression with your data in Netezza, you see an associated 4X
improvement in performance, for all types of workloads, not just reads.
No comments:
Post a Comment