Among some of the other “claims”
that our competitor made about Netezza is that is can only load at the rate of
2TB/hr. First off, this is false. The current generation of the Netezza
platform can load at over 5TB/hr. But, the real question I ask is "Does this really matter after you have your system up and
running?”
After the initial loading of the
database from the existing system(s), very few companies load more than a
couple hundred GB to a couple TB per day, and most do not even approach these
daily or even monthly load volumes. Even Netezza’s biggest customers who have
PetaByte sized systems do not find Netezza’s load speed to be an issue.
Now let’s look at the claims this
competitor is making in more detail, and peel back the layers of the onion. This competitor claims that they can load at
double the Netezza’s load speed of 5TB/hr. But they leave out a number of
important factors when they make this claim.
What about compression?
Netezza can load at a rate of
5TB/hr and compress the data at the same time. This competitor can only load at
their claimed compression rate if compression is not used. So, if you want to
compress the data, how fast can you really load on their platform? They use a library based compression
algorithm that uses CPU cycles to basically “zip” the data pages as they are
written to disk, using significant CPU cycles in the system that cannot then be
used to format the data into pages, build indexes, etc.
What about partitioned tables?
This competitor needs tables to be
partitioned in order to provide good performance, but in order to load a table
this competitor has to have a “rule” for each data partition, and then each row
that is being loaded must be compared to the rules to know which partition it
should be loaded into. If the row should be loaded into one of the first couple
of ranges, then there is little extra processing, but all of the latest data
will have to be checked against many rules, slowing down the processing of
these rows, and definitely slowing down the load process.
What about indexes?
This competitor admits in their
manuals that they also need indexes in order to perform well. But, each index
incrementally slows down load performance.
Netezza does not need indexes to
perform well, so does not suffer from decreased load speed because of indexes,
or table partitioning
What about pre-processing?
Netezza can load at the same 5TB/hr
rate with no pre-processing of the input data file. This same competitor can
only load at their claimed faster rate if their appliance includes an option
“integration” or ETL module where the servers pre-process the data and then
send it to the data modules to be loaded. Without the integration module, the
load file would need to be placed on the shared file system (accessible from
all modules in their appliance) and then the load speed is really only 2TB/hr
based on published validation reports of their architecture and procedures. And
again, this 2TB/hr is without compression, or table partitioning.
2 comments:
Great article Dwaine. Just curious, have you checked out AssetCentral? It's a great piece of software for data center management. Hope you'll check it out and let me know what your thoughts are.
Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging!
Work Claims
Post a Comment