Monday, July 02, 2012

Addressing more crazy competitor claims

Among some of the other “claims” that our competitor made about Netezza is that is can only load at the rate of 2TB/hr. First off, this is false. The current generation of the Netezza platform can load at over 5TB/hr. But, the real question I ask is "Does this really matter after you have your system up and running?”

After the initial loading of the database from the existing system(s), very few companies load more than a couple hundred GB to a couple TB per day, and most do not even approach these daily or even monthly load volumes. Even Netezza’s biggest customers who have PetaByte sized systems do not find Netezza’s load speed to be an issue.

Now let’s look at the claims this competitor is making in more detail, and peel back the layers of the onion.  This competitor claims that they can load at double the Netezza’s load speed of 5TB/hr. But they leave out a number of important factors when they make this claim.

What about compression?
Netezza can load at a rate of 5TB/hr and compress the data at the same time. This competitor can only load at their claimed compression rate if compression is not used. So, if you want to compress the data, how fast can you really load on their platform?  They use a library based compression algorithm that uses CPU cycles to basically “zip” the data pages as they are written to disk, using significant CPU cycles in the system that cannot then be used to format the data into pages, build indexes, etc.

What about partitioned tables?
This competitor needs tables to be partitioned in order to provide good performance, but in order to load a table this competitor has to have a “rule” for each data partition, and then each row that is being loaded must be compared to the rules to know which partition it should be loaded into. If the row should be loaded into one of the first couple of ranges, then there is little extra processing, but all of the latest data will have to be checked against many rules, slowing down the processing of these rows, and definitely slowing down the load process.

What about indexes?
This competitor admits in their manuals that they also need indexes in order to perform well. But, each index incrementally slows down load performance.

Netezza does not need indexes to perform well, so does not suffer from decreased load speed because of indexes, or table partitioning

What about pre-processing?
Netezza can load at the same 5TB/hr rate with no pre-processing of the input data file. This same competitor can only load at their claimed faster rate if their appliance includes an option “integration” or ETL module where the servers pre-process the data and then send it to the data modules to be loaded. Without the integration module, the load file would need to be placed on the shared file system (accessible from all modules in their appliance) and then the load speed is really only 2TB/hr based on published validation reports of their architecture and procedures. And again, this 2TB/hr is without compression, or table partitioning.


ahealy74 said...

Great article Dwaine. Just curious, have you checked out AssetCentral? It's a great piece of software for data center management. Hope you'll check it out and let me know what your thoughts are.

prasannasneha yadhavan said...

Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging!

Work Claims