Recently I was made aware of some FUD (fear, uncertainty,
and doubt) that a competitor has been sending to our current and prospective
clients. This FUD contained a number of gross inaccuracies, as well as some “points”
that really made me scratch my head and wonder how much this competitor really
understands data warehousing and analytics.
This competitor claimed that Netezza scanned the entire
table for all SQL / Analytic operations.
This claim is absolutely not true. While
Netezza does not have indexes that the DBA must create and maintain, it does
automatically build and maintain zone maps for all tables in the database.
These zone maps contain the minimum and maximum value for all columns in every
extent within each table. So, before the query starts reading from disk, it
looks at the predicates in the query, and compares them to the zone maps to
determine which table extents can be skipped and which need to be read.
For example, if you want to calculate
the total sales of Red Ford Mustangs in June 2011, Netezza can skip any table extent
that does not have data for June 2011. So, for a database with 7 years of sales
history, it can skip any extent that has a maximum that is less, or a minimum
that is greater than, June 2011. This eliminates 98% or more of the I/O
required.
Our competitor claims that their range partitioning would automatically eliminate processing on all of the months other than June 2011 and is a better solution. While their range partitioning will eliminate the I/O like Netezza, there is a whole bunch of effort that partitioning brings to their solution that they do not talk about. In their solution you create a table, and then you create "inheritance children", one per partition. So for a table with 7 years of data, that is 84 monthly partitions, and 84 tables (the base table plus the 83 inheritance children). That might not seem too bad, but there's more. If you have a primary key, foreign key, index, constraint, or permission on the table, you need to apply it to the table and each of its inheritance children, it is not a global operation. So, for these 84 tables with 3 user groups with different permissions, a primary key, a constraint, and one non-unique index, that would be 84 * (3 + 1 + 1 + 1) or 504 DDL statements to set this up, and to maintain over time.
And on top of that, their bulk loader is not partition aware, so you need to write a rule for each table/inheritance child, adding 84 more DDL statements to the list.
In Netezza you write one statement to create the table, vs. 588 DDL statements for the exact same table and data in this competitor.
Our competitor claims that their range partitioning would automatically eliminate processing on all of the months other than June 2011 and is a better solution. While their range partitioning will eliminate the I/O like Netezza, there is a whole bunch of effort that partitioning brings to their solution that they do not talk about. In their solution you create a table, and then you create "inheritance children", one per partition. So for a table with 7 years of data, that is 84 monthly partitions, and 84 tables (the base table plus the 83 inheritance children). That might not seem too bad, but there's more. If you have a primary key, foreign key, index, constraint, or permission on the table, you need to apply it to the table and each of its inheritance children, it is not a global operation. So, for these 84 tables with 3 user groups with different permissions, a primary key, a constraint, and one non-unique index, that would be 84 * (3 + 1 + 1 + 1) or 504 DDL statements to set this up, and to maintain over time.
And on top of that, their bulk loader is not partition aware, so you need to write a rule for each table/inheritance child, adding 84 more DDL statements to the list.
In Netezza you write one statement to create the table, vs. 588 DDL statements for the exact same table and data in this competitor.
I’ll respond to some more of the
claims this competitor has been making over my next few posts.