Dwaine Snow's Thoughts on Databases and Data Management: June 2012

Friday, June 29, 2012

Uncovering the truth about more outrageous competitor claims

Another of the claims this competitor made is that Netezza cannot handle the data ingest from a point of sale (POS) type system because so it could not handle the data being sent from the hundreds or thousands of POS systems at the same time. They go on to claim that Netezza cannot handle more than a specific, ridiculously small, number of concurrent writes to try to give credence to this argument.

In my opinion, this point shows a lack of knowledge of real data warehousing and analytics. IBM Netezza has a large number of retail customers who feed their POS data into Netezza all the time. But, this data normally always comes into the data center in some non-database type of communication protocol. The data is then extracted from these POS feeds, and has meta data management and identity assertion algorithms applied against it, because the data may be coming from many different stores, even different “named” stores where the same item may have different SKUs. Only then is the cleansed data loaded into the warehouse, it is not loaded directly from the hundreds/thousands of POS applications/registers.

IBM Netezza absolutely supports trickle feed and real/near-real time updates from this type of data stream process, as well as direct replication from other relational databases.

And, if you are looking for the ultimate in real time reporting and analytics on your POS data, IBM has the system for you. The IBM DB2 Analytics Accelerator is an IBM Netezza system connected directly to a DB2 for z/OS system using the zEnterprise connection. In this configuration, the transactional applications still run against the tried and true DB2 for z/OS system, and reporting/analytic queries get transparently routed through the DB2 for z/OS system to the Netezza system, to offload the processing and resource usage, and ultimately run much faster. DB2 for z/OS systems run many of the world’s largest scale OLTP applications, and this brings them the power of real time analytics without the need to create extra indexes, aggregates, etc. in their DB2 for z/OS system which are needed to allow the reports/analytics to run quickly enough, but also have a detrimental effect on the transactional performance.

Thursday, June 28, 2012

FUD Competitors are Spreading on Netezza

Recently I was made aware of some FUD (fear, uncertainty, and doubt) that a competitor has been sending to our current and prospective clients. This FUD contained a number of gross inaccuracies, as well as some “points” that really made me scratch my head and wonder how much this competitor really understands data warehousing and analytics.

This competitor claimed that Netezza scanned the entire table for all SQL / Analytic operations.

This claim is absolutely not true. While Netezza does not have indexes that the DBA must create and maintain, it does automatically build and maintain zone maps for all tables in the database. These zone maps contain the minimum and maximum value for all columns in every extent within each table. So, before the query starts reading from disk, it looks at the predicates in the query, and compares them to the zone maps to determine which table extents can be skipped and which need to be read.

For example, if you want to calculate the total sales of Red Ford Mustangs in June 2011, Netezza can skip any table extent that does not have data for June 2011. So, for a database with 7 years of sales history, it can skip any extent that has a maximum that is less, or a minimum that is greater than, June 2011. This eliminates 98% or more of the I/O required.

Our competitor claims that their range partitioning would automatically eliminate processing on all of the months other than June 2011 and is a better solution. While their range partitioning will eliminate the I/O like Netezza, there is a whole bunch of effort that partitioning brings to their solution that they do not talk about. In their solution you create a table, and then you create "inheritance children", one per partition. So for a table with 7 years of data, that is 84 monthly partitions, and 84 tables (the base table plus the 83 inheritance children). That might not seem too bad, but there's more. If you have a primary key, foreign key, index, constraint, or permission on the table, you need to apply it to the table and each of its inheritance children, it is not a global operation. So, for these 84 tables with 3 user groups with different permissions, a primary key, a constraint, and one non-unique index, that would be 84 * (3 + 1 + 1 + 1) or 504 DDL statements to set this up, and to maintain over time.

And on top of that, their bulk loader is not partition aware, so you need to write a rule for each table/inheritance child, adding 84 more DDL statements to the list.

In Netezza you write one statement to create the table, vs. 588 DDL statements for the exact same table and data in this competitor.

I’ll respond to some more of the claims this competitor has been making over my next few posts.