Tuesday, May 01, 2012

The Importance of Agility for Analytic Applications

In the context of data warehousing, agility means that the system can quickly and easily adapt to and accommodate; changes in data volumes, new data sources, new subject areas, new applications and/or new users. In order for a data warehouse to be able to do this, it needs to be able to run any query against any data model/schema. It must also be able to process all of the data, no mater what the query or what table(s) are being accessed, and do so quickly – without impacting the other users of the system.


You can accomplish this is a number of ways, including:
·       Over-building the original data warehouse to be able to handle some incremental growth
·       Restricting access to data or data sampling
·       Adding more resources for the new users, data and applications as they come on board. 
But normally there are a number of issues with these approaches,
·       Clients do not want to spend 2-3 times as much as they need to up front so they can accommodate some future growth that may or may not occur
·       Data sampling means that there is a good chance that the important data may be missed
·       Adding more resources to an exiting Teradata system can be a long, arduous, and costly process[1]

In our opinion, it is far more effective, from a cost and effort, as well as overall performance of the system perspective, to augment the Teradata system with IBM Netezza data warehouse appliances where you can run the new applications without impacting the current users at all. Rather than wait for weeks for the new system to arrive, the data model to be tweaked for the new application, the data to be moved, and the database to be tuned, why not roll in an IBM Netezza appliance, copy the data model (schema) as is, load the data, and be up and running in hours? 

As my colleague Nancy Kopp-Hensley discussed in her article "Consolidate Smarter with the Data Warehouse Ecosystem", we had a client that became challenged with query performance with their applications, and yet they were anxious to roll out some new applications in their sales and marketing divisions. Over time, they became challenged with query performance on their applications, and yet they were anxious to roll out some new applications in sales and marketing. And to top it off the business needed these new applications on-line right away. Rather than frustrate the business with a long timeline, which would have included first tuning the EDW to fix the existing problems before even starting the expansion, they chose to offload the new applications to a Netezza appliance. The result? Queries ran 24 times faster and they were able to achieve a much lower total cost of ownership (TCO).        

You could also move your deep analytic applications to the IBM Netezza platform, and run against the entire data set, not just the last week’s data, or a sample of the data from the last year like in the EDW. This will provide more accurate results and predictions that will help drive more value to the organization. Consider an example, you are trying to predict what a shopper can be influenced to buy, given a coupon. Let’s say that the shopper has bought the following items in the past month:

1.      Topographical Map of Alaska
2.      The book “Hiking Alaska”
3.      Tent
4.      Back pack
5.      Sleeping bag
6.      Compass
7.      Portable GPS

In their current shopping expedition they are buying a pair of hiking boots. Looking at the list of what they are buying, we might hazard a guess that they are looking to start hiking, but we do not know where, or know what else they might need. So, let’s sample their historical purchases, and see what we can come up with. Even with a 20% sample (which is much larger than normal) we might retrieve the tent and compass. We still do not know where they are going, so we might offer them a coupon for a sleeping bag. But we see that they already have one.

If the sample had included the book and the backpack instead, we now have an idea they might be going to Alaska, so maybe we should offer them a portable GPS for 20% off. This could be bad in a couple of ways… If the offer is for the same GPS they bought, they are likely to return the one they have and re-buy it, which just cut into the profit. If the offer is for a newer, better GPS and a price close to the price of what they just paid, then they may return the old one, or if they bought it just outside of the 30 day return window, you are likely to have an unhappy customer on your hands. This example shows why it is important to have fast analytics on all of your data, not just a “representative sample”, and this is what you can get by augmenting your EDW with an IBM Netezza data warehouse appliance.


[1] Teradata Customer Story - Overstock.com retrieved 05/17/2011 from http://tinyurl.com/6mdktgm -Since removed from the site

1 comment:

Mitch Stinson said...

Great post Dwaine. I absolutely agree that agility is very important for all types of software and applications that incorporate analytics. I've recently been analyzing different types of DCIM solutions that incorporate analytics, and speed has been one thing that I've been evaluating. It's important that the software is user-friendly, fast, and allows for real-time analytical tracking.