You can accomplish this is a number of
ways, including:
· Over-building the original data warehouse to be able to
handle some incremental growth
· Restricting access to data or data sampling
· Adding more resources for the new users, data and
applications as they come on board.
But normally there are a number of
issues with these approaches,
· Clients do not want to spend 2-3 times as much as they
need to up front so they can accommodate some future growth that may or may not
occur
· Data sampling means that there is a good chance that the
important data may be missed
In our opinion, it is far more
effective, from a cost and effort, as well as overall performance of the system
perspective, to augment the Teradata system with IBM Netezza data warehouse
appliances where you can run the new applications without impacting the current
users at all. Rather than wait for weeks for the new system to arrive, the data
model to be tweaked for the new application, the data to be moved, and the
database to be tuned, why not roll in an IBM Netezza appliance, copy the data
model (schema) as is, load the data, and be up and running in hours?
As my colleague Nancy Kopp-Hensley discussed in her article "Consolidate Smarter with the Data Warehouse Ecosystem", we had a client that became challenged with query performance with their applications, and yet they were anxious to roll out some new applications in their sales and marketing divisions. Over time, they became challenged with query performance on their applications, and yet they were anxious to roll out some new applications in sales and marketing. And to top it off the business needed these new applications on-line right away. Rather than frustrate the business with a long timeline, which would have included first tuning the EDW to fix the existing problems before even starting the expansion, they chose to offload the new applications to a Netezza appliance. The result? Queries ran 24 times faster and they were able to achieve a much lower total cost of ownership (TCO).
As my colleague Nancy Kopp-Hensley discussed in her article "Consolidate Smarter with the Data Warehouse Ecosystem", we had a client that became challenged with query performance with their applications, and yet they were anxious to roll out some new applications in their sales and marketing divisions. Over time, they became challenged with query performance on their applications, and yet they were anxious to roll out some new applications in sales and marketing. And to top it off the business needed these new applications on-line right away. Rather than frustrate the business with a long timeline, which would have included first tuning the EDW to fix the existing problems before even starting the expansion, they chose to offload the new applications to a Netezza appliance. The result? Queries ran 24 times faster and they were able to achieve a much lower total cost of ownership (TCO).
You could also move your deep analytic
applications to the IBM Netezza platform, and run against the entire data set,
not just the last week’s data, or a sample of the data from the last year like
in the EDW. This will provide more accurate results and predictions that will help
drive more value to the organization. Consider an example, you are trying to
predict what a shopper can be influenced to buy, given a coupon. Let’s say that
the shopper has bought the following items in the past month:
1.
Topographical Map
of Alaska
2.
The book “Hiking
Alaska”
3.
Tent
4.
Back pack
5.
Sleeping bag
6.
Compass
7.
Portable GPS
In their current shopping expedition
they are buying a pair of hiking boots. Looking at the list of what they are
buying, we might hazard a guess that they are looking to start hiking, but we
do not know where, or know what else they might need. So, let’s sample their
historical purchases, and see what we can come up with. Even with a 20% sample
(which is much larger than normal) we might retrieve the tent and compass. We
still do not know where they are going, so we might offer them a coupon for a
sleeping bag. But we see that they already have one.
If the sample had included the book
and the backpack instead, we now have an idea they might be going to Alaska, so
maybe we should offer them a portable GPS for 20% off. This could be bad in a
couple of ways… If the offer is for the same GPS they bought, they are likely
to return the one they have and re-buy it, which just cut into the profit. If
the offer is for a newer, better GPS and a price close to the price of what
they just paid, then they may return the old one, or if they bought it just
outside of the 30 day return window, you are likely to have an unhappy customer
on your hands. This example shows why it is important to have fast analytics on
all of your data, not just a “representative sample”, and this is what you can
get by augmenting your EDW with an IBM Netezza data warehouse appliance.
[1] Teradata Customer Story -
Overstock.com retrieved 05/17/2011 from http://tinyurl.com/6mdktgm
-Since removed from the site
1 comment:
Great post Dwaine. I absolutely agree that agility is very important for all types of software and applications that incorporate analytics. I've recently been analyzing different types of DCIM solutions that incorporate analytics, and speed has been one thing that I've been evaluating. It's important that the software is user-friendly, fast, and allows for real-time analytical tracking.
Post a Comment