I took a
few days off to visit the Finger Lakes in New York (you really should go there
if you like hiking and/or wine) and came back to an overflowing email inbox.
Some of these emails were from clients that have been receiving more
correspondence from our competitors with claims of their superiority over IBM
Netezza.
One of
these claims was that their solution is more nimble and able to handle broader
workloads because they have indexes and aggregates. This one made me think for
a few minutes, but I still feel that the Netezza approach where there is no
need for indexes is still a far better solution.
While an
index or aggregate can be used to improve/optimize performance for one or more
queries, the upfront table, aggregate, and index design/build phase will cause
other systems to take MANY TIMES longer to get up and running efficiently than
IBM Netezza appliances. In fact, some of our competitors’ customers have openly
talked about months long implementation cycles, while IBM Netezza customers
talk about being up and running in 24 hours…
Instead of days or weeks of planning their data models, IBM Netezza
appliance customers simply copy their table definitions from their existing
tables, load the data, and start running their queries/reports/analytics. There
is absolutely no need to create any indexes (and then have to choose between up
to 19 different types of indexes) or aggregates. Where other data warehouse
solutions require weeks of planning, IBM Netezza appliances are designed to
deliver results almost immediately.
Today’s data
warehouse technologies made it possible to collect and merge very large amounts
of data. Systems that require indexes are fine for creating historical reports
because you simply run the same report over and over again. But today’s
business users need answers promptly. The answer to one question will determine
the next question that they are going to ask, and that answer the next, and so
on. This thought process is known as “train of thought” analysis, and this can
lead to competitive advantages required in the economy of today and tomorrow.
Outside of IBM
Netezza data warehouse appliances, standard operating procedure is for users to
extract a small sample of data, move it out of the data warehouse to another
server, and then run the analytics against that sample. This is required because the systems cannot
support these ad-hoc analytic reports without completely exhausting the system
resources, and impacting all other users. Even if these other systems had the
right indexes all of the time, they would still need to move massive amounts of
data into memory before processing it. This has led to users with other data warehouse
solutions sampling their data and copying a small sample to a dedicated
analytics server.
The small sample
size allows the analysis to complete in a reasonable amount of time, but by
restricting this analysis to a small subset of the data, it becomes harder to
spot (and act on) the trends found within it.
We discussed this above with the baseball example, but it applies to everyone.
Credit cards are another example, with credit card numbers and identities being
stolen every day, the card companies need to detect these misuses immediately
in order to limit their liability and prevent loss. While there are quick
“indicators” of fraud, like a new credit card being used at a pay phone at an
airport for the first time, most indicators come from correlating more than one
transaction. For example, a card cannot be used in Kansas and then in Orlando 7
minutes later, unless one of the transactions is a web transaction or the card
number was manually entered because it was taken over the phone. So, if a card
was physically swiped in Kansas and then in Orlando less than a few hours
later, the account must have been compromised. Now, if the fraud detection
application only looked at every 10, or 100, transactions, they would miss at
least one, if not both, of these transactions most of the time.
No comments:
Post a Comment