You can check out my latest post about the new PureData Systems at these links.
Hi, and welcome to my blog. I have been working for IBM and working with DB2 for the past 22 years, and I recently started to work with our new colleagues from Netezza. Although I work for IBM, the views expressed are my own and not necessarily those of IBM and its affiliates. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.
Wednesday, November 07, 2012
Tuesday, September 25, 2012
Sorry for the lack of posts
I was asked to write a series of posts for the IBM Expert Integrated System blog area. Check out my first post here
Monday, August 06, 2012
The Downside of Down Casing
ANSI
standard naming in databases is to Upper Case the names of tables and columns
in the database. So, for the table users (see the statement below), the table
name and the two column names should be stored in the database as USERS,
USERID, and NAME.
create table users
(UserID int, Name Char(60))
Now, to
make things “easy”, ANSI standard databases also Upper Case references to
tables and columns automatically. So the statement select userid from users would be automatically converted to select USERID from USERS as it is
optimized, so that it will not fail.
In DB2
and Netezza if you run select userid
from users or select USERID from
users, or select UserID from Users
, you get all users in the table. No matter what mix of case you use for the
column named UserID, you get the same results, unless you enclose the table
name or column name in quotes. If the name is enclosed in quotes, then the case
is preserved, and must match exactly.
So, for
the table users2 created like
create table “USERS2”
(“USERID” int, “NAME” char(60))
Could be accessed
in DB2 and Netezza using any of the following SQL statement, because of the way
DB2 adheres to ANSI standards and Upper Cases the names.
select
USERID from USERS2
select
userid from users2
select
“USERID” from “USERS2”
select
“USERID” from users2
etc.
For databases
that down case the table and column names, 3 of the above 4 statements would
fail on the USERS2 table, and only the statement select “USERID” from “USERS2” would run.
Isn’t the
way that DB2 and Netezza work a lot more intuitive, and a lot easier? And since
you do not need to worry about the way that the SQL was written in your
existing application, this is a lot less work to make your existing
applications and BI report run.
Why cause
more work for yourself?
Monday, July 16, 2012
Adding a 4th V to BIG Data - Veracity
I talked a week or so ago about IBM’s 3 V’s of Big Data. Maybe
it is time to add a 4th V, for Veracity.
Veracity deals with uncertain or imprecise data. In traditional
data warehouses there was always the assumption that the data is certain,
clean, and precise. That is why so much time was spent on ETL/ELT, Master Data
Management, Data Lineage, Identity Insight/Assertion, etc.
However, when we start talking about social media data
like Tweets, Facebook posts, etc. how much faith can or should we put in the
data. Sure, this data can be used as a count toward your sentiment, but you
would not count it toward your total sales and report on that.
Two of the now 4 V’s of Big Data are actually working
against the Veracity of the data. Both Variety and Velocity hinder the ability
to cleanse the data before analyzing it and making decisions.
Due to the sheer velocity of some data (like stock
trades, or machine/sensor generated events), you cannot spend the time to
“cleanse” it and get rid of the uncertainty, so you must process it as is -
understanding the uncertainty in the data. And as you bring multi-structured
data together, determining the origin of the data, and fields that correlate
becomes nearly impossible.
When we talk Big Data, I think we need to define trusted
data differently than we have in the past. I believe that the definition of
trusted data depends on the way you are using the data and applying it to your
business. The “trust” you have in the data will also influence the value of the
data, and the impact of the decisions you make based on that data.
Friday, July 13, 2012
A discussion of data models for analytics
Some of our
competitors recommend that you use 3rd Normal Form (3NF) for their data
structures, as they believe that is the optimal architecture for the ad hoc
queries that form the basis for decision support and analytical processing of
today. While 3NF can save storage space, it makes queries harder to write, and
slower to execute. A big down side of 3NF for data warehousing is that it
causes the database to join tables for most queries. “Joins” can be a
performance pitfall because they force large volumes of data to be moved around
the system. To speed up these queries, DBAs using these other databases create
and maintain aggregates and/or indexes across tables. In fact, some tables can
have 3, 4, 5 or even more aggregates/ indexes if they are joined to tables
using different columns. It is important to realize that these aggregates/indexes
require knowledge of the queries, reports and analytics that are going to be
run within the system, now and in the future.
Think of it
this way—if you are heading out for a Sunday drive, and have no destination in
mind, how can you use a navigation system to give you directions?
Another
issue that many of our competitors’ customers report is that they need to
create a duplicate copy of their data in a dimensional model in order to meet
their desired response times. In many cases this dimensional model will contain
many star schemas, as well as a number of copies of the fact tables containing
different levels of data. The issue with this approach is that the application
developer and the business users must be fully aware of what data these new tables
and aggregates really contain. If they aren’t aware, they can inadvertently
make a business decision based on an old snapshot of the data, or a small,
skewed sample of the data. In addition, if a user mistakenly picks the wrong
“table,” that query might take 10 or even 100 times longer to run that it would
if they completely understood the model.
Because IBM
Netezza appliances do not need indexes or pre-computed aggregates to perform
well, there is no need to tune the data model for IBM Netezza appliances. In
fact, many customers simply copy the data model directly from their old data
mart/warehouse or from their transactional system into the IBM Netezza
appliance and leave it as-is. They then simply load a single copy of the data
and start running their queries.
Wednesday, July 11, 2012
Why Analytics is like Wine Tasting
I mentioned yesterday that I was in the Finger Lakes last
week. Well, we stopped by some wineries while we were there and it is always
interesting to read the tasting notes on the back of the bottle, or on the
tasting sheet that the winery gives to you.
I was thinking last night about this and how great it would
be if all of the data we, as an organization, create or consume came with its
own “tasting notes”. Just imagine a new
set of transactions arriving from your OLTP systems, and they come with a tag that
says something like “This data shows a correlation between the purchase of
steaks and seasoning salt.” Wouldn’t that make the job of the data scientist /
data analyst so much easier?
We also went to one winery in particular, and noted that the
wine maker did not have any tasting notes, and only described his wine as “Like
your favorite pair of slippers” or something like that. After talking about
this for a while, we found that we actually liked this approach better. Rather
than tasting and searching for what the wine maker told us he or she tasted, we
were able to develop our own impression, and detect tastes on our own. Without
being directed to a particular smell or taste, we used our own nose and palette
to decide what we tasted, and what we liked. In the end we bought more bottles
from this winery than we did from any of the other wineries that we visited.
You might be scratching your head and wonder where I am
going here, so let me explain. I believe that analytics needs to be more like
the second case above. You should not start with any preconceived notions on
your data based on what others tell you. Analyze the data, detect
correlations/patterns/trends on your own, and then check the tasting notes if
you want to.
The goal of analytics
should be to find NEW information that you can act upon, not simply find the
same thing that someone else already found.
Tuesday, July 10, 2012
IBM's Big Data Platform - Saving One Life at a Time
Having been working on parts of IBM’s Big Data platform for
the past year or more, I am continually impressed with the value that IBM
brings to our clients.
When we talk Big Data at IBM, we talk about the three V’s:
Variety, Velocity, and Volume.
Volume is pretty simple. We all understand that we’re going from the terabytes to petabytes and into a
zettabytes world. I think most of us understand today just how much data is out
there now and what’s
coming over the next few years.
The variety aspect is something kind of new
to us in the data warehousing world, and it means that analytics are no longer
just be for structured data, and on top of that, analytics on structured data
doesn’t have to be in a traditional database any longer. The Big Data era is characterized
by the absolute need and desire to explore and analyze all of the data that
organization produce. Because most of the data we produce today is
unstructured, we need to fold in unstructured data analytics as well as
structured.
If you look
at a Facebook post or a Tweet, they may come in a structured format (JSON), but
the true value, and the part we need to analyze, is in the unstructured part.
And that unstructured part is the text of your tweet or your Facebook status/post.
Finally,
there’s velocity. We at IBM consider velocity
as being how fast the data arrives at the enterprise, and of course, it’s going to lead to the question, and
how long does it take you to analyze it and act on it?
It is important to keep in mind that a Big Data problem
could involve only one of these characteristics, or all of them. And in fact,
most of our clients see that a closed loop mechanism, normally involving more
than one of our Big Data solutions, is the best way to tackle their problem.
The neonatal ward at a well known Hospital is a prime
example of this. Hospital equipment issues an alert when a vital sign goes out
of range – prompting the hospital staff to take action immediate. However many life threatening conditions take
hours or days to reach critical levels, delaying possible life saving
treatments. Often signs that something is wrong begin to appear long before the
situation becomes serious enough to trigger an alert, and even a skilled nurse
or physician might not be able to spot and interpret these trends in time to
avoid serious complications. Complicating this is the fact that many of these
warning indicators are hard to detect and it’s next to impossible to understand
their interaction and implications until a threshold has been breached.
For example, nosocomial infection, a life threatening
illness contracted in hospitals. Research has shown that signs of this
infection can appear 12-24 hours before overt trouble/distress is spotted and
normal ranges exceeded. Making things more complex, in a baby where this
infection has set in, heart rates stay completely normal (i.e. it doesn’t rise
and fall throughout the day like it does for a healthy baby). In addition, the
pulse also stays within acceptable limits. The information needed to detect the
information is present, it is very subtle and hard to detect. In a neonatal
ward, the ability to absorb and reflect upon all of the data being presented is
beyond human capacity, there is just too much data.
By analyzing historical data, and developing correlations
and understanding of the indicators of this and other heath conditions, the
Doctors and researchers were able to develop a set of rules (or set of
conditions) that indicate a patient is suffering from a specific malady, like nosocomial
infection. The monitors (which can produce 1,000+ reading per second) feed
their reading into IBM’s InfoShpere Streams where it is checked on the fly. The
data is checked against healthy ranges, and also against other values for the
past 72 hours, and if there are any rules that are breached, an alert is
generated. For example, if a child’s heart rate has not changed for the past 4
hours and their temperature is above 99 degrees, then that is a good indicator
that they may be suffering from nosocomial infection.
And as the researchers continue to study more and more historical
data in their data warehouse and Hadoop clusters, when they detect more
correlations, they can dynamically update the rules that are being checked on
the real time streaming data.
Monday, July 09, 2012
Indexes do NOT make a warehouse agile
I took a
few days off to visit the Finger Lakes in New York (you really should go there
if you like hiking and/or wine) and came back to an overflowing email inbox.
Some of these emails were from clients that have been receiving more
correspondence from our competitors with claims of their superiority over IBM
Netezza.
One of
these claims was that their solution is more nimble and able to handle broader
workloads because they have indexes and aggregates. This one made me think for
a few minutes, but I still feel that the Netezza approach where there is no
need for indexes is still a far better solution.
While an
index or aggregate can be used to improve/optimize performance for one or more
queries, the upfront table, aggregate, and index design/build phase will cause
other systems to take MANY TIMES longer to get up and running efficiently than
IBM Netezza appliances. In fact, some of our competitors’ customers have openly
talked about months long implementation cycles, while IBM Netezza customers
talk about being up and running in 24 hours…
Instead of days or weeks of planning their data models, IBM Netezza
appliance customers simply copy their table definitions from their existing
tables, load the data, and start running their queries/reports/analytics. There
is absolutely no need to create any indexes (and then have to choose between up
to 19 different types of indexes) or aggregates. Where other data warehouse
solutions require weeks of planning, IBM Netezza appliances are designed to
deliver results almost immediately.
Today’s data
warehouse technologies made it possible to collect and merge very large amounts
of data. Systems that require indexes are fine for creating historical reports
because you simply run the same report over and over again. But today’s
business users need answers promptly. The answer to one question will determine
the next question that they are going to ask, and that answer the next, and so
on. This thought process is known as “train of thought” analysis, and this can
lead to competitive advantages required in the economy of today and tomorrow.
Outside of IBM
Netezza data warehouse appliances, standard operating procedure is for users to
extract a small sample of data, move it out of the data warehouse to another
server, and then run the analytics against that sample. This is required because the systems cannot
support these ad-hoc analytic reports without completely exhausting the system
resources, and impacting all other users. Even if these other systems had the
right indexes all of the time, they would still need to move massive amounts of
data into memory before processing it. This has led to users with other data warehouse
solutions sampling their data and copying a small sample to a dedicated
analytics server.
The small sample
size allows the analysis to complete in a reasonable amount of time, but by
restricting this analysis to a small subset of the data, it becomes harder to
spot (and act on) the trends found within it.
We discussed this above with the baseball example, but it applies to everyone.
Credit cards are another example, with credit card numbers and identities being
stolen every day, the card companies need to detect these misuses immediately
in order to limit their liability and prevent loss. While there are quick
“indicators” of fraud, like a new credit card being used at a pay phone at an
airport for the first time, most indicators come from correlating more than one
transaction. For example, a card cannot be used in Kansas and then in Orlando 7
minutes later, unless one of the transactions is a web transaction or the card
number was manually entered because it was taken over the phone. So, if a card
was physically swiped in Kansas and then in Orlando less than a few hours
later, the account must have been compromised. Now, if the fraud detection
application only looked at every 10, or 100, transactions, they would miss at
least one, if not both, of these transactions most of the time.
Tuesday, July 03, 2012
Why should compression only work for read-only data?
A number of our competitors make bold compression claims,
like “we provide 4X, or even 10X+ compression”. What they do not tell you is
that you have to choose between multiple different compression levels, and/or
compression libraries/types. They also do not mention that you cannot use their
compression on tables that have inserts, updates, and/or deletes occurring. Nor
do they mention the overhead that you can expect to see if you turn on their
compression.
Let’s look at these three points in a little more detail.
In today’s world of reduced
budgets, one of the easiest ways to save money with a data warehouse is to
reduce the amount of disk space required to store the data. To this end, nearly
all data warehouse solutions offer some form of database compression. In order
to use compression with many of our competitors the table must be an
Append-only table.
Append-only tables have the following limitations:
•Cannot UPDATE rows in an append-only table
•Cannot
DELETE rows from an append-only table
•Cannot
ALTER TABLE...ADD COLUMN to an append-only table
•Cannot
add indexes to an append-only table.
These limitations are because
of the way these vendors have implemented their ‘database compression’. While
Netezza has built in algorithms specifically designed for database usage,
others uses a compression library that “compresses” the data file as it is
being written to disk, and then uses the same library to un-compress the file
as it reads it from disk. For anyone that has used a tool like gzip, WinZip,
pkzip, WinRar, etc. you all know how slow these tools are, and how much CPU
cycles they use. This is the same consideration and overhead you will have with
these other vendors if you use their compression. In fact this overhead can be
so bad that some customers who have presented at our competitors’ conferences
have talked about tests where a single query running on a table with
compression used up over 95% of the CPU, while the same query against the same
data in a table that was not compressed used less than 5% of the CPU.
On top of the performance impact, there is also the DBA
overhead. With one competitor the DBA has to choose between 3 types of compression (i.e.
compression libraries) and 9 different levels of compression, each of which
work better for different data types. That is 27 different
combinations/permutations that the DBA has to choose between, for each table.
With
IBM Netezza compression is always on, there is no “better algorithm” for
different tables, and because of the way the Netezza architecture works, when
you get 4X compression with your data in Netezza, you see an associated 4X
improvement in performance, for all types of workloads, not just reads.
Monday, July 02, 2012
Addressing more crazy competitor claims
Among some of the other “claims”
that our competitor made about Netezza is that is can only load at the rate of
2TB/hr. First off, this is false. The current generation of the Netezza
platform can load at over 5TB/hr. But, the real question I ask is "Does this really matter after you have your system up and
running?”
After the initial loading of the
database from the existing system(s), very few companies load more than a
couple hundred GB to a couple TB per day, and most do not even approach these
daily or even monthly load volumes. Even Netezza’s biggest customers who have
PetaByte sized systems do not find Netezza’s load speed to be an issue.
Now let’s look at the claims this
competitor is making in more detail, and peel back the layers of the onion. This competitor claims that they can load at
double the Netezza’s load speed of 5TB/hr. But they leave out a number of
important factors when they make this claim.
What about compression?
Netezza can load at a rate of
5TB/hr and compress the data at the same time. This competitor can only load at
their claimed compression rate if compression is not used. So, if you want to
compress the data, how fast can you really load on their platform? They use a library based compression
algorithm that uses CPU cycles to basically “zip” the data pages as they are
written to disk, using significant CPU cycles in the system that cannot then be
used to format the data into pages, build indexes, etc.
What about partitioned tables?
This competitor needs tables to be
partitioned in order to provide good performance, but in order to load a table
this competitor has to have a “rule” for each data partition, and then each row
that is being loaded must be compared to the rules to know which partition it
should be loaded into. If the row should be loaded into one of the first couple
of ranges, then there is little extra processing, but all of the latest data
will have to be checked against many rules, slowing down the processing of
these rows, and definitely slowing down the load process.
What about indexes?
This competitor admits in their
manuals that they also need indexes in order to perform well. But, each index
incrementally slows down load performance.
Netezza does not need indexes to
perform well, so does not suffer from decreased load speed because of indexes,
or table partitioning
What about pre-processing?
Netezza can load at the same 5TB/hr
rate with no pre-processing of the input data file. This same competitor can
only load at their claimed faster rate if their appliance includes an option
“integration” or ETL module where the servers pre-process the data and then
send it to the data modules to be loaded. Without the integration module, the
load file would need to be placed on the shared file system (accessible from
all modules in their appliance) and then the load speed is really only 2TB/hr
based on published validation reports of their architecture and procedures. And
again, this 2TB/hr is without compression, or table partitioning.
Friday, June 29, 2012
Uncovering the truth about more outrageous competitor claims
Another
of the claims this competitor made is that Netezza cannot handle the data ingest
from a point of sale (POS) type system because so it could not handle the data
being sent from the hundreds or thousands of POS systems at the same time. They
go on to claim that Netezza cannot handle more than a specific, ridiculously
small, number of concurrent writes to try to give credence to this argument.
In my opinion,
this point shows a lack of knowledge of real data warehousing and analytics.
IBM Netezza has a large number of retail customers who feed their POS data into
Netezza all the time. But, this data normally always comes into the data center
in some non-database type of communication protocol. The data is then extracted
from these POS feeds, and has meta data management and identity assertion
algorithms applied against it, because the data may be coming from many
different stores, even different “named” stores where the same item may have
different SKUs. Only then is the cleansed data loaded into the warehouse, it is
not loaded directly from the hundreds/thousands of POS applications/registers.
IBM
Netezza absolutely supports trickle feed and real/near-real time updates from
this type of data stream process, as well as direct replication from other
relational databases.
And, if
you are looking for the ultimate in real time reporting and analytics on your
POS data, IBM has the system for you. The IBM DB2 Analytics Accelerator is an
IBM Netezza system connected directly to a DB2 for z/OS system using the
zEnterprise connection. In this configuration, the transactional applications
still run against the tried and true DB2 for z/OS system, and
reporting/analytic queries get transparently routed through the DB2 for z/OS
system to the Netezza system, to offload the processing and resource usage, and
ultimately run much faster. DB2 for z/OS systems run many of the world’s largest
scale OLTP applications, and this brings them the power of real time analytics
without the need to create extra indexes, aggregates, etc. in their DB2 for
z/OS system which are needed to allow the reports/analytics to run quickly
enough, but also have a detrimental effect on the transactional performance.
Thursday, June 28, 2012
FUD Competitors are Spreading on Netezza
Recently I was made aware of some FUD (fear, uncertainty,
and doubt) that a competitor has been sending to our current and prospective
clients. This FUD contained a number of gross inaccuracies, as well as some “points”
that really made me scratch my head and wonder how much this competitor really
understands data warehousing and analytics.
This competitor claimed that Netezza scanned the entire
table for all SQL / Analytic operations.
This claim is absolutely not true. While
Netezza does not have indexes that the DBA must create and maintain, it does
automatically build and maintain zone maps for all tables in the database.
These zone maps contain the minimum and maximum value for all columns in every
extent within each table. So, before the query starts reading from disk, it
looks at the predicates in the query, and compares them to the zone maps to
determine which table extents can be skipped and which need to be read.
For example, if you want to calculate
the total sales of Red Ford Mustangs in June 2011, Netezza can skip any table extent
that does not have data for June 2011. So, for a database with 7 years of sales
history, it can skip any extent that has a maximum that is less, or a minimum
that is greater than, June 2011. This eliminates 98% or more of the I/O
required.
Our competitor claims that their range partitioning would automatically eliminate processing on all of the months other than June 2011 and is a better solution. While their range partitioning will eliminate the I/O like Netezza, there is a whole bunch of effort that partitioning brings to their solution that they do not talk about. In their solution you create a table, and then you create "inheritance children", one per partition. So for a table with 7 years of data, that is 84 monthly partitions, and 84 tables (the base table plus the 83 inheritance children). That might not seem too bad, but there's more. If you have a primary key, foreign key, index, constraint, or permission on the table, you need to apply it to the table and each of its inheritance children, it is not a global operation. So, for these 84 tables with 3 user groups with different permissions, a primary key, a constraint, and one non-unique index, that would be 84 * (3 + 1 + 1 + 1) or 504 DDL statements to set this up, and to maintain over time.
And on top of that, their bulk loader is not partition aware, so you need to write a rule for each table/inheritance child, adding 84 more DDL statements to the list.
In Netezza you write one statement to create the table, vs. 588 DDL statements for the exact same table and data in this competitor.
Our competitor claims that their range partitioning would automatically eliminate processing on all of the months other than June 2011 and is a better solution. While their range partitioning will eliminate the I/O like Netezza, there is a whole bunch of effort that partitioning brings to their solution that they do not talk about. In their solution you create a table, and then you create "inheritance children", one per partition. So for a table with 7 years of data, that is 84 monthly partitions, and 84 tables (the base table plus the 83 inheritance children). That might not seem too bad, but there's more. If you have a primary key, foreign key, index, constraint, or permission on the table, you need to apply it to the table and each of its inheritance children, it is not a global operation. So, for these 84 tables with 3 user groups with different permissions, a primary key, a constraint, and one non-unique index, that would be 84 * (3 + 1 + 1 + 1) or 504 DDL statements to set this up, and to maintain over time.
And on top of that, their bulk loader is not partition aware, so you need to write a rule for each table/inheritance child, adding 84 more DDL statements to the list.
In Netezza you write one statement to create the table, vs. 588 DDL statements for the exact same table and data in this competitor.
I’ll respond to some more of the
claims this competitor has been making over my next few posts.
Sunday, May 20, 2012
Argumentative Colleagues and IBM Netezza reducing runtime from 7 hours to 8 seconds....
I had an “interesting discussion” with someone during my IBM
Netezza Futures presentation last week at the IDUG (International DB2 User’s
Group) conference. At the end of the presentation, one of my IBM colleagues was
trying to argue that Netezza’s streaming architecture must not be able to
handle multi-table joins because “it does all of its work in the FPGA”. I tried to explain that he was incorrect, and the way that Netezza does
joins so efficiently, but my colleague didn’t seem to listen, and asked the same thing again.
Although some people reading my post might think that I am
writing this to try to explain to my IBM colleagues how they should act in
customer events, I am not.
The best part of this discussion was when my colleague asked
the same question a second time, one of our customers stood up and said “What
Dwaine said about the performance of IBM Netezza is exactly right”. The
customer then went on to say how impressed his company is with the proof of
concept (PoC) they just completed with the IBM DB2 Analytics Accelerator. I will explain the IBM DB2 Analytics Accelerator soon, but what the customer went on to say is
that they have queries that were taking 7 hours on their DB2 for z/OS system (and
using MIPS all that time) that are now running on the IBM DB2 Analytics Accelerator in 8 seconds and
using basically no MIPS. I could not have paid for a better ending to my
session.
The IBM DB2 Analytics Accelerator is an IBM Netezza 1000 system piggy backing along side a
DB2 for z/OS system. The DB2 for z/OS optimizer will offload large, complex
queries from the DB2 system onto the IBM DB2 Analytics Accelerator, much like a materialized view or
materialize query table, but keep the short transactional queries running on
DB2 for z/OS.
So with absolutely no change to their application, data
loading procedures, or any extra tuning, this customer was able to run their
analytics over 3,000 times faster. That’s the power of the IBM Netezza
appliance and the IBM DB2 Analytics Accelerator.
And, it would still be nice if my IBM colleagues learned how
to act at customer events.
Tuesday, May 15, 2012
Increasing Profits with the "Best Offer"
My colleague James Kobielus has been talking about
“next best” action recently[1][2], and I also talked about this in my
post on the importance of agility for analytic applications[3]. I have been talking to a number of
customers recently who want to take this beyond next best, to the best offer.
In my blog I talked about a person who had
made a number of purchases, and the retailer was trying to influence the
customers next purchase by providing them with a coupon for an item that they
might be interested in based on their purchasing history. But, how much more
could a company make if they could influence their customers’ purchases before they
are made, pushing them to more profitable items during the ordering process.
Think of a distributor of auto parts as an
example. As the body shops and garages are placing orders, wither online or
over the phone, as they enter an item’s description or SKU, what if you could
offer a different part, that has the same specifications and same look and feel,
but it costs less, and has a higher profit margin for you. Well, that would be
a win-win situation, the garage gets a less expensive part that they can sell
to the customer for less (like this would really happen, right?) but the
distributor also makes a higher profit.
This take more than just a data warehouse /
analytic application. And this type of example is where IBM is uniquely
positioned to help you out. You cannot build this “system” with just a data
warehouse.
You will need an analytics warehouse that can
determine the most profitable items. You may need IBM InfoSphere BigInsights to examine
your parts catalogs to find which parts have the same specifications and can be
substituted for each other. You will probably need a rules storage mechanism
that will record the compatible items from BigInsights and also from your
internal sales team who have the deep knowledge of the items they sell. Then
you need to have both your online ordering system and your internal order entry
system must be intelligent enough to pop-up an offer whenever it sees one of the
item numbers or SKUs that have been identified as "replaceable".
While you might be able to call a customer
back if you can save them thousands of dollars or more, normally these offers
need to be made as the order is being entered/taken, not afterwards.
Bring the power if the IBM Big Data family
and our other Information Management and WebSphere solutions to your business,
and drive more profit.
Subscribe to:
Posts (Atom)