Having been working on parts of IBM’s Big Data platform for
the past year or more, I am continually impressed with the value that IBM
brings to our clients.
When we talk Big Data at IBM, we talk about the three V’s:
Variety, Velocity, and Volume.
Volume is pretty simple. We all understand that we’re going from the terabytes to petabytes and into a
zettabytes world. I think most of us understand today just how much data is out
there now and what’s
coming over the next few years.
The variety aspect is something kind of new
to us in the data warehousing world, and it means that analytics are no longer
just be for structured data, and on top of that, analytics on structured data
doesn’t have to be in a traditional database any longer. The Big Data era is characterized
by the absolute need and desire to explore and analyze all of the data that
organization produce. Because most of the data we produce today is
unstructured, we need to fold in unstructured data analytics as well as
structured.
If you look
at a Facebook post or a Tweet, they may come in a structured format (JSON), but
the true value, and the part we need to analyze, is in the unstructured part.
And that unstructured part is the text of your tweet or your Facebook status/post.
Finally,
there’s velocity. We at IBM consider velocity
as being how fast the data arrives at the enterprise, and of course, it’s going to lead to the question, and
how long does it take you to analyze it and act on it?
It is important to keep in mind that a Big Data problem
could involve only one of these characteristics, or all of them. And in fact,
most of our clients see that a closed loop mechanism, normally involving more
than one of our Big Data solutions, is the best way to tackle their problem.
The neonatal ward at a well known Hospital is a prime
example of this. Hospital equipment issues an alert when a vital sign goes out
of range – prompting the hospital staff to take action immediate. However many life threatening conditions take
hours or days to reach critical levels, delaying possible life saving
treatments. Often signs that something is wrong begin to appear long before the
situation becomes serious enough to trigger an alert, and even a skilled nurse
or physician might not be able to spot and interpret these trends in time to
avoid serious complications. Complicating this is the fact that many of these
warning indicators are hard to detect and it’s next to impossible to understand
their interaction and implications until a threshold has been breached.
For example, nosocomial infection, a life threatening
illness contracted in hospitals. Research has shown that signs of this
infection can appear 12-24 hours before overt trouble/distress is spotted and
normal ranges exceeded. Making things more complex, in a baby where this
infection has set in, heart rates stay completely normal (i.e. it doesn’t rise
and fall throughout the day like it does for a healthy baby). In addition, the
pulse also stays within acceptable limits. The information needed to detect the
information is present, it is very subtle and hard to detect. In a neonatal
ward, the ability to absorb and reflect upon all of the data being presented is
beyond human capacity, there is just too much data.
By analyzing historical data, and developing correlations
and understanding of the indicators of this and other heath conditions, the
Doctors and researchers were able to develop a set of rules (or set of
conditions) that indicate a patient is suffering from a specific malady, like nosocomial
infection. The monitors (which can produce 1,000+ reading per second) feed
their reading into IBM’s InfoShpere Streams where it is checked on the fly. The
data is checked against healthy ranges, and also against other values for the
past 72 hours, and if there are any rules that are breached, an alert is
generated. For example, if a child’s heart rate has not changed for the past 4
hours and their temperature is above 99 degrees, then that is a good indicator
that they may be suffering from nosocomial infection.
And as the researchers continue to study more and more historical
data in their data warehouse and Hadoop clusters, when they detect more
correlations, they can dynamically update the rules that are being checked on
the real time streaming data.
1 comment:
Great piece Dwaine. Good to see IBM and the rest of the industry finally adopting the "3V"s of big data over 11 years after Gartner first published them. For future reference, and a copy of the original article I wrote in 2001, see: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/. --Doug Laney, VP Research, Gartner, @doug_laney
Post a Comment