Dwaine Snow's Thoughts on Databases and Data Management: IBM's Big Data Platform

Having been working on parts of IBM’s Big Data platform for the past year or more, I am continually impressed with the value that IBM brings to our clients.

When we talk Big Data at IBM, we talk about the three V’s: Variety, Velocity, and Volume.

Volume is pretty simple. We all understand that we’re going from the terabytes to petabytes and into a zettabytes world. I think most of us understand today just how much data is out there now and what’s coming over the next few years.

The variety aspect is something kind of new to us in the data warehousing world, and it means that analytics are no longer just be for structured data, and on top of that, analytics on structured data doesn’t have to be in a traditional database any longer. The Big Data era is characterized by the absolute need and desire to explore and analyze all of the data that organization produce. Because most of the data we produce today is unstructured, we need to fold in unstructured data analytics as well as structured.

If you look at a Facebook post or a Tweet, they may come in a structured format (JSON), but the true value, and the part we need to analyze, is in the unstructured part. And that unstructured part is the text of your tweet or your Facebook status/post.

Finally, there’s velocity. We at IBM consider velocity as being how fast the data arrives at the enterprise, and of course, it’s going to lead to the question, and how long does it take you to analyze it and act on it?

It is important to keep in mind that a Big Data problem could involve only one of these characteristics, or all of them. And in fact, most of our clients see that a closed loop mechanism, normally involving more than one of our Big Data solutions, is the best way to tackle their problem.

The neonatal ward at a well known Hospital is a prime example of this. Hospital equipment issues an alert when a vital sign goes out of range – prompting the hospital staff to take action immediate. However many life threatening conditions take hours or days to reach critical levels, delaying possible life saving treatments. Often signs that something is wrong begin to appear long before the situation becomes serious enough to trigger an alert, and even a skilled nurse or physician might not be able to spot and interpret these trends in time to avoid serious complications. Complicating this is the fact that many of these warning indicators are hard to detect and it’s next to impossible to understand their interaction and implications until a threshold has been breached.

For example, nosocomial infection, a life threatening illness contracted in hospitals. Research has shown that signs of this infection can appear 12-24 hours before overt trouble/distress is spotted and normal ranges exceeded. Making things more complex, in a baby where this infection has set in, heart rates stay completely normal (i.e. it doesn’t rise and fall throughout the day like it does for a healthy baby). In addition, the pulse also stays within acceptable limits. The information needed to detect the information is present, it is very subtle and hard to detect. In a neonatal ward, the ability to absorb and reflect upon all of the data being presented is beyond human capacity, there is just too much data.

By analyzing historical data, and developing correlations and understanding of the indicators of this and other heath conditions, the Doctors and researchers were able to develop a set of rules (or set of conditions) that indicate a patient is suffering from a specific malady, like nosocomial infection. The monitors (which can produce 1,000+ reading per second) feed their reading into IBM’s InfoShpere Streams where it is checked on the fly. The data is checked against healthy ranges, and also against other values for the past 72 hours, and if there are any rules that are breached, an alert is generated. For example, if a child’s heart rate has not changed for the past 4 hours and their temperature is above 99 degrees, then that is a good indicator that they may be suffering from nosocomial infection.

And as the researchers continue to study more and more historical data in their data warehouse and Hadoop clusters, when they detect more correlations, they can dynamically update the rules that are being checked on the real time streaming data.

1 comment:

Doug Laney said...: Great piece Dwaine. Good to see IBM and the rest of the industry finally adopting the "3V"s of big data over 11 years after Gartner first published them. For future reference, and a copy of the original article I wrote in 2001, see: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/. --Doug Laney, VP Research, Gartner, @doug_laney; 11:10 AM

Dwaine Snow's Thoughts on Databases and Data Management

Tuesday, July 10, 2012

IBM's Big Data Platform - Saving One Life at a Time

1 comment: