Tuesday, September 25, 2012

Sorry for the lack of posts

I was asked to write a series of posts for the IBM Expert Integrated System blog area.  Check out my first post here 

Monday, August 06, 2012

The Downside of Down Casing

ANSI standard naming in databases is to Upper Case the names of tables and columns in the database. So, for the table users (see the statement below), the table name and the two column names should be stored in the database as USERS, USERID, and NAME.

create table users (UserID int, Name Char(60))

Now, to make things “easy”, ANSI standard databases also Upper Case references to tables and columns automatically. So the statement select userid from users would be automatically converted to select USERID from USERS as it is optimized, so that it will not fail.

In DB2 and Netezza if you run select userid from users or select USERID from users, or select UserID from Users , you get all users in the table. No matter what mix of case you use for the column named UserID, you get the same results, unless you enclose the table name or column name in quotes. If the name is enclosed in quotes, then the case is preserved, and must match exactly.

So, for the table users2 created like

create table “USERS2” (“USERID” int, “NAME” char(60))

Could be accessed in DB2 and Netezza using any of the following SQL statement, because of the way DB2 adheres to ANSI standards and Upper Cases the names.

         select USERID from USERS2
         select userid from users2
         select “USERID” from “USERS2”
         select “USERID” from users2

For databases that down case the table and column names, 3 of the above 4 statements would fail on the USERS2 table, and only the statement select “USERID” from “USERS2” would run.

Isn’t the way that DB2 and Netezza work a lot more intuitive, and a lot easier? And since you do not need to worry about the way that the SQL was written in your existing application, this is a lot less work to make your existing applications and BI report run.

Why cause more work for yourself?

Tuesday, July 17, 2012

Nucleus research reports a 241% ROI using Big Data to enable larger, more complex analytics

Monday, July 16, 2012

Adding a 4th V to BIG Data - Veracity

I talked a week or so ago about IBM’s 3 V’s of Big Data. Maybe it is time to add a 4th V, for Veracity.

Veracity deals with uncertain or imprecise data. In traditional data warehouses there was always the assumption that the data is certain, clean, and precise. That is why so much time was spent on ETL/ELT, Master Data Management, Data Lineage, Identity Insight/Assertion, etc.

However, when we start talking about social media data like Tweets, Facebook posts, etc. how much faith can or should we put in the data. Sure, this data can be used as a count toward your sentiment, but you would not count it toward your total sales and report on that.

Two of the now 4 V’s of Big Data are actually working against the Veracity of the data. Both Variety and Velocity hinder the ability to cleanse the data before analyzing it and making decisions.

Due to the sheer velocity of some data (like stock trades, or machine/sensor generated events), you cannot spend the time to “cleanse” it and get rid of the uncertainty, so you must process it as is - understanding the uncertainty in the data. And as you bring multi-structured data together, determining the origin of the data, and fields that correlate becomes nearly impossible.

When we talk Big Data, I think we need to define trusted data differently than we have in the past. I believe that the definition of trusted data depends on the way you are using the data and applying it to your business. The “trust” you have in the data will also influence the value of the data, and the impact of the decisions you make based on that data.