You can check out my latest post about the new PureData Systems at these links.
Dwaine Snow's Thoughts on Databases and Data Management
Hi, and welcome to my blog. I have been working for IBM and working with DB2 for the past 22 years, and I recently started to work with our new colleagues from Netezza. Although I work for IBM, the views expressed are my own and not necessarily those of IBM and its affiliates. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.
Wednesday, November 07, 2012
Tuesday, September 25, 2012
Sorry for the lack of posts
I was asked to write a series of posts for the IBM Expert Integrated System blog area. Check out my first post here
Monday, August 06, 2012
The Downside of Down Casing
ANSI
standard naming in databases is to Upper Case the names of tables and columns
in the database. So, for the table users (see the statement below), the table
name and the two column names should be stored in the database as USERS,
USERID, and NAME.
create table users
(UserID int, Name Char(60))
Now, to
make things “easy”, ANSI standard databases also Upper Case references to
tables and columns automatically. So the statement select userid from users would be automatically converted to select USERID from USERS as it is
optimized, so that it will not fail.
In DB2
and Netezza if you run select userid
from users or select USERID from
users, or select UserID from Users
, you get all users in the table. No matter what mix of case you use for the
column named UserID, you get the same results, unless you enclose the table
name or column name in quotes. If the name is enclosed in quotes, then the case
is preserved, and must match exactly.
So, for
the table users2 created like
create table “USERS2”
(“USERID” int, “NAME” char(60))
Could be accessed
in DB2 and Netezza using any of the following SQL statement, because of the way
DB2 adheres to ANSI standards and Upper Cases the names.
select
USERID from USERS2
select
userid from users2
select
“USERID” from “USERS2”
select
“USERID” from users2
etc.
For databases
that down case the table and column names, 3 of the above 4 statements would
fail on the USERS2 table, and only the statement select “USERID” from “USERS2” would run.
Isn’t the
way that DB2 and Netezza work a lot more intuitive, and a lot easier? And since
you do not need to worry about the way that the SQL was written in your
existing application, this is a lot less work to make your existing
applications and BI report run.
Why cause
more work for yourself?
Monday, July 16, 2012
Adding a 4th V to BIG Data - Veracity
I talked a week or so ago about IBM’s 3 V’s of Big Data. Maybe
it is time to add a 4th V, for Veracity.
Veracity deals with uncertain or imprecise data. In traditional
data warehouses there was always the assumption that the data is certain,
clean, and precise. That is why so much time was spent on ETL/ELT, Master Data
Management, Data Lineage, Identity Insight/Assertion, etc.
However, when we start talking about social media data
like Tweets, Facebook posts, etc. how much faith can or should we put in the
data. Sure, this data can be used as a count toward your sentiment, but you
would not count it toward your total sales and report on that.
Two of the now 4 V’s of Big Data are actually working
against the Veracity of the data. Both Variety and Velocity hinder the ability
to cleanse the data before analyzing it and making decisions.
Due to the sheer velocity of some data (like stock
trades, or machine/sensor generated events), you cannot spend the time to
“cleanse” it and get rid of the uncertainty, so you must process it as is -
understanding the uncertainty in the data. And as you bring multi-structured
data together, determining the origin of the data, and fields that correlate
becomes nearly impossible.
When we talk Big Data, I think we need to define trusted
data differently than we have in the past. I believe that the definition of
trusted data depends on the way you are using the data and applying it to your
business. The “trust” you have in the data will also influence the value of the
data, and the impact of the decisions you make based on that data.
Subscribe to:
Posts (Atom)