“In
Memory” seems to be the latest buzz term in the database and analytics arena. If
in memory was able to live up to the hype it would be a great thing, but there
is still a lot of hype involved.
In memory database
vendors claim that “Reporting requirements have a large impact on I/O time and
network resources when dealing with large volumes of data. In order to minimize
this, a company needs optimized file and database access methods. In-memory
overcomes this process by shifting all the data and workload into memory.” [1]
But this is not the
only way to eliminate the I/O bottleneck that can plague traditional databases
when they are used for data warehousing and analytics. IBM Netezza solves the
I/O bottleneck by pushing the work down to the I/O channel, where it acts on the
data as it comes off disk, and only sends the data that matches the query
across the server inter-connect.
Because
IBM Netezza operates on the data as it streams off of the disks, it is not
plagued data not fitting in memory. In an in-memory database, the entire
database must fit in memory, and any temporary tables, scratch tables,
overflowed sorts, etc. must also fit entirely within the memory the system has
been configured to use. As data volume
grow, and the “warehouse” becomes more and more visible, there will be more
users, and as the number of users increases, the chance that 2, 3, or even 20
or more people will be running queries at the same time, and may be sorting or
ordering data using temporary tables. Or, a new user may point and click their way
around Business Objects and build a huge query that joins 5 or 10 tables. All
of these will cause large temporary tables to be created, all at once,
potentially filling up the memory on the system.
So,
what happens if an in memory databases uses up all of the system memory (RAM)
on a server? Well according to an SAP employee, the system will crash.[2] Does that seem like a good solution to you?
Because
of issues like this, most in memory databases are used to run reports and/or
analytics on a sample of the data, to restrict the data size to something
manageable and affordable. And, as I
explained a couple weeks ago, using data sampling for analytics is a bad idea,
that will severely limit, if not completely eliminate the value you can derive
from your data. Data that you store (some where else, since it cannot fit in
memory) and do not analyze is a liability, only when you analyze your data does
it become an asset. IBM Netezza ensures
that all of your data is a business asset that you can derive the most value
from.
There
are a number of other points that I want to make on “in memory” databases, but
I’ll talk about them in my upcoming posts.
1 comment:
Post a Comment