Monday, February 28, 2011

Why You Need to Partition the Database and Applications To Scale with Oracle RAC/Exadata

On Friday I talked about the fundamental difference between Oracle RAC / Exadata and DB2 pureScale. And now I want to dive deeper into why RAC applications need to be cluster-aware to perform and scale well.

Let’s use a small example to show the differences. Let’s consider a 2-server (node/member if you are RAC or pureScale) cluster and a database that is being accessed by applications connecting to these servers.

In the RAC case, if a user sends a request to server 1 to update a row, say for customer Smith, it must get that row from the database into it’s own memory, then it can work on that row (i.e. apply the transaction). Then another user sends a request to server 2 asking it to update the row for customer Jones in the database. First server 2 must read that row into memory and then it can work on it. So far there are no issues, but let’s go on.

Now what happens if another user wants to update the data for customer Jones, but is routed to server 1?  In this case server 1 doesn’t have the row, it only has the row for customer Smith.   So server one sends a message over to server two asking it to send the row for customer Jones over to server 1. Once server 1 has a copy of the row for customer Jones it can then work on that transaction. Now server 1 has both rows (Jones and Smith) so if a transaction affecting either customer comes to it, it can be processed right away.

The problem now is that any transaction (for customer Smith or Jones) that goes to server 2 requires that server to go to server 1 to get the resource since it has no rows that it can work on directly.

As transactions are randomly distributed amongst the two servers (in order to balance workload) the rows for the customers must be sent back and forth between the two servers. This results in very inefficient use of resources (too much network traffic and a lot of messages between the two servers to coordinate access to data).  This limits the scalability of a RAC cluster and also impacts performance. To make RAC scale you have to find the bottlenecks and remove them. In most cases the bottlenecks are too much data being shipped back and forth between nodes (difficult to find in the first place because you now have to look in many different places across the cluster to find the hot spots).  To solve the problem you have to repartition your application and your database to make it scale.

DB2 and pureScale on the other hand provide near linear scalability our to over 100 members (servers) with no partitioning of the application or the database.


odenysenko said...


You have missed concept of oracle block/buffer in your post - server doesn't operated at row level as you mentioned...
The Feature that provides possibility of accessing block from the buffer cache
of another instance was named Cache Fusion,
and it provides better access times in comparision with disk access.

BTW, You were absolutely right that single resource contention is worst thing in RAC,
but there are pretty many ways how to deal with it,
and actually You don't have to migrate to RAC to use Exadata -
stay single-instance, but use processing offloading and be protected by clusterware.

Oleksandr Denysenko

Mitch Stinson said...

I agree, you absolutely need to partition your databases if you want to scale. This is one of the primary principles for any data center software solution, such as Oracle, IBM, AlphaPoint, and Rackwise. These data center software solutions can help you easily partition your databases and other apps.

THRJG said...

I dont see how Purescale could avoid the inter-node traffic long as there are multiple nodes and a single instance of the shared data ..there will be inter-node communication, if not there will be a unbalanced workload in a given node. This issue is not specific to Oracle ...any NUMA systems will have as per my understanding. Please explain further ..if this is not the case.