Dwaine Snow's Thoughts on DB2
Tuesday, August 30, 2005
Sunday, August 28, 2005
Intro to DB2 Performance Tuning
Performance is one of the most important keys in any type of database system. This chapter focuses on a number of performance tuning tips with DB2 Universal Database (UDB). Although some tuning tips are the same, regardless of the database workload, there are differences between OLTP- and DSS-type workloads, and these differences will be noted where applicable.An OLTP workload typically consists of many applications concurrently running short transactions that include not only selects but also inserts, updates, and deletes. In contrast, DSS workloads are generally read-only transactions, but the transactions are much more complex and normally read much more data.
The performance of a database and the applications that access it are influenced by many factors. This chapter first discusses some fundamentals of performance tuning as it relates to DB2. It then concentrates on how to configure DB2 for optimal performance, using database and database manager configuration parameters, as well as DB2 registry variables. In the database configuration parameter monitoring and tuning section, this chapter will present a list of database manager and database configuration parameters in their order of importance and make recommendations for setting and monitoring these parameters.
The Magic Triangle of Performance
When considering performance and tuning, it is important to understand the performance triangle. The sides of the performance represent: CPU, memory, and I/O.
In an ideal world, the triangle would be an equilateral triangle and would be in perfect balance. While performance tuning on a DB2 UDB server, it is important to consider the trade-offs when making tuning decisions. For example, when tuning sorting within a database, if tuning only for memory and, therefore, reducing the sort heap, we would see an increase in both CPU usage and I/O for the sorting of the data using temporary tables. To attempt to minimize the I/O involved in sorting, the amount of memory assigned to the sort heaps would need to be increased, and this would also lead to increased CPU usage. These three things (CPU, I/O, and memory) must be balanced in order to tune for optimal performance on the database.
Ensure Enough Available Memory
An OLTP workload is much more dependent on memory for performance than a DSS workload; however, both workloads require sufficient memory in order to perform optimally. For OLTP workloads, the memory is normally used for database buffer pools, whereas a DSS workload normally requires much more memory for sorting.
In DB2 UDB Version 8 with full 64-bit support, there are no longer limits on the maximum size of the buffer pools. For an OLTP workload, a good starting point for the initial size of the buffer pool is 75% of the usable memory installed on the database server. For a DSS workload, a starting point for the size of the buffer pools is 50% of the usable memory installed on the server to leave enough memory for application sorting.
In many of today's database servers, a process or thread is dedicated to each client that connects to a database. For a typical OLTP workload that handles large numbers of connected users who perform relatively short-lived transactions with some delay between subsequent transactions, this puts a heavy load on the database server because system resources are being tied up by client connections that are not performing any work. DB2 UDB Version 8 has implemented a connection-multiplexing architecture, the Connection Concentrator, that will allow users to move from a configuration where the number of connected users is constrained by the physical limitations of the underlying hardware to a scenario where the limiting factor will be solely based on the transaction load and the machine's ability to handle such a load.
Ensure Sufficient I/O Handling Capability
No matter what type of disk subsystem is used on the database server, there must be enough physical disks to support a high volume of concurrent transactions for an OLTP system or the large amount of data read for a DSS system. As a general rule of thumb, there should be at least six to ten physical disks per CPU on the database server to ensure adequate throughput and to ensure that there are no bottlenecks.
The best way to estimate the I/O handling capability needed to ensure good performance for the database is to prototype the actual transactions and database to determine the number of I/O requests required per transaction and the number of transactions that will be processed per second. Then the I/O rate for the disk controllers and the disk subsystem can be used to determine how many controllers and disks are required to achieve the desired level of performance.
Use the DB2 Configuration Advisor for an Initial Set of Database Configuration Parameters
The Configuration Advisor will ask a series of questions about the database server, nature of the workload, transactions, priority, connections, and isolation level to determine a starting set of database configuration parameter values. These parameters can later be modified to suit the production workload and for additional fine-tuning.
To configure a database for performance using the Configuration Advisor:
1. Open the DB2 Control Center.
2. Select/right-click the database to be configured.
3. Choose Configuration Advisor.
4. Complete each of the applicable wizard pages.
5. Each page is discussed below.
6. The Finish button is available once enough information has been supplied for the Advisor to configure performance parameters for the database.
7. Click Finish to get a list of suggested configuration parameters for the database.
Saturday, August 27, 2005
How well do you know your database workload?
How well do you think you know the workload running against your database?I have worked with a few customers recently who have been looking at using Query Patroller (QP) to manage the workload running against their data warehouses. The first part of using QP is to set it up and capture the queries on the system for a couple weeks. At this point there is a history of the workload that can be analyzed to determine good query classes and system or user thresholds.
The intriguing part of the last few QP customers that I have worked with is the look of amazement and downright disbelief when we examine the query history (in the db2qp.track_query_info table) and can show them that on their large EDW:
- 85% of their queries are less than 10,000 timerons
- 90% of their queries run in less than 2 seconds
- their longest running queries have a cost of 12 timerons
NOTE: These are not made up, these are actual results that we have found in some cases.
The first thing I do when analyzing the queries that have run, is get a breakdown of the number of queries and their average, minimum and maximum execution time within specific cost ranges. The query that I use to do this is pretty long, so I put it at the end of this entry, and labeled it Query # 1.
There are a vast number of queries you can run against the QP tables to gain valuable insight into the queries running against your database, who is running these queries, the tables and indexes used and not used, etc. By understanding the real workload and object usage you can make better tuning recommendations/decisions.
Some other queries you can run to gain more insight into the system workload:
Determine how many queries executed in less than one second
select count(*) from db2qp.track_query_info sql2 where ( (decimal((julian_day(sql2.time_completed)-julian_day(sql2.time_started))*3600*24)) + (decimal((hour(sql2.time_completed)-hour(sql2.time_started))*3600)) + (decimal((minute(sql2.time_completed)-minute(sql2.time_started))*60)) + (decimal((second(sql2.time_completed)-second(sql2.time_started)))) + ((decimal((microsecond(sql2.time_completed)-microsecond(sql2.time_started)))) / 1000000 ) ) < 1.0
Determine how many queries ran more than 500 seconds
select count(*) from db2qp.track_query_info sql2 where ( (decimal((julian_day(sql2.time_completed)-julian_day(sql2.time_started))*3600*24)) + (decimal((hour(sql2.time_completed)-hour(sql2.time_started))*3600)) + (decimal((minute(sql2.time_completed)-minute(sql2.time_started))*60)) + (decimal((second(sql2.time_completed)-second(sql2.time_started)))) + ((decimal((microsecond(sql2.time_completed)-microsecond(sql2.time_started)))) / 1000000 ) ) > 500.0
This script will return the Query Patroller ID, SQL, run time, username, and application of the top 10 longest running queries based on the execution time in ascending order. The critical section of the query which restricts the criteria to a specified timeron range is in bold. This can be removed to do this analysis for all queries.
select id,db2qp.convertToString(statement) as statement,estimated_cost,
((decimal((julian_day(sql2.time_completed)-julian_day(sql2.time_started))*3600*24)) + (decimal((hour(sql2.time_completed)-hour(sql2.time_started))*3600)) + (decimal((minute(sql2.time_completed)-minute(sql2.time_started))*60)) + (decimal((second(sql2.time_completed)-second(sql2.time_started)))) + ((decimal((microsecond(sql2.time_completed)-microsecond(sql2.time_started)))) / 1000000 ) ) as exec_time_sec,substr(user_id,1,20)as user_id,substr(application,1,30) as applications from db2qp.track_query_info sql2 where id in (select id
from db2qp.track_query_info sql2
where estimated_cost between 40000 and 2000000
order by ((decimal((julian_day(sql2.time_completed)-julian_day(sql2.time_started))*3600*24)) + (decimal((hour(sql2.time_completed)-hour(sql2.time_started))*3600)) + (decimal((minute(sql2.time_completed)-minute(sql2.time_started))*60)) + (decimal((second(sql2.time_completed)-second(sql2.time_started)))) + ((decimal((microsecond(sql2.time_completed)-microsecond(sql2.time_started)))) / 1000000 ) )
fetch first 10 rows only);
Query # 1
select case when rslt.timeron < 1000 then 'bt_000000000_1K'
when rslt.timeron between 1000 and 2000 then 'bt_000001000_2K' when rslt.timeron between 2000 and 3000 then 'bt_000002000_3K' when rslt.timeron between 3000 and 4000 then 'bt_000003000_4K' when rslt.timeron between 4000 and 5000 then 'bt_000004000_5K' when rslt.timeron between 5000 and 10000 then 'bt_000005000_10K' when rslt.timeron between 10000 and 20000 then 'bt_000010000_20K' when rslt.timeron between 20000 and 30000 then 'bt_000020000_30K' when rslt.timeron between 30000 and 40000 then 'bt_000030000_40K' when rslt.timeron between 40000 and 50000 then 'bt_000040000_50K' when rslt.timeron between 50000 and 100000 then 'bt_000050000_100K' when rslt.timeron between 100000 and 150000 then 'bt_000100000_150K' when rslt.timeron between 150000 and 200000 then 'bt_000150000_200K' when rslt.timeron between 200000 and 300000 then 'bt_000200000_300K' when rslt.timeron between 300000 and 400000 then 'bt_000300000_400K' when rslt.timeron between 400000 and 500000 then 'bt_000400000_500K' when rslt.timeron between 500000 and 1000000 then 'bt_000500000_1M' when rslt.timeron between 1000001 and 2000000 then 'bt_001000000_2M' when rslt.timeron between 2000001 and 3000000 then 'bt_002000000_3M' when rslt.timeron between 3000001 and 4000000 then 'bt_003000000_4M' when rslt.timeron between 4000001 and 5000000 then 'bt_004000000_5M' when rslt.timeron between 5000001 and 6000000 then 'bt_005000000_6M' when rslt.timeron between 6000001 and 7000000 then 'bt_006000000_7M' when rslt.timeron between 7000001 and 8000000 then 'bt_007000000_8M' when rslt.timeron between 8000001 and 9000000 then 'bt_008000000_9M' when rslt.timeron between 9000001 and 10000000 then 'bt_009000000_10M' when rslt.timeron between 10000001 and 20000000 then 'bt_010000000_20M' when rslt.timeron between 20000001 and 30000000 then 'bt_020000000_30M' when rslt.timeron between 30000001 and 40000000 then 'bt_030000000_40M' when rslt.timeron between 40000001 and 50000000 then 'bt_040000000_50M' when rslt.timeron between 50000001 and 60000000 then 'bt_050000000_60M' when rslt.timeron between 60000001 and 70000000 then 'bt_060000000_70M' when rslt.timeron between 70000001 and 80000000 then 'bt_070000000_80M' when rslt.timeron between 80000001 and 90000000 then 'bt_080000000_90M' when rslt.timeron between 90000001 and 100000000 then 'bt_090000000_100M' when rslt.timeron between 100000001 and 500000000 then 'bt_100000000_500M' when rslt.timeron between 500000001 and 1000000000 then 'bt_500000000_1000M' when rslt.timeron > 1000000001 then 'gt_1000000001' end as timeron_range , avg(rslt.max_exec) as avg_exec , min(rslt.max_exec) as min_exec , max(rslt.max_exec) as maxm_exec , count(*) as cnt from (select sql2.id,sql2.time_created, sql2.estimated_cost as timeron, ( (decimal((julian_day(sql2.time_completed)-julian_day(sql2.time_started))*3600*24)) + (decimal((hour(sql2.time_completed)-hour(sql2.time_started))*3600)) + (decimal((minute(sql2.time_completed)-minute(sql2.time_started))*60)) + (decimal((second(sql2.time_completed)-second(sql2.time_started)))) + ((decimal((microsecond(sql2.time_completed)-microsecond(sql2.time_started)))) / 1000000 ) ) as max_exec from db2qp.track_query_info sql2 where sql2.type = 1 and sql2.estimated_cost > 1 and sql2.completion_status = 'D' order by sql2.estimated_cost desc,sql2.time_created,sql2.id ) as rslt group by case when rslt.timeron < 1000 then 'bt_000000000_1K' when rslt.timeron between 1000 and 2000 then 'bt_000001000_2K' when rslt.timeron between 2000 and 3000 then 'bt_000002000_3K' when rslt.timeron between 3000 and 4000 then 'bt_000003000_4K' when rslt.timeron between 4000 and 5000 then 'bt_000004000_5K' when rslt.timeron between 5000 and 10000 then 'bt_000005000_10K' when rslt.timeron between 10000 and 20000 then 'bt_000010000_20K' when rslt.timeron between 20000 and 30000 then 'bt_000020000_30K' when rslt.timeron between 30000 and 40000 then 'bt_000030000_40K' when rslt.timeron between 40000 and 50000 then 'bt_000040000_50K' when rslt.timeron between 50000 and 100000 then 'bt_000050000_100K' when rslt.timeron between 100000 and 150000 then 'bt_000100000_150K' when rslt.timeron between 150000 and 200000 then 'bt_000150000_200K' when rslt.timeron between 200000 and 300000 then 'bt_000200000_300K' when rslt.timeron between 300000 and 400000 then 'bt_000300000_400K' when rslt.timeron between 400000 and 500000 then 'bt_000400000_500K' when rslt.timeron between 500000 and 1000000 then 'bt_000500000_1M' when rslt.timeron between 1000001 and 2000000 then 'bt_001000000_2M' when rslt.timeron between 2000001 and 3000000 then 'bt_002000000_3M' when rslt.timeron between 3000001 and 4000000 then 'bt_003000000_4M' when rslt.timeron between 4000001 and 5000000 then 'bt_004000000_5M' when rslt.timeron between 5000001 and 6000000 then 'bt_005000000_6M' when rslt.timeron between 6000001 and 7000000 then 'bt_006000000_7M' when rslt.timeron between 7000001 and 8000000 then 'bt_007000000_8M' when rslt.timeron between 8000001 and 9000000 then 'bt_008000000_9M' when rslt.timeron between 9000001 and 10000000 then 'bt_009000000_10M' when rslt.timeron between 10000001 and 20000000 then 'bt_010000000_20M' when rslt.timeron between 20000001 and 30000000 then 'bt_020000000_30M' when rslt.timeron between 30000001 and 40000000 then 'bt_030000000_40M' when rslt.timeron between 40000001 and 50000000 then 'bt_040000000_50M' when rslt.timeron between 50000001 and 60000000 then 'bt_050000000_60M' when rslt.timeron between 60000001 and 70000000 then 'bt_060000000_70M' when rslt.timeron between 70000001 and 80000000 then 'bt_070000000_80M' when rslt.timeron between 80000001 and 90000000 then 'bt_080000000_90M' when rslt.timeron between 90000001 and 100000000 then 'bt_090000000_100M' when rslt.timeron between 100000001 and 500000000 then 'bt_100000000_500M' when rslt.timeron between 500000001 and 1000000000 then 'bt_500000000_1000M' when rslt.timeron > 1000000001 then 'gt_1000000001' end order by 1,2
Friday, August 26, 2005
2005 DB2 Technical Conference - Sept 12-16th
Next month I will be presenting at the DB2 Information Management Technical Conference in Orlando. The 2005 IBM DB2 Information Management Technical Conference will give you the opportunity to experience the best of DB2®, Business Intelligence, Informix, U2, and WebSphere Information Integration technologies in one dynamic event. Learn how DB2 Information Management Software simplifies and speeds development with embedded capabilities to integrate and analyze information, regardless of its format or location. In the 4 ½ days of the conference you'll gain new insights, address business issues, and enhance your technical knowledge so you can grow your businesses.When: September 12–16, 2005
Where: Walt Disney World Dolphin®, Orlando, Florida
Register: ibm.com/training/us/conf/db2
The conference is a great place to learn from the people who work with DB2 daily. You will hear from the experts from the IBM labs as well as customers and consultants about the experiences they have had with DB2, and how they have:
Used DB2 to solve a problem
Setup or configured DB2 to provide the function they need
Tune DB2 for optimal performance
It is also a great place to make contacts that you can use when you have questions. It is always nice to know other people who are doing the same things that you are, and who you know you can ask for advice.
Hope to see you there.
..Dwaine
Thursday, August 25, 2005
Understanding DB2 - Learning Visually with Examples
Last year a group of us (Raul Chong, Clara Liu, Sylvia Qi and myself) got together to write a DB2 book that we thought would be different than any other DB2 book available today. The concept of the book was to use a visual learning method that presents the ideas and concepts in a clear and concise manner. The book uses figures, examples, case studies, and review questions to present and reinforce the material.If you want to take a look at a couple sample chapters from the book, you can checkout the following chapters from the book - "Introduction to DB2 UDB" and "DB2 at a Glance: The Big Picture" .
While looking around at other BLOGs recently, I found the following review of our book that was quite complimentary.
If you have read the book please post any comments you have here.
Thanks,
Dwaine
Wednesday, August 24, 2005
Hi
Just wanted to say Hi for now. I am starting this blog to share my thoughts, ideas, advice, etc. on DB2, and Data Warehousing.Just to be clear...The postings on this site solely reflect my personal views and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.
