Use ASH for troubleshooting a long running query
You wont believe it how useful ASH (Active Session History) can be.
Step 1: Find the sql_id of the long_running query (the following lists the top 100 most -busy queries. I.e., queries that are in the most ASH samples either on cpu or actively waiting on something) :
[Update]: In this newer post we describe how one can use ASH and SQL Monitoring to tune a complex query that does not even finish!
nkarag@DWHPRD> host cat ash_top100_sql
cat: ash_top100_sql: No such file or directory
nkarag@DWHPRD> host cat ash_top100_sql.sql
set pagesize 999
set lines 999
col username format a13
col program format a10 trunc
col modules format a10 trunc
col session_id format 9999
col session_serial# format 99999
col inst_id format 9
select *
from (
select
username, sql_id, SQL_CHILD_NUMBER,
sum(decode(ash.session_state,'ON CPU',1,0)) CPU,
sum(decode(ash.session_state,'WAITING',1,0)) - sum(decode(ash.session_state,'WAITING',decode(wait_class,'User I/O',1, 0 ), 0)) WAITING,
sum(decode(ash.session_state,'WAITING',decode(wait_class,'User I/O',1, 0 ), 0)) IO,
sum(decode(session_state,'ON CPU',1,1)) TOTAL
from (
select sample_time , c.username, SESSION_ID, SESSION_SERIAL#, INST_ID, PROGRAM,module, IS_AWR_SAMPLE, a.sql_id, SQL_CHILD_NUMBER, IS_SQLID_CURRENT,
SESSION_STATE, WAIT_TIME, WAIT_CLASS, EVENT, P1TEXT, P2TEXT, P3TEXT, TIME_WAITED/100 time_waited_secs,
BLOCKING_SESSION, BLOCKING_SESSION_SERIAL#, BLOCKING_INST_ID
from gv$active_session_history a,
dba_users c
where
a.user_id = c.user_id
) ash
where
SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
group by username, sql_id, SQL_CHILD_NUMBER
order by sum(decode(session_state,'ON CPU',1,1)) desc
)
where rownum < 101
/
nkarag@DWHPRD> @ash_top100_sql
Enter value for minutes_from_now: 20
old 19: SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
new 19: SAMPLE_TIME > sysdate - (20/(24*60))
USERNAME SQL_ID SQL_CHILD_NUMBER CPU WAITING IO TOTAL
------------- ------------- ---------------- ---------- ---------- ---------- ----------
SYS a55a2732q80m7 0 1160 1 6 1167
SYS -1 76 780 0 856
LSINOS 11dzaj21zthzr 0 345 0 30 375
SYS 0 108 46 0 154
LSINOS 5ufj8g6fw5dht 0 9 0 24 33
LSINOS d1suttxfhnha0 0 22 0 0 22
Step 2: Analyze the top wait events of this query for a period of time (e.g., for the last 20 minutes):
nkarag@DWHPRD> host cat ash_events.sql
set pagesize 999
set lines 999
select decode(grouping(event),1,'TOTAL',event) event,
decode(grouping(wait_class),1,'TOTAL', wait_class) wait_class,
sum(PCNT), sum(nosamples), sum(nosamplestot),
decode(grouping(owner),1,'TOTAL', owner)owner, decode(grouping(object_name),1,'TOTAL', object_name) object_name, decode(grouping(object_type),1,'TOTAL', object_type) object_type,
decode(grouping(P1TEXT),1,'TOTAL', P1TEXT) P1TEXT, decode(grouping(P2TEXT),1,'TOTAL', P2TEXT) P2TEXT, decode(grouping(P3TEXT),1,'TOTAL', P3TEXT) P3TEXT
from (
select event, wait_class,round((count(*)/cnttot)*100) PCNT, count(*) nosamples, cnttot nosamplestot,P1TEXT, P2TEXT, P3TEXT,
owner, object_name, object_type
from (
select event, wait_class, P1TEXT, P2TEXT, P3TEXT, owner, object_name, object_type, count(*) over() cnttot
from gv$active_session_history a join dba_objects b on(a.CURRENT_OBJ# = b.object_id)
where
SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
and session_state= 'WAITING' and WAIT_TIME = 0
and sql_id = nvl('&sql_id',sql_id)
and SQL_CHILD_NUMBER = nvl('&SQL_CHILD_NUMBER',0)
)t
group by event, wait_class, P1TEXT, P2TEXT, P3TEXT, owner, object_name, object_type, cnttot
--order by count(*) desc
)
group by rollup(event, (wait_class, P1TEXT, P2TEXT, P3TEXT, owner, object_name, object_type))
order by sum(PCNT) desc
/
nkarag@DWHPRD> @ash_events
Enter value for minutes_from_now: 20
old 13: SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
new 13: SAMPLE_TIME > sysdate - (20/(24*60))
Enter value for sql_id: a55a2732q80m7
old 15: and sql_id = nvl('&sql_id',sql_id)
new 15: and sql_id = nvl('a55a2732q80m7',sql_id)
Enter value for sql_child_number: 0
old 16: and SQL_CHILD_NUMBER = nvl('&SQL_CHILD_NUMBER',0)
new 16: and SQL_CHILD_NUMBER = nvl('0',0)
Formated with excel:
It is obvious the query is doing a lot of single block reads on an index.
We can go a step further (always with the help of ASH), by trying to locate the execution plan operation that has taken the most DB time to execute.
nkarag@DWHPRD> host cat ash_ops.sql
set pagesize 999
set lines 999
select SQL_PLAN_OPERATION, owner, object_name, object_type, round((count(*)/cnttot)*100) PCNT, count(*) nosamples, cnttot nosamplestot
from (
select A.SQL_PLAN_OPERATION, owner, object_name, object_type, count(*) over() cnttot
from gv$active_session_history a join dba_objects b on(a.CURRENT_OBJ# = b.object_id)
where
SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
and sql_id = nvl('&sql_id',sql_id)
)
group by SQL_PLAN_OPERATION, owner, object_name, object_type, cnttot
order by count(*) desc
/
nkarag@DWHPRD> @ash_ops
Enter value for minutes_from_now: 20
old 6: SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
new 6: SAMPLE_TIME > sysdate - (20/(24*60))
Enter value for sql_id: a55a2732q80m7
old 7: and sql_id = nvl('&sql_id',sql_id)
new 7: and sql_id = nvl('a55a2732q80m7',sql_id)
Formated with excel:
We can deduce that the most DB-time consuming operation is a SORT that used a specific bitmap index.
Now, that we have found the point in our sql that consumes the most time we can focus our efforts on tuning this, instead of doing irrelevant tunings. All this has been accomplished due to the detailed session data provided by ASH (Active Session History).
Cheers,
OL.
Update (2/1/2013):
In the above query apart from the column SQL_PLAN_OPERATION you must add also column SQL_PLAN_OPTIONS so as to have a clearer picture of the executed operation. For example:
Lets run the following query:
nikos@NIKOSDB> select count(amount_sold) from sh.sales
2 /
COUNT(AMOUNT_SOLD)
------------------
918843
So, the most time-consuming operation for this count was a fast full index scan on SALES_PROMO_BIX bitmap index.
OL
Step 1: Find the sql_id of the long_running query (the following lists the top 100 most -busy queries. I.e., queries that are in the most ASH samples either on cpu or actively waiting on something) :
[Update]: In this newer post we describe how one can use ASH and SQL Monitoring to tune a complex query that does not even finish!
nkarag@DWHPRD> host cat ash_top100_sql
cat: ash_top100_sql: No such file or directory
nkarag@DWHPRD> host cat ash_top100_sql.sql
set pagesize 999
set lines 999
col username format a13
col program format a10 trunc
col modules format a10 trunc
col session_id format 9999
col session_serial# format 99999
col inst_id format 9
select *
from (
select
username, sql_id, SQL_CHILD_NUMBER,
sum(decode(ash.session_state,'ON CPU',1,0)) CPU,
sum(decode(ash.session_state,'WAITING',1,0)) - sum(decode(ash.session_state,'WAITING',decode(wait_class,'User I/O',1, 0 ), 0)) WAITING,
sum(decode(ash.session_state,'WAITING',decode(wait_class,'User I/O',1, 0 ), 0)) IO,
sum(decode(session_state,'ON CPU',1,1)) TOTAL
from (
select sample_time , c.username, SESSION_ID, SESSION_SERIAL#, INST_ID, PROGRAM,module, IS_AWR_SAMPLE, a.sql_id, SQL_CHILD_NUMBER, IS_SQLID_CURRENT,
SESSION_STATE, WAIT_TIME, WAIT_CLASS, EVENT, P1TEXT, P2TEXT, P3TEXT, TIME_WAITED/100 time_waited_secs,
BLOCKING_SESSION, BLOCKING_SESSION_SERIAL#, BLOCKING_INST_ID
from gv$active_session_history a,
dba_users c
where
a.user_id = c.user_id
) ash
where
SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
group by username, sql_id, SQL_CHILD_NUMBER
order by sum(decode(session_state,'ON CPU',1,1)) desc
)
where rownum < 101
/
nkarag@DWHPRD> @ash_top100_sql
Enter value for minutes_from_now: 20
old 19: SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
new 19: SAMPLE_TIME > sysdate - (20/(24*60))
USERNAME SQL_ID SQL_CHILD_NUMBER CPU WAITING IO TOTAL
------------- ------------- ---------------- ---------- ---------- ---------- ----------
SYS a55a2732q80m7 0 1160 1 6 1167
SYS -1 76 780 0 856
LSINOS 11dzaj21zthzr 0 345 0 30 375
SYS 0 108 46 0 154
LSINOS 5ufj8g6fw5dht 0 9 0 24 33
LSINOS d1suttxfhnha0 0 22 0 0 22
Step 2: Analyze the top wait events of this query for a period of time (e.g., for the last 20 minutes):
nkarag@DWHPRD> host cat ash_events.sql
set pagesize 999
set lines 999
select decode(grouping(event),1,'TOTAL',event) event,
decode(grouping(wait_class),1,'TOTAL', wait_class) wait_class,
sum(PCNT), sum(nosamples), sum(nosamplestot),
decode(grouping(owner),1,'TOTAL', owner)owner, decode(grouping(object_name),1,'TOTAL', object_name) object_name, decode(grouping(object_type),1,'TOTAL', object_type) object_type,
decode(grouping(P1TEXT),1,'TOTAL', P1TEXT) P1TEXT, decode(grouping(P2TEXT),1,'TOTAL', P2TEXT) P2TEXT, decode(grouping(P3TEXT),1,'TOTAL', P3TEXT) P3TEXT
from (
select event, wait_class,round((count(*)/cnttot)*100) PCNT, count(*) nosamples, cnttot nosamplestot,P1TEXT, P2TEXT, P3TEXT,
owner, object_name, object_type
from (
select event, wait_class, P1TEXT, P2TEXT, P3TEXT, owner, object_name, object_type, count(*) over() cnttot
from gv$active_session_history a join dba_objects b on(a.CURRENT_OBJ# = b.object_id)
where
SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
and session_state= 'WAITING' and WAIT_TIME = 0
and sql_id = nvl('&sql_id',sql_id)
and SQL_CHILD_NUMBER = nvl('&SQL_CHILD_NUMBER',0)
)t
group by event, wait_class, P1TEXT, P2TEXT, P3TEXT, owner, object_name, object_type, cnttot
--order by count(*) desc
)
group by rollup(event, (wait_class, P1TEXT, P2TEXT, P3TEXT, owner, object_name, object_type))
order by sum(PCNT) desc
/
nkarag@DWHPRD> @ash_events
Enter value for minutes_from_now: 20
old 13: SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
new 13: SAMPLE_TIME > sysdate - (20/(24*60))
Enter value for sql_id: a55a2732q80m7
old 15: and sql_id = nvl('&sql_id',sql_id)
new 15: and sql_id = nvl('a55a2732q80m7',sql_id)
Enter value for sql_child_number: 0
old 16: and SQL_CHILD_NUMBER = nvl('&SQL_CHILD_NUMBER',0)
new 16: and SQL_CHILD_NUMBER = nvl('0',0)
Formated with excel:
EVENT | WAIT_CLASS | SUM(PCNT) | SUM(NOSAMPLES) | SUM(NOSAMPLESTOT) | OWNER | OBJECT_NAME | OBJECT_TYPE |
cell single block physical read | User I/O | 100 | 9 | 9 | ICT_DW | ICT_USAGE_FCT_RAT_COMP_BIDX | INDEX SUBPARTITION |
TOTAL | TOTAL | 100 | 9 | 9 | TOTAL | TOTAL | TOTAL |
cell single block physical read | TOTAL | 100 | 9 | 9 | TOTAL | TOTAL | TOTAL |
It is obvious the query is doing a lot of single block reads on an index.
We can go a step further (always with the help of ASH), by trying to locate the execution plan operation that has taken the most DB time to execute.
nkarag@DWHPRD> host cat ash_ops.sql
set pagesize 999
set lines 999
select SQL_PLAN_OPERATION, owner, object_name, object_type, round((count(*)/cnttot)*100) PCNT, count(*) nosamples, cnttot nosamplestot
from (
select A.SQL_PLAN_OPERATION, owner, object_name, object_type, count(*) over() cnttot
from gv$active_session_history a join dba_objects b on(a.CURRENT_OBJ# = b.object_id)
where
SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
and sql_id = nvl('&sql_id',sql_id)
)
group by SQL_PLAN_OPERATION, owner, object_name, object_type, cnttot
order by count(*) desc
/
nkarag@DWHPRD> @ash_ops
Enter value for minutes_from_now: 20
old 6: SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
new 6: SAMPLE_TIME > sysdate - (20/(24*60))
Enter value for sql_id: a55a2732q80m7
old 7: and sql_id = nvl('&sql_id',sql_id)
new 7: and sql_id = nvl('a55a2732q80m7',sql_id)
Formated with excel:
SQL_PLAN_OPERATION | OWNER | OBJECT_NAME | OBJECT_TYPE | PCNT | NOSAMPLES | NOSAMPLESTOT |
SORT | ICT_DW | ICT_USAGE_FCT_RAT_COMP_BIDX | INDEX SUBPARTITION | 98 | 910 | 926 |
BITMAP INDEX | ICT_DW | ICT_USAGE_FCT_RAT_COMP_BIDX | INDEX SUBPARTITION | 2 | 14 | 926 |
PARTITION COMBINED | ICT_DW | ICT_USAGE_FCT_RAT_COMP_BIDX | INDEX SUBPARTITION | 0 | 1 | 926 |
BITMAP CONVERSION | ICT_DW | ICT_USAGE_FCT_RAT_COMP_BIDX | INDEX SUBPARTITION | 0 | 1 | 926 |
We can deduce that the most DB-time consuming operation is a SORT that used a specific bitmap index.
Now, that we have found the point in our sql that consumes the most time we can focus our efforts on tuning this, instead of doing irrelevant tunings. All this has been accomplished due to the detailed session data provided by ASH (Active Session History).
Cheers,
OL.
Update (2/1/2013):
In the above query apart from the column SQL_PLAN_OPERATION you must add also column SQL_PLAN_OPTIONS so as to have a clearer picture of the executed operation. For example:
Lets run the following query:
nikos@NIKOSDB> select count(amount_sold) from sh.sales
2 /
COUNT(AMOUNT_SOLD)
------------------
918843
Now lets get the execution plan operation that takes the most DB time in this query (we will utilize the column SQL_PLAN_OPTIONS to get the exact operation):
nikos@NIKOSDB> host cat ash_ops.sql
set pagesize 999
set lines 999
select SQL_PLAN_OPERATION, SQL_PLAN_OPTIONS, owner, object_name, object_type, round((count(*)/cnttot)*100) PCNT, count(*) nosamples, cnttot nosamplestot
from (
select A.SQL_PLAN_OPERATION, SQL_PLAN_OPTIONS, owner, object_name, object_type, count(*) over() cnttot
from gv$active_session_history a join dba_objects b on(a.CURRENT_OBJ# = b.object_id)
where
SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
and sql_id = nvl('&sql_id',sql_id)
)
group by SQL_PLAN_OPERATION, SQL_PLAN_OPTIONS, owner, object_name, object_type, cnttot
order by count(*) desc
/
nikos@NIKOSDB> @ash_ops
Enter value for minutes_from_now: 5
old 6: SAMPLE_TIME > sysdate - (&minutes_from_now/(24*60))
new 6: SAMPLE_TIME > sysdate - (5/(24*60))
Enter value for sql_id: 3dp4d69xdfh1u
old 7: and sql_id = nvl('&sql_id',sql_id)
new 7: and sql_id = nvl('3dp4d69xdfh1u',sql_id)
SQL_PLAN_OPERATION SQL_PLAN_OPTIONS OWNER OBJECT_NAME OBJECT_TYPE PCNT NOSAMPLES NOSAMPLESTOT
------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------- ---------- ---------- ------------
BITMAP INDEX FAST FULL SCAN SH SALES_PROMO_BIX INDEX PARTITION 100 1 1
So, the most time-consuming operation for this count was a fast full index scan on SALES_PROMO_BIX bitmap index.
OL
No comments:
Post a Comment