"Wait Event" Defined
At any given moment, every Oracle process is either busy servicing a
request or waiting for something specific to happen. By "busy" we mean
that the
process wishes to use the CPU. For example, a dedicated server process
might
be searching the buffer cache to see if a certain data block is in
memory.
This process would be said to be busy and not waiting. The ARC0
process,
meanwhile, might be waiting for LGWR to signal that an online redo log
needs
archiving. In this case, the ARC0 process is waiting.
The kernel developers at Oracle have defined a list of every
possible
event that an Oracle process could be waiting for. At any
moment,
every Oracle process that is not busy is waiting for one of these
events
to occur. Suppose an application has submitted a COMMIT statement and
the
server process is waiting for the LGWR process to signal that the redo
log
buffer has been flushed to disk. This wait event is called "log file
sync."
Another dedicated server process might be waiting for a row-level lock
on
the INVOICES table to be freed so that a SELECT FOR UPDATE statement
can
continue. That wait event is called "enqueue."
By querying v$ views, we can see what events
processes are waiting on to an amazing level of detail. For example, we
might learn that a dedicated server process has been waiting 30
milliseconds for the operating system
to read eight blocks from data file 42, starting at block 18042. We can
also see summary information of how much time each Oracle process has
spent
waiting on each type of wait event for the duration of the process. In
addition,
we can direct an Oracle process to write detailed wait event data to a
trace
file for later analysis using TKPROF.
Why Wait Event Information is Useful
Using the wait event interface, you can get insights into where time
is
being spent. If a report takes four hours to complete, for example, the
wait event interface will tell you how much of that four hours was
spent
waiting for disk reads caused by full table scans, disk reads caused by
index lookups, latch contention, and so on.
The wait event interface provides much more information to work with
than cache hit ratios do.You get data that can touch upon so many
different areas of your database such as disk I/O, latches, parallel
processing,
network traffic, checkpoints, and row-level locking. At the same time,
you
get incredible detail such as the file number and block number of a
block
being read from disk, or the name of a latch being waited on along with
the number of retries.
The wait event interface will not always give you all of the
information
you need in order to diagnose and solve a problem, but it will
certainly
point you in the right direction. You might think the buffer cache is
too
small because the cache hit ratio is only 70%, but in fact, the
application's
slow response time could be caused by latch contention in the shared
pool,
a bottleneck in the log writer, or any of a number of other things.
Prior to Oracle 10g, Oracle's default optimizer mode was called
“choose.” In the choose optimizer mode, Oracle will execute the
rule-based optimizer if there are no statistics present for the table;
it will execute the cost-based optimizer if statistics are present. The
danger with using the choose optimizer mode is that problems can occur
in cases where one Oracle table in a complex query has statistics and
the other tables do not.
Starting in Oracle 10g, the default optimizer mode is all_rows,
favoring full-table scans over index access. The all_rows
optimizer mode is designed to minimize computing resources and it
favors full-table scans. Index access (first_rows) adds
additional I/O overhead, but they return rows faster.
When only some tables contain CBO statistics, Oracle will use the
cost-based optimization and estimate statistics for the other tables in
the query at runtime. This can cause significant slowdown in the
performance of the individual query.
Common Wait Event Names and
What
They Mean
Although there are many different types of wait events, the majority of them come up very infrequently or tend not to be significant. In practice, only a few dozen wait events tend to be of interest to most DBAs. You'll see different wait events surfacing in different environments based on which Oracle features have been implemented and which capabilities of the database are being taxed the most. For example, the PX wait events won't appear if you aren't using parallel query, and the virtual circuit status wait event won't appear if you are not using the multi-threaded server architecture (or shared server architecture as it is more recently called). Along those lines, the log file sync and enqueue wait events probably won't be prevalent in a primarily read-only system. Here are some of the most common wait events and what they mean:
| Wait Event | Description |
| buffer busy waits | The session wants to access a data block that is either 1)
currently not in memory, but another process has already issued an I/O
request
to read the block into memory, or 2) in memory but in an incompatible
mode (current versus consistent, for example). For example another
session is using that block via an insert, update or delete. |
| control file parallel write | The session has issued multiple I/O requests in parallel to write blocks to all control files, and is waiting for all of the writes to complete. |
| control file sequential read | The session is waiting for blocks to be read from a control file. |
| db file parallel read | The session has issued multiple I/O requests in parallel to read blocks from data files into memory and is waiting for all requests to complete. This may occur during recovery or during regular activity when a session batches many single block I/O requests together and issues them in parallel. |
| db file parallel write | The process, typically DBWR, has issued multiple I/O requests in parallel to write dirty blocks from the buffer cache to disk and is waiting for all requests to complete. |
| db file scattered read | The session has issued an I/O request to read a series of
contiguous blocks from a data file into the buffer cache and is waiting
for the operation to complete. This typically happens during a full
table
scan or fast full index scan. Oracle reads up to
DB_FILE_MULTIBLOCK_READ_COUNT consecutive blocks at a time and scatters
them into buffers in the buffer cache |
| db file sequential read | The session has issued an I/O request to read one block from
a data file into the buffer cache and is waiting for the operation to
complete. This typically happens during an index lookup or a fetch from
a table
by ROWID when the required data block is not already in memory. This
call differs from "db file scattered read" in that a sequential read
reads data into contiguous memory (whilst a scattered read reads
multiple blocks and scatters them into different buffers in the SGA). |
| direct path read, direct path write |
The session has issued asynchronous I/O requests that bypass
the buffer cache and is waiting for them to complete. These wait events
often involve temporary segments, sorting activity, parallel query or
hash joins. |
| enqueue | The session is waiting on an enqueue (a lock you can see in v$lock). This commonly occurs when one user is trying to update a row in a table that is currently being updated by another user. |
| free buffer waits | The session needs a free buffer so it can bring a data block into the buffer cache and is waiting for a buffer that is not dirty to become available. This can occur if DBWR is not writing dirty buffers to disk fast enough. |
| latch free | The session is waiting for a latch held by another session. (This event does not apply to processes that are spinning while waiting for a latch; when a process is spinning, it is not waiting.) |
| library cache load lock | The session is waiting for the opportunity to load an object or a piece of an object into the library cache. (Only one process can load an object or a piece of an object at a time.) |
| library cache pin | The session wants to pin an object in memory in the library cache for examination, ensuring no other processes can update the object at the same time. This happens when you are compiling or parsing a PL/SQL object or a view. |
| log buffer space | The session is waiting for space in the log buffer. (Space becomes available only after LGWR has written the current contents of the log buffer to disk.) This typically happens when applications generate redo faster than LGWR can write it to disk. |
| log file parallel write | The session is waiting for blocks to be written to all online redo log members in one group. LGWR is typically the only process to see this wait event. It will wait until all blocks have been written to all members. |
| log file sequential read | The session is waiting for blocks to be read from the online redo log into memory. This primarily occurs at instance startup and when the ARCH process archives filled online redo logs. |
| log file switch completion | The session is waiting for a log file switch to complete, typically so more redo can be generated. |
| log file sync | The session is waiting for LGWR to finish flushing the log
buffer
to disk. This occurs when a user commits a transaction. (A transaction
is not considered committed until all of the redo to recover the
transaction
has been successfully written to disk). Probably you are commiting too
often |
| undo segment extension | The session is waiting for an undo segment to be extended or shrunk. |
| write complete waits | The session is waiting for a requested buffer to be written to disk; the buffer cannot be used while it is being written. |
There are several wait events that we call "idle events" because each of these wait events typically occurs when the Oracle process has nothing to do and is waiting for somebody to give it a task. Idle events are usually not very interesting from a tuning standpoint, so we usually overlook them when evaluating data extracted from the wait event interface. The common idle events are as follows:
|
|
|
| client message | PX Deq: Execute Reply |
| dispatcher timer | PX Deq: Execution Msg |
| gcs for action | PX Deq: Signal ACK |
| gcs remote message | PX Deq: Table Q Normal |
| ges remote message | PX Deque wait |
| i/o slave wait | PX Idle Wait |
| jobq slave wait | queue messages |
| lock manager wait for remote message | rdbms ipc message |
| null event | slave wait |
| parallel query dequeue | smon timer |
| pipe get | SQL*Net message from client |
| PL/SQL lock timer | SQL*Net message to client |
| pmon timer | SQL*Net more data from client |
| PX Deq Credit: need buffer | virtual circuit status |
| PX Deq Credit: send blkd | wakeup time manager |
What Wait Event Information
Does
Not Provide
If an Oracle process has work to do but, must wait for something to happen before it can continue, then the process will be waiting on a non-idle wait event. If a process has nothing to do, it will be waiting on an idle wait event. So what happens if a process has work to do and is busy doing it? When a process is busy, there will be no information in the wait event interface since the process is not waiting.
When we look at the wait event information extracted from an Oracle instance, we see detailed information about how many times and how much time was spent waiting for specific events to occur. But we do not see anything about the time periods in which the process requested use of the CPU. This means that the wait event interface is not able to provide information about the following:
This is important to keep in mind because there is an easy trap to fall into. You could be troubleshooting a very slow SELECT statement and learn from the wait event interface that the process does not have significant wait events when running the query. This could lead you to think that the statement is optimal, and that it just takes a long time to run. In fact, however, the query might be performing huge numbers of logical reads and the number of buffer gets could be reduced by rewriting the query.
When Oracle needs to access a data block and the block is already in
the buffer cache, a logical read occurs with no physical read. The
process is
able to read the block without the occurrence of any wait events. Large
amounts of CPU time could be consumed on significant numbers of logical
reads, and the wait event interface will have nothing to show for the
elapsed
time.
Statement parsing and spinning while waiting for a latch to become
available are two other examples of activities not accounted for by the
wait event interface. An Oracle process uses CPU time while parsing a
statement or
spinning on a busy latch; since no waiting is involved, the wait event
interface
does not have anything to report.
Beginning in Oracle 9i, the portion of the Statspack report
that lists wait event information also lists CPU usage. This is very
helpful information.
It allows us to easily compare time spent waiting to time spent
processing,
so we know where to focus our tuning efforts. However, it should be
pointed
out that the CPU usage information in this section of the Statspack
report
does not come from the wait event interface. Instead, Statspack merges
data
collected from the wait event interface with CPU usage information
collected
from the v$sysstat dynamic performance view.
A Note About the timed_statistics Parameter
The Oracle kernel is capable of timing many activities including wait events. In Oracle 9i Release 2 timed statistics are collected by default, and some timings are collected in units of microseconds. The timed_statistics instance parameter is used to enable and disable timed statistics collection. When timed_statistics is set to FALSE, all times in the wait event interface will appear as zero. You may enable timed statistics collection at the instance or the session level with the following commands:
You may enable timed statistics at the instance level on the next and all subsequent instance restarts by adding the following line to the instance parameter file:
timed_statistics = true
In practice, the overhead of collecting timed statistics is extremely small. In most cases, the benefit you'll get from having timing information at your disposal will outweigh the performance overhead. For several years, many DBAs have been running their production systems with timed statistics enabled at all times. With Oracle 9i Release 2, this is now the default behavior.
Collecting Wait Event
Information
There are two common ways of collecting wait event information: the
Oracle extended trace (10046 event) and using V$ views; which is the
main topic of this article. Any user with the
SELECT ANY
TABLE system privilege or the SELECT_CATALOG_ROLE role can query the v$
views. Only users who can connect to the database as SYSDBA or execute
the
DBMS_SUPPORT or DBMS_SYSTEM packages can activate wait event tracing in
other database sessions. In a typical environment, DBAs have access to
the
wait event interface but general users do not.
Oracle provides a number of views to aid in tuning; a few are:
* V$SYSTEM_EVENT, which gives you total system waits
for an event since database startup.
* V$SESSION_EVENT, which is at the session level.
* V$SESSION_WAIT gives you what an active session is
waiting on at that point in time.
* V$SESSION, which gives you session information and
has a column ROW_WAIT_OBJ# that tells you the object involved when
joined with OBJECT_ID or DATA_OBJECT_ID in DBA_OBJECTS.
Version 10g has some more V$ views for tracking history called
V$%_HISTORY, including V$ACTIVE_SESSION_HISTORY (ASH), which samples
non-idle wait events every second and new columns in V$SESSION to
combine information in V$SESSION_WAIT.
In addition to the four dynamic performance views, Oracle provides a tracing facility where extremely detailed wait event information is written to a trace file for later evaluation. In more recent releases of Oracle, a PL/SQL package is provided for activating the wait event tracing facility. In older releases of Oracle, the wait event tracing facility is activated by setting debug event number 10046 (not to be confused with a wait event).
The V$SYSTEM_EVENT View
The v$system_event view shows one row for each wait event name,
along
with the total number of times a process has waited for this event, the
number of timeouts, the total amount of time waited, and the average
wait
time. All of these figures are cumulative for all Oracle processes
since
the instance started. Wait events that have not occurred at least once
since
instance startup do not appear in this view. The v$system_event view
has the following columns:
Name Null? Type
------------------------------- -------- ----
EVENT VARCHAR2(64)
TOTAL_WAITS NUMBER
TOTAL_TIMEOUTS NUMBER
TIME_WAITED NUMBER
AVERAGE_WAIT NUMBER
TIME_WAITED_MICRO NUMBER
EVENT is the name of the wait event. You can see a list of all wait
event names known to your Oracle kernel with the following query:
SELECT
name FROM v$event_name;
TOTAL_WAITS is the total number of times a process has waited for
this
event since instance startup.
TOTAL_TIMEOUTS is the total number of times a process encountered a
timeout while waiting for an event. When a process begins to wait for
an event,
it specifies a timeout period after which the operating system should
wake
it up if the event has not yet transpired. For example, when an Oracle
process
issues an I/O request to read a block from a data file (the db file
sequential
read wait event), the process sets a timeout of one second. Usually the
I/O request will complete in less than one second and no timeout will
occur.
But if the read should take longer than one second for whatever reason,
a timeout will occur and the process will wake up. The process might do
some minor housekeeping, but it will likely just begin another timeout
period
of one second and continue waiting for the same event.
TIME_WAITED and AVERAGE_WAIT show the cumulative (sum) and average time
spent
by processes waiting for this event, in centiseconds. Divide these
figures
by 100 in order to get the wait time in seconds. These two columns will
show as zero if timed statistics are not enabled.
TIME_WAITED_MICRO is the same as TIME_WAITED, except that the time
is
in microseconds. Divide this figure by 1000000 in order to get the wait
time in seconds.
Consider the following query from v$system_event:
SELECT event, time_waitedSince instance startup, processes on this system have waited a total of 286.57 seconds while reading single data file blocks from disk, and over 1,596 seconds (26 minutes) while writing redo to the online redo logs. A huge amount of time has been spent waiting on the smon timer and SQL*Net message from client events, but these are both idle wait events so we don't worry about them. (The SMON process spends a lot of time sleeping between consecutive checks of the system, and many dedicated server processes spend a lot of their time waiting for the application to submit a SQL statement for processing.)
FROM v$system_event
WHERE event IN ('SQL*Net message from client', 'smon timer',
'db file sequential read', 'log file parallel write');
EVENT TIME_WAITED(sec)
------------------------------------------------------ ------------
log file parallel write 1596.92
db file sequential read 286.57
smon timer 1306738.37
SQL*Net message from client 165289.89
The V$SESSION_EVENT View
The v$session_event view is a lot like the v$system_event view,
except
that it shows separate rows of information for each Oracle process. As
with v$system_event, event names do not appear in this view if the
process has
not waited for them at least once. Also, when an Oracle process
terminates
(as in the case of when a user logs off the database) all of the rows
in v$session_event for that process permanently disappear. The
v$session_event view has the following columns:
Name Null? Type
------------------------------- -------- ----
SID NUMBER
EVENT VARCHAR2(64)
TOTAL_WAITS NUMBER
TOTAL_TIMEOUTS NUMBER
TIME_WAITED NUMBER
AVERAGE_WAIT NUMBER
MAX_WAIT NUMBER
TIME_WAITED_MICRO NUMBER
SID indicates the session ID of the process waiting for the event.
You
can query v$session in order to determine the SID of the session whose
wait
events you want to investigate. The next five columns in the
v$session_event
view are the same as in the v$system_event view, except that now they
pertain
to the one specific process instead of all processes.
MAX_WAIT indicates the maximum amount of time the process had to
wait
for the event. Like TIME_WAITED and AVERAGE_WAIT, the unit of measure
is
centiseconds and will display as zero if timed statistics are not
enabled.
Consider the following query which displays all wait event information for the current SQL*Plus session:
SELECT event, total_waits, time_waited_micro, max_wait
FROM v$session_event
WHERE SID =
(SELECT sid FROM v$session
WHERE audsid = USERENV ('sessionid') );
EVENT TOTAL_WAITS TIME_WAITED_MICRO MAX_WAIT
------------------------------ ----------- ----------------- ----------
log file sync 1 19629 2
db file sequential read 3 26128 1
db file scattered read 678 1154632 8
SQL*Net message to client 76 352 0
SQL*Net more data to client 1 63 0
SQL*Net message from client 75 1118656279 26930
SQL*Net break/reset to client 4 22281 2
You can see that the Oracle process serving this session has spent 1.180760 seconds waiting for disk I/O. Although there have been 76 instances where the process waited for the networking software to allow it to send a message to the client (the SQL*Plus program), each of these waits was shorter than the 0.01 second resolution of the MAX_WAIT column in v$session_event. (All 76 waits combined added up to only 0.000352 seconds.) Far and away, the Oracle process has spent the vast majority of its time (over 18 minutes) waiting for a SQL statement to be entered at the SQL*Plus prompt.
If you run the query against v$session_event shown above and get
back
no rows, it is possible that you are encountering Oracle bug number
2429929.
In some releases of Oracle 9i, including 9.2.0.1, the SID column in
v$session_event actually shows the SID minus one instead of the SID.
The V$SESSION_WAIT View
The v$session_wait view shows one row for each Oracle process. The
row indicates the name of the wait event and additional parameters that
provide further information about exactly what the process is waiting
for (or information
about the most recent wait event if the process is not currently
waiting).
While the v$system_event and v$session_event views show cumulative wait
event information, the v$session_wait view shows information as of the
present
moment only. The v$session_wait view has the following columns:
Name Null? TypeSID - Session ID.
------------------------------- -------- ----
SID NUMBER
SEQ# NUMBER
EVENT VARCHAR2(64)
P1TEXT VARCHAR2(64)
P1 NUMBER
P1RAW RAW(8)
P2TEXT VARCHAR2(64)
P2 NUMBER
P2RAW RAW(8)
P3TEXT VARCHAR2(64)
P3 NUMBER
P3RAW RAW(8)
WAIT_TIME NUMBER
SECONDS_IN_WAIT NUMBER
STATE VARCHAR2(19)
The following query shows the parameters associated with the db file scattered read wait event:
SELECT *
FROM v$event_name
WHERE name = 'db file scattered read';
EVENT# NAME PARAMETER1 PARAMETER2 PARAMETER3
---------- ---------------------- ------------ ----------- -----------
188 db file scattered read file# block# blocks
This tells us that when a process is waiting for a multi-block read from disk to complete (as in the case of a full table scan where the data blocks were not already in the buffer cache), we can see the file from which the blocks are being read as well as the starting block and number of blocks.
The following query was run while a session was performing a full table scan:
set linesize 32000
SELECT sid,seq#,substr(event,1,35) event, p1text, p1, p2text, p2, p3text, p3, wait_time, seconds_in_wait, state
FROM v$session_wait WHERE sid = 442;
SID SEQ# EVENT
---------- ---------- ------------------------------
P1TEXT P1 P1RAW
------------------------------ ---------- --------
P2TEXT P2 P2RAW
------------------------------ ---------- --------
P3TEXT P3 P3RAW WAIT_TIME SECONDS_IN_WAIT
------------------------------ ---------- -------- ---------- ---------------
STATE
-------------------
442 303 db file scattered read
file# 17 00000011
block# 2721 00000AA1
blocks 8 00000008 -1 0
WAITED SHORT TIME
You can see that the process was not waiting at the moment this
query
was run, but its last wait had been for an I/O request to read eight
blocks
from file 17 starting at block 2721. That I/O request had completed in
less
than 0.01 second. (Note the -1 in the WAIT_TIME column when the STATE
is
WAITED SHORT TIME.) Why did Oracle choose to read eight blocks at a
time?
Because the db_file_multiblock_read_count instance parameter was set to
eight.
For most wait events this view is sufficient, but it is hardly a robust
tuning tool for at least two important reasons:
* The view is a snapshot of the present. When the
waits cease to exist, the history of those waits experienced by the
session earlier disappears too, making after-effect diagnosis
difficult. V$SESSION_EVENT provides cumulative but not very detailed
data.
* V$SESSION_WAIT contains information only about
wait events; for all other relevant information such as the userid and
terminal you have to join it with the view V$SESSION.
In Oracle Database 10g, the wait
interface has been radically redesigned to provide more information
with less DBA intervention. In this article, we will explore those new
features and see how they aid us in the diagnosis of performance
problems. For most of the performance problems, you will get an
extended analysis from Automatic Database Diagnostic Manager (ADDM),
but for immediate problems not yet captured by ADDM, the wait interface
provides valuable data for diagnosis.
The V$SESSION View
Session details, the only listing column used in the scope of this
article.
ROW_WAIT_OBJ# - This is the object id of the object involved and is
joined to the DBA_OBJECTS.OBJECT_ID or DBA_OBJECTS.DATA_OBJECT_ID
column. This column is only populated in version 9i and above. A value
that is not -1 means that the session is waiting on a lock for this
object. Previous versions of the database will need to use P(n) values
from V$SESSION_WAIT to find objects in DBA_EXTENTS.
The v$event_name View
The v$event_name view shows reference information rather than
up-to-the-moment
information about instance performance. This view shows one row for
each
wait event known to the Oracle kernel. Associated with each wait event
are
up to three parameters-additional pieces of information that provide
more detail about a wait situation. This view displays the definition
of P(n) columns from V$SESSION_WAIT and can be joined on the NAME
column.
The v$event_name view has the
following columns:
Name Null? Type
------------------------------- -------- ----
EVENT# NUMBER
NAME VARCHAR2(64)
PARAMETER1 VARCHAR2(64)
PARAMETER2 VARCHAR2(64)
PARAMETER3 VARCHAR2(64)
EVENT# - Number of the wait event.
NAME - Name of the event.
PARAMETER1 - Description of the value for P1.
PARAMETER2 - Description of the value for P2.
PARAMETER3 - Description of the value for P3.
WAIT_CLASS_ID - ID of the wait class. Column is in 10g.
WAIT_CLASS# - Number of the class. Column is in 10g.
WAIT_CLASS - The name of the wait class (Idle, Network, System I/O,
etc.). Column is in 10g.
The v$system_event and v$session_event views show cumulative information about past waits in summary form, leaving out parameter information from each individual wait. As we will see in the next sections, wait event parameters come into play in the v$session_wait view and wait event trace files.
Tracing Wait Event Activity
(SYSTEM EVENT 10046)
DBAs should be familiar with the SQL trace facility built into
Oracle.
By using the commands below, you can enable SQL trace for your session
or
another session:
ALTER SESSION SET sql_trace = TRUE;
EXECUTE SYS.dbms_system.set_sql_trace_in_session (sid, serial#, TRUE);
When SQL trace is enabled for a session, the Oracle process writes
detailed trace information (including timing data if timed statistics
are enabled)
to a trace file in a directory specified by the user_dump_dest instance
parameter. These trace files are plain text files and human readable,
but
rather tedious and repetitive. You can optionally run trace files
through
a processor such as TKPROF instead of looking at them directly.
Oracle has the ability to direct a process to include additional
information in the trace file, including wait event information. In
earlier releases
of Oracle, we activated the wait event tracing facility by setting
debug
event 10046. Debug event 10046 still works in Oracle 9i, but recent
releases
of Oracle include a PL/SQL package built-in that gives a more friendly
means
for activating wait event tracing.
Setting debug events allows a DBA to instruct an Oracle instance to
take
on a special, atypical behavior. Debug events can be used, for example,
to cause Oracle to write a system level dump file whenever an ORA-0600
error
occurs or skip over corrupted blocks in a table when performing a full
table
scan. Most debug events should never be set unless you are directed to
do
so by Oracle Support. Some debug events can put your database at risk.
Debug event 10046 affects the amount of information written to trace
files. It is a very safe debug event, and one of the few that you are
allowed to
set without special permission from Oracle Support. Debug event 10046
can
be set to the following values:
| Debug Event Setting | Effect |
| 10046 trace name context forever, level 1 | Enables ordinary SQL trace |
| 10046 trace name context forever, level 4 | Enables SQL trace with bind variable values included in trace file |
| 10046 trace name context forever, level 8 | Enables SQL trace with wait event information included in trace file |
| 10046 trace name context forever, level 12 | Equivalent of level 4 and level 8 together |
| 10046 trace name context off | Turns off tracing |
You can set the 10046 debug event to trace your session and collect wait event information in the trace file with either of the following commands:
EXECUTE SYS.DBMS_SUPPORT.START_TRACE
ALTER SESSION SET events '10046 trace name context forever, level 8';
You can set the 10046 debug event to trace another session on the
database with any of the following commands:
Find the session:
set linesize 150
column Name format a14
column SID format 9999
column PID format 99999
column TERM format a15
column OSUSER format a15
column Program format a15
column Stats format a10
column Logon_time format a20
select a.username Name, a.sid SID, a.serial#, b.spid PID,
SUBSTR(A.TERMINAL,1,9) TERM, SUBSTR(A.OSUSER,1,9) OSUSER,
substr(a.program,1,10) Program, a.status Status,
to_char(a.logon_time,'MM/DD/YYYY hh24:mi') Logon_time
from v$session a, v$process b
where a.paddr = b.addr
and a.serial# <> '1'
and a.status = 'ACTIVE'
and a.username like
upper('%&user%') -- if you want to filter by username
order by a.logon_time;
Once
you have the SID and SERIAL# of the session you can use
some Oracle supplied packages.
-- set timed
statistics at user session level if not set at system level.
execute
sys.dbms_system.set_boo_param_in_session(&&SID,
&&SERIAL,'timed_statistics',true);
--set max dump
file size if not set at system level.
execute
sys.dbms_system.set_in_param_in_session(&&SID,
&&SERIAL, 'max_dump_file_size',10000000);
-- activate
level 8 extended SQL tracing.
execute
sys.dbms_system.set_ev(&&SID, &&SERIAL, 10046, 8, ' ');
or
execute sys.dbms_support.start_trace_in_session (&&SID, &&SERIAL)
******* run all of your processing here *******
--
disables extended SQL tracing.
execute
sys.dbms_system.set_ev(&&SID, &&SERIAL, 10046, 0, ' ');
or
execute sys.dbms_support.stop_trace_in_session (&&SID,
&&SERIAL)
Calling DBMS_SUPPORT as shown here is
equivalent to activating debug
event 10046 at level 8. You may include optional additional parameters
in the procedure call to activate level 4 or 12. The DBMS_SUPPORT
package is not installed in the database by default. You need to run
the dbmssupp.sql script found in the rdbms/admin directory as SYS
before you can call DBMS_SUPPORT.
It should also be pointed out that the SET_EV procedure in the
DBMS_SYSTEM package is not officially supported by Oracle
Corporation-they prefer that
you use DBMS_SUPPORT instead. The reason is that DBMS_SYSTEM.SET_EV
allows
you to set any debug event in any session. As mentioned above some
debug
events can potentially be dangerous, and DBMS_SYSTEM.SET_EV lets you
set
such events in any session.
Unlike timed statistics, tracing consumes a significant amount of
system resources. Therefore it is important to use the tracing facility
sparingly. Trace only the sessions you need to trace, and only for as
long as you need. Turn off tracing as soon as it is no longer needed
with any of the following commands:
EXECUTE SYS.DBMS_SUPPORT.STOP_TRACE
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
ALTER SESSION SET sql_trace = FALSE;
EXECUTE SYS.DBMS_SUPPORT.STOP_TRACE_IN_SESSION (sid, serial#)
oradebug setorapid [Oracle PID from v$process]
oradebug session_event 10046 trace name context forever, level 0
EXECUTE SYS.DBMS_SYSTEM.SET_EV (sid, serial#, 10046, 0, '')
When you set debug event 10046 to a level of 8 or 12 (or call
START_TRACE
or START_TRACE_IN_SESSION in DBMS_SUPPORT), the Oracle process will
write
a line into the trace file every time it finishes waiting for an event.
The line in the trace file will contain almost the same information
that
would have appeared in the v$session_wait view, but perhaps in a
slightly
less friendly format. You can also see in the trace file which cursor
(and
therefore which SQL statement) the wait event was associated with.
Here is an excerpt from a trace file generated by setting debug
event
10046 to level 12 on an Oracle 8i database:
=====================
PARSING IN CURSOR #1 len=80 dep=0 uid=502 oct=3 lid=502
tim=2293771931 hv=2293373707 ad='511dca20'
SELECT /*+ FULL */ SUM (LENGTH(notes))
FROM customer_calls
WHERE status = :x
END OF STMT
PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=0,tim=2293771931
BINDS #1:
bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=03 oacfl2=0
size=24 offset=0
bfp=09717724 bln=22 avl=02 flg=05
value=43
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=2293771931
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=675562835 p2=1 p3=0
WAIT #1: nam='db file scattered read' ela= 3 p1=17 p2=923 p3=8
WAIT #1: nam='db file scattered read' ela= 1 p1=17 p2=931 p3=8
WAIT #1: nam='db file scattered read' ela= 2 p1=17 p2=939 p3=8
WAIT #1: nam='db file sequential read' ela= 0 p1=17 p2=947 p3=1
WAIT #1: nam='db file scattered read' ela= 3 p1=17 p2=1657 p3=8
WAIT #1: nam='db file scattered read' ela= 1 p1=17 p2=1665 p3=8
WAIT #1: nam='db file scattered read' ela= 2 p1=17 p2=1673 p3=8
WAIT #1: nam='db file scattered read' ela= 0 p1=17 p2=1681 p3=8
WAIT #1: nam='db file scattered read' ela= 3 p1=17 p2=1761 p3=8
WAIT #1: nam='db file scattered read' ela= 1 p1=17 p2=1769 p3=8
WAIT #1: nam='db file scattered read' ela= 1 p1=17 p2=1777 p3=8
WAIT #1: nam='db file scattered read' ela= 0 p1=17 p2=1785 p3=8
WAIT #1: nam='db file scattered read' ela= 2 p1=17 p2=1841 p3=8
WAIT #1: nam='db file scattered read' ela= 1 p1=17 p2=1849 p3=8
The "ela=" figures indicate the duration of the wait. In trace files
generated
by Oracle 8i and earlier, the elapsed time is shown in centiseconds.
Beginning
in Oracle 9i, the elapsed time is shown in microseconds.
You can see that when the session executed the query, there was a
wait shorter than one centisecond for a message to be sent to the
client, followed
by a bunch of waits for I/O requests against file 17. Most of the I/O
requests
were multi-block reads, reading eight blocks at a time while performing
a full table scan of the CUSTOMER_CALLS table.
You can probably imagine how large and overwhelming trace files can
get
when anything but a trivial application is traced. This is why Oracle
provides
the TKPROF utility. TKPROF takes a trace file as input and generates a
nicely formatted, easy to read report. Unfortunately, the TKPROF that
comes with
Oracle 8i and earlier releases ignores wait event information written
in
the trace file. To analyze the wait event information collected by a
trace
in an Oracle 8i or earlier environment, you will need to either pore
through
the trace file manually or write your own trace file parser and
formatter.
(We've heard of people writing perl scripts to do this, but we have
never
come across them.)
The TKPROF utility that comes with Oracle 9i is capable of reporting
wait event information summarized by distinct statement. By default
Oracle 9i
TKPROF ignores wait event information in the trace file as earlier
releases
did. However, if the "waits=yes" command line argument is provided, a
separate
table of wait event statistics will appear in the report for each
distinct
statement. Here is a sample excerpt from a TKPROF report showing wait
event
information:
********************************************************************************
SELECT A.customer_id, A.customer_name, COUNT (*) purchases,
MAX (B.transaction_date) last_purchase_date
FROM customers A, purchase_history B
WHERE B.customer_id = A.customer_id
GROUP BY A.customer_id, A.customer_name
ORDER BY A.customer_id
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.06 0.12 0 27 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1461 21.41 25.80 7801 5905 5 21893
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1463 21.47 25.93 7801 5932 5 21893
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 17
Rows Row Source Operation
------- ---------------------------------------------------
21893 SORT GROUP BY
1525494 HASH JOIN
31212 TABLE ACCESS FULL CUSTOMERS
1525494 INDEX FAST FULL SCAN PURCHASE_HISTORY_PK (object id 7824)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1461 0.00 0.00
db file sequential read 4 0.00 0.01
db file scattered read 750 0.07 2.25
direct path write 271 0.83 2.79
direct path read 254 0.00 0.01
SQL*Net message from client 1461 1.28 22.40
********************************************************************************
You can see that the query was parsed and executed once, and there
were
1461 fetches to read 21893 rows. The Oracle process used 21.47 CPU
seconds
and 25.93 seconds of elapsed time to parse, execute, and fetch the
query.
The elapsed time was greater than the CPU time because the Oracle
process
had to wait on several wait events. The listing at the bottom of the
report
shows the process waited on network roundtrips to the client and reads
and
writes to disk.
All of these waits seem to make sense: There was one network
roundtrip
to the client for each fetch call, and it took SQL*Plus a long time
(over
22 seconds) to display the 21,893 rows of data on my display. The db
file
sequential reads and db file scattered reads resulted from the table
access
and index fast full scan. The direct path writes and reads resulted
from
the hash join and sort operations. If you add the CPU time of 21.47
seconds
and the wait times (5.06 seconds not including the SQL*Net message from
client waits) you come up with an elapsed time of 26.53 seconds.
However,
the report shows the total elapsed time as 25.93 seconds. This is a
good
example of the round-off error that is not unusual when thousands of
very
short time intervals are added together.
In some situations you may not be able to identify exactly which
session
to trace. This happens frequently when your database is supporting a
web application that has frequent short connections, or an application
server
tier that maintains a pool of database connections all logged on as the
same Oracle user. You may have a specific query you want to collect
wait
event information for, but you may not know which session will be
executing
the query. One way to deal with this situation is to trace all of the
sessions
being started by your application for a brief period. This can be done
easily
with the following SQL*Plus script:
SPOOL traceall.sql
SET HEADING OFF FEEDBACK OFF
SELECT 'EXECUTE SYS.dbms_system.set_ev (' || TO_CHAR (sid) ||
', ' || TO_CHAR (serial#) || ', 10046, 8, '''')'
FROM v$session
WHERE username = 'WEB_USER';
SPOOL OFF
SET FEEDBACK ON
@traceall.sql
This script will start wait event tracing on all sessions connected
to
the database as the WEB_USER user. To stop tracing simply change the 8
after
the 10046 to a 0 and run the script again. You should think about how
many
sessions this script will trace and how that will affect server load
before
running on a production system.
After you've traced a number of sessions, you can scan the trace
files
for occurrences of a specific query with a command like the following:
grep -n 1234567890 *.trc
Replace 1234567890 with the hash value or address of the statement
you
are interested in. (You can get these values by querying v$sql.) The
output
of the above grep command will tell you which trace files the query
appears
in and on which lines. You can then go to that point in the trace files
and get detailed wait event information.
Trace files are easiest to work with when you use a dedicated server
connection
to the database. If you connect to the database using Oracle's shared
server
architecture then different SQL statements may be executed by different
server
processes. Each server process writes to its own trace file. Thus the
trace
information for the session can be spread over several trace files.
Wait Event Enhancements in Oracle 10g
Oracle 10g brings more
significant change to the
wait event interface than we have seen in years. Now there are over 800
wait events and names are more descriptive, wait events are categorized
into classes, several v$ views have been added, helpful columns have
been added to existing v$ views, built-in statistics collection by
Active Session History and the Automatic Workload Repository has been
introduced, a new time model concept for looking at how sessions spend
their time has appeared, and improvements have been made to the session
tracing facility.
In this section, we will introduce what we see as the top dozen
areas in which wait events and the wait event interface have been
enhanced in Oracle 10g. We’ve listed these enhancements in no
particular order.
More Descriptive Wait Event Names
Prior to Oracle 10g, some wait event names were quite vague and not very useful without looking at the parameter values for a specific occurrence of the event. For example, an enqueue wait could indicate various situations ranging from contention for a row in a table to waiting on a user-defined lock. Wait event names in Oracle 10g are more descriptive in the areas of latches, enqueues, and buffer busy waits.
There is still a latch free wait event in Oracle 10g, but there are also 26 more specific latch-related events. These cover the most common latches that experience contention. In the past, if we saw a session waiting on the latch free event, we would have output like the following:
SQL> SELECT event, state, p1, p2, p3
2 FROM v$session_wait
3 WHERE sid = 162;
EVENT STATE P1 P2 P3
------------- ------- ----------- ------ -----
latch free WAITING 15113593728 97 5
We would then have to query v$event_name to determine the meaning of the parameters p1, p2, and p3:
SQL> SELECT * FROM v$event_name WHERE name = 'latch free';
EVENT# NAME PARAMETER1 PARAMETER2 PARAMETER3
------ ---------- --------------- --------------- ---------------
3 latch free address number tries
And, seeing that p2 is the latch number, we would need to query v$latch to find out which latch was being waited upon:
SQL> SELECT name
2 FROM v$latch
3 WHERE latch# = 97;
NAME
--------------------
cache buffers chains
In Oracle 10g we simply see:
SQL> SELECT event, state
2 FROM v$session_wait
3 WHERE sid = 162;
EVENT STATE
------------------------------ -------
latch: cache buffers chains WAITING
The descriptive event name saves us two steps in determining which latch is causing the wait. The more detailed description enables one to more quickly drill down to the root cause of the wait.
The names of enqueue-related wait events have also been made more descriptive in Oracle 10g. There is no longer an enqueue wait event—it has been replaced by 184 events with more detailed names. In the past, if sessions were waiting on enqueues, we would have to decode the type of lock from the p1 parameter:
SQL> SELECT event, state, seconds_in_wait siw
2 FROM v$session_wait
3 WHERE sid = 96;
EVENT STATE SIW
----------------------------------- ------------------- ----------
enqueue WAITING 24
SQL> SELECT sid,
2 CHR (BITAND (p1,-16777216) / 16777215) ||
3 CHR (BITAND (p1, 16711680) / 65535) enq,
4 DECODE (CHR (BITAND (p1,-16777216) / 16777215) ||
5 CHR (BITAND (p1, 16711680) / 65535),
6 'TX', 'Transaction (RBS)',
7 'TM', 'DML Transaction',
8 'TS', 'Tablespace and Temp Seg',
9 'TT', 'Temporary Table',
10 'ST', 'Space Mgt (e.g., uet$, fet$)',
11 'UL', 'User Defined',
12 CHR (BITAND (p1,-16777216) / 16777215) ||
13 CHR (BITAND (p1, 16711680) / 65535)) enqueue_name,
14 DECODE (BITAND (p1, 65535), 1, 'Null', 2, 'Sub-Share',
15 3, 'Sub-Exclusive', 4, 'Share', 5, 'Share/Sub-Exclusive',
16 6, 'Exclusive', 'Other') lock_mode
17 FROM v$session_wait
18 WHERE sid = 96;
SID ENQ ENQUEUE_NAME LOCK_MODE
----- ---- ------------------------------ ----------
96 TX Transaction (RBS) Exclusive
In Oracle 10g we get more information directly from the enqueue name:
SQL> SELECT event, state, seconds_in_wait siw
2 FROM v$session_wait
3 WHERE sid = 143;
EVENT STATE SIW
----------------------------------- ------------------- ----------
enq: TX - row lock contention WAITING 495
Additionally, more information is available in some cases from the p1, p2, and p3 parameters, whose meanings vary with the different enqueue-related wait events. (See the Appendix for a listing of all wait events and their associated parameter meanings.)
Wait Event Classes
In Oracle 10g wait events are classified into categories which can help the DBA to more easily determine the likely root cause of the wait. The categories are:
| Administrative | Idle |
| Application | Network |
| Cluster | Scheduler |
| Commit | System I/O |
| Concurrency | User I/O |
| Configuration | Other |
While nearly 70% of the wait events are in the “Other” category (557 out of 811 in release 10.1.0.3), the most frequently encountered ones are in wait classes with helpful names. Let’s look at the wait class designations of some enqueue events as an example:
SQL> SELECT wait_class, name
2 FROM v$event_name
3 WHERE name LIKE 'enq%'
4 AND wait_class <> 'Other'
5 ORDER BY wait_class;
WAIT_CLASS NAME
------------------------------ ----------------------------------------
Administrative enq: TW - contention
Administrative enq: DB - contention
Application enq: PW - flush prewarm buffers
Application enq: RO - contention
Application enq: RO - fast object reuse
Application enq: TM - contention
Application enq: TX - row lock contention
Application enq: UL - contention
Concurrency enq: TX - index contention
Configuration enq: ST - contention
Configuration enq: TX - allocate ITL entry
Configuration enq: SQ - contention
Configuration enq: HW - contention
We see that TX enqueues (row locks) and TM enqueues (table locks) are in the Application class, which makes sense since these wait events generally occur because of application behavior. Meanwhile, ST (space management), HW (high-water mark extension), and SQ (sequence number) enqueues are in the Configuration class, as these can usually be alleviated by changes in object and database settings.
The User I/O class includes the db file scattered read, db file sequential read, direct path read, and direct path write events as one might expect, while the System I/O class includes many waits related to reading and writing of redo logs and archive logs. The Commit class has one member, log file sync, as that wait is caused by commits. And the Idle class is made up of various wait events which have traditionally been considered “idle events,” such as SQL*Net message from client. It should be noted that sometimes such “idle events” can actually be symptoms of the root cause of poor performance, so they should not be disregarded without consideration.
In general, the addition of wait classes helps direct the DBA more quickly toward the root cause of performance problems.
v$ View Enhancements
In Oracle 10g there are quite a few new v$ views that pertain to wait events, and helpful new columns have been added to existing v$ views. We will discuss several of the enhancements in this section, although some new v$ views are part of major new Oracle 10g functionality and will be discussed in separate sections later on.
v$event_name
Three columns have been added to the v$event_name view in Oracle 10g: wait_class_id, wait_class#, and wait_class. These columns show which wait class the wait event is part of. We saw in the previous section how this new information might be used. The columns in v$event_name now are:
SQL> DESCRIBE v$event_name
Name Null? Type
----------------------------------------- -------- ----------------------------
EVENT# NUMBER
EVENT_ID NUMBER
NAME VARCHAR2(64)
PARAMETER1 VARCHAR2(64)
PARAMETER2 VARCHAR2(64)
PARAMETER3 VARCHAR2(64)
WAIT_CLASS_ID NUMBER
WAIT_CLASS# NUMBER
WAIT_CLASS VARCHAR2(64)
v$sql and v$sqlarea
The v$sql and v$sqlarea views have six new columns in Oracle 10g that relate to wait events:
| application_wait_time |
| concurrency_wait_time |
| cluster_wait_time |
| user_io_wait_time |
| plsql_exec_time |
| java_exec_time |
Suppose we create a table called testtab with about a million rows. We then run the following statement from one session without committing, and then run the same statement from another session:
SQL> UPDATE testtab SET numcol = numcol + 1 WHERE ROWNUM < 1000;
Obviously the second session will wait on an enqueue wait event until the first session either commits or rolls back. After a while we roll back the first session and then the second session. Next, in a third session, we run the following statement:
SQL> UPDATE testtab SET numcol = numcol + 1;
The instance we’re using has a small buffer cache, so quite a bit of physical I/O is caused by the statement. After the UPDATE completes we look at v$sqlarea:
SQL> SELECT sql_id, application_wait_time appl, concurrency_wait_time concurr,
2 user_io_wait_time user_io
3 FROM v$sqlarea
4 WHERE sql_text LIKE 'UPDATE testtab SET numcol%';
SQL_ID APPL CONCURR USER_IO
------------- --------- --------- -----------
038m56cp4am0c 178500000 0 20000
fd5mxhdbf09ny 0 10000 105040000
SQL> SELECT sql_id, sql_text
2 FROM v$sqlarea
3 WHERE sql_id IN ('fd5mxhdbf09ny','038m56cp4am0c');
SQL_ID SQL_TEXT
------------- -------------------------------------------------------------
038m56cp4am0c UPDATE testtab SET numcol = numcol + 1 WHERE ROWNUM < 1000
fd5mxhdbf09ny UPDATE testtab SET numcol = numcol + 1
So we see that the first statement (locking rows) spent 178.5 seconds (178,500,000 microseconds) waiting on events in the Application wait class and 0.02 seconds waiting on events in the User I/O wait class. If we recall from the discussion of wait classes, the TX enqueue for row contention is in the Application wait class, and data file reads are in the User I/O wait class. The second statement, which required more disk reads, shows 105 seconds of User I/O waits and a very small amount of concurrency waits.
While the currently available documentation from Oracle does not provide much information about these new columns in v$sql and v$sqlarea, they appear to hold much promise for diagnosing query performance problems.
v$session_wait_history
Up through Oracle 9i, the v$ views show us only the most recent wait event for each session. Even though wait times can accumulate to greatly slow down a process, many waits are very short (from a human perspective) individually. So, it’s often difficult to grab information on a wait event as it is happening. The v$session_wait_history view, new in Oracle 10g, helps by showing the last ten wait events each session has experienced As an example, the following query shows the ten most recent wait events for session 154:
The seq# column is supposed to show the chronological sequence of the wait events, with 1 being the most recent wait event in the session. Note that this seq# value differs from the seq# column in v$session, which is incremented for each wait experienced by the session throughout the life of the session.SQL> SELECT sid, seq#, event, wait_time, p1, p2, p3
2 FROM v$session_wait_history
3 WHERE sid = 154
4 ORDER BY seq#;
SID SEQ# EVENT WAIT_TIME P1 P2 P3
--- ---- ------------------------ ---------- ------ ------ ------
154 1 db file sequential read 28 4 3547 1
154 2 log buffer space 18 0 0 0
154 3 log buffer space 36 0 0 0
154 4 db file sequential read 0 4 3559 1
154 5 db file sequential read 0 4 1272 1
154 6 db file sequential read 0 4 3555 1
154 7 log buffer space 9 0 0 0
154 8 db file sequential read 0 4 3551 1
154 9 db file sequential read 6 4 1268 1
154 10 log buffer space 8 0 0 0
In the above query, we see that the session’s most recent waits alternated between single-block disk reads and log buffer space. This makes sense, since the SQL that the session was performing looked like:
INSERT INTO table1 (column1, column2)
SELECT column1, column2
FROM table2
WHERE ...
From this list of recent waits, we can also drill down to get more detail. The p1 and p2 values for db file sequential read indicate the file and block numbers being read, so we can quickly determine what segment was being read.
The columns in v$session_wait_history are:
SQL> DESCRIBE v$session_wait_history
Name Null? Type
----------------------------------------- -------- ----------------------------
SID NUMBER
SEQ# NUMBER
EVENT# NUMBER
EVENT VARCHAR2(64)
P1TEXT VARCHAR2(64)
P1 NUMBER
P2TEXT VARCHAR2(64)
P2 NUMBER
P3TEXT VARCHAR2(64)
P3 NUMBER
WAIT_TIME NUMBER
WAIT_COUNT NUMBER
v$session
The v$session view has been enhanced in Oracle 10g with several new columns which are quite useful. The wait event columns from v$session_wait have been added to v$session. In previous releases of Oracle, to get more detailed information about a session experiencing waits (such as what SQL the waiting session is executing), we had to join v$session_wait with v$session, as in:
SQL> SELECT s.sid, w.state, w.event, w.seconds_in_wait siw,
2 s.sql_address, s.sql_hash_value hash_value, w.p1, w.p2, w.p3
3 FROM v$session s, v$session_wait w
4 WHERE s.sid = w.sid
5 AND s.sid = 154;
In Oracle 10g we can get all of this information from v$session:
SQL> SELECT sid, state, event, seconds_in_wait siw,
2 sql_address, sql_hash_value hash_value, p1, p2, p3
3 FROM v$session
4 WHERE sid = 154;
SID STATE EVENT SIW SQL_ADDRESS HASH_VALUE P1 P2 P3
--- ------- ----------------------- --- ---------------- ---------- --- ---- ---
154 WAITING db file sequential read 1 000000038551E820 3625097388 4 9813 1
Two additional columns have been added to v$session that are helpful for wait event analysis: blocking_session and blocking_session_status. The blocking_session column contains the session id (SID) of the holder of the resource that the waiting session is waiting for. The blocking_session_status column indicates the validity of the contents of the blocking_session column. If blocking_session_status is VALID, a valid SID is present in the blocking_session column. In the past, if we saw that a session was waiting for a resource, we would have to do additional queries to determine who was holding the resource. If the resource was an enqueue lock, we would have to query v$lock (sometimes a very expensive query) to determine who was holding the lock. Now we can find out who is holding the lock much more quickly:
SQL> SELECT sid, blocking_session, username,
2 blocking_session_status status
3 FROM v$session
4 WHERE blocking_session_status = 'VALID';
SID BLOCKING_SESSION USERNAME STATUS
--- ---------------- -------- -----------
154 157 TSUTTON VALID
If we combine this with the wait event information now available in v$session, we see:
SQL> SELECT sid, blocking_session, username,
2 event, seconds_in_wait siw
3 FROM v$session
4 WHERE blocking_session_status = 'VALID';
SID BLOCKING_SESSION USERNAME EVENT SIW
--- ---------------- -------- ------------------------------ ---
154 157 TSUTTON enq: TX - row lock contention 318
v$event_histogram
The v$system_event view shows the number of waits, total time waited, and average wait time for a given wait event name (system-wide since instance startup). However, this aggregation can cloud the picture, because a small number of long waits can skew the data. For example, consider the following query from v$system_event:
SQL> SELECT event, total_waits, time_waited, average_wait
2 FROM v$system_event
3 WHERE event = 'enq: TX - row lock contention';
EVENT TOTAL_WAITS TIME_WAITED AVERAGE_WAIT
----------------------------- ----------- ----------- ------------
enq: TX - row lock contention 17218 2101966 122
We see that there have been 17,218 waits and that the average wait time was 1.22 seconds, but we have no idea how the wait times are distributed. Were all of these waits roughly the same length? Were most of them under one second long and a few really long waits threw off the average? We can’t tell from v$system_event. However, in Oracle 10g we can look at v$event_histogram for a more complete picture:
SQL> SELECT event, wait_time_milli, wait_count
2 FROM v$event_histogram
3 WHERE event = 'enq: TX - row lock contention';
EVENT WAIT_TIME_MILLI WAIT_COUNT
----------------------------- --------------- ----------
enq: TX - row lock contention 1 833
enq: TX - row lock contention 2 635
enq: TX - row lock contention 4 372
enq: TX - row lock contention 8 395
enq: TX - row lock contention 16 781
enq: TX - row lock contention 32 3729
enq: TX - row lock contention 64 3050
enq: TX - row lock contention 128 410
enq: TX - row lock contention 256 47
enq: TX - row lock contention 512 46
enq: TX - row lock contention 1024 37
enq: TX - row lock contention 2048 3
enq: TX - row lock contention 4096 6880
We see that very few of the waits were anywhere near 1.22 seconds. Nearly 60% of the waits were less than 0.128 seconds (with most of those between 16 milliseconds and 64 milliseconds), and most of the remaining waits were between 2.048 seconds and 4.096 seconds (at which point some timed out and started new waits). In this way, the v$event_histogram view gives us a more accurate picture of wait times summarized by event name.
The columns of v$event_histogram are:
SQL> DESCRIBE v$event_histogram
Name Null? Type
----------------------------------------- -------- ----------------------------
EVENT# NUMBER
EVENT VARCHAR2(64)
WAIT_TIME_MILLI NUMBER
WAIT_COUNT NUMBER
v$system_wait_class and v$session_wait_class
The two new views v$system_wait_class and v$session_wait_class enable us to get system and session wait information summarized by wait classes. This gives us a higher level view of what is happening in the system or session, rather than focusing on individual events. The views are roughly equivalent to the views v$system_event and v$session_event, except that they roll up the events by wait class. The wait times are expressed in centiseconds since instance startup for v$system_wait_class and centiseconds since session connection for v$session_wait_class.
The following queries show how much time (in centiseconds) has been spent on waits in each class across the system since instance start and for one specific session since that session began:
SQL> SELECT wait_class, time_waited
2 FROM v$system_wait_class
3 ORDER BY time_waited DESC;
WAIT_CLASS TIME_WAITED
------------- -----------
Idle 777450022
System I/O 1261584
User I/O 116667
Configuration 116481
Application 72301
Other 12432
Commit 3496
Concurrency 319
Network 1
SQL> SELECT wait_class, time_waited
2 FROM v$session_wait_class
3 WHERE sid = 154
4 ORDER BY time_waited DESC;
WAIT_CLASS TIME_WAITED
------------- -----------
Idle 612453
User I/O 1500
Configuration 28
Commit 11
Other 0
Application 0
Network 0
Since the wait times shown in these views are aggregations since system or session startup, these views are best used by taking samples and comparing the results to determine waits over a period of time. For instance, you could get data for the entire instance at time T1:
DROP TABLE swc_snap;
CREATE TABLE swc_snap
AS
SELECT wait_class, total_waits, time_waited
FROM v$system_wait_class;
And then, at time T2 a while later, get a summary of waits between T1 and T2:
SELECT a.wait_class, (a.time_waited - b.time_waited) tm_waited
FROM v$system_wait_class a, swc_snap b
WHERE a.wait_class = b.wait_class
AND a.total_waits > NVL (b.total_waits, 0)
ORDER BY tm_waited DESC;
WAIT_CLASS TM_WAITED
--------------- ----------
Idle 255767
Application 171
System I/O 156
User I/O 44
Other 21
Commit 13
Network 1
Active Session History
In previous releases of Oracle, the detailed information displayed in v$session_wait could prove extremely helpful in diagnosing performance problems—if you queried the view at the right time. The v$session_wait_history view in Oracle 10g makes it a little easier to catch detailed information by preserving the last ten waits for each session. But what if you want detailed information about a session’s waits for a period further back in time? This is where the Active Session History feature of Oracle 10g—ASH for short—comes in handy. ASH makes detailed information about a sampling of waits encountered by all sessions available to us for a very long time.
In Oracle 10g, a new background daemon process called MMNL queries v$session once each second and stores information about all active sessions in a circular buffer in memory accessible by a new view called v$active_session_history. How far back you can look at sessions in this view depends on session activity and how much memory Oracle allocated for ASH. Oracle’s goal is to keep at least a few hours of session data available in this view. The v$active_session_history view includes much of the detailed wait event information shown in v$session:
SQL> DESCRIBE v$active_session_history
Name Null? Type
----------------------------------------- -------- ----------------------------
SAMPLE_ID NUMBER
SAMPLE_TIME TIMESTAMP(3)
SESSION_ID NUMBER
SESSION_SERIAL# NUMBER
USER_ID NUMBER
SQL_ID VARCHAR2(13)
SQL_CHILD_NUMBER NUMBER
SQL_PLAN_HASH_VALUE NUMBER
SQL_OPCODE NUMBER
SERVICE_HASH NUMBER
SESSION_TYPE VARCHAR2(10)
SESSION_STATE VARCHAR2(7)
QC_SESSION_ID NUMBER
QC_INSTANCE_ID NUMBER
EVENT VARCHAR2(64)
EVENT_ID NUMBER
EVENT# NUMBER
SEQ# NUMBER
P1 NUMBER
P2 NUMBER
P3 NUMBER
WAIT_TIME NUMBER
TIME_WAITED NUMBER
CURRENT_OBJ# NUMBER
CURRENT_FILE# NUMBER
CURRENT_BLOCK# NUMBER
PROGRAM VARCHAR2(48)
MODULE VARCHAR2(48)
ACTION VARCHAR2(32)
CLIENT_ID VARCHAR2(64)
As you can see, v$active_session_history captures the essential wait-related data from v$session. It also captures useful information about the SQL statement currently being executed, as well as current object number, file, and block being accessed. When a wait that was sampled by ASH completes, Oracle fills in the time_waited column for the row in v$active_session_history with the actual duration of the wait.
The Automatic Workload Repository, which we will discuss in the next section, writes data from v$active_session_history to disk at regular intervals, preserving one sample every ten seconds from each active session. So, active session information remains accessible—although with less detail—even after the data has aged out of v$active_session_history.
Because ASH is always “on,” you always have access to detailed information about waits encountered in sessions within the last few hours. This means that if a user complains about a performance problem, you may be able to query v$active_session_history and gain insight into the problem without having to initiate an extended SQL trace or start a close watch of v$session while they reproduce the problem.
An important thing to keep in mind about v$active_session_history, however, is that it is populated by sampling v$session once each second. A session may encounter many different waits during a one second period, but only the one wait that was in progress when ASH collected its sample will be recorded in v$active_session_history. For this reason, ASH is valuable for general aggregate queries but not for precise counting of individual events or determining minimum or maximum wait times. Statistically speaking, the data collected by ASH is probably more accurate over a larger time interval and/or number of sessions.
For example, you might query v$active_session_history to see what percentage of time over the last two hours a particular group of sessions spent waiting on disk reads. However, using this view to determine how many disk read waits a session encountered in the last minute probably will not yield very accurate results.
Even though ASH data is only a sampling of active sessions, the information can prove to be quite useful. For example, the following query shows that sessions running the ARXENV application over the last two hours encountered a great deal of row-level lock contention:
SQL> SELECT DECODE (session_state, 'WAITING', event, NULL) event,
2 session_state, COUNT(*), SUM (time_waited) time_waited
3 FROM v$active_session_history
4 WHERE module = 'ARXENV'
5 AND sample_time > SYSDATE - (2/24)
6 GROUP BY DECODE (session_state, 'WAITING', event, NULL),
7 session_state;
EVENT SESSION_STATE COUNT(*) TIME_WAITED
------------------------------ ------------- -------- -----------
ON CPU 124 0
log file sync WAITING 2 52686
db file scattered read WAITING 2 28254
db file sequential read WAITING 1 6059
control file sequential read WAITING 1 9206
SQL*Net break/reset to client WAITING 1 9140
enq: TX - row lock contention WAITING 922 930864016
In addition to running queries against the v$active_session_history view, you can use Enterprise Manager to run reports that will display ASH data.
Although ASH runs on all Oracle 10g databases by default, you are not allowed to query the v$active_session_history view (or run the corresponding reports in Enterprise Manager) unless you have purchased the Diagnostic Pack.
Automatic Workload Repository
Oracle 10g includes another significant facility relevant to the wait event interface. It’s known as the Automatic Workload Repository—or AWR for short. AWR is basically a next-generation Statspack. By default, AWR collects an hourly snapshot of database performance information, storing the data in tables in the SYS schema. AWR is configured automatically when you create an Oracle 10g database. You can call the dbms_workload_repository package to collect a snapshot on demand, purge a snapshot or range of snapshots, or change the snapshot interval or retention period. (By default snapshots are collected at the top of each hour and are purged after seven days.)
AWR collects the same type of data that Statspack does—including system-level statistics, resource-intensive SQL, and of course instance-wide wait event information. AWR also collects data that is new for Oracle 10g, such as time model statistics (which we will discuss in the next section). As an aside, the Oracle 10g version of Statspack also collects a lot of this new information, including time model statistics.
You can generate an AWR report of database activity between two snapshots by running the awrrpt.sql script in the rdbms/admin directory under $ORACLE_HOME. This script offers reports formatted as plain text or HTML. The reports will look familiar if you have used Statspack before. You can use Enterprise Manager to generate AWR reports as well.
AWR offers many benefits over Statspack. For one, it is more tightly integrated into the Oracle kernel, reducing resource requirements and overhead when collecting snapshots. AWR snapshots also include ASH data from v$active_session_history, providing session-level information to complement the system-level data collection familiar to Statspack users.
Data collected by AWR is made easily accessible via views with names that start DBA_HIST. This enables you to write your own reports that extract just the data you need to address a specific situation, if for some reason you don’t find what you need in the standard AWR report. For example, the following query displays the two most recent snapshot IDs:
SQL> SELECT snap_id, begin_interval_time, end_interval_time
2 FROM (
3 SELECT snap_id, begin_interval_time, end_interval_time
4 FROM dba_hist_snapshot
5 ORDER BY end_interval_time DESC
6 )
7 WHERE ROWNUM <= 2;
SNAP_ID BEGIN_INTERVAL_TIME END_INTERVAL_TIME
---------- ------------------------- -------------------------
362 10-MAR-05 04.00.02.018 PM 10-MAR-05 05.00.36.581 PM
361 10-MAR-05 03.00.25.885 PM 10-MAR-05 04.00.02.018 PM
Just like ASH, AWR runs on all Oracle 10g databases by default. Also like ASH, you are not allowed to query the AWR views (or run AWR reports) unless you have licensed the Diagnostic Pack. Because AWR consumes system resources when collecting snapshots and uses up storage in the SYSAUX tablespace, you may want to disable the collection of AWR snapshots if you are not licensed to use AWR. This may be done by using the dbms_workload_repository package. If AWR is not available to you, Statspack is still a good way to go in Oracle 10g.
Time Model Statistics
Oracle 10g introduces a new concept called Time Model Statistics. This information provides yet another way to see how time is spent, and with greater detail than was available previously. The v$sys_time_model view shows time model statistics for the entire system since instance startup, while the v$sess_time_model view shows time model statistics for each session since session start. The columns in these two views are as follows:
SQL> DESCRIBE v$sys_time_model
Name Null? Type
---------------------------------------- -------- ---------------------------
STAT_ID NUMBER
STAT_NAME VARCHAR2(64)
VALUE NUMBER
SQL> DESCRIBE v$sess_time_model
Name Null? Type
---------------------------------------- -------- ---------------------------
SID NUMBER
STAT_ID NUMBER
STAT_NAME VARCHAR2(64)
VALUE NUMBER
A sample query from v$sys_time_model shows the following:
SQL> SELECT stat_name, value / 1000000 seconds
2 FROM v$sys_time_model
3 ORDER BY seconds DESC;
STAT_NAME SECONDS
------------------------------------------------ ----------
DB time 80970.190
sql execute elapsed time 75057.271
DB CPU 44448.628
background elapsed time 29333.160
PL/SQL execution elapsed time 8824.538
background cpu time 5170.311
parse time elapsed 1270.147
hard parse elapsed time 838.068
PL/SQL compilation elapsed time 176.731
sequence load elapsed time 112.334
connection management call elapsed time 44.644
failed parse elapsed time 11.946
hard parse (sharing criteria) elapsed time 5.579
hard parse (bind mismatch) elapsed time 4.610
failed parse (out of shared memory) elapsed time 0.000
Java execution elapsed time 0.000
inbound PL/SQL rpc elapsed time 0.000
This query shows us a lot more information about how Oracle sessions have spent their time (categorically) than v$sysstat and v$sesstat do. Of course, we have to know how to interpret this information before we can put it to work for us. Values in these views are shown in microseconds, and they do not include background processes unless “background” appears in the statistic name. The “DB time” statistic shows elapsed time spent on database calls (user processes only). This amounts to time spent on the CPU or waiting on non-idle wait events. For a description of the other time model statistics, see the v$sess_time_model view listing in the Oracle 10g Database Reference manual.
From this query, among many other useful facts, we can see that no time has been spent executing Java, very little time has been spent hard parsing or compiling PL/SQL, background processes have used about 10% of the CPU time, and about 11% of the elapsed time for user sessions was spent on PL/SQL execution.
Tracing Facility Improvements
The extended SQL trace facility, also known as event 10046, allows us to capture in a trace file detailed information about every wait event encountered by a database session. This feature has been available in Oracle for a long time. However, Oracle 10g offers some helpful improvements in this area.
Enabling extended SQL trace has always been a bit of a nuisance. In earlier releases of Oracle, you had to use a clumsy ALTER SESSION SET EVENTS statement or—worse yet—call the undocumented dbms_system.set_ev procedure to set the 10046 event in another user’s session. In Oracle 8i the dbms_support package was introduced to make this step easier, but the package was missing from many releases of Oracle and usually was not installed by default.
Oracle 10g introduces the new dbms_monitor package. This package, among many other things, makes it very easy to turn extended SQL trace on and off in any Oracle session. With one easy to remember call, you can turn extended SQL trace on or off, with wait events and/or bind variables captured in the trace file:
SQL> DESCRIBE dbms_monitor
...
PROCEDURE SESSION_TRACE_DISABLE
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
SESSION_ID BINARY_INTEGER IN DEFAULT
SERIAL_NUM BINARY_INTEGER IN DEFAULT
PROCEDURE SESSION_TRACE_ENABLE
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
SESSION_ID BINARY_INTEGER IN DEFAULT
SERIAL_NUM BINARY_INTEGER IN DEFAULT
WAITS BOOLEAN IN DEFAULT
BINDS BOOLEAN IN DEFAULT
If the session_id parameter is not specified or set to NULL, your own session will be traced. Thus, the following two statements should be equivalent:
ALTER SESSION SET events '10046 trace name context forever, level 12';
EXECUTE dbms_monitor.session_trace_enable (waits=>TRUE, binds=>TRUE);
In Oracle 9i and earlier, extended SQL trace was easy to use if your application connected to the Oracle database via a dedicated server connection. If the shared server architecture was used, each shared server process that serviced a request from the session being traced would write its data to a separate trace file. Furthermore, tracing sessions in a connection pool environment became difficult because one Oracle session could actually process requests for many different end-user sessions.
The dbms_monitor package in Oracle 10g addresses this problem. Instead of enabling extended SQL trace for a specific Oracle session, you can enable it for a specific client identifier or combination of service, module, and action. Any time any Oracle session has the specified client identifier or combination of service, module, and action, the session will be traced. Sessions can set or clear their client identifier at will by calling the dbms_session package, and they can set their module and action by calling the dbms_application_info package.
Suppose a web-based application uses a pool of 30 database connections to serve user requests and maintains a current_sessions table to keep track of the state of each end-user session. When a user clicks a button in their browser window, the application server receives the HTTP request and hands it off to an application server process. The application server process grabs a free database connection from the pool and accesses the database as necessary to service the request. It is likely that subsequent requests from the same user will be processed using different database connections.
In Oracle 9i and earlier it would have been very difficult to trace the one user’s database activity in this environment. The user’s database accesses are spread among multiple Oracle server processes in the connection pool. Moreover, each of those Oracle processes is handling requests from many different end users.
With dbms_monitor in Oracle 10g, the application could be modified in a way to make extended SQL trace a whole lot easier. We mentioned that the application uses the current_sessions table to maintain state for each end user session. Each time the application server grabs a database connection from the pool, it could set the client identifier for the Oracle session to the session_id from the current_sessions table before doing any database access for that end user session. Then the application could clear the client identifier before returning the database connection to the pool. The Oracle calls could look like this:
EXECUTE dbms_session.set_identifier ('session_id174837492748');
...do the work for this end user session...
EXECUTE dbms_session.clear_identifier
To trace this end user session, we could now call the dbms_monitor package like this:
SQL> EXECUTE dbms_monitor.client_id_trace_enable -
> ('session_id174837492748', waits=>TRUE, binds=>TRUE);
This call to dbms_monitor will cause each Oracle process to write extended SQL trace data to a trace file for all calls that occur while the client identifier for the session is set to the specified value. However, each Oracle process will write to its own trace file. This will cause the trace data to be split over multiple files. To address this problem, Oracle 10g provides a new command-line utility called trcsess. The trcsess utility reads multiple trace files and consolidates entries from the various files that meet the specified criteria into one trace file that can be processed by TKPROF. To consolidate the trace data for our current example, we could use the following commands:
$ cd $ORACLE_BASE/admin/$ORACLE_SID/udump
$ trcsess output=/home/rschrag/case1403/case1403-trial1.trc \
clientid=session_id174837492748
In this way the dbms_monitor package and trcsess utility in Oracle 10g make it a lot easier to collect extended SQL trace data from an end user’s session when connection pooling or other session aggregation techniques are used by the application server tier.
Conclusion
Oracle 10g includes many enhancements to the wait event interface that should make performance management using wait event methodologies easier than ever. Some enhancements, such as descriptive wait event names, wait classes, and the session_trace_enable procedure in the dbms_monitor package, are simple conveniences that make our jobs easier. These enhancements don’t give us any information or power that we didn’t have before. Other enhancements, however, such as time model statistics, v$event_histogram, and new columns in v$sql and v$sqlarea, provide us with helpful timing and wait information that was previously unavailable.
Although as of this writing documentation on these new features is incomplete and discussion on Metalink is surprisingly sparse, many of these enhancements will likely prove quite valuable to the Oracle DBA responsible for performance management of a complex Oracle system.
Event Wait Analysis with STATSPACK
In STATSPACK, you can get a snapshot of Oracle wait events every hour and plot changes in wait behavior over time. You can also set thresholds, and report only on wait events that exceed your pre-defined threshold. Here is the script that is commonly used for exception reporting of wait events.
-- prompt |
Here is the output from this script. As we can see, we get a time-series result, showing us those days and hours when our thresholds are exceeded. If we look at this listing, we see that every evening between 10:00 p.m. and 11:00 p.m., we experience high waits on the redo logs.
Wed Aug 21 page 1 |
Using Queries Against v$event_wait
Here is the most commonly used script for displaying system-wide events, called current_waits.sql.
set pages 999 |
Below is the output from this script. As we see, the report shows all cumulative waits since instance startup time.
Tue Aug 20 page 1 |
This should give a list of "current" wait events, but some events are more important than others. These are the one's to watch-out for:
set pages 999 |
Here is the output from this script. Here we see each background process and the time each has spent waiting within Oracle.
Tue Aug 20 page 1 |
From all of the above reports, we see the following areas for improvement:
1 - Event waits for parallel query dequeues. We need to check the default degree of parallelism for database objects and turn-off parallelism at the system level.
2 - Event waits for "db file sequential reads." This is most likely due to segment header contention on the indexes, but it could also be due to disk contention. We will start by increasing the number of freelists on the indexes. If the waits persist, the offending index should be striped across multiple disk spindles.
These are very general wait conditions, but they can sometimes be fixed by changing parameters or object characteristics. Some possible solutions include:
wait_detail_process.sql script.
Tue Aug 20 page 1 |
Here is the output from this script. Here we see two types of wait conditions:
Individual process wait times |
Methods for Using Event
Wait Data
Because
this is a short article, let's confine our wait discussion to the most
onerous wait event, the db file sequential read wait event.
In our
example from the system-wide reports we noted that the number one wait
in v$system_event is "db file sequential reads." This wait event is
usually for index reads. If we look at the detail listing for these
wait events, we see that the script returns the file number and block
number where Oracle was waiting for the event.
SID Name Name Wait Time state P1 text Val Text |
Here we
see that the CPA1_USER was waiting on access to file number 16, block
number 51,253. Given this information, we can drill down into
dba_extents
and dba_data_files to see the data file that is causing the waits.
Here is
a simple script that will display the file name for any file number:
accept filenbr prompt 'Enter the file#: ' |
When we run the script we can quickly see the name of the file that is experiencing the "db file sequential reads" wait:
Enter the file#: 10 |
To properly locate the source of the wait we need more than just the file name We can use the get_object_by_block_nbr.sql script to display all object on the target block:
set pages 999 |
Here we see the exact source of the wait event is an index called IDX_EVENTCASE_STATUS_OVERAGE.
Enter the file number: 116 |
So, why are sessions waiting on index access? The most common cause is a freelist shortage. Prior to Oracle bitmap freelists (automatic extent management), segment header contention can happen if multiple tasks compete to simultaneously update the same index. The script can easily check the number of freelists for this object:
clear columns |
Of course, there could be other problems such as a high disk enqueue on the disk that contains the index, but we should always try to add freelists first. If more freelists do not solve the wait issue, then the index must be placed on a faster disk, a less busy disk, or striped across multiple disks.
Also, note that because we have the SID for the waiting task, we can use the SID to join into v$session and v$sql to see the SQL that was causing the wait condition.
How Wait Event Information Can Be Used to Boost System Performance
In the examples below, we will apply the techniques described above for collecting wait event information, and we will evaluate the data in order to better understand how the system is functioning and determine what we can do to boost system performance. Hopefully these examples will give you insights into researching performance and other types of problems on your systems.
Example #1: Buffer Busy Waits
A magazine publisher had developed a web site that offered web
access
to many of the articles in the magazine. At times, the website would
get
completely bogged down and response time would become very poor.
Operating
system statistics on the database server indicated that during busy
times,
server idle time would be near zero. We used Statspack to gather
database
statistics on an hourly basis. Excerpts from one Statspack report are
as
follows:
Snap Id Snap Time Sessions
------- ------------------ --------
Begin Snap: 61 11-Dec-02 13:00:52 145
End Snap: 71 11-Dec-02 14:00:26 145
Elapsed: 59.57 (mins)
Cache Sizes
~~~~~~~~~~~
db_block_buffers: 38400 log_buffer: 1048576
db_block_size: 8192 shared_pool_size: 80000000
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 6,390.41 5,731.32
Logical reads: 33,962.40 30,459.63
Block changes: 44.72 40.11
Physical reads: 536.77 481.41
Physical writes: 163.02 146.21
User calls: 397.75 356.73
Parses: 237.64 213.13
Hard parses: 23.84 21.38
Sorts: 248.38 222.77
Logons: 0.41 0.36
Executes: 213.52 191.50
Transactions: 1.11
% Blocks changed per Read: 0.13 Recursive Call %: 86.07
Rollback per transaction %: 0.23 Rows per Sort: 30.73
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 98.39 Redo NoWait %: 99.97
Buffer Hit %: 98.42 In-memory Sort %: 100.00
Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
buffer busy waits 1,962,372 1,278,649 50.03
db file sequential read 1,336,870 1,050,878 41.12
db file scattered read 47,717 49,326 1.93
direct path write 8,070 40,574 1.59
latch free 38,220 31,012 1.21
Buffer Pool Statistics for DB:
-> Pools D: default pool, K: keep pool, R: recycle pool
Free Write Buffer
Buffer Consistent Physical Physical Buffer Complete Busy
P Gets Gets Reads Writes Waits Waits Waits
- ----------- ------------- ----------- ---------- ------- -------- ----------
D 12,292 49,863,696 470 3,530 0 0 32
K 1,509,385 20,691,972 1,509,366 11 0 0 1,962,117
R 5 13 0 0 0 0 0
We noticed that the two dominant wait events were buffer busy waits and db file sequential read. These two wait events accounted for over 23,000 seconds of wait time during the one hour sample period. We decided to drill down to learn more about these wait events. Looking at v$event_name, we see that the parameters for the buffer busy waits event are as follows:
SELECT * FROM v$event_name WHERE name = 'buffer busy waits';
EVENT# NAME PARAMETER1 PARAMETER2 PARAMETER3
---------- -------------------- --------------- --------------- ---------------
75 buffer busy waits file# block# id
For this wait event, the parameter information from v$event_name
points
you in the right direction but does not necessarily give you the full
story.
The first and second parameters tell you the file and block number of
the
data block in the database that the session was trying to access when
the
wait event occurred. The third parameter is a numeric reason code
indicating
why the buffer in the buffer cache was busy and the process had to
wait.
Metalink bulletin 34405.1 explains what the different reason codes
mean.
The following query will list all sessions waiting for a given wait
event
at the split second the query is begun:
SELECT sid, event, state, seconds_in_wait, wait_time, p1, p2, p3
FROM v$session_wait
WHERE event = 'buffer busy waits'
ORDER BY sid;
Because only a couple (if any) sessions are likely to be encountering this wait at any one moment in time, we usually need to run this query several times in quick succession. One of the times, we got the following output:
SID EVENT STATE SECONDS_IN_WAIT WAIT_TIME P1 P2 P3
---- ----------------- ----- --------------- --------- ------ ------ ------
12 buffer busy waits WAITE 1 0 30 62157 130
31 buffer busy waits WAITE 1 0 30 23558 130
We performed the following query for each session wait found, substituting P1 for the file number and P2 for the block number:
SELECT owner, segment_name, segment_type
FROM dba_extents
WHERE file_id = &absolute_file_number
AND &block_number BETWEEN block_id AND block_id + blocks -1;
Using this query we found that session 12 above was waiting on a
busy
buffer in the SAMPLES table in the PRODMGR schema:
Enter value for absolute_file_number: 30
Enter value for block_number: 62157
OWNER SEGMENT_NAME SEGMENT_TYPE
----------------- ------------------- ------------
PRODMGR SAMPLES TABLE
In order to figure out what reason code 130 means, we look at a reference table listed in Metalink bulletin 34405.1. For convenience this table appears below. We can see that the buffer is busy because the block is being read by another session.
| Reason Code (Id) P3 | Reason |
| >=8.1.6 | |
| 0 | A block is being read |
| 100 | We want to NEW the block but the block is currently being read by another session (most likely for undo). |
| 200 | We want to NEW the block but someone else has is using the current copy so we have to wait for them to finish. |
| 230 | Trying to get a buffer in CR/CRX mode , but a modification has started on the buffer that has not yet been completed. |
| - | A modification is happening on a SCUR or XCUR buffer, but has not yet completed |
| 231 | CR/CRX scan found the CURRENT block, but a modification has started on the buffer that has not yet been completed. |
| 130 | Block is being read by another session and no other suitable block image was found, so we wait until the read is completed. This may also occur after a buffer cache assumed deadlock. The kernel can't get a buffer in a certain amount of time and assumes a deadlock. Therefor it will read the CR version of the block. |
| 110 | We want the CURRENT block either shared or exclusive but the Block is being read into cache by another session, so we have to wait until their read() is completed. |
| 120 | We want to get the block in current mode but someone else is currently reading it into the cache. Wait for them to complete the read. This occurs during buffer lookup. |
| 210 | The session wants the block in SCUR or XCUR mode. If this is a buffer exchange or the session is in discrete TX mode, the session waits for the first time and the second time escalates the block as a deadlock and so does not show up as waiting very long. In this case the statistic: "exchange deadlocks" is incremented and we yield the CPU for the "buffer deadlock" wait event. |
| 220 | During buffer lookup for a CURRENT copy of a buffer we have found the buffer but someone holds it in an incompatible mode so we have to wait. |
Next we looked for sessions waiting on the db file sequential read event-sessions waiting on index lookups or table accesses by ROWID:
SELECT sid, event, state, seconds_in_wait, wait_time, p1, p2, p3
FROM v$session_wait
WHERE event = 'db file sequential read'
ORDER BY sid;
Following the same approach as used with the buffer busy waits, we then drilled down to find which segments in the database sessions were accessing with the single block I/Os. In this case it was again the SAMPLES table.
From this we concluded that the buffer busy waits appeared related to the single block disk reads-buffer block reads were waiting for other buffer block reads to complete, and those in turn were waiting for sequential reads from disk to complete. Put another way, session A was waiting on a single block disk I/O to bring a block into the buffer cache. Meanwhile, session B wanted to read the same block. Since the block was not yet in the buffer cache and another session was already reading it from disk, session B waited on a buffer busy waits event while session A waited on the db file sequential read event.
When we looked at the SAMPLES table further, we found that it contained a LONG column and there was a fair amount of row chaining-some rows were larger than the database block size. Many queries against multiple tables were doing nested loops joins with the SAMPLES table. This was causing lots of single block reads from the SAMPLES table.
The values stored in the LONG column of the SAMPLES table were quite big. One of the many weaknesses of the LONG data type is that data is stored inline in the table segment regardless of size. This led to a very large SAMPLES table segment even though the table did not have a large number of rows. While most queries against the SAMPLES table were not fetching the LONG column, Oracle still had to bring the LONG data into the buffer cache because it was stored in the same blocks as the other columns. This led to a lot more physical I/O and churn in the buffer cache than should have been necessary.
One way to reduce buffer cache churn and physical I/O would have been to replace the LONG column with a CLOB column. CLOB data can be stored in a separate segment from the rest of the table data. Thus the SAMPLES table segment would be a lot smaller and more likely to fit entirely in the buffer cache. This would reduce the activity rate in the buffer cache and the amount of physical I/O.
Unfortunately, changing the database schema and the application code to replace the LONG data type with CLOB was not possible in a quick enough timeframe to address the immediate performance problem. So we had to develop a quick fix that could be deployed temporarily until the data type could be changed. First we enlarged the KEEP buffer pool, reducing physical reads as all of the SAMPLES table could stay in the buffer cache. Next, we created an index on all of the columns in the SAMPLES table except the LONG column. This allowed Oracle to satisfy most queries against the SAMPLES table without having to read the excessively large table segment, which greatly reduced the amount of data accessed during each query.
These two easy and quick changes virtually eliminated the buffer busy waits and reduced db file sequential read wait events as total physical reads dropped 90%. Response time for the most common queries involving the SAMPLES table dramatically improved, and overall web site response times returned to an acceptable level. Excerpts from a Statspack report after the two changes were made are as follows:
Snap Id Snap Time Sessions
------- ------------------ --------
Begin Snap: 1192 20-Dec-02 13:00:49 102
End Snap: 1202 20-Dec-02 14:00:18 102
Elapsed: 59.48 (mins)
Cache Sizes
~~~~~~~~~~~
db_block_buffers: 38400 log_buffer: 1048576
db_block_size: 8192 shared_pool_size: 80000000
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
-------- ---------------
Redo size: 2,496.24 3,453.13
Logical reads: 30,270.86 41,874.69
Block changes: 17.08 23.62
Physical reads: 36.90 51.04
Physical writes: 79.62 110.14
User calls: 313.37 433.50
Parses: 173.91 240.57
Hard parses: 17.04 23.57
Sorts: 169.33 234.25
Logons: 0.36 0.50
Executes: 149.11 206.27
Transactions: 0.72
% Blocks changed per Read: 0.06 Recursive Call %: 84.55
Rollback per transaction %: 0.31 Rows per Sort: 62.86
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 99.98
Buffer Hit %: 99.88 In-memory Sort %: 99.89
Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
direct path write 6,467 13,545 30.61
log file sync 4,914 7,493 16.93
library cache pin 1,175 6,090 13.76
direct path read 5,488 3,428 7.75
latch free 14,528 2,931 6.62
Buffer Pool Statistics for DB
-> Pools D: default pool, K: keep pool, R: recycle pool
Free Write Buffer
Buffer Consistent Physical Physical Buffer Complete Busy
P Gets Gets Reads Writes Waits Waits Waits
- ----------- ------------- ----------- ---------- ------- -------- ----------
D 10,178 39,561,072 330 1,122 0 0 10
K 1 13,075,944 1 4 0 0 0
R 0 0 0 0 0 0 0
As you can see, with a similar amount of querying activity (roughly 300-400 user calls and 30,000-33,000 logical reads per second) we have reduced total wait event time to about 442 seconds during the one hour sample period. (Direct path writes totaled 13545 centiseconds and represented 30.61% of total waits; 135.45/0.3061 = 442 seconds of total wait time.) This is quite a difference from the 23,000 seconds of wait time during the sample period we were seeing before the KEEP pool was enlarged and the extra index was added to the SAMPLES table.
Over time, application changes were made so that even fewer queries
against the SAMPLES table required retrieving the LONG column.
Eventually, the LONG column was replaced with a CLOB.
Example #2: More Buffer Busy
Waits,
Plus Latch Contention for Good Measure
The magazine publisher's application demonstrated buffer busy waits
where multiple Oracle processes want to read the same data block from
disk at
the same time. The first Oracle process would issue the I/O request and
wait on a db file sequential read event, while subsequent Oracle
processes
would wait on a buffer busy waits event until the data block had made
its
way into the buffer cache. Now we will look at an example where
multiple
Oracle processes try to write to different copies of the same data
block
in the buffer cache at the same time. Since Oracle has row-level
locking,
multiple users can update different rows in the same table at the same
time
without locking each other out. Multiple users can even update
different
rows in the same data block concurrently. But, as we will see in this
example,
contention and buffer busy waits can sometimes occur.
A genetic research company used an Oracle database to store research data. Periodically, they would load large volumes of raw data into the database. Then they would run a processing application on multiple workstations. Each workstation would fetch raw data from the database, process it, and put the processed data back in the database. The company had planned to add more workstations to speed up the processing, but found that adding more workstations actually seemed to reduce the throughput. The Oracle database was being used merely as a repository, we were told. All Oracle had to do was fetch the raw data and then store the processed data. The actual processing was being performed on the workstations outside of the realm of Oracle. So it would seem that the Oracle database should not be the bottleneck.
We found about 100 active database sessions, each initiated by a separate workstation. We modified the application code running on workstation 30 to activate debug event 10046 for wait event tracing. We did this by inserting the following statement into the code just after connecting to the database:
ALTER SESSION SET events '10046 trace name context forever, level 8';
We also modified the application to exit after iterating through its main loop 500 times. We ran the modified application on the workstation and retrieved the trace file that Oracle had created in the directory specified by the user_dump_dest parameter. The application ran for less than one minute before exiting. Since this was an Oracle 9i database, we were able to run TKPROF to generate a report including wait event statistics with the following command:
tkprof prodgen_ora_16466.trc report_16466.prf waits=yes
An interesting excerpt from the TKPROF report was as follows:
UPDATE processing_stations
SET status = 'ACTIVE',
status_date = SYSDATE,
data_set_id_being_processed = :b1
WHERE station_id = 30
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 500 0.23 10.14 0 3616 1010 500
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 501 0.24 10.14 0 3616 1010 500
Optimizer goal: CHOOSE
Parsing user id: 40
Rows Row Source Operation
------- ---------------------------------------------------
500 UPDATE
500 INDEX UNIQUE SCAN PROCESSING_STATIONS_PK (object id 9372)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
buffer busy waits 26 0.71 7.87
latch free 17 0.57 2.08
log file switch completion 3 0.09 0.20
The UPDATE statement here seemed quite trivial-changing three column values on one row, given a primary key value. Oracle only needed to do about nine logical reads per execution, and no physical reads were required at all. Yet it still took Oracle over 10 seconds to execute the statement 500 times. If we had not specified the "waits=yes" TKPROF option, we might conclude that the database server was CPU-starved. But a quick look at the wait event portion of the TKPROF report shows that most of the 10 seconds of elapsed execution time was spent waiting on busy buffers and latches.
Some of the buffer busy waits listed in the trace file were as follows:
WAIT #2: nam='buffer busy waits' ela= 527727 p1=18 p2=10 p3=220
WAIT #2: nam='buffer busy waits' ela= 498765 p1=18 p2=10 p3=220
WAIT #2: nam='buffer busy waits' ela= 137611 p1=18 p2=10 p3=220
WAIT #2: nam='buffer busy waits' ela= 124165 p1=18 p2=10 p3=220
WAIT #2: nam='buffer busy waits' ela= 5237 p1=18 p2=10 p3=220
WAIT #2: nam='buffer busy waits' ela= 264050 p1=18 p2=10 p3=220
WAIT #2: nam='buffer busy waits' ela= 270177 p1=18 p2=10 p3=220
WAIT #2: nam='buffer busy waits' ela= 330912 p1=18 p2=10 p3=220
WAIT #2: nam='buffer busy waits' ela= 156317 p1=18 p2=10 p3=220
WAIT #2: nam='buffer busy waits' ela= 710696 p1=18 p2=10 p3=220
Since this trace file was generated by an Oracle 9i database, the elapsed times are shown in units of microseconds. The wait shown at the bottom of the list above had an elapsed time of 710,696 microseconds, or approximately 0.71 seconds. This was the longest buffer busy wait for this statement, as indicated by the "Max. Wait" figure in the TKPROF report.
From the previous example, recall that the parameters for the buffer busy waits event indicate file number, block number, and reason code. All of the buffer busy waits listed above involve block 10 in data file 18, with a reason code of 220. By querying the data dictionary, we found that block 10 in data file 18 was allocated to the PROCESSING_STATIONS table, the table being updated by the statement that encountered the buffer busy waits. According to Metalink bulletin 34405.1, reason code 220 indicates that the buffer is busy because another session is using a copy of the buffer in an incompatible mode.
From the TKPROF report we can see that the application updated the PROCESSING_STATIONS table 500 times during the period traced. Each update affected one row. While performing the updates, there were 26 times that the process had to wait on a buffer busy waits event. From looking at the trace file we know that all of the buffer busy waits occurred because data block 10 in data file 18 was already in the buffer cache, but in an incompatible mode. These 26 waits totaled 7.87 seconds.
A query against the data dictionary revealed the following statistics for the PROCESSING_STATIONS table:
SELECT SYSDATE - last_analyzed, blocks, avg_row_len, avg_space,
chain_cnt, num_rows
FROM user_tables
WHERE table_name = 'PROCESSING_STATIONS';
SYSDATE-LAST_ANALYZED BLOCKS AVG_ROW_LEN AVG_SPACE CHAIN_CNT NUM_ROWS
--------------------- ---------- ----------- ---------- ---------- ----------
2.132118056 1 62 1686 0 100
We can see that optimizer statistics were gathered recently and the table contains 100 very short rows, all stored in one data block. If the application session that we traced updated the PROCESSING_STATIONS table 500 times in less than a minute, we might suspect that the database sessions from the other 99 workstations are doing roughly the same thing. This adds up to over 50,000 updates to the same data block per minute. Since each session is updating a different row in the table, there is no waiting on row-level locks (which would show up in the wait event interface as enqueue waits). However, only one process at a time may write to a buffer in the buffer cache. Therefore, a lot of time is being wasted contending for this one very hot buffer.
All of this begs the question, "Why is the application updating a row in the PROCESSING_STATIONS table 500 times in less than one minute?" It turns out that the application uses this table to coordinate efforts between the workstations. There is one row in the table for each workstation. The application running on each workstation updates its row in PROCESSING_STATIONS to indicate which data set it is processing so that other workstations will not attempt to process the same data set. Also, each workstation periodically updates the status date so that if a workstation crashes, it will be easy to tell when it crashed and what data it was processing during the crash.
Processing of one data set could take anywhere from a fraction of a second to twenty minutes. The application code was written to not only update the PROCESSING_STATIONS table when a new data set was processed, but also to frequently update the timestamp during the processing of one data set. All stakeholders agreed that 500 updates per minute might be excessive. If the application code could be modified to update the table only once per second, then writes to the table would reduce by roughly 90% and contention for the hot buffer in the buffer cache would likely go away.
However, application changes needed to be tested thoroughly before deployment. A short-term fix was desired until the long-term fix of the application could be implemented. Luckily, a temporary fix was available that was simple to implement.
We rebuilt the PROCESSING_STATIONS table with the PCTFREE storage parameter set to 99 and PCTUSED set to 1. This caused Oracle to leave 99% of the available space in each data block empty for future row expansion. This change caused the 100 rows in the table to be spread over 100 data blocks instead of all rows being neatly packed into one data block. A query against the data dictionary confirmed this:
SELECT SYSDATE - last_analyzed, blocks, avg_row_len, avg_space,
chain_cnt, num_rows
FROM user_tables
WHERE table_name = 'PROCESSING_STATIONS';
SYSDATE-LAST_ANALYZED BLOCKS AVG_ROW_LEN AVG_SPACE CHAIN_CNT NUM_ROWS
--------------------- ---------- ----------- ---------- ---------- ----------
.130868056 100 62 8014 0 100
Tracing another 500 iterations of the main loop in the application on workstation 30 after rebuilding the PROCESSING_STATIONS table yielded the following TKPROF output:
UPDATE processing_stations
SET status = 'ACTIVE',
status_date = SYSDATE,
data_set_id_being_processed = :b1
WHERE station_id = 30
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 500 0.20 2.22 0 500 1009 500
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 501 0.20 2.22 0 500 1009 500
Optimizer goal: CHOOSE
Parsing user id: 40
Rows Row Source Operation
------- ---------------------------------------------------
500 UPDATE
500 INDEX UNIQUE SCAN PROCESSING_STATIONS_PK (object id 9379)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
latch free 2 0.35 0.61
The application is still updating the PROCESSING_STATIONS table very rapidly. But now each row in the table is stored in a separate data block, so each workstation is working with its own hot buffer in the buffer cache. The sessions are no longer competing with each other for the opportunity to write to the same buffer. Waits on the buffer busy waits event have been eliminated. As a side effect, logical reads have been reduced to one third of what they were originally.
The original TKPROF reports also indicated waits on the latch free wait event, indicating latch contention. A query against v$event_name shows that the parameters to the latch free wait event are as follows:
SELECT * FROM v$event_name WHERE name = 'latch free';
EVENT# NAME PARAMETER1 PARAMETER2 PARAMETER3
---------- -------------------- --------------- --------------- ---------------
3 latch free address number tries
Some of the latch free wait events in the trace file generated before the PROCESSING_STATIONS table was rebuilt were as follows:
WAIT #2: nam='latch free' ela= 47004 p1=15113593728 p2=97 p3=0
WAIT #2: nam='latch free' ela= 14629 p1=15113593728 p2=97 p3=1
WAIT #2: nam='latch free' ela= 20652 p1=15113593728 p2=97 p3=2
WAIT #2: nam='latch free' ela= 37737 p1=15113593728 p2=97 p3=3
A query against v$latch or v$latchname will give us the name of latch number 97:
SELECT name
FROM v$latch
WHERE latch# = 97;
NAME
-------------------------
cache buffers chains
So we can see that the four wait events listed from the trace file above amount to a wait of approximately 0.12 seconds for the cache buffer chains latch. (Remember that in Oracle 9i trace files, elapsed times are listed in microseconds while in trace files from Oracle 8i and earlier the elapsed times are in centiseconds.) This is not unexpected, since we had 100 sessions contending for the right to modify one buffer in the buffer cache.
Comparing the TKPROF report before rebuilding the
PROCESSING_STATIONS
table with after, we see that rebuilding the table reduced the latch
contention
but did not eliminate it. It is likely that the long-term solution of
modifying
the application to reduce the update frequency will eliminate or
further
reduce the latch contention.
Example #3: Log File Waits
An application performed occasional loads into a small data
warehouse. Unfortunately, the loads were taking a much longer time on
the production
system than they had in a test environment. To find out what was
happening
on the production database during a data load, we ran the following
script
that displays the wait event activity for the entire instance over a 30
second interval. The list of events in the NOT IN clause simply filters
out the idle events, making the output a little easier to read:
DROP TABLE previous_events;
CREATE TABLE previous_events
AS
SELECT SYSDATE timestamp, v$system_event.*
FROM v$system_event;
EXECUTE dbms_lock.sleep (30);
SELECT A.event,
A.total_waits - NVL (B.total_waits, 0) total_waits,
A.time_waited - NVL (B.time_waited, 0) time_waited
FROM v$system_event A, previous_events B
WHERE B.event (+) = A.event
AND A.event NOT IN
(
'client message', 'dispatcher timer', 'gcs for action',
'gcs remote message', 'ges remote message', 'i/o slave wait',
'jobq slave wait', 'lock manager wait for remote message',
'null event', 'Null event', 'parallel query dequeue', 'pipe get',
'PL/SQL lock timer', 'pmon timer', 'PX Deq Credit: need buffer',
'PX Deq Credit: send blkd', 'PX Deq: Execute Reply',
'PX Deq: Execution Msg', 'PX Deq: Signal ACK',
'PX Deq: Table Q Normal', 'PX Deque Wait', 'PX Idle Wait',
'queue messages', 'rdbms ipc message', 'slave wait',
'smon timer', 'SQL*Net message to client',
'SQL*Net message from client', 'SQL*Net more data from client',
'virtual circuit status', 'wakeup time manager'
)
ORDER BY time_waited;
Running this script during a data load yielded the following output:
EVENT TOTAL_WAITS TIME_WAITED
----------------------------------- ----------- -----------
LGWR wait for redo copy 0 0
SQL*Net break/reset to client 2 0
SQL*Net more data to client 0 0
buffer busy waits 0 0
db file scattered read 6 0
direct path read 27 0
direct path write 34 0
direct path write (lob) 0 0
file identify 6 0
file open 93 0
library cache pin 0 0
local write wait 0 0
process startup 0 0
rdbms ipc reply 0 0
refresh controlfile command 0 0
single-task message 0 0
switch logfile command 0 0
control file sequential read 61 1
latch free 2 1
db file sequential read 6 7
control file parallel write 41 31
log file single write 6 164
db file parallel write 13 220
enqueue 6 486
log buffer space 24 2007
log file sequential read 30 2655
log file switch completion 33 2883
log file parallel write 19 3561
log file sync 113 10249
During the 30 second sample interval, Oracle processes spent roughly
215 seconds waiting on events related to the online redo log. The
likely culprit
in this situation is some combination of slow disks and contention on
the
disks holding the online redo log files. After a little research, we
found
that the redo logs had been striped on a RAID array on the faster test
system.
On the production system, however, the redo logs were all on a single
disk
that was shared with another very active database. When the online redo
logs were moved to separate disks, the data loads ran more rapidly and
the
wait events pertaining to the redo log decreased significantly.
Example #4: Direct Path Read
and
Write Waits
A company experienced delays when querying summarized information
about
their orders. Querying individual orders was fast, but queries
involving
totals or rollups seemed to take unacceptably long. We ran our Database
Rx monitoring tool at regular intervals to collect performance
information
from the system while the users were performing their typical tasks.
(Database
Rx is a proactive monitoring and alert notification tool that we
developed
for use with our remote database administration clients. For a free
demo,
see our website at dbrx.dbspecialists.com/guest.)
Looking at the Top 5 Wait Events section, we see that the leading wait events are direct path write, db file scattered read, and direct path read. These three wait events combined accounted for roughly 99% of the non-idle wait time on the system during the sample interval. From our knowledge of the queries in the application, the db file scattered read (i.e., full table scan) events didn't surprise us. The absence of db file sequential read waits also did not surprise us as the relevant blocks for index lookups and row fetches by ROWID were likely present in the buffer cache.
We figured we might be able to reduce the db file scattered read waits with a larger buffer cache or judicious use of the KEEP pool in the short term and application code review in the longer term. What seemed amiss, however, were the large numbers of direct path read and write wait events. Querying v$session_wait while applications were running yielded the following:
SELECT sid, event, state, seconds_in_wait, wait_time, p1, p2, p3
FROM v$session_wait
WHERE event = 'direct path write'
ORDER BY sid;
SID EVENT STATE SECONDS_IN_WAIT WAIT_TIME P1 P2 P3
---- ----------------- ----- --------------- --------- ------ ------ ------
39 direct path write WAITI 0 0 201 65 7
47 direct path write WAITI 0 0 201 2248 7
Querying v$event_name shows the parameters for the direct path write event:
SELECT * FROM v$event_name WHERE name = 'direct path write';
EVENT# NAME PARAMETER1 PARAMETER2 PARAMETER3
---------- -------------------- --------------- --------------- ---------------
115 direct path write file number first dba block cnt
We see that the first parameter is supposed to be the file number of the data file containing the blocks being written, but we couldn't find data file 201 using the following query:
SELECT tablespace_name, file_id "AFN", relative_fno "RFN"
FROM dba_data_files
WHERE file_id = 201;
no rows selected
This is because the database is using temp files in the temporary tablespace, so we used the following query:
SELECT tablespace_name, file_id + value "AFN", relative_fno "RFN"
FROM dba_temp_files, v$parameter
WHERE name = 'db_files';
TABLESPACE_NAME AFN RFN
------------------------------ ---------- ----------
TEMP 201 1
This indicated that the TEMP tablespace was experiencing direct path write waits, which is not surprising as direct path reads and writes usually occur in connection with temporary segments. Direct path I/O operations were responsible for most of the waits on the system and appeared to be a bottleneck. Upon further examination, we discovered that the sort_area_size parameter was set to the default value of 65536. Since the database server had a fair amount of RAM free and there generally weren't too many sessions connected simultaneously, we increased the sort_area_size to 10 Mb, and the direct path waits disappeared as sorts no longer had to be written to disk.
If increasing the sort_area_size had not been an option, we might have looked at the disk storage where the TEMP tablespace temp files were located to see if there was I/O contention with other systems. We might also have looked at application code to see if the amount of sorting and hashing could be reduced.
This example also demonstrates something that might seem odd at
first
glance. During the Database Rx sample period, there were more physical
reads
than logical reads. How can this be-don't all physical reads also
require
logical reads? Well, this is where direct path reads get their name. As
Oracle says in reference note 50415.1 on Metalink, "Direct path reads
are
generally used by Oracle when reading directly into PGA memory (as
opposed
to into the buffer cache)." Since direct path reads bypass the buffer
cache,
they aren't counted as logical reads. This is the reason that some
people
question the traditional definition of the buffer cache hit ratio,
because
it doesn't take account of direct path reads. We won't get into that
debate,
but be aware that direct path reads could distort your cache hit ratio.
In this example, the cache hit ratio would be negative!
Example #5: Database Link Wait
Events
Several of the SQL*Net wait events are considered idle events. For
example, the "SQL*Net message from client" event indicates that the
process is waiting for the application to submit a SQL statement for
processing. However, the SQL*Net wait events pertaining to database
links are not idle events-they
come into play when queries use database links to access remote
databases.
The "SQL*Net message from dblink" event, for example, indicates that
the
local process has submitted a request to a remote database and is
waiting
for a response.
A company had deployed a data import application for receiving batches of customer transactions from its partners. There were five Oracle databases, one per geographic region. Customer transactions were sorted by geographic location, and loaded into the database serving that region. Due to human error, sometimes the same customer transaction would get imported into multiple Oracle databases. They had developed a reconciliation report to address this. The report was simply a query that compared transactions between two databases and listed records found on both. Unfortunately, the report took over 30 minutes to run and the runtime was expected to get much worse as the customer base grew.
Looking at the report code, we found the main query, which appeared to be the one using the lion's share of resources. We logged onto SQL*Plus as an application user in one window and logged onto SQL*Plus again as a DBA in another window. From the DBA session we queried v$session and found the SID corresponding to our application user SQL*Plus session. Then we launched the query from the application user session and monitored v$session_event from the other. Since the wait event interface does not account for CPU usage, we also monitored v$sesstat.
After the report query completed 31 minutes later, we queried v$session_event one last time and the results were as follows:
SELECT event, total_waits, time_waited, max_wait
FROM v$session_event
WHERE sid = 47
ORDER BY event;
EVENT TOTAL_WAITS TIME_WAITED MAX_WAIT
------------------------------ ----------- ----------- ----------
SQL*Net message from client 32 4435 2432
SQL*Net message from dblink 1525516 104919 31
SQL*Net message to client 33 0 0
SQL*Net message to dblink 1525516 466 9
db file sequential read 27199 8025 28
latch free 40 5 4
log file sync 1 2 2
SELECT A.name, B.value
FROM v$statname A, v$sesstat B
WHERE A.statistic# = 12
AND B.statistic# = A.statistic#
AND B.sid = 47;
NAME VALUE
------------------------------ ----------
CPU used by this session 67937
We can ignore the SQL*Net message from client wait event because
this
just indicates time Oracle spent waiting for us to submit the query in
SQL*Plus. However, there were over 1.5 million network roundtrips to a
database link totaling over 1053 seconds of wait time. (The 1053
seconds includes both
network latency and waiting on the remote database server to respond to
requests.) There were over 27,000 waits on single block disk I/Os
totaling
about 80 seconds of wait. There were also latch free and log file sync
wait
events, but the wait times were insignificant. The query from v$sesstat
shows that about 679 seconds of CPU time were used.
Knowing nothing about the report query or database size, the number
of network roundtrips seemed very high and the number of single block
disk
I/Os seemed like it might be high. Before we could go any further, we
really
needed to look at the query and the tables involved. The report query
was
as follows:
SELECT customer_id, batch_serial_number, batch_date, load_date,
batch_comment, control_total, control_amount
FROM customer_transfer_batches A
WHERE exists
(
SELECT 1
FROM customer_transfer_batches@prdwest B
WHERE B.customer_id = A.customer_id
AND B.batch_serial_number = A.batch_serial_number
)
ORDER BY customer_id, batch_serial_number;
Estimated statistics on the CUSTOMER_TRANSFER_BATCHES table showed that it had approximately 1,526,000 rows spanning approximately 21,825 blocks. Correlating the wait event information with the table size, it was clear that Oracle was reading the local table one block at a time and querying the remote table once per local row. The execution plan for the query confirmed this:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT
1 0 FILTER
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'CUSTOMER_TRANSFER_BATCHES'
3 2 INDEX (FULL SCAN) OF 'CUSTOMER_TRANSFER_BATCHES_PK' (UNIQUE)
4 1 REMOTE* PRDWEST
4 SERIAL_FROM_REMOTE SELECT "CUSTOMER_ID","BATCH_SERIAL_NUMBER"
FROM "CUSTOMER_TRANSFER_BATCHES" "B"
WHERE "BATCH_SERIAL_NUMBER"=:1
AND "CUSTOMER_ID"=:2
Performing 1.5 million network roundtrips and making the remote database execute 1.5 million queries did not seem to be the most efficient way to go. It would probably be more efficient for the local database to pull all of the required data from the remote database in one shot and then join it to the local table. We probably could have added an optimizer hint to the query to get Oracle to rewrite the subquery as a join, but instead we chose to rewrite the query ourselves in a way that would be easier for Oracle's query optimizer to do the right thing. The rewritten query was as follows:
SELECT customer_id, batch_serial_number, batch_date, load_date,
batch_comment, control_total, control_amount
FROM customer_transfer_batches
WHERE (customer_id, batch_serial_number) IN
(
SELECT customer_id, batch_serial_number
FROM customer_transfer_batches
INTERSECT
SELECT customer_id, batch_serial_number
FROM customer_transfer_batches@prdwest
)
ORDER BY customer_id, batch_serial_number;
We ran the revised query and monitored v$session_event and v$sesstat as before, with the following results:
SELECT event, total_waits, time_waited, max_wait
FROM v$session_event
WHERE sid = 49
ORDER BY event;
EVENT TOTAL_WAITS TIME_WAITED MAX_WAIT
------------------------------ ----------- ----------- ----------
SQL*Net message from client 46 3680 2481
SQL*Net message from dblink 24 31 18
SQL*Net message to client 47 0 0
SQL*Net message to dblink 24 0 0
SQL*Net more data from dblink 5978 1337 13
db file scattered read 3430 675 8
db file sequential read 182 60 2
direct path read 148 233 11
direct path write 920 3572 33
log file sync 1 2 2
SELECT A.name, B.value
FROM v$statname A, v$sesstat B
WHERE A.statistic# = 12
AND B.statistic# = A.statistic#
AND B.sid = 49;
NAME VALUE
------------------------------ ----------
CPU used by this session 3227
This time the query completed in 92 seconds. Both elapsed time and CPU usage were reduced by about 95%. The waits on the database link were reduced from about 1053 seconds to under 14 seconds, indicating less network overhead and a lot less demand being placed on the remote database server. Waits on db file sequential read and db file scattered read combined were significantly shorter than the waits on db file sequential read in the original query because large numbers of single block I/Os were replaced with fewer multi-block I/Os.
However, there were about 38 seconds of direct path I/O waits. As we've seen, direct path I/O usually occurs in connection with temporary segments. Oracle had to do quite a bit of sorting to implement the INTERSECT operation added to the rewritten query, so the direct path I/O waits were not unexpected.
The performance gains achieved by the query rewrite were significant, and we were tempted to end the tuning exercise right here. However, the company indicated that this report was critical to their operation and they wanted to make it run even faster if possible. So, our next step was to speed up the sort operation and thereby reduce the direct path I/O waits.
Looking at v$parameter, we saw that the sort_area_size was set to 1 Mb. A larger sort area would allow Oracle to perform more of the sort in memory and less on disk, thus reducing the direct path I/O. To see the effect of a larger sort area, we repeated the timing exercise, first setting sort_area_size to 100 Mb dynamically in the SQL*Plus session where the report query was run. The query completed in 55 seconds this time, and statistics collected from v$session_event and v$sesstat were as follows:
SELECT event, total_waits, time_waited, max_wait
FROM v$session_event
WHERE sid = 46
ORDER BY event;
EVENT TOTAL_WAITS TIME_WAITED MAX_WAIT
------------------------------ ----------- ----------- ----------
SQL*Net message from client 47 442 287
SQL*Net message from dblink 25 25 14
SQL*Net message to client 48 0 0
SQL*Net message to dblink 25 0 0
SQL*Net more data from dblink 6050 1378 26
db file scattered read 3430 945 8
db file sequential read 191 59 1
log file sync 1 3 3
SELECT A.name, B.value
FROM v$statname A, v$sesstat B
WHERE A.statistic# = 12
AND B.statistic# = A.statistic#
AND B.sid = 46;
NAME VALUE
------------------------------ ----------
CPU used by this session 3296
You can see that increasing the sort area size did not reduce the CPU time, database link wait time, or db file scattered/sequential read times. However, the direct path I/O waits disappeared completely. This indicates that all sorting was performed in memory. Of course, it may not be feasible to keep the sort entirely in memory as the CUSTOMER_TRANSFER_BATCHES table grows. However, a larger sort area will still speed up the sort and overall query speed even if the entire sort cannot be performed in memory.
This example demonstrated the practice of iterative tuning: Curing
one bottleneck often creates another or reveals another performance
issue that
was masked by the larger issue. Boosting performance of complex systems
is usually an iterative process.
Example #6: Slow Batch
Processing
The operations group noticed that its overnight batch processing was
taking significantly longer after an additional data feed program was
added to the nightly run schedule. The decision was made to add more
processors to the database server to speed things up. Unfortunately,
this did not do the trick; the programs ran at about the same overall
speed as before.
The v$system_event view can come in handy in situations like this in order to take a quick glance at the overall system. By making a copy of the view's contents at two separate points in time, you can get a high level summary of what processes were waiting for during the time interval. Here is an extremely basic SQL*Plus script that displays the wait event activity for the instance over a 30 second time interval:
CREATE TABLE previous_eventsOf course, you could use Statspack, utlbstat, or any number of other tools out there to capture this information as well. Running the above script while the batch programs were running generated the following output:
AS
SELECT SYSDATE timestamp, v$system_event.*
FROM v$system_event;
EXECUTE dbms_lock.sleep (30);
SELECT A.event,
A.total_waits - NVL (B.total_waits, 0) total_waits,
A.time_waited - NVL (B.time_waited, 0) time_waited
FROM v$system_event A, previous_events B
WHERE B.event (+) = A.event
ORDER BY A.event;
EVENT TOTAL_WAITS TIME_WAITEDThe idle events have been removed for clarity. What stands out here is the enqueue wait event. During the 30 second time interval, processes collectively spent over 56 seconds waiting for enqueues, or locks that you can see in v$lock. The most common types of locks to show up in an enqueue wait are the table-level lock (TM) and the transaction enqueue or row-level lock (TX).
------------------------------ ----------- -----------
LGWR wait for redo copy 115 41
buffer busy waits 53 26
control file parallel write 45 44
control file sequential read 25 0
db file scattered read 932 107
db file sequential read 76089 6726
direct path read 211 19
direct path write 212 15
enqueue 37 5646
file identify 2 0
file open 37 0
free buffer waits 11 711
latch free 52 44
log buffer space 2 8
log file parallel write 4388 1047
log file sequential read 153 91
log file single write 2 6
log file switch completion 2 24
write complete waits 6 517
The parameters for the enqueue wait event provide information about the type and location of the lock waited upon, but the information is fairly obscure. The first two bytes of P1RAW are the ASCII codes of the lock type, and the last two bytes are the requested lock mode. P2 and P3 correspond to the ID1 and ID2 columns in the v$lock view.
In order to get a sense of where the locking contention was coming from, I used the oradebug facility in SQL*Plus to activate system event 10046 for one of the batch programs as it was running. Since tracing incurs a lot of overhead, this significantly slowed down batch processing. But this was only done once, and with the greater good of eliminating the contention problems in mind.
To activate event 10046, I looked at v$session and v$process to find the Oracle PID of the dedicated server process running one of the batch programs. Then I ran the following commands in SQL*Plus:
SQL> oradebug setorapid 13This caused the dedicated server process to begin writing a trace file including wait event information. An excerpt from the trace file is as follows:
Unix process pid: 19751, image: oracle@dbserver.acme.com (TNS V1-V3)
SQL> oradebug session_event 10046 trace name context forever, level 8
Statement processed.
SQL>
EXEC #5:c=0,e=0,p=3,cr=2,cu=1,mis=0,r=1,dep=1,og=4,tim=2313020980We can see the enqueue wait events pretty clearly. Using the magical decoder ring on the P1 values, we can see that Oracle was waiting on TX locks requested in mode 6 (exclusive mode). (1415053318 is 54580006 in hexadecimal. The 54 corresponds to a "T", the 58 to an "X", and 0006 is exclusive mode. Look up the v$lock view in the Server Reference Guide for a list of lock types and modes.)
XCTEND rlbk=0, rd_only=0
WAIT #1: nam='write complete waits' ela= 11 p1=3 p2=2 p3=0
WAIT #4: nam='db file sequential read' ela= 4 p1=10 p2=12815 p3=1
WAIT #4: nam='db file sequential read' ela= 1 p1=10 p2=12865 p3=1
WAIT #4: nam='db file sequential read' ela= 5 p1=3 p2=858 p3=1
=====================
PARSING IN CURSOR #4 len=65 dep=1 uid=502 oct=6 lid=502
tim=2313021001 hv=417623354 ad='55855844'
UPDATE CUSTOMER_CALLS SET ATTR_3 = :b1 WHERE CUSTOMER_CALL_ID=:b2
END OF STMT
EXEC #4:c=1,e=10,p=3,cr=2,cu=3,mis=0,r=1,dep=1,og=4,tim=2313021001
WAIT #5: nam='db file sequential read' ela= 0 p1=10 p2=5789 p3=1
WAIT #5: nam='enqueue' ela= 307 p1=1415053318 p2=196705 p3=6744
WAIT #5: nam='enqueue' ela= 307 p1=1415053318 p2=196705 p3=6744
WAIT #5: nam='enqueue' ela= 53 p1=1415053318 p2=196705 p3=6744
WAIT #5: nam='db file sequential read' ela= 0 p1=10 p2=586 p3=1
WAIT #5: nam='db file sequential read' ela= 1 p1=3 p2=858 p3=1
EXEC #5:c=0,e=668,p=3,cr=5,cu=3,mis=0,r=1,dep=1,og=4,tim=2313021669
What is not clear from the excerpt above is what table and row the process was trying to lock when it had to wait. If we were looking at the database while the lock contention was happening, we could look at the v$locked_object view, or the ROW_WAIT_OBJ# and ROW_WAIT_ROW# columns of v$session. But since all I had to work with was the contents of the trace file, I could see that the waits occurred while processing cursor number five. Searching backwards through the trace file I found the exact SQL statement that corresponded to cursor number five at the time of the lock contention.
This turned out to be enough information to diagnose the contention
problem and develop a workaround. By using autonomous transactions, we
were able to modify the batch programs that were contending with each
other so that they held locks on critical path rows for extremely short
periods of time. Adjusting the schedules of the programs in the nightly
run schedule also helped.
Example #7: A Slow
Client/Server Application
A client/server application was taking several seconds to bring up a
certain screen. The users complained constantly, and the developers
were confounded because the delay was occurring before the user had any
chance to kick off a query. The only thing happening in the form on
startup was some fetching of basic reference data. All of the SQL had
been tuned and was known to work very quickly. Being frequently
accessed reference data, the data blocks were likely to be in the
buffer cache--eliminating slow I/O as a possible bottleneck.
Looking at v$session_event for a user who had just logged on to the application but not yet begun any activities showed almost 20,000 waits on the SQL*Net message from client event, totaling over 300 seconds of waiting. This is an idle event, which we would typically ignore, but something seemed wrong and warranted further investigation.
A process waits on the SQL*Net message from client event when it has completed a request from a user and now needs to wait for the next request. You typically see high wait times on this event in front-end applications that spend a lot of time waiting for a user to submit a query or initiate some other action that requires database activity. So the 300 seconds of wait time here did not seem unusual, but the high number of waits did not seem right.
The application code was modified in a test environment so that it would disable timed statistics for the session after all initialization was complete. This caused the time information in v$session_event to reflect wait times during the form startup phase only; the time spent waiting for a user to initiate an action would be excluded because timed statistics would be disabled at that point.
Running the modified application yielded the following information in v$session_event:
EVENT TOTAL_WAITS TIME_WAITEDA fetch of a few pertinent session statistics from v$sesstat showed the following:
------------------------------ ----------- -----------
SQL*Net message to client 18520 6
SQL*Net message from client 18519 1064
NAME VALUEThe data suggested that when the application started up, it was spending over 10 seconds performing more than 18,000 network roundtrips. The database server was using a little under one second of CPU time to perform 9,295 logical reads of data blocks that were already in the buffer cache.
------------------------------ ----------
session logical reads 9295
CPU used by this session 82
physical reads 0
A closer inspection of the reference data being retrieved by the application revealed 9,245 rows of reference data. The application code was fetching the data from cursors one row at a time. This resulted in two network roundtrips and one logical read for each row fetched.
The application was modified to use Oracle's array processing interface in order to fetch 100 rows at a time. This sped up the application dramatically by reducing network roundtrips. An additional benefit was that CPU usage on the database server was reduced because only about one logical read per data block was required, instead of one logical read per row retrieved.
When fetching 100 rows at a time, the waits and session statistics looked like this:
EVENT TOTAL_WAITS TIME_WAITEDBy fetching rows up to 100 at a time, we were able to dramatically improve response time by reducing network roundtrips--and reduce CPU usage on the database server at the same time.
------------------------------ ----------- -----------
SQL*Net message to client 200 0
SQL*Net message from client 199 28
NAME VALUE
------------------------------ ----------
session logical reads 135
CPU used by this session 3
physical reads 0
Conclusion
Because wait events touch all areas of Oracle server operation, the
wait event interface can provide insights into just about any kind of
performance problem you might experience. Other metrics, such as cache
hit ratios, can
give you valuable information, but it is the wait event interface that
ties everything together. In the examples discussed here, wait event
data directed
us to a swift diagnosis of each problem.
The wait event interface, with its four dynamic performance views and tracing facility, enables you to see how an Oracle process is spending its time when it is not using the CPU. While wait event information may not give you everything you might need to cure every performance problem, it does give you access to a wealth of information that can guide you on the path to better system performance.
The basic concepts behind the wait event interface have not changed
since Oracle 7. But each release of Oracle brings a few more
enhancements that
make the wait event interface easier to use or more informative. In
Oracle
9i, TKPROF has been enhanced to parse wait event information from trace
files and format it in a useful report. Also, some timing information
is
tracked in microseconds for more resolution, and the DBMS_SUPPORT
package
has been included for easier trace activation. However, these are just
bells
and whistles.