Tips to Write Effective Queries and Explain Plan
SQL
Statement Processing
Phases
The four statement processing phases in SQL are parsing binding,
executing and fetching.
BIND: It scans the statement for bind variables and assigns a value to each variable.
EXECUTE: The Server applies the parse tree to the data buffers, performs necessary I/O and sorts for DML statements.
FETCH: Retrieves rows for a SELECT
statement during
the fetch phase. Each fetch retrieves multiple rows, using an array
fetch.
A careful understanding of these steps will show
that real user data are being processed in the steps 2 through 4; and
that the step 1 merely is present for the Oracle engine to deal with
the SQL statement.
This first step may take considerable time and resources, and as it is
overhead seen from the data processing point of view, applications
should be written to minimize the amount of time spent during this
step. The most efficient way to do this is to avoid the
parse/optimization step as much as possible.
Optimizer Method and how to
know the Driving Table.
A small "golden rule" is that your driving table should be the table
that returns the smallest number of rows (so you need to look at the
where clause), and this is not always the table with the smallest
number of rows. But…. Where to specify the
driving Table?
Oracle processes result sets a table at a time. It starts by retrieving
all the data for the first (driving) table. Once this data is
retrieved it is used to limit the number of rows processed for
subsequent (driven) tables. In the case of multiple table joins, the
driving table limits
the rows processed for the first driven table. Once processed, this
combined set of data is the driving set for the second driven table
etc. Roughly translated into English, this means that it is best to
process tables that will retrieve a small number of rows
first.
The optimizer will do this to the best of its ability regardless of
the structure of the DML, but some factors may help.
Both the Rule and Cost based optimizers select a driving table for each
query.
In the RBO (Rule Based Optimizer) the driving table is the LAST TABLE in the FROM CLAUSE (chooses the driving
order by
taking the tables in the FROM clause RIGHT to LEFT).
In the CBO (Cost Based Optimizer)
the driving table is is determinated from costs derived from GATHERED STATISTICS. If
there are
no statistics or if the optimizer_mode IS COST then CBO chooses the
driving order
of tables from LEFT to RIGHT in the FROM clause, Place
the most limiting tables first in the FROM clause
If a decision cannot be made, the order of
processing is FROM the END of the
FROM clause to the START.
In RBO, we have a habit of ordering tables right-to-left in queries,
right being the driving table for the query.
In CBO, I had to adapt to ordering from left-to-right, left being the
driving table. The ORDERED hint used in CBO picks up tables
left-to-right for processing. Take a pick.
The WHERE clause is the main decision maker about
which
indexes to use. You should always try to
use your unique indexes first, and then if that is not possible then
use a
non-unique index. For a query to use an index, one or more fields from
that
index need to be mentioned in the WHERE clause. On
concatenated indexes the index will only be used if the first
field
in the index is mentioned.On 10g that in not needed any
more!!!
The more of its
fields are mentioned in the where clause,
the better an index is used.
So if you need to get statistics on
your schema quickly, you can perform:
BEGIN
dbms_stats.gather_schema_stats (ownname => 'SCOTT'
, estimate_percent => 10
, degree => 5
, cascade => true);
END;
/
OR
execute
dbms_stats.gather_schema_stats(ownname => 'SCOTT', estimate_percent
=> 10, degree => 5, cascade => true);
If you want to grab statistics for a Table and its indexes, then:
EXEC
DBMS_STATS.gather_table_stats('SCOTT', 'TEST', cascade => TRUE);
More information HERE
The TIPS to write better queries
Although two SQL statements may produce the same result,
Oracle may process one faster than the other. You can use the results
of the
EXPLAIN PLAN statement to compare the execution plans and costs of the
two
statements and determine which is more efficient. Following
are some tips that help in writing
efficient queries.
Before starting our discussion, once nice
parameter to know:
Flushing the
Buffer Cache
Prior to Oracle Database 10g, the only way to flush the database buffer
cache was to shut down the database and restart it. Oracle Database 10g
now allows you to flush the database buffer cache with the alter system
command using the flush buffer_cache parameter. The FLUSH Buffer Cache
clause is useful if you need to measure the performance of rewritten
queries or a suite of queries from identical starting points. Use the
following statement to flush the buffer cache.
ALTER SYSTEM FLUSH
BUFFER_CACHE; #This command flushed the buffer cache in the SGA
ALTER SYSTEM FLUSH
SHARED_POOL; #This command flushed
the shared pool
However, note that these clauses are intended for use only on a test
database. It is not advisable to use them on a production
database, because subsequent queries will have no hits, only misses.
Declare with
Care!!
The following table and then the sections after that offer some
concrete advice on potential issues you might encounter when declaring
variables in PL/SQL
NUMBER If you don’t specify a precision, as in NUMBER(12,2), Oracle
supports up to 38 digits of precision. If you don’t need this
precision, you’re wasting memory.
CHAR This is a fixed-length character string and
is mostly available for compatibility purposes with code written in
earlier versions of Oracle. The values assigned to CHAR variables are
right-padded with spaces, which can result in unexpected behavior.
Avoid CHAR unless it’s specifically needed.
VARCHAR2 The greatest challenge you will run into with VARCHAR2 is to
avoid the tendency to hard-code a maximum length, as in VARCHAR2(30).
Use %TYPE as described later in this sectoin.
INTEGER If your integer values fall within the range of –231+1 .. 231–1
(a.k.a. – 2147483647 .. 2147483647), you should declare your variables
as PLS_INTEGER. This is the most efficient format for integer
manipulation (until you get to Oracle Database 10g Release 2, at which
point BINARY_INTEGER, PLS_INTEGER and all the other subtypes of
BINARY_INTEGER offer the same performance).
Anchor
variables to database datatypes using %TYPE and %ROWTYPE.
When you declare a variable using %TYPE or %ROWTYPE, you “anchor” the
type of that data to another, previously defined element. If your
program variable has the same datatype as (and, as is usually the case,
is acting as a container for) a column in a table or view, use %TYPE to
define it from that column. If your record has the same structure as a
row in a table or view, use %ROWTYPE to define it from that
table. Your code will automatically adapts to underlying changes
in data structures.
1.
Existence
of a row
Do not use ‘Select count(*)…’ to test the existence of a
row. Instead, open an explicit cursor, fetch once, and then check
cursor%NOTFOUND :
2.
Avoid the use of NULL or IS NOT NULL.
Instead of:
Select * from clients where phone_number is null;
Use:
Select * from
clients where phone_number =
0000000000000000;
3. Select the data
that you need ONLY!!!
When selecting from a table, be sure to only select the data that you
need.
For example, if you only need 1 column from a 50 column table, be sure
to do a
'select fld from table' and
only retrieve what you need. If you do a
'select * from table' you
will be
fetching ALL columns of the table which increases network traffic and
causes the system to perform unnecessary work to retrieve data that is
not being used
4. Always use
table alias and prefix
The parse phase for statements can be decreased by efficient use of
aliasing. This helps the speed of parsing the statements in two ways:
5. IN and EXISTS
6.
Use Joins in place of EXISTS.
SELECT *
FROM emp e
WHERE
EXISTS
(SELECT d.deptno
FROM
dept d
WHERE
e.deptno = d.deptno
AND d.dname =
'RESEARCH');
SELECT *
FROM emp e, dept d
WHERE e.deptno = d.deptno
AND d.dname
= ‘RESEARCH’;
7.
Use EXISTS in place
of DISTINCT.
SELECT
DISTINCT d.deptno, d.dname ,
FROM dept d, emp e
WHERE d.deptno
= e.deptno;
SELECT d.deptno , d.dname
FROM dept d
WHERE
EXISTS
(SELECT 'X'
FROM emp e
WHERE d.deptno = e.deptno);
Another Example:
SELECT DISTINCT hetitle, hename
FROM helpfiles h , merchant m
WHERE m.merfnbr = h.hemenbr;
Much Better:
SELECT hetitle, hename
FROM helpfiles h WHERE EXISTS
(SELECT m.merfnbr
FROM merchant m);
48% Time Reduction could be achieved.
8.
Math Expressions.
The optimizer fully evaluates expressions whenever possible
and translates certain syntactic constructs into equivalent constructs.
This is
done either because Oracle can more quickly evaluate the resulting
expression
than the original expression or because the original expression is
merely a syntactic
equivalent of the resulting expression.
Any computation of constants is performed only once when the
statement is optimized rather than each time the statement is executed.
Consider these conditions that test for monthly salaries greater than
2000:
sal > 24000/12
sal > 2000
sal*12 > 24000
If a SQL statement
contains the first condition, the optimizer simplifies it into the
second
condition.
Note that the
optimizer does not simplify expressions across comparison operators.
The optimizer
does not simplify the third expression into the second.
For this reason, application developers
should write conditions that compare columns with constants whenever
possible,
rather than conditions with expressions involving columns.
The Optimizer
does
not use index for the following statement.
SELECT
*
FROM emp
WHERE sal*12
> 24000;
Instead use
the
following statement.
SELECT
*
FROM emp
WHERE sal
> 24000/12;
9. Never use NOT in an
indexed column. Whenever Oracle encounters a NOT in an index
column, it will perform full-table scan.
SELECT *
FROM emp
WHERE NOT
deptno = 0;
SELECT
*
FROM emp
WHERE deptno
> 0;
10. Never use a function / calculation on an indexed column (unless you are SURE that you are using an Index Function Based new in Oracle 8i). If there is any function is used on an index column, optimizer will not use index. Use some other alternative. If you don’t have another choice, keep functions on the right hand side of the equal sign. The Concatenate || symbol will also disable indexes. Examples:
/**
Do not use
**/
SELECT * FROM
emp WHERE SUBSTR (ENAME, 1,3) = ‘MIL’;
/** Suggested Alternative
**/
Note: Optimizer
uses the index only when optimizer_goal is set to FIRST_ROWS.
SELECT *
FROM emp WHERE ENAME LIKE 'MIL%’;
/**
Do not use
**/
SELECT *
FROM emp WHERE sal! = 0;
Note: Index can
tell you what is there in a table but not what is not in a table.
Note: Optimizer
uses the index only when optimizer_goal = FIRST_ROWS.
/** Suggested
Alternative **/
SELECT *
FROM emp WHERE sal > 0;
/**
Do not use
**/
SELECT *
FROM emp WHERE ename || job =
‘MILLERCLERK’;
Note: || is the
concatenate function. Like other
functions it disables index.
/** Suggested
Alternative **/
Note: Optimizer
uses the index only when optimizer_goal=FIRST_ROWS.
SELECT *
FROM emp
WHERE ename
= 'MILLER'
AND job
= ‘CLERK’;
11. Whenever possible try to use
bind variables
create or replace procedure
dsal(p_empno in number) as
begin
execute immediate 'update
emp set sal = sal*2 where empno =
'||p_empno;
commit;
end;
/
create or replace procedure
dsal(p_empno in number) as
begin
execute immediate 'update
emp set sal=sal*2 where empno =
:x' using p_empno;
commit;
end;
/
Just to give you a tiny idea of how huge of a difference this can make
performance wise, you only need to run a very small test:
SQL> alter system flush shared_pool;
SQL> set serveroutput on;
declare
type rc is ref
cursor;
l_rc rc;
l_dummy
all_objects.object_name%type;
l_start number
default dbms_utility.get_time;
begin
for i in 1 .. 1000
loop
open l_rc for
'select object_name from all_objects
where object_id = ' || i;
fetch l_rc into l_dummy;
close l_rc;
--
dbms_output.put_line(l_dummy);
dbms_output.put_line (round((dbms_utility.get_time-l_start)/100, 2) ||
' Seconds...' );
end;
/
101.71 Seconds...
... and here is the Performance Winner:
declare
type rc is ref
cursor;
begin
end;
/
1.9 Seconds...
12.
Use the same convention for all your queries.
Oracle will put all your SQL or PL/SQL code in memory and will
reuse statements that are the same (saving parse time). So remember that:
Select * from emp
where dept = :dept_no
Is different than
Select * from EMP
where dept = :dept_no
Even differing spaces in the statement will cause this lookup to fail.
Assuming the statement does not have a cached execution plan it must be
parsed before execution.
13.
Tuning the WHERE Clause:
- When using AND
Clauses in the
WHERE Clause, put the most stringent AND Clause furthest from the WHERE.
- When using
OR Clauses in the
WHERE Clause, put the most stringent OR Clause closest to the WHERE.
The HAVING clause filters selected rows only after all rows have been fetched. Using a WHERE clause helps reduce overheads in sorting, summing, etc. HAVING clauses should only be used when columns with summary operations applied to them are restricted by the clause.
| Given Query | Alternative |
|
SELECT d.dname, AVG
(e.sal) |
SELECT d.dname, AVG
(e.sal) |
Instead of this:
Update emp set
emp_cat =
(select max (category) from
emp_categories),
sal_range = (select max(sal_range) from
emp_categories);
Use:
Update emp set (emp_cat,
sal_range) = (Select max
(category), max (sal_range) from emp_categories) ;
Least Efficient:
Select ….
From location
Where loc_id =
10 or loc_id=20 or loc_id = 30
Most Efficient
Select ….
From location
Where loc_id
in (10,20,30)
17. Do not Commit inside a
Do not use a commit or DDL statements inside a loop or cursor, because
that will make the undo segments needed by the cursor unavailable.
Many applications commit more frequently than necessary, and their
performance suffers as a result. In isolation a commit is not a very
expensive operation, but lots of unnecessary commits can nevertheless
cause severe performance problems. While a few extra commits may not be
noticed, the cumulative effect of thousands of extra commits is very
noticeable. Look at this test. Insert 1,000 rows into a test table --
first as a single transaction, and then committing after every row.
Your mileage may vary, but these are my results, on an otherwise idle
system show a performance blowout of more than 100% when committing
after every row.
create table
t (n number);
--BAD METHOD
declare
start_time number;
begin
start_time := dbms_utility.get_time;
for i in 0..999 loop
insert into t values (i);
commit;
end loop;
dbms_output.put_line(dbms_utility.get_time - start_time ||
' centiseconds');
end;
/
102 centiseconds
truncate
table t;
--GOOD METHOD
declare
start_time number;
begin
start_time := dbms_utility.get_time;
for i in 0..999 loop
insert into t values (i);
end loop;
commit;
dbms_output.put_line(dbms_utility.get_time - start_time ||
' centiseconds');
end;
/
44 centiseconds
18. Use UNION ALL instead of UNION
The problem is that in a UNION, Oracle finds all the qualifying rows and then
"deduplicates" them. To see what I mean, you can simply compare the
following queries:
select
* from
dual
union
select * from dual;
D
---
X
select * from dual
union ALL
select * from dual;
D
---
X
X
Note how the first query returns only one record and the second returns two. A UNION forces a big sort and deduplication—a removal of duplicate values. Most of the time, this is wholly unnecessary. To see how this might affect you, I'll use the data dictionary tables to run a WHERE EXISTS query using UNION and UNION ALL and compare the results with TKPROF. The results are dramatic.
First, I'll do the UNION query:
SQL> select *
2 from dual
3 where exists
4 (select null from all_objects
5 union
6 select null from dba_objects
7 union
8 select null from all_users);
call cnt cpu ela query
---- --- ---- --- ------
Parse 1 0.01 0.00 0
Execute 1 2.78 2.75 192234
Fetch 2 0.00 0.00 3
----- ---- ---- ---- ------
total 4 2.79 2.76 192237
As you can see, that was a lot of work—more than 192,000 I/Os just to see if I should fetch that row from DUAL. Now I add a UNION ALL to the query:
SQL> select *
2 from dual
3 where exists
4 (select null from all_objects
5 union all
6 select null from dba_objects
7 union all
8 select null from all_users);
call cnt cpu ela query
------ ---- ---- ---- -----
Parse 1 0.00 0.00 0
Execute 1 0.01 0.00 9
Fetch 2 0.00 0.00 3
------ ---- ---- ---- -----
total 4 0.01 0.00 12
Quite a change! What happened here was that the WHERE EXISTS stopped running the subquery when it got the first row back, and because the database did not have to bother with that deduplicate step, getting the first row back was very fast indeed.
The bottom line: If you can use UNION
ALL, by all means use it over UNION to avoid a costly deduplication step—a step that is
probably not even necessary most of the time.
19. Check
that your application is using the existing indexes
This is a CRITICAL point. So make use of
Explain Plan!!!
SELECT ImportedDate, State
FROM IssueData
WHERE
TRUNC(ImportedDate ) = TRUNC(SYSDATE);
SELECT ImportedDate, State
FROM IssueData
WHERE
ImportedDate between trunc(SYSDATE) and TRUNC(SYSDATE) + .99999;
SELECT *
FROM dept
WHERE deptno NOT IN (SELECT deptno FROM EMP);SELECT dept.*
FROM dept, emp
WHERE dept.deptno = emp.deptno (+)
AND emp.ROWID IS NULL;SELECT *
FROM dept
WHERE NOT EXISTS (SELECT NULL FROM emp WHERE emp.deptno = dept.deptno);
Normally, an outer join of table A to table B would return every record in table A, and if it had a mate in table B, that would be returned as well. Every row in table A would be output, but some rows of table B might not appear in the result set. A full outer join would return ebery row in table A, as well as every row in table B. The syntax for a full outer join is new in Oracle 9i, but it is a syntactic convenience, it is possible to produce full outer joins sets using conventional SQL.
update
emp set deptno = 9 where deptno = 10;
commit;
| Conventional SQL | New Syntax |
|
SELECT empno, ename, dept.deptno, dname EMPNO ENAME DEPTNO DNAME |
SELECT empno, ename, |
23. Use
BETWEEN instead of IN.
The BETWEEN keyword is very useful for filtering out values in a
specific range. It is much faster than typing each value in the range
into an IN.
Example:
Before: SELECT crpcgnbr FROM
cgryrel WHERE crpcgnbr IN (508858, 508859, 508860, 508861,508862,
508863, 508864)
After: SELECT crpcgnbr FROM
cgryrel WHERE crpcgnbr BETWEEN 508858 and 508864
59% Time Reduction could be achieved.
The trace utility is very helpful to see the execution plan for a
specific query WITHOUT executing
it. We can obtain the execution plan and some additional statistics
on running a SQL command automatically using AUTOTRACE.
SET AUTOTRACE
<OPTIONS> <EXPLAIN or STATISTICS>
- Create the PLAN_TABLE as SYS by executing:
@$ORACLE_HOME/rdbms/admin/utlxplan.sql
create or replace public synonym
PLAN_TABLE
for PLAN_TABLE;
grant all on PLAN_TABLE to PUBLIC;
- Setup the PLUSTRACE role (to be used with AUTOTRACE options) as
SYS user:
@$ORACLE_HOME/sqlplus/admin/plustrce.sql
grant plustrace to public;
If granting the 'plustrace' role to public doesn't work, you could also
do the following:
alter user &USER_NAME default
role PLUSTRACE;
Note=
If you get problems with AUTOTRACE, then try the following as SYS:
grant select on v_$session to
plustrace;
Options to
execute it
OFF - Disables autotracing SQL statements
ON - Enables autotracing SQL Statements
TRACEONLY - Enables auto tracing SQL Statements, and Suppresses
Statement Output
EXPLAIN - Displays execution plans, but does not display
statistics
STATISTICS Displays statistics, but does not
display
execution plans.
The best option is to use SET
AUTOTRACE TRACE , this will not return
the selected data from the query, it will return the access path from
plan table and its statistics.
If you just want the execution plan, then you can use SET AUTOTRACE
TRACE EXP. These are the options:
set autotrace on explain;
-> only the explain plan and the query result
set autotrace on statistics;
-> only the result set and statistics. No explain plan
set autotrace traceonly;
-> only the explain plan and statistics . No query result
set autotrace traceonly
statistics; -> only the statistics. No query result or
explain plan
set autotrace traceonly
explain; -> only the explain plan. No query
result or statistics
Statistic
Explanation
• recursive calls:
The number of internal calls Oracle has made to execute the command.
Those additional calls(sql) executed by Oracle implicitly to process
your (user) sql statement. Can be many things, hard parses, trigger
executions , sort extent allocations , data dictionary lookups/updates
etc
• db block gets:
The number of blocks retrieved to answer the query. A: A 'db block get'
is a current mode get. That is, it's the most up-to-date copy of the
data in that block, as it is right now, or currently. There can only be
one current copy of a block in the buffer cache at any time. Db block
gets generally are used when DML changes data in the database. In that
case, row-level locks are implicitly taken on the updated rows. There
is also at least one well-known case where a select statement does a db
block get, and does not take a lock. That is, when it does a full table
scan or fast full index scan, Oracle will read the segment header in
current mode
• consistent gets:
The number of blocks retrieved that did not change the data and
therefore did not interfere with other users (i.e. by locking data). A
'consistent get' is when Oracle gets the data in a block which is
consistent with a given point in time, or SCN. The consistent get is at
the heart of Oracle's read consistency mechanism. When blocks are
fetched in order to satisfy a query result set, they are fetched in
consistent mode. If no block in the buffer cache is consistent to the
correct point in time, Oracle will (attempt to) reconstruct that block
using the information in the rollback segments. If it fails to do so,
that's when a query errors out with the much dreaded, much feared, and
much misunderstood ORA-1555 "snapshot too old".
• physical reads:
The number of blocks read from the disc. Basically those
that cannot be satisfied by the cache and those that are direct reads.
• redo size:
The number of redo entries. The redo entries are written out to the
online
redolog files from the log buffer cache by LGWR.
• bytes sent via
SQL*Net to client: The number of bytes sent across the network
from the server to the client.
• bytes received
via SQL*Net from client: The number of bytes sent across the
network from the client to the server.
• SQL*Net
roundtrips to/from client: The number of exchanges between
client and server.
• sorts (memory): The
number of data sorts performed in memory.
• sorts (disc):
The number of data sorts performed on disc.
• rows processed: The
number of rows processed by the query.
The db block gets, consistent gets and physical reads give the
number of blocks that were read to form the buffers or from the disc.
For many queries, the number of physical reads is low as the data is
already in the database buffers. If the number of physical reads is
high then the query will be expected to be slow as there will be many
disc accesses.
The bytes received/sent via SQL*Net indicate how much data is being
moved across the network. This is important as moving a lot of data
across the network may affect the network's performance.
The sorts indicate the amount of work done in sorting data during the
execution of the query. Sorts are important as sorting data is a slow
process.
The Explain Plan command uses a table to store
information
about the execution plan chosen by the optimizer.
Oracle provides an autotrace facility to provide execution plan and
some statistics.
Perform the following to check it:
EXPLAIN PLAN FOR
your query.
Example
EXPLAIN PLAN FOR
SELECT
* FROM emp e, dept d
WHERE e.deptno = d.deptno
AND e.ename = 'SMITH';
Finally use the DBMS_XPLAN.DISPLAY function to display the execution
plan:
SET LINESIZE 130
SET PAGESIZE 0
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY);
-------------------------------------------------------------------
| Id |
Operation
| Name | Rows | Bytes | Cost |
-------------------------------------------------------------------
| 0 | SELECT
STATEMENT
| | 1
| 57 | 3 |
| 1 | NESTED
LOOPS
| | 1
| 57 | 3 |
|* 2 | TABLE ACCESS
FULL |
EMP | 1 |
37 | 2 |
| 3 | TABLE ACCESS BY INDEX ROWID| DEPT |
1 | 20 | 1 |
|* 4 | INDEX UNIQUE
SCAN | PK_DEPT |
1 |
| |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("E"."ENAME"='SMITH')
4 - access("E"."DEPTNO"="D"."DEPTNO")
The DBMS_XPLAN.DISPLAY function can accept 3 parameters:
1 table_name - Name of plan table, default value
'PLAN_TABLE'.
2 statement_id - Statement id of the plan to be
displayed, default value NULL.
3 format - Controls the level of detail displayed,
default value 'TYPICAL'. Other values include 'BASIC', 'ALL', 'SERIAL'.
EXPLAIN
PLAN SET STATEMENT_ID='TSH' FOR
SELECT *
FROM emp e, dept d
WHERE e.deptno = d.deptno
AND e.ename = 'SMITH';
SET LINESIZE 130
SET PAGESIZE 0
SELECT *
FROM
TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE','TSH','BASIC'));
---------------------------------------------
| Id |
Operation
| Name |
---------------------------------------------
| 0 | SELECT
STATEMENT
| |
| 1 | NESTED
LOOPS
| |
| 2 | TABLE ACCESS
FULL |
EMP |
| 3 | TABLE ACCESS BY INDEX ROWID| DEPT |
| 4 | INDEX UNIQUE
SCAN | PK_DEPT |
---------------------------------------------
Explain
Plan using the SQL ID
exec dbms_stats.set_table_stats(
user, 'T', numrows => 1000000, numblks => 100000 );
declare
l_x_number number;
l_x_string varchar2(30);
begin
execute immediate
'alter session set optimizer_mode=all_rows';
for x in (select *
from t look_for_me where x = l_x_number) loop null; end loop;
for x in (select *
from t look_for_me where x = l_x_string) loop null; end loop;
execute immediate
'alter session set optimizer_mode=first_rows';
for x in (select *
from t look_for_me where x = l_x_number) loop null; end loop;
for x in (select *
from t look_for_me where x = l_x_string) loop null; end loop;
end;
/
select sql_id, child_number,
sql_text
from v$sql
where upper(sql_text) like
'SELECT % FROM T%'
ORDER BY 2;
select * from
table(DBMS_XPLAN.DISPLAY_CURSOR('&cursor_id', 0) );
-- The 0 is the child_number of the query
Query Plan
-----------------------------------------
SELECT STATEMENT [CHOOSE]
Cost=1234
TABLE ACCESS FULL TPAIS [:Q65001]
[ANALYZED]
The rightmost uppermost operation of an explain plan is the first thing that the explain plan will execute. In this case TABLE ACCESS FULL TPAIS is the first operation. This statement means we are doing a full table scan of table TPAIS When this operation completes then the resultant row source is passed up to the next level of the query for processing. In this case it is the SELECT STATEMENT, which is the top of the query.
[CHOOSE] is an indication of the
optimizer_goal
for the query.
This DOES NOT necessarily indicates that plan has actually used this
goal. The
only way to confirm this is to check the cost= part of the explain plan
as
well. For example the following query indicates that the CBO has been
used
because there is a cost in the cost field:
SELECT STATEMENT [CHOOSE]
Cost=1234
However the explain plan below indicates the use
of the RBO
because the cost field is blank:
SELECT STATEMENT [CHOOSE] Cost=
The cost field is a comparative cost that is used internally to determine the best cost for particular plans. The costs of different statements are not really directly comparable.
[:Q65001]
indicates that this particular part of
the query is being executed in parallel. This number indicates that the
operation will be processed by a parallel query slave as opposed to
being executed serially.
[ANALYZED] indicates that the object in question has been analyzed and there are currently statistics available for the CBO to use. There is no indication of the 'level' of analysis done.
1- FULL TABLE SCAN (FTS) -
Read every row in the table, every block up
to the high water mark. The HWM marks the last block in the table that
has ever had data
written
to it. If you have deleted all the rows then you will still read up to
the HWM.
Truncate is the only way to reset the HWM back to the start of the
table. Buffers from FTS operations are placed on the
Least
Recently Used (LRU) end of the buffer cache so will be quickly aged
out. FTS is
not recommended for large tables unless you are reading >5-10% of it
(or so)
or you intends to run in parallel. Oracle uses multiblock reads where
it can.
2- CLUSTER - Access via an index cluster.
3- HASH - A hash key is issued to access one or more
rows in a table with a matching hash value.
4- BY ROWID - This is the quickest access
method available. Oracle simply
retrieves
the block specified and extracts the rows it is interested in. Access
by rowid
:
SQL>
explain plan for select * from dept where rowid =
':x';
Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID DEPT
[ANALYZED]
Another example where the table is accessed by rowid following index lookup:
SQL> explain plan for select empno,ename from emp where
empno=10;
Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE]
Cost=1
TABLE ACCESS BY ROWID EMP
[ANALYZED]
INDEX UNIQUE SCAN EMP_I1
5- INDEX LOOKUP -
The
data is accessed by looking up key values in an
index and
returning rowids. A rowid uniquely identifies an individual row in a
particular
data block. This block is read via single block I/O. In this example an
index
is used to find the relevant row(s) and then the table is accessed to
lookup
the ename column (which is not included in the index):
SQL> explain plan for select empno,ename
from emp where
empno=10;
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP
[ANALYZED]
INDEX UNIQUE SCAN EMP_I1
Note the 'TABLE ACCESS BY ROWID' section. This
indicates
that the table data is not being accessed via a FTS operation but
rather by a
rowid lookup. In this case looking up values in the index first has
produced
the rowid. The index is being accessed by an 'INDEX UNIQUE SCAN'
operation. This
is
explained below. The index name in this case is EMP_I1. If all the
required
data resides in the index then a table lookup may be unnecessary and
all you
will see is an index access with no table access.
In the next example all the columns (empno) are in the index. Notice that no table access takes place:
SQL> explain
plan for select empno from emp where empno=10;
Query Plan
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX UNIQUE SCAN EMP_I1
Indexes are presorted so sorting may be
unnecessary if the
sort order required is the same as the index. In the next example the
index is sorted so the rows will
be
returned in the order of the index hence a sort is unnecessary.
SQL> explain plan for select
empno,ename from emp where empno > 7876 order by empno;
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP
[ANALYZED]
In the next example we will forcing a full table scan. Because we have
forced a FTS the data is unsorted
and we
must sort the data after it has been retrieved.
SQL> explain plan for select /*+ Full(emp) */
empno,ename from emp where
empno> 7876 order by
empno;
Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=9
SORT ORDER
BY
TABLE ACCESS FULL EMP [ANALYZED]
Cost=1 Card=2 Bytes=66
1. Index unique scan
This is a method for looking up a single key value via a unique
index.
Always
returns a single value. Example:
SQL>
explain plan for select
empno,ename from emp where
empno=10;
------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP
[ANALYZED]
INDEX UNIQUE SCAN EMP_I1
2. Index range scan
This is a method for accessing multiple column values. You must supply
AT LEAST
the
leading column of the index to access data via the index. Can be used
for range
operations (e.g. > < <> >= <= between). e.g.
SQL>
explain plan for select
empno,ename from emp where empno > 7876 order by
empno;
Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
TABLE ACCESS BY ROWID EMP
[ANALYZED]
A non-unique index may return multiple values for the predicate col1 = 5 and will use an index range scan
SQL>
explain plan for select mgr
from emp where mgr = 5;
Query plan
--------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX RANGE SCAN EMP_I2
[ANALYZED]
3. Index Full Scan
In certain circumstances it is possible for the whole index to
be
scanned as opposed to a range scan (i.e. where no constraining
predicates
are provided for a table). Oracle chooses
an index Full Scan when you have statistics that indicate that it is
going to be
more efficient than a Full table scan and a sort. For example Oracle
may do
a Full
index scan when we do an unbounded scan of an index and want the data
to be
ordered in the index order. The optimizer may decide that selecting all
the
information from the index and not sorting is more efficient than doing
a FTS
or a Fast Full Index Scan and then sorting.
An Index full scan
will perform single block i/o's and so it
may prove to be inefficient. e.g. Index BE_IX is a concatenated index
on big_emp (empno,ename)
SQL>
explain plan for select
empno,ename from big_emp
order by empno,ename;
Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=26
INDEX FULL SCAN BE_IX [ANALYZED]
4. Index Fast Full Scan (not very used)
Scans all the block in the index. Rows are not returned in
sorted order. Note that INDEX FAST FULL SCAN is the mechanism behind
fast index
create and
recreate. E.g.
Index BE_IX is a concatenated index on big_emp (empno, ename)
SQL>
explain plan for select
empno,ename from big_emp;
Query Plan
------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=1
INDEX FAST FULL SCAN BE_IX
[ANALYZED]
There are three kinds of join conditions: nested
loops, merge joins, and hash joins. Each has specific performance
implications, and each should be used in different circumstances.
a. Nested loops work from one
table (preferably the smaller of the two),
looking up the join criteria in the larger table. It’s helpful if the
join column is indexed from the larger table. Nested loops are useful
when joining a smaller table to a larger table and performs very well
on smaller amounts of data. Nesting is when you perform the same
operation for every element in a data set: For each row in A do B
b. Hash joins read the smaller
tables into a hash table in memory so the
referenced records can be quickly accessed by the hash key. Hash joins
are great in data warehouse scenarios where several smaller tables
(with referential integrity defined) are being referenced in the same
SQL query as a single larger or very large table. The hash join has ab
initial overhead (of creating the hash tables) but
performs rather well no matter how many rows are involved.
c. Sort Merge or Merge joins
work by selecting the result set from each table,
and then merging these two (or more) results together. Merge
joins are useful when joining two relatively large tables of about the
same size together, the merge join starts out with more overhead but
remains rather consistent.
a. NESTED LOOPS JOIN - Nested Loops Joins
are
the most common and straightforward type of nesting in Oracle. When
joining two tables, for each row in one table Oracle looks up the
matching rows in the other table.Take the example of 2 tables joined as
follows:
Select *
From Table1 T1, Table2 T2
Where T1.Table1_Id = T2.Table1_id;
In the case of the Nested Loop Join, the rows will be accessed with an
outer table being chosen (say Table1 in this case) and for each row
in
the outer table, the inner table (Table2) will be accessed with
an
index to retrieve the matching rows. Once all matching rows from Table2
are found, then the next row on Table1 is retrieved and the matching to
Table2 is performed again
It's
important that efficient index access is used on the inner table
(Table2 in this example) or that the inner table be a very small table.
This is critical to prevent table scans from being performed on the
inner table for each row of the outer table that was retrieved.
Optimizer uses nested loop when we are joining tables containing small
number of rows with an efficient driving condition. It is the most
common join performed by
transactional (OLTP) systems
OUTER - A nested loops operation to perform
an outer
join statement.
Note: You will see more use of
nested loop when
using FIRST_ROWS optimizer mode as it works on model of showing
instantaneous results to user as they are fetched. There is no need for
selecting caching any data before it is returned to user. In case of
hash join it is needed and is explained below.
b. HASH JOIN - An operation that joins
two sets of rows
and returns the same result.
-ANTI - A hash anti-join.
-SEMI - A hash semi-join.
Hash joins are used when we are joining large
tables. The
optimizer uses the smaller of the 2 tables to build a hash table in
memory
and the scans the large tables and compares the hash value (of rows
from large table) with this hash table to find the joined rows.
The algorithm of hash join is divided in two parts
Unlike nested loop, the output of hash join result
is not instantaneous as hash joining is blocked on building up hash
table.
The Hash Join is is a very efficient join when used in the right
situation. With the hash join, one Table is chosen as the Outer table.
This is the larger of the two tables in the Join - and the other is
chosen as the Inner Table. Both tables are broken into sections and the
inner Tables join columns are stored in memory (if hash_area_size is
large enough) and 'hashed'. This hashing provides an algorithmic
pointer that makes data access very efficient. Oracle attempts to keep
the inner table in memory since it will be 'scanned' many times. The
Outer rows that match the query predicates are then selected and for
each Outer table row chosen, hashing is performed on the key and the
hash value is used to quickly find the matching row in the Inner Table.
This join can often outperform a Sort Merge join, particularly when 1
table is much larger than another. No sorting is performed and index
access can be avoided since the hash algorithm is used to locate the
block where the inner row is stored. Hash-joins are also only used for
equi-joins. Other important init.ora parms are: hash_join_enabled,
sort_area_size and hash_multiblock_io_count.
Note: You may see more hash joins used with ALL_ROWS optimizer mode, because it works on model of showing results after all the rows of at least one of the tables are hashed in hash table.
c. SORT MERGE JOIN or MERGE JOIN or Merge Scan
- An operation that accepts two sets of
rows, each sorted by a specific value, combines each row from one set
with the
matching rows from the other. Take an example of 2
tables being joined and returning a large number of rows (say,
thousands) as follows:
Select *
From Table1 T1, Table2 T2
Where T1.Table1_Id = T2.Table1_id;
The Merge Scan join will be chosen because the database has detected
that a large number of rows need to be processed and it may also notice
that index access to the rows are not efficient since the data is not
clustered (ordered) efficiently for this join. The steps followed to
perform this type of join are as follows:
1) Pick an inner and outer table
2) Access the inner table, choose the rows that match the predicates in
the Where clause of the SQL statement.
3) Sort the rows retrieved from the inner table by the joining columns
and store these as a Temporary table. This step may not be performed if
data is ordered by the keys and efficient index access can be
performed.
4) The outer table may also need to be sorted by the joining columns so
that both tables to be joined are sorted in the same manner. This step
is also optional and dependent on whether the outer table is already
well ordered by the keys and whether efficient index access can be
used.
5) Read both outer and inner tables (these may be the sorted temporary
tables created in previous steps), choosing rows that match the join
criteria. This operation is very quick since both tables are sorted in
the same manner and Database Prefetch can be used.
6) Optionally sort the data one more time if a Sort was performed (e.g.
an 'Order By' clause) using columns that are not the same as were used
to perform the join.
The Merge Join can be deceivingly fast due to database multi-block
fetch (helped by initialization parameter
db_file_multiblock_read_count) capabilities and the fact that each
table is accessed only one time each. These are only used for
equi-joins. The other init.ora parm that can be tuned to help
performance is sort_area_size.
OUTER - A merge join operation to perform an outer
join statement.
-ANTI - A merge anti-join.
-SEMI - A merge semi-join.
Important point to understand is, unlike nested loop where driven (inner) table is read as many number of times as the input from outer table, in sort merge join each of the tables involved are accessed at most once. So they prove to be better than nested loop when the data set is large.
When optimizer uses Sort merge join?
a) When the join condition is an inequality condition (like <, <=, >=). This is because hash join cannot be used for inequality conditions and if the data set is large, nested loop is definitely not an option.
b) If
sorting is anyways required due to some other attribute (other than
join) like “order by”, optimizer prefers sort merge join over hash join
as it is cheaper.
Note: Sort merge join can be seen with both ALL_ROWS and FIRST_ROWS optimizer hint because it works on a model of first sorting both the data sources and then start returning the results. So if the data set is large and you have FIRST_ROWS as optimizer goal, optimizer may prefer sort merge join over nested loop because of large data. And if you have ALL_ROWS as optimizer goal and if any inequality condition is used the SQL, optimizer may use sort-merge join over hash join
a. Sorts
There are a number of different operations that promote sorts
Sorts are expensive operations especially on large
tables
where the rows do not fit in memory and spill to disk. By default sort
blocks
are placed into the buffer cache. This may result in aging out of other
blocks
that may be reread by other processes.
b. Filter
Has a number of different meanings used to indicate partition
elimination may also indicate an actual filter step where one row
source is
filtering another functions such as min may introduce filter steps into
query
plans.In the next example there are 2 filter steps. The first is
effectively like
a NL
except that it stops when it gets something that it doesn't like (i.e.
a
bounded NL). This is there because of the not in. The second is
filtering out
the min value:
SQL> explain
plan for select * from emp where empno not in (select
min(empno) from big_emp group by empno);
Query Plan
------------------
SELECT STATEMENT [CHOOSE] Cost=1
FILTER
**** This is like a bounded nested loops
TABLE ACCESS FULL EMP [ANALYZED]
FILTER
**** This filter is introduced by the min
SORT GROUP BY NOSORT
INDEX FULL SCAN BE_IX
This example is also interesting in that it has a
NOSORT
function. The group by does not need to sort because the index row
source is
already pre sorted.
c. Views
When a view cannot be merged into the main query you will often see a
projection view operation. This indicates that the 'view' will be
selected from
directly as opposed to being broken down into joins on the base tables.
A
number of constructs make a view non mergeable. Inline views are also
non
mergeable.
In the following example the select contains an inline view that cannot
be
merged:
SQL>
explain plan for select ename,tot from emp, (select empno,sum(empno)
tot from
big_emp group by empno) tmp where emp.empno = tmp.empno;
Query Plan
------------------------
SELECT STATEMENT [CHOOSE]
HASH JOIN
TABLE ACCESS FULL EMP [ANALYZED]
VIEW
SORT GROUP BY
INDEX FULL SCAN BE_IX
In this case the inline view tmp that contains an aggregate function cannot be merged into the main query. The explain plan shows this as a view step