I’ve often wished that the Oracle database would provide a method to retain in memory the result set from a complex query that contains what I like to call reference information. These are data that hardly ever change, but must still be read and used across multiple applications - for example, a list of all country codes and their corresponding names for lookup when processing addresses for new international customers, or a list of all ZIP Codes in the Midwestern US.
Oracle Database 11g fills this gap with three new structures called result caches, and each structure has a different purpose:
By retaining result sets in these in-memory caches, the results are immediately available for reuse by any user session. For user sessions that connect to the database through an application server, the OCI client cache permits those sessions to simply share the results that are already cached on the application server without having to reissue a query directly against the database. These result caches therefore hold great promise for eliminating unnecessary “round trips” to the database server to collect relatively static reference data that still needs to be shared across many application servers or user sessions – a potentially immense improvement in overall database throughput.
If you’ve already experienced the advice for SQL performance improvements that Oracle Database 10g’s SQL Tuning Advisor and SQL Access Advisor provide, you’ll be pleasantly surprised with Oracle 11g’s enhanced SQL tuning capabilities. Here’s a brief sample:
As a DBA, one of the most bedeviling problems that I’ve regularly faced is to be able to predict accurately how the next set of changes to the database’s application code, database patch set, or hardware configuration will affect that database’s performance. That usually meant purchasing a relatively expensive third-party package (e.g. Mercury Interactive’s LoadRunner) to generate a sample workload against the database using the next version of the application code, and then comparing the results against baseline performance for the current application code version.
Fortunately, Oracle Database 11g has come to the rescue with two new utilities that offer monumental strides forward in system testing:
Oracle Database 10g introduced an impressive plethora of database performance advisors like the Segment Advisor, the Undo Advisor, the SQL Access Advisor, the SQL Tuning Advisor, the MTTR Advisor, and the ultimate expert system for tuning database performance: the Automatic Database Diagnostic Monitor (ADDM).
Oracle Database 11g expands this advisory framework with several new Database Repair Advisors. The chief goals of these new Advisors are to locate root causes of a failure, identify and present options for repairing these root causes, and even correct the identified problems with self-healing mechanisms. Oracle Database 11g also adds a series of improved fault diagnostics to make it extremely easy for even an inexperienced DBA to detect and quickly resolve problems with Oracle 11g databases. Here are the highlights of these new features:
Oracle Database 10g dramatically expanded database recoverability with the ability to perform an incomplete recovery of the database with Flashback Database. Oracle Database 10g also provided four new logical database recovery features: Flashback Table, Flashback Drop, Flashback Version Query, and Flashback Transaction Query. Oracle Database 11g expands this arsenal of recovery tools with two new Flashback features:
Oracle Database 11g provides a series of brand-new methods for storing large binary objects (also known as LOBs) inside the database. These new features, collectively called SecureFiles, will allow Oracle Database 11g to store images, extremely large text objects, and the more advanced datatypes introduced in prior Oracle releases (e.g. XMLType, Spatial, and medical imaging objects that utilize the DICOM (Digital Imaging and Communications In Medicine) format).
SecureFiles promises to offer performance that compares favorably with file system storage of these object types, as well as the ability to transparently compress and “deduplicate” these data. (Deduplication is yet another brand-new feature in Oracle Database 11g. It can detect identical LOB data in the same LOB column that’s referenced in two or more rows, and then stores just one copy of that data, thus reducing the amount of space required to store these LOBs.) Perhaps most importantly, Oracle Database 11g will also insure that these data can be encrypted using Transparent Data Encryption (TDE) methods – especially important (and welcome!) in the current security-conscious environments we inhabit today as database administrators.
Oracle Database 10gR2 dramatically improved the options for encrypting sensitive data both within Oracle database tables and indexes, as well as outside the database (i.e. RMAN backups and DataPump export files) with Transparent Data Encryption (TDE). Oracle Database 11g continues to expand the use of TDE within the database. For example, it’s now possible to encrypt data at the tablespace level as well as the table and index level. Also, logical standby databases can utilize TDE to protect data that’s been transferred from its corresponding primary standby database site. Moreover, secured storage of the TDE master encryption key is insured by allowing it to be stored externally from the database server in a separate Hardware Security Module.
Secure By Default. Oracle Database 11g also implements a new set of out-of-the-box security enhancements that are collectively called Secure By Default. These security settings can be enabled during database creation via the Database Configuration Assistant (DBCA), or they can be enabled later after the database has been created. Here’s a sample of these new security features:
Oracle Database 10g made a few important improvements to partitioned tables and indexes (e.g. hash-partitioned global indexes), but Oracle Database 11g dramatically expands the scope of partitioning with several new composite partitioning options: Range Within Range, List Within Range, List Within Hash, and List Within List. And that’s not all:
Oracle Database 10g introduced Automatic Storage Management (ASM) that at its essence is a file system specifically developed for Oracle database files. (See my article series on ASM for more details.) Oracle Database 11g expands the reach of ASM with several new features, including:
Last but most certainly not least, Oracle Database 11g adds plenty of enhancements to its flagship high-availability solution for site survivability, DataGuard:
Oracle Database 11g Release 1 continues to improve upon the massive paradigm shift in Oracle Database 10g toward self-managed, self-tuning, and self-healing databases. Though I believe I’ve thoroughly reviewed what I consider to be the major features of this massive new release, I’m sure there are areas I’ve yet to probe. These automatic database management features will be especially valuable to IT organizations that continue to struggle with ever-larger databases and ever-increasing computing workloads while attempting to answer the demand for lowered costs and value-added service.
Even though I’ve hopefully provided enough technical information in this article to encourage you to explore with these features, I also strongly suggest that you first review the corresponding detailed Oracle documentation before proceeding with any experiments. Actual implementation of these features should commence only after a crystal-clear understanding exists. Please note that I’ve drawn upon the following Oracle Database 11gR1 documentation for the deeper technical details of this article:
B28279-01 Oracle Database 11gR1 New Features
B28286-01 Oracle Database 11gR1 SQL Language Reference
B28294-01 Oracle Database 11gR1 DataGuard Concepts and Administration
B28310-01 Oracle Database 11gR1 Administrator’s Guide
B28318-01 Oracle Database 11gR1 Concepts
B28320-01 Oracle Database 11gR1 Reference
B31107-01 Oracle Database 11gR1 Storage Administrator’s Guide
B32024-01 Oracle Database 11gR1 VLDB and Partitioning Guide
As promised I will continue with my series of posts about top 11g new features.
I will continue with another Change Management functionality
Oracle database 11g ships with. It is called Replay
Database.
This is one of the very big highlights of Oracle database 11g. And this
is what many prodution DBAs have been hoping for since a long time.
Any change to a database system, such as an upgrade or changes to the configuration like new hardware or changed parameters, can have dramatic impacts on the performance of the application. It is necessary to test and validate these changed conditions under real life conditions in order to get around possible pitfalls before applying such changes to prodution. This is often a not easy to manage challenge for a DBA.
Oracle database 11g gives us the possiblity to capture a workload of a production system and run it in a testing environment. With this we can identify possible negative impacts of changes and analyze them under real conditions by comparing performance benchmarks for workload periodes in different systems.
How does it work?
On the prodution system we record all client activities for a typical
workload periode with a new built in struture in the RDBMS. In fact it
is a new built in package named DBMS_WORKLOAD_CAPTURE
which is used . The captured worklaod consists of binary
files which
hold metadata information for the workload and as well as replay files
which are stored in a special directory. It is possible to use filters
for certain session which should not be captured. The recording user as
well as the replaying user needs at least SYSOPER privileges because it
is necessary to restart the database before starting the capture
process.
After capturing the workload data we need a testing system which should be axactely the same state as our prodution system before the capture process. Now we start one or multiply replay clients (a binary called wrc) to run the pre processed workload against the testing system. Internally another new built in package named DBMS_WORKLOAD _REPLAY is use for this.
After the replay phase we can use analytic tools like ADDM to analyze the workload performance and compare it against teh original wokload performance.
There are a number of new DBA_WORKLOAD_* as well as V$WORKLOAD_* views that can be used to manage and manage the different steps (capture and replay).
Also Oracle Enterprise Manager comes with a very handy interface for the capture process as well as for the replay and analyze steps.
Some good scenarios for database replay could be:
- testing a migration from single instance to RAC
- reproducing of bugs for debugging
Oracle 11G : OLTP Table Compression : Advanced Compression Option
Oracle 11G introduces a new option to support compression of various structured, unstructured and other backup data. This option named Oracle Advanced Compression option provides multiple features including OLTP table compression ( this feature extends the basic table compression that was available since 9i ) , Secure File Deduplication, RMAN Compression and Data Pump Compression. Oracle will continue to provide the basic Table compression in the Enterprise Edition of its product, but customers will have to license the new features along with Oracle 11G to get OLTP Table Compression and others.
Introduced in Oracle 9i, Table compression quickly became a favourite feature of the database administrators. It was particularly useful in Data Warehouse environments where bulk loads were used in abundance and the compression feature allowed reduction in overall storage requirements.
In Oracle 11g, OLTP Table compression is introduced and this feature allows data to be compressed during all DML operations. Oracle 11g claims to use a unique algorithm that eliminates duplicate values within a single block even across multiple columns. Similar to Oracle 9i, each block contains a symbol table that maintains compression metadata and this allows better performance compared to the global symbol table based algorithms.
Wouldnt using a compression algorithm during OLTP introduce unnecessary overhead ?. I would guess so, but Oracle 11g claims that their threshold based compression, which only compresses the data within a block once a threshold is reached, eliminates the overhead to compress the data everytime its written. So, if the threshold was 35%, the contents of the block are compressed the first time the block reaches the 35% utilized state, again when it reaches the 70% threshold and once when its full to make sure that the maximum compression is achieved.
This process completes until the Oracle 11g database determines that no
further benefits can be achieved through compression. Lets
see how
well it lives us to the promises made by oracle around this feature.
Oracle 11G : SQL Performance Analyzer : Real Application Testing
Database Administrators spend countless hours troubleshooting SQL statements that have taken upon a new life after a simple system or environment change. Oracle 11G SQL Performance Analyzer provides a solution to this never ending regressive statement issues. This new feature of Oracle 11G allows the database administrator to get a granular view of the impact to sql execution plans and statistics due to environment changes and the best part is that it completely automates the impact analysis process of extremely large SQL workloads
Capture the SQL Workload on the Oracle 11G production system as a SQL Tuning set is accomplished from several sources including the cursor cache, AWR repository and then this SQL Tuning Set also referred to as an STS is transferred to a test system where the analysis will take place.
Baselining the oracle 11g database performance through measurement of the workload is the critical next step. This is accomplished by executing the SQL Performance Analyzer on the SQL tuning set that was captured from production in the last step
Database Administrator can now make the required changes including upgrades, schema changes, patches or optimizer statistics refresh that would normally be a risky step due to the unknown performance of the workload post upgrade.
Re execute the workload after the change by executing SQL Performance Analyzer on the same SQL tuning set that was used to baseline the performance in step 2. SPA can now be used to compare performance of the two runs of the tuning set to identify the statements that have either improved, refressed or unchanged.
The advantage of this tool, compared to the home grown approaches that
dba's have evolved over the years, is that it takes the number of
executions into consideration while measuring the impact. Oracle
11G
SQL Performance Analyzer takes all these factors into account while
predicting the overall performance improvement and regressions and
allows the oracle 11g database administrator to fix the SQL or fix the
regression through SQL Plan Baselines.
Oracle 11G : Database Replay : Real Application Testing
Oracle 11G introduces multiple new features around Real Application testing. This feature allows the database administrator to take control of the simulation of workload to test the impact of infrastructure and database level changes through the recreation of actual workloads. Enterprises today face a daunting task and cost of retesting their entire application for every infrastructure or database change thats rolled out in the Organization. This feature is particularly helpful during OS, Oracle patch rollouts, conversion to RAC, movement to low cost grid computing hardware etc.
Oracle 11G Database Replay allows the database administrator to capture the full production workload and then recreate the same in the test environment.
Workload Capture when enabled tracks and captures all external database requests and writes the data into a binary file called capture file.
Workload Processing is the step where the capture files are converted into replay files. Replay files contain all the necessary metadata needed for replaying the workload. This is the most resource intensive task and should be performed on a non-production server. Once the replay files are created, they can be reused any number of times without having to reprocess the workload.
Workload Replay is performed through the replay client which processes the replay files and submits the load / calls to the database with the exact timing and concurrency that it has recorded in the original system. Depending on the workload being simulated, you may have to use multiple replay clients and a calibration tool is provided to help determine the replay clients needed to replay the workload.
Workload Analysis and Reporting allows you to generate detailed analysis of the capture and replay. All errors encountered during replay are reported and since the workload is replayed including DML and SQL queries, the test system should be a snapshot of production just before the workload is captured. Any divergence in rows returned by DML or queries is shows and the basic comparisons between the capture and replay is shown.
This is a preview from the forthcoming book “Oracle 11g New Features”, by Rampant TechPress, featuring John Garmany, plus Oracle ACE’s Steve Karam, and Lutz Hartmann.
The declarative nature of the SQL syntax has always made it difficult to perform SQL tuning. The basic tenet of cost-based SQL optimization is that the person who writes a SQL query simply "declares" what columns they want to see (the SELECT clause), the tables where the columns reside (the FROM clause), and the filtering conditions (the WHERE clause). It's up to the SQL optimizer to always determine the optimal execution plan. This is a formidable challenge, especially in a dynamic environment, which is why Oracle introduced the 10g new feature of CBO dynamic sampling.
Oracle tuning consultants have know for many years that the best way to tune an Oracle system is to take a top-down approach, finding the optimal configuration for external factors (i.e. OS kernel settings, disk I/O subsystem), and determining the best overall setting for the Oracle instance (i.e. init.ora parameters).
Holistic tuning involves tuning a representative workload, adjusting global parameters in order to optimize as much SQL as possible. Only then, is it prudent to start tuning individual SQL statements. Many Oracle professional who adopt a bottom-up approach (tune the SQL first), find all of their hard-work un-done when a change is made to a global setting, such as one of the SQL optimizer parameters or recomputing optimizer statistics. Oracle’s holistic SQL tuning approach is new, and many Oracle professionals find it difficult to embrace, but this is about to change. The Oracle 11g SQL Performance Analyzer (SPA), is primarily designed to speed up the holistic SQL tuning process, automating much of the tedium.
Once you create a workload (called a SQL Tuning Set, or STS), Oracle will repeatedly execute the workload, using sophisticated predictive models (using a regression testing approach) to accurately identify the salient changes to execution plans, based on your environmental changes.
Using SPA, we can predict the impact of system changes on a workload, and we can forecast changes in response times for SQL after making any change, like parameter changes, schema changes, hardware changes, OS changes, or Oracle upgrades; any change that influence SQL plans is a good candidate for SPA.
Oracle had made a commitment to Decision Support Systems (DSS) Technology starting in Oracle 9i when they started to publish “advisory” utilities, the result of monitoring the Oracle instance and coming up with estimated benefits for making a change to the database configuration. In the world of applied artificial intelligence, an expert system (e.g. AMM, ASM) solves a well-structured problem for the DBA, while a decision support system solves a semi-structured problem with the DBA, who supplies the human intuition required to solve a complex problem.
Oracle has made a commitment to distinguishing themselves in the database marketplace, and this is one of the major reasons that they command a major market share. One of the most exciting areas of Oracle technology is automation, especially the self-management features. Oracle has now automated many critical components, including memory advisors (AMM), automated storage management (ASM), and Oracle is now working to enhance more intelligent utilities including ADDM, the Automated Database Diagnostic Monitor, and the brand new 11g SQL Performance Advisory (SPA).
The Oracle 11g SPA functions as a DSS, helping the DBA by automating the well-structured components of a complex tuning task, such as hypothesis testing. In SPA, the DBA defines a representative workload and then tests this workload empirically, running the actual queries against the database and collecting performance metrics. SPA allows the DBA to obtain real-world performance results for several types of environmental changes:
Instead of using theory and mathematical calculations, Oracle SPA tests the SQL Tuning Set (STS) workload in a real-world environment, running the workload repeatedly while using heuristic methods to tally the optimal execution plan for the SQL. The DBA can then review the changes to execution plans and tune the SQL (using the SQLTuning Advisor) to lock-in the execution plans using SQL profiles.
Let’s take a closer look at SPA and see how holistic SQL tuning can remove the tedium of tuning SQL statements.
The Oracle 11g SQL Performance Analyzer is a step in the direction of fully automated SQL tuning, allowing the database administrator to create a STS “workload”, a unified set of SQL which comes from either the cursor cache (Shared Pool) or from the AWR (the Automated Workload Repository). The DBA can use exception thresholds to select the SQL for each STS, based on execution criteria such as disk reads, consistent gets, executions, etc. Once the DBA has chosen their STS, SPA allows them to run the workload while changing Oracle environmental factors, namely the CBO release level, init.ora parameters and customized hypothesis testing using the guided workflow option.
The central question becomes which Oracle initialization parameters would be the most appropriate within the SQL performance analyzer? Because the SPA is used to measure changes in SQL execution plans, it only makes sense that we would want to choose those Oracle parameters which will influence the behavior of the Oracle optimizer.
These would include the basic Oracle optimizer parameters (including optimizer_index_cost_adj, optimizer_mode, optimizer_index_caching), as well as other important initialization parameters. We also have non-optimizer parameters which effect SQL execution plan decisions.
Of course, we can change any parameters we like. Let’s now see how the SPA captures changes in SQL execution plans.
Until the advent of the Oracle 10g intelligent SQL tuning advisors (The SQLAccess advisor and SQLTuning Advisor), SQL tuning was a time-consuming and tedious task. That all started to change in Oracle 10g, and it's even more exciting in Oracle 11g, where Oracle has promised "fully automated" SQL tuning, via the new SQL Performance Analyzer and improvements in the SQL advisories.
The Oracle 10g
automatic tuning advisor allowed us to implement tuning
suggestions in the form of SQL profiles that will improve
performance. Now with Oracle11g, the DBA can tell Oracle to
automatically apply SQL profiles for statements whenever the suggested
profile give 3-times better performance that the existing statement.
These performance comparisons are done by a new 11g administrative task
that is executed during a user-specified maintenance window.
In a nutshell, the 11g fully automated SQL tuning works like this:
1 - Define the SQL workload - The DBA defines a "set" of problematic SQL statements using exception thresholds (e.g. all SQL with > 100,000 disk reads), select from the cursor cache or the AWR. This is called the SQL Tuning set, or STS.
2 - Set-up a changed environment - Here you can chose to change your initialization parms, test your performance against a previous release of the CBO (a very useful features when testing upgrades) or conduct "custom experiments" on the effect of environmental changes on your SQL tuning set.
3 - Schedule & run your tests - The workload is scheduled for execution during "low usage" periods, so that an empirical sample of real-world execution times can be collected and compared, using different execution plans.
4 - Implement the changes – You can flag SQL statements for changes and tune them with the 10g SQLTuning advisor.
Tip: Also related is the Oracle 11g automated SQL tuning Advisor, whereby you can automatically implement changes that cause your SQL to run more than 3x faster. The Oracle 11g automated SQL tuning advisor will implement all execution plan changes via "SQL Profiles", a tool that is conceptually similar to stored outlines, a method to bypasses the generation of execution plans for incoming SQL, replacing it with a pre-tuned access plan. The automatic SQL tuning advisor also recommends restructuring badly-form SQL, and adding missing indexes and materialized views, but these require a manual decision.
Before we examine the nuances of the 11g fully automated SQL tuning features, let's briefly review the goals of SQL tuning.
Holistic tuning in Oracle 11g is a broad-brush approach that can save thousands of hours of tedious manual SQL tuning. By applying global changes, the DBA can tune hundreds of queries at once, and implement them via SQL profiles.
DBA’s who fail to do holistic SQL tuning first (especially those who tune SQL with optimizer directives), may find that subsequent global changes (e.g. optimizer parameter change) may un-tune their SQL By starting with system-level tuning, the DBA can establish an "optimal baseline", before diving into the tuning of individual SQL statements:
Now, Oracle 11g does not have all of the intelligence of a human SQL tuning expert, but the 11g SQL Performance Analyzer (SPA) is a great way to test for the effect of environmental changes to your Oracle environment.
Let's take a closer look at how Oracle has automated the SQL tuning process with SPA.
The SQL performance analyzer allows the DBA to define the SQL Tuning set (the STS), as a source for the test (usually using historical SQL from the AWR tables).
The SPA receives one or more SQL statements as input (via the SPA), and provides advice on which tuning conditions have the best execution plans, gives the proof for the advice, shows an estimated performance benefit, and allegedly has a facility to automatically implement changes that are more than 3x faster than the "before" condition".
The SQL workload (the STS) can be thought of as a container for conducting and analyzing many SQL statements. The STS is fed to the SPA for real-world execution with before-and-after comparisons of changes to holistic "environmental" conditions, specifically CBO levels or changed init.ora parameters.
Internally, the SPA is stored as a database object that contains one or more SQL statements combined with their execution statistics and context such as particular schema, application module name, list of bind variables, etc. The STS also includes a set of basic execution statistics such as CPU and elapsed times, disk reads and buffer gets, number of executions, etc.
When creating a STS, the SQL statements can be filtered by different
patterns such as application module name or execution statistics, such
as high disk reads. Once created, STS can be an input source for the
SQL Tuning Advisor.
Typically, the following steps are used to define the STS using the
dbms_sqltune package. The steps within the new 11g OEM screen
for "guided workflow" are simple and straightforward, and serve as an
online interface to the dbms_sqltune.create_sqlset procedure:
1 –
Options – Choose a name for your SQL tuning set (STS).
This encapsulated SQL workload is created using the dbms_sqltune.create_sqlset
procedure. For example, the following script can be used to create a
STS called SQLSET1:
exec
dbms_sqltune.create_sqlset (‘MYSET1’);
2 – Load methods - Here is where you can choose the source for your SQL workload, and to take historical SQL statements from AWR.
3 – Filter options - You can choose “filtering” conditions, based on your specific tuning needs. For example, if your database is disk I/O bound, you might choose only SQL statements that have more than 100k disk reads.
4 – Schedule – This is an interface to the dbms_scheduler package, allowing you to define and schedule a job.
5 – Review – Here you can see the actual source calls to dbms_sqltune.create_sqlset and the dbms_scheduler.create_job procedure call syntax.
There is an interface to the SQL Performance Analyzer in the enterprise manager in the OEM Advisor Central area, and a number of new to dba_advisor views have been added in 11g which will display information from the SQL Performance Advisor.
The technology behind SPA is encapsulated inside a new package called dbms_sqlpa. Here is an overview for the procedures of the dbms_sqlpa package:
In sum, the new 11g SQL Performance Analyzer is a great way to test for holistic tuning changes. Remember, the savvy Oracle DBA will always adjust their Oracle initialization parameters to optimizer as much of the workload as possible before diving into the tuning of specific SQL statements.
The OEM screen for the SPA “guided workflow” contains a pre-defined set of steps for holistic SQL workload tuning:
1 – Create SQL Performance Analyzer Task, based on SQL Tuning Set
2 – Replay SQL Tuning Set in Initial Environment
3 – Create replay Trial using changed environment
4 – Create Replay trial comparison (using trials from step 2 and step3)
5 – View trial comparison report
Using the guided workflow functionality, we can take our SQL tuning set and execute it twice (before and after), saving the SQL execution results (disk reads, buffer gets) using some of the common SQL execution metrics found in the dba_hist_sqlstat table:
DBA_HIST_SQLSTAT Columns:
FETCHES_TOTAL END_OF_FETCH_COUNT_TOTAL SORTS_TOTAL EXECUTIONS_TOTAL LOADS_TOTAL INVALIDATIONS_TOTAL PARSE_CALLS_TOTAL DISK_READS_TOTAL BUFFER_GETS_TOTAL ROWS_PROCESSED_TOTAL CPU_TIME_TOTAL ELAPSED_TIME_TOTAL
Guided Workflow Items
EXECUTE_ELAPSED_TIME
ELAPSED_TIME
PARSE_TIME
EXECUTE_ELAPSED_TIME
EXECUTE_CPU_TIME
BUFFER_GETS
DISK_READS
DIRECT_WRITES
OPTIMIZER_COST
Here it’s important to note that the guided workflow does not measure these important SQL execution metrics such as sorts and fetches.
The final step in SPA allows the DBA to quickly isolate sub-optimal SQL statement and tune them with the 11g SQLTuning Advisor. When viewing the results, you can use OEM for a visually display of all delta values between the execution run, but most important, you can do a side-by-side comparison of the before-and-after execution plans.
Oracle has always
been ahead of the curve in automating well-structured DBA tasks, and
the SPA is just the latest incarnation in real-world SQL tuning
tools. Tools such as SPA free-up the DBA to pursue other
important DBA tasks, relieving them of the tedium of individually
tuning SQL statements.
One of the exciting new features of Oracle 11g is the new inline data compression utility that promises these benefits:
Up to a 3x reduction in disk storage with a minimal overhead
Faster full-scan and range-scan operations
Smaller tables make solid-state flash disk more affordable, and are 100x faster than platters.
While it is true that data storage prices (disks) have fallen dramatically over the last decade (and continue to fall rapidly), Oracle data compression has far more appealing benefits than simply saving on disk storage cost. Because data itself can be highly compressed, information can be fetched off of the disk devices with less physical IO, which radically improves query performance under certain conditions.
Let's take a closer look at how one would implement Oracle 11g Data Compression in order to achieve the optimal results.
Data compression algorithms (such as the Huffman algorithm), have been around for nearly a century, but only today are they being put to use within mainstream information systems processing.
The legacy mainframe databases such as IDMS and DB2 allowed the DBA to choose any data compression algorithm they desired. One popular database compression program was offered by Clemson University (still a leader in data compression technologies), and their compression programs were very popular in the early 1980's with their CLEMCOMP and CLEMDCOM programs. In IDMS, a user exit in the DMCL allowed for the data compression routine to be invoked "before put" (writes), and "after get" (reads). Internally, all database compression routines try to avoid changing their internal software and rely on user exits to compress the data outbound and decompress the incoming data before it enters the database buffers. In general, data compression techniques follows this sequence:
1 - The database determines that a physical read is desired and issues an I/O request.
2 - Upon receipt of the data block from disk, Oracle uncompresses the data. This happens in RAM, very quickly.
3 - The uncompressed data block is moved into the Oracle buffer.
1 - The database determines that a physical write is desired and issues an I/O request.
2 - The database reads the data block from the buffer and calls a compression routine to quickly uncompresses the data. This happens in RAM, very quickly.
3 - The compressed data block is written to disk.
Of course, database compression has evolved dramatically since it was first introduced in the early 1980's, and Oracle 11g claims to have one of the best database compression utilities ever made. Let's take a closer look.
Oracle introduced a simple index index compression in feature in Oracle 8i, and we now see support for compression of tables and materialized views. Compression indexes with block-level compression is far easier because all index objects are treated the same way, whereas with the 11g table compression, a data block may contain both compressed and uncompressed row data.
For the official details on Oracle 11g table data compression, see the Oracle 11g concepts documentation. Oracle says that their software will perform a native disk read on the data block, only decompressing the row data after the physical I/O has been completed.
Within the data buffers, the fully uncompressed version of the data remains, even though the information remains compressed on the disk data blocks themselves. This leads to a discrepancy between the size of information on the data blocks and the size of the information within the data buffers. Upon applying Oracle data compression, people will find that far more rows will fit on a data block of a given size, but there is still no impact on the data base management system from the point of view of the SGA (system global area).
Because the decompression routine is called upon block fetch, the Oracle data buffers remain largely unchanged while the data blocks themselves tend to have a lot more data on them. This Oracle 11g data compression whitepaper describes the data compression algorithm:
Compressed blocks contain a structure called a symbol table that maintains compression metadata. When a block is compressed, duplicate values are eliminated by first adding a single copy of the duplicate value to the symbol table. Each duplicate value is then replaced by a short reference to the appropriate entry in the symbol table.
Through this innovative design, compressed data is self-contained within the database block as the metadata used to translate compressed data into its original state is contained within the block.
In today's Oracle database
management systems, physical disk I/O remains one of the
foremost bottlenecks. Even at relatively fast speeds of 10
milliseconds, many data intensive Oracle applications can still choke
on I/O by having disk enqueues, Oracle block read tasks waiting to pull
information from the spinning platters. Data compression is
certainly useful for reducing the amount of physical disk I/O but there
are some caveats that need to be followed by the Oracle database
administrator.
It is expected that Oracle data compression will eventually become a default for Oracle systems, much like Oracle implemented the move from dictionary managed tablespaces to locally managed tablespaces. Eventually, this data compression may become ubiquitous, a general part of the Oracle database management engine, but its important for the Oracle database administrator to understand the ramifications of the data compression and have the ability to turn-off compression at-will.
For example, super small servers (read PC's), may not possess enough horsepower to absorb the small (but measurable) overhead of the compress/decompress routines. Remember, there is always a tradeoff between these costs vs. the saving on disk storage and allowing for information to be retrieved with the minimum amount of physical disk I/O.
It's clear that 11g data compression offers these huge benefits, but the exact overhead costs remain unknown:
Up to a 3x disk savings -
Depending on the nature of your data, Oracle compression will result in
huge savings on disk space.
Cheaper solid-state disk - Because
compressed tables reside on fewer disk blocks, shops that might not
otherwise be able to afford Oracle flash disks can now enjoy I/O speeds
up to 300x faster than platter disk.
Faster full scan/range scan operations
- Because tables will reside on less data blocks, full table scans and
index range scans can retrieve the rows with less disk I/O.
Reduced network traffic - Because the data blocks are compressed/decompressed only within Oracle, the external network packets will be significantly smaller.
The internal machinations of Oracle have
always been a closely-guarded secret, Oracle's internal software, their
bread-and-butter "edge" that gives Oracle such a huge competitive
advantage over their competition. Because Oracle withholds many
internal details, we must we must discover the internals of 11g
compression with with real-world observations and conjecture.
First, Oracle hires some of the brightest
software engineers in the world (graduates of prestigious colleges like
MIT), and it's likely that overhead will be minimized by doing the data
compress/uncompress only once, at disk I/O time, and kept in
decompressed form somewhere within the RAM data buffers.
Some unknown issues (as of September 2007) with implementing Oracle11g
data compression include the amount of overhead. The
compress/decompress operations are computationally intensive but super
small (probably measured in microseconds). This CPU overhead
might be significantly measurable, but we can assume that the overhead
will be the same (or smaller) than data compression in legacy
databases (with the possible exception of PC-based Oracle
databases). In a perfect implementation, incoming data would only
be decompressed once (at read time) and the uncompressed copy of the
disk block would reside in RAM, thereby minimizing changes to the
Oracle kernel code. The overhead on DML must involve these operations:
Overhead at DML time - Whenever a SQL
update, insert of delete changes a data block in RAM, Oracle must
determine if the data block should be unlinked from the freelist (this
threshold is defined by the PCTFREE parameter).
Compression on write -
An outbound data block must be compressed to fit into it's tertiary
block size (as defined by db_block_size and the tablespace blocksize
keyword). For example, an uncompressed block in RAM might occupy
up to 96k in RAM and be compressed into it's tertiary disk blocksize of
32k upon a physical disk write.
Decompress on read - At physical read
time, incoming disk blocks must be expanded once and stored in the RAM
data buffer. The exact mechanism for this expansion is not
published in the Oracle11g documentation, but it's most likely a block
versioning scheme similar to the one used for maintaining read
consistency.
Increased RAM consumption?
- If we assume that compressed data blocks are
expanded once and saved, it's a safe bet that there may be increased
RAM demands within the data buffers (db_cache_size, db_nn_cache_size).
This is especially important for objects residing within the KEEP pool,
where the DBA must automate the pools adjustment to keep all objects
fully-cached. Unfortunately, this pool adjustment will be
difficult to automate without wasting RAM. We know how many disk
blocks a table occupies, but we may not know the compression factor,
(unless Oracle keep the compression details buried somewhere inside the
11g data dictionary views or AWR. Click
here for complete details and source code on the 9i and 10g scripts
to automatically adjust the KEEP pool to ensure full
caching.)
Increased likelihood of disk contention - Because the data is tightly compressed on the data blocks, more rows can be stored, thus increasing the possibility of "hot" blocks on disk. Of course, using large data buffers and/or solid-state disk (RAM-SAN) will alleviate this issue.
Remember, physical disk I/O against disk platters has become the major system bottleneck as the speed of processors increase. Until the widespread adoption of RAM disk (solid state disk), we can see this type of data compression being widely used in order to reduce the amount of physical disk I/O against Oracle systems.
According to the Oracle whitepaper on 11g data compression, the CPU overhead for the compress/decompress operations will be minimal. More important, Oracle11g data compression will be a Godsend for shops that are constrained by Federal regulation to archive their audit trails (HIPAA, SOX).
The 11g docs note that the new COMPRESS keyword works for tables, table partitions and entire tablespaces. Oracle has implemented their data compression at the table level, using new keywords within the "create table" DDL:
create table fred (col1 number) NOCOMPRESS;
create table fred (col1 number) COMPRESS FOR DIRECT_LOAD OPERATIONS;
create table fred (col1 number) COMPRESS FOR ALL OPERATIONS;
We also see syntax for creating a compressed tablespace:
CREATE TABLESPACE MYSTUFF . . . DEFAULT{ COMPRESS [ FOR { ALL | DIRECT_LOAD } OPERATIONS ]
| NOCOMPRESS
}
Several DBA views have been enhanced in 11g to show compression attributes. The dba_tables view has added the new columns COMPRESSED (enabled, disabled) and COMPRESSED_FOR (nocompress, compress for direct_load operations, compress for all operations).
While powerful, Oracle11g has made some important fundamental decisions about mixed-mode compression, a feature whereby it is possible for rows within the same table to be either compressed or uncompressed! The expected behaviors for the new compression syntax has a few surprises, features that require knowledge of how Oracle has chosen to implement their data compression utility.
While the "alter table" and "alter tablespace" clauses support changing the compression options, we would expect that Oracle would feel obligated to change all objects to match their new compression attributes. That is not the case, and the 11g compression docs note that a table may have multi-state rows, some compressed and others expanded:
You can alter the compression attribute for a table (or a partition or tablespace), and the change only applies to new data going into that table.
As a result, a single table or partition may contain some compressed blocks and some regular blocks. This guarantees that data size will not increase as a result of compression; in cases where compression could increase the size of a block, it is not applied to that block.
Without implementing this revolutionary "partial" row compression, making a table-wide or tablespace-wide compression change would require a massive update to blocks within the target tablespace. The 11g compression docs note that when changing to/from global compression features, the risk averse DBA would choose to rebuild the table or tablespace from scratch:
Existing data in the database can also be compressed by moving it into compressed form through
ALTERTABLEandMOVEstatements. This operation takes an exclusive lock on the table, and therefore prevents any updates and loads until it completes.If this is not acceptable, the Oracle Database online redefinition utility (the
DBMS_REDEFINITIONPL/SQL package) can be used.
Oracle now has many data compression utilities (the 8i index compression, 10g compression and the new 11g compression), so we must be careful to distinguish between the disparate tools. The 11g data compression is threshold-based and allows Oracle to honor the freelist unlink threshold (PCTFREE) for the compress rows, thereby allowing more rows per data block.
For more details on 11g data compression, see the book "Oracle 11gnew Features" by John Garmany, V. J Jain, with Oracle ACE's Steve Karam and Lutz Hartmann.
Roby Sherman published some negative test results about the 11g data compression utility, suggesting that there may be some performance issues, noting that his performance was "quite horrible". However, this does not match the experience of others, and it serves to underscore the issue that data compression is marginally CPU intensive, and 11g table compression may not be performant on smaller personal computers:
For anyone interested, I ran some very quick benchmarks on 11g's new Advanced Compression table option COMPRESS FOR ALL OPERATIONS that Oracle is claiming was "specifically written to create only the most 'minimal' of performance impacts in OLTP environments.
The results are here:
http://web.mac.com/tikimac/iWeb/silicon/Roby_Sherman/ Oracle_Certifiable/Entries/2007/8/16_11G_TABLE_COMPRESSION_- _Don’t_Believe_the_Hype.html
I guess their definition of minimal and my definition of minimal must be different... Anyway, if you're interested in this feature, feel free to take a quick look!
Other comments of Sherman's 11g compression test included:
Rather than commenting that the advanced compression is "quite horrible" I'd comment that your choice of tests are quite horrible.
I don't consider Roby's test cases are horrible. Anyone who criticizes the hype seems to be criticized. Look at the argument of Roby: "But, then again, if your operations are that minimal, you probably aren't creating enough data to need compression in the first place!"
Robert Freeman, a trainer for Burleson Consulting noted that his results did not show the same degradation and he offers insightful comments about the dangers of using a "negative proof":
I've read this particular post several times. I just have to believe that I'm not getting something here, because ..... I want to be charitable but the point that is being made is just asinine in my opinion. I hope, I really hope, that I've missed something obvious and that I'm the fool (would not be the first time - I freely confess my imperfections).
>> The common nonsense peddled is like: It all depends.
EXCUSE ME????? Common nonsense? The whole scientific method is all about Ceteris paribus. Research is influenced heavily on IT DEPENDS. Drop a rock and a feather on the Earth and then on the moon and tell me the results are not a matter of IT DEPENDS. I must have missed your point, because nobody could be so short sighted as to presuppose that there are no dependencies.
Can you explain negative cases? Sure. I can explain the negative case of the rock falling faster than the feather to the difference in location and criteria of the experiment. I can explain Roby's negative results in numerous ways, including accepting the *possibility* that his results reflect truth, and that compression is a performance killer. Did he provide sufficient evidence to review his results, of course not. How do we know the issue isn't one of the optimizer changing the plan, as opposed to the physical implementation of compression, for example? We don't because no execution plans were provided.
That being said, his results do not mirror mine. Explain that negative case. Oh, is it because my results are on Oracle Beta 5? Or is it that my results are on a faster CPU? Can we always explain negative cases? No. . .
Additionally, I argue that one can never, ever, systematically prove everything 100%. Perhaps to a high degree of confidence, but never for sure. Why? Because the conditions of that result and that analysis can change because the dependencies change. You can not control the entire environment, thus you can not 100% guarantee an outcome, ever. If you have never had the frustrating experience of having two different result sets and not being
able to figure out why they differ, then you are luckier than I (or younger, or you have more time or less experience).
. . .While I have not tested compression in the production code (yet, I'm running the book through production now), when I did my chapter on compression in Beta 5, I found the results to be very different from Roby's. Still, I'm glad to see him testing this stuff and reminding us that not every new feature is a panacea.