Large SGA On Linux

The current shipping version of Oracle is able to use about 1.7GB of address space for its SGA.
There are several ways to allocate more memory than this.
On a 4 GB RAM machine, the size of the SGA (SGA utilizes shared memory) can be increased up to is 2.7 GB. This requires changes in Linux and Oracle.
On a 8 GB RAM machine, the size of the SGA can be increased up to 7 GB by using the shared memory filesystem "shmfs". A maximum size of 5.4 GB of SGA can be created using the "bigpages" feature for System V shared memory where the page size is 4 MB vs. the regular 4 KB.
On a machine that supports Physical Address Extension (PAE), the SGA can theoretically have a size of 62 GB. The PAE mechanism allows addressing using 36 bits on IA-32 systems. But current hardware limitations and practical consideration limit the actual size of the SGA on such systems.

Red Hat Linux Advanced Server has several features and enhancements that don't exist in other Red Hat versions. Among other things, Red Hat AS provides:
- Asynchronous I/O
- Process scheduler with CPU affinity, cache affinity, and per CPU runqueues and locks that provide better performance
- "mapped base" (base address for shared libaries) can be changed dynamically allowing larger sizes for the SGA
- Page frame of size 4 MB as opposed to 4 KB can be used for the SGA which improves performance for large SGAs
- The kernel can also use the "high memory" pool (physical memory above 1 GB) for allocating page table entries (PTE) which allow a higher number of Oracle connections
- Elimination of copy to bounce buffer improves I/O performance


More information
Note 260152.1 - Linux Big SGA, Large Memory, VLM - White Paper
Note 225220.1 - OS Configuration for large SGA
Note 260152.1 - Summary About the Large SGA & Address Space on RH Linux
Note 275318.1 - The Bigpages Feature on Linux
Note 317055.1 - How to Configure RHEL 3.0 32-bit for Very Large Memory and HugePages
Note 317141.1 - How to Configure RHEL 4 32-bit for Very Large Memory with ramfs and HugePages
Note 200266.1 - Increasing Usable Address Space for Oracle on 32-bit Linux
Note 401749.1 - Shell Script to Calculate Values Recommended HugePages / HugeTLB Configuration


The most robust and scalable method to increase the SGA memory requires the use of a shared memory file system (shmfs). The procedure presented in this article assumes you are using RedHat Advanced Server (AS) 2.1 with the enterprise kernel which supports Page Address Extension (PAE). Create a shared memory file system (shmfs)
The shmfs is a memory file system so it can be as large as the maximum allowable virtual memory supported by Red Hat Linux AS2.1, currently 16 GB, although the enterprise kernel theoretically supports up to 64 GB of RAM.

The shmfs is created using the following command as the root user:
mount -t shm shmfs -o size=3g /dev/shm
The shared memory file system can be mounted automatically by adding the following line into /etc/fstab file:
shmfs /dev/shm shm size=3g 0 0
In the above example I've created the shmfs with a size of 3G as that is the size of the buffer cache I am planning to use. The other elements of the SGA are placed in regular memory, not this shared memory file system, so they should not be included when deciding on the size of the shmfs. It is advisable to size this slightly bigger than the actual size needed, but in this example I've used a 3G shmfs for a 3G buffer cache.

Enabling big pages
Big pages are enabled by adding the bigpages=xMB to the relevant kernel entry in the boot loader file /boot/grub/grub.conf file, where "x" is calculated as follows:
(Total SGA size in Gig) x 1024
Then round this value to the nearest hundredth. So for a 4G SGA we would do the following:
4 x 1024 = 4096 = 4100
So the /boot/grub/grub.conf file entry might look like this:
kernel /vmlinuz-2.4.9-e.40enterprise ro root=/dev/cciss/c0d0p2 bigpages=4100MB
With this entry saved the system should be rebooted. Once the system is available you must perform the following command as the root user:
echo 2 > /proc/sys/kernel/shm-use-bigpages
Alternatively you can add the following entry into the /etc/sysctl.conf file so this value persists between reboots:
kernel.shm-use-bigpages = 2

Setting the SHMMAX value
The shmmax value should be set at half the physical memory up to a maximum of 4294967295. For a server with 6G of memory we can set this value to 3G (half physical memory) using the following command as the root user:
echo 3221225472 > /proc/sys/kernel/shmmax
Alternatively it can be set in the /etc/sysctl.conf file with the following entry:
kernel.shmmax = 3221225472
The contents of your /etc/sysctl.conf file may look something like this:
kernel.shmmax = 3221225472
kernel.shmmni = 4096
kernel.shmall = 2097152
kernel.sem = 1000 32000 100 150
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000
kernel.shm-use-bigpages = 2
Alterations to the /etc/sysctl.conf file can be applied without a reboot by issuing the following command as root:
/sbin/sysctl -p
Instance Parameters
Some instance parameter changes are necessary to allow the Oracle instance to use the shared memory file system. The spfile parameters can be manipulated using the ALTER SYSTEM SET command in a running instance, or by modifying the spfile contents offline:
-- Change the parameter value in the spfile directly.
ALTER SYSTEM SET parameter = value SCOPE=spfile

-- Create a pfile with the contents of the current spfile.
CREATE PFILE='/tmp/pfile' FROM SPFILE;

-- Manually manipulate the contents of the pfile.

-- Recreate the spfile from the amended pfile.
CREATE SPFILE FROM PFILE='/tmp/pfile';
The following parameter should be added to the spfile or pfile:
use_indirect_data_buffers=true
Also, any references to db_cache_size and db_xK_cache_size parameters should be removed and replaced with the old style db_block_buffers parameter entry:
# 3Gig for an 8K db_block_size.
db_block_buffers = 393216
This means that the multiple block size feature is not available when using this method. Remember that the buffer cache is only one part of the SGA.

For further information see:















The recommended kernel for Red Hat Enterprise Linux 2.1 is 2.4.9-e.25 or higher. This kernel has several fixes that are relevant to Oracle including fixes for memory problems and kswapd problems.

If the Linux server has <= 4 GB RAM, the kernel "kernel-smp" should be used for SMP machines, or the kernel "kernel" should be used for UP machines. If the Linux server has > 4 GB RAM, the enterprise kernel "kernel-enterprise" should be used for UP and SMP machines.

To check if these kernels are installed, execute e.g. the following command:
rpm -q kernel-smp kernel-enterprise
To check which kernel is currently running, execute the following command:
uname -a
To install e.g. the enterprise kernel, download the "kernel-enterprise" RPM and execute the following command:
rpm -ivh kernel-enterprise-2.4.9-e.25.i686.rpm

To make sure that the right kernel is booted, check the /etc/grub.conf file if you use GRUB, and change the "default" attribute if necessary. Here is an example:
default=1
timeout=10
splashimage=(hd0,1)/boot/grub/splash.xpm.gz
title Red Hat Linux (2.4.9-e.25enterprise)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25enterprise ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25enterprise.img
title Red Hat Linux Advanced Server (2.4.9-e.25smp)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25smp ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25smp.img
title Red Hat Linux Advanced Server-up (2.4.9-e.25)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25 ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25.img
In this example, the "default" attribute is set to "1" which means that the 2.4.9-e.25smp kernel will be booted. If the "default" attribute would be set to "0", then the 2.4.9-e.25enterprise kernel would be booted.

After you installed the new kernel and/or made changes to the /etc/grub.conf file, reboot the server.

Once you are sure you don't need the old kernel anymore, you can remove the old kernel by running:
su - root
rpm -e <OldKernelVersion>
When you remove the kernel, you don't need to make any changes to the /etc/grub.conf file.

NOTE: Be very careful when removing a kernel! Making a mistake could render the server unbootable.



















Increasing Space for larger SGA (2.7 GB) to Fit Into Memory

If the size of SGA does not need to be increased from 1.7 GB to 2.7 GB, then the following steps can be skipped.

By default, the maximum size for SGA is 1.7 GB on a 32-bit system without Physical Address Extension (PAE). You will also be able to allocate 1.7 GB SGA if you have less than 4 GB RAM. In this case you have to make sure you have enough swap space, however, this will have an impact to the performance of the database. I was even able to bring up a database with a SGA size of 2.64 GB on a test PC that had 256 MB RAM.

Theoretically, the SGA can have a size of up to 62 GB on a system that supports Physical Address Extension (PAE). The PAE mechanism allows addressing using 36 bits on IA-32 systems. But current hardware limitations and practical consideration limit the actual size of the SGA on such a system. Since I do not have such a system, I will not cover the steps for creating SGAs larger than 2.7 GB via the
tmpfs filesystem.

To increase the size of the SGA to 2.7 GB without using a shared memory filesystem (tmpfs), the following needs to be done:
  - The base address "mapped base" for Oracle's shared libraries has to be lowered at the Linux OS level.
  - Oracle needs to be relinked with a lower base address for SGA which uses shared memory segments.


Address Mappings on Linux - Shared Memory and Shared Library Mapping on Linux

Normally, the 4 GB linear address space (also known as virtual address space) for a 32-bit Linux system is split into 4 equal sized sections for different purposes:
0GB-1GB  User space   - Used for executable and brk/sbrk allocations (malloc uses brk for small chunks).
1GB-2GB User space - Used for mmaps (shared memory), shared libraries and malloc uses mmap (malloc uses mmap for large chunks).
2GB-3GB User space - Used for stack.
3GB-4GB Kernel Space - Used for the kernel itself.
- The mmaps grow bottom up and the stack grows top down. The unused space used by the one can be used by the other.
- The split between userspace and kernelspace can be changed by setting the kernel parameter PAGE_OFFSET and recompiling the kernel. By default, the PAGE_OFFSET macro yields the value 0xc0000000.
- The split between brk(2) and mmap(2) can be changed by setting the kernel parameter TASK_UNMAPPED_BASE and recompiling the kernel. However, on Red Hat AS this parameter can be changed for individual processes dynamically without reboot or kernel recompilation.

Usually, the portion of address space available for mapping shared libraries and shared memory segments consists of virtual addresses in the range of 0x40000000 (1 GB) - 0xc0000000 (3 GB). On Red Hat AS, 0x40000000 is the default base address for shared libraries and shared memory segments. The default base address for mapping shared memory segments can be changed and overwritten for programs and applications by non-root users. The default base address "mapped base" for loading shared libraries for programs and applications can be changed by the user root only.

The default base address that Oracle uses for SGA (shared memory segment) is 0x50000000 and not 0x40000000. Oracle uses or keeps the space from 0x40000000-0x50000000 for loading Oracle shared libraries. As I mentioned before, 0x40000000 is the default base address on RH AS for loading shared libraries which can only be changed by the user root. Oracle increased the base address for SGA to prevent address range conflicts between the segments (shared memory segment and shared libraries).
If the base address for shared memory segments would be 0x15000000 and if the base address for shared libraries would be 0x40000000, then Oracle cannot create the SGA larger than 0x2b000000 bytes or 688 MB, even though there is address space available above the shared libraries portion. (According to Oracle, Oracle binaries will no longer work if the base address for shared memory segments is lower than the base address shared libraries like in this example. Even though I didn't experience any problems, I would not recommend it).
If the base address for shared memory segments is 0x50000000 and if the base address for shared libraries is 0x40000000, then Oracle can create a SGA that starts at 0x50000000 and ends almost at 0xc0000000; 0xc0000000 is the address where the kernel address space begins. This means that the SGA can have a size of almost 0x70000000 bytes or 1.792 GB - actually it's about 100 MB less due to stack space and other use of memory.

Once again, Oracle increased the default base address for SGA to 0x50000000 so that all shared libraries can be loaded below 0x50000000, and the rest of the space up to almost 0xc0000000 can be used for shared memory.

You can verify the address mappings of Oracle processes by viewing the proc file /proc/<pid>/maps where <pid> stands for the Oracle process ID. The default mapping of an Oracle process might look like this:
08048000-0ab11000 r-xp 00000000 08:09 273078     /ora/product/9.2.0/bin/oracle
0ab11000-0ab99000 rw-p 02ac8000 08:09 273078 /ora/product/9.2.0/bin/oracle
0ab99000-0ad39000 rwxp 00000000 00:00 0
40000000-40016000 r-xp 00000000 08:01 16 /lib/ld-2.2.4.so
40016000-40017000 rw-p 00015000 08:01 16 /lib/ld-2.2.4.so
40017000-40018000 rw-p 00000000 00:00 0
40018000-40019000 r-xp 00000000 08:09 17935 /ora/product/9.2.0/lib/libodmd9.so
40019000-4001a000 rw-p 00000000 08:09 17935 /ora/product/9.2.0/lib/libodmd9.so
4001a000-4001c000 r-xp 00000000 08:09 16066 /ora/product/9.2.0/lib/libskgxp9.so
...
42606000-42607000 rw-p 00009000 08:01 50 /lib/libnss_files-2.2.4.so
50000000-50400000 rw-s 00000000 00:04 163842 /SYSV00000000 (deleted)
51000000-53000000 rw-s 00000000 00:04 196611 /SYSV00000000 (deleted)
53000000-55000000 rw-s 00000000 00:04 229380 /SYSV00000000 (deleted)
...
bfffb000-c0000000 rwxp ffffc000 00:00 0

As this address mapping shows, shared libraries start at base address 0x40000000. The address mapping also shows that Oracle uses the base address 0x50000000 for SGA (in this example System V shared memory for SGA). Here is a summary of all the entries:

The text (code) section is mapped at 0x08048000:
  08048000-0ab11000 r-xp 00000000 08:09 273078     /ora/product/9.2.0/bin/oracle
The data section is mapped at 0x0ab11000:
  0ab11000-0ab99000 rw-p 02ac8000 08:09 273078     /ora/product/9.2.0/bin/oracle
The uninitialized data segment .bss is allocated at 0x0ab99000:
  0ab99000-0ad39000 rwxp 00000000 00:00 0
The base address for shared libraries is 0x40000000:
  40000000-40016000 r-xp 00000000 08:01 16         /lib/ld-2.2.4.so
The base address for SGA (System V shared memory) is 0x50000000:
  50000000-50400000 rw-s 00000000 00:04 163842     /SYSV00000000 (deleted)
The stack is allocated at 0xbfffb000:
  bfffb000-c0000000 rwxp ffffc000 00:00 0

Now it should become clear what needs to be done to provide more space for SGA. To increase the space for SGA, two base addresses need to be changed. The base address "mapped base" for shared libraries needs to be lowered at the Linux OS level, and the base address for SGA (shared memory) needs to be lowered at the Oracle level (application level).

Note: Once the base addresses have been changed at the Linux OS level and at the Oracle level, all Oracle commands need to be executed with a lower "mapped base"! This means that every new shell must run with a lowered "mapped base". Further down I will show you how you can automate this so that every Oracle user gets automatically a shell with a lowered "mapped base".


Changing the Base Address "mapped base" for Shared Libraries at the Linux OS Level

The default base address "mapped base" on RH 2.1AS is TASK_UNMAPPED_BASE = 0x40000000 (decimal 1073741824 or 1 GB). This is the address that splits the section between brk(2) and mmap(2), which defines available space for shared libraries (if it hasn't been changed and overwritten at the application level) and for shared memory (e.g. SGA).

To change "mapped base" for a Linux process, the file /proc/<pid>/mapped_base needs to be changed where <pid> stands for the process ID. Note that this is not a system wide parameter! So in order to change "mapped base" for the Oracle database (i.e. Oracle processes), the parent shell that starts the database needs to be modified at the Linux OS level to allow it's child processes to inherit the change. The following procedure shows how this can be done.

Execute the following command to identify the process ID "pid" of the shell process used by the Oracle user that will start the database:
echo $$
As root in another shell, change "mapped base" to 0x10000000 (decimal 268435456 bytes or 256 MB) for the Oracle shell with the pid we identified above:
su - root
echo 268435456 > /proc/<pid>/mapped_base
This will tell the kernel to load shared libraries at the virtual address portion starting at 0x10000000. Now if Oracle is started with sqlplus in the shell used by the Oracle user for which we changed "mapped base", the Oracle processes will inherit the new base address.

Once the
base address for shared memory has been changed at the Oracle level as well, more space will become available for the SGA. To accommodate the increased space for shared memory allocations by the Oracle processes, the maximum value of SHMMAX needs to be raised. This value defines the largest shared memory segment size allowed by the kernel. Since the SGA can be increased up to 2.7 GB with this method, the maximum size for SHMMAX can be rounded to 3000000000. This will allow Oracle to allocate one large shared memory segment for the SGA. This is also what Oracle recommends.

The maximum size SHMMAX for a shared memory segment can be changed in the proc file system without reboot:
su - root
echo "3000000000" > /proc/sys/kernel/shmmax
Alternatively, you can use sysctl(8) to change it:
sysctl -w kernel.shmmax=3000000000
To make the change permanent, add or change the following line in the file /etc/sysctl.conf. This file is used during the boot process.
kernel.shmmax=3000000000

Changing the Base Address for Shared Memory at the Oracle Level

The previous steps showed how to lower the base address "mapped base" for Oracle's shared libraries to 0x10000000 (256 MB). The following steps show how to lower the base address for shared memory (SGA) for Oracle to 0x15000000 (336 MB).

The base address for SGA (shared memory) should not be lowered to 0x10000000 at the Oracle level. As I explained in the section "
Address Mappings on Linux - Shared Memory and Shared Library Mapping on Linux", to prevent address range conflicts between the segments (Oracle shared libraries and Oracle shared memory), the address at which the SGA should be attached is 0x15000000. It can be lowered to 0x12000000, but this would require thorough testing. So I would not recommend it.

The following calculation shows how large the SGA can be created:
   0xc0000000  (base address of the kernel space -> 3 GB)
- 0x15000000 (base address of SGA -> 336 MB)
-------------
0xab000000 (decimal 2868903936 or 2.736 GB)
- stack space
- other memory allocations
------------
~ 2.65 to 2.70 GB

To lower the base address at which the SGA (shared memory) should be attached, Oracle needs to be relinked. Changing the base address for SGA can be done on Linux with genksms, which is an Oracle utility:
  # shutdown Oracle
SQL> shutdown

su - oracle
cd $ORACLE_HOME/rdbms/lib


# Make a backup of the ksms.s file if it exists
[[ -f ksms.s ]] && cp ksms.s ksms.s_orig

# Modify the attach address in the ksms.s file before relinking Oracle
genksms -s 0x15000000 > ksms.s

Rebuild the Oracle executable in the $ORACLE_HOME/rdbms/lib directory by entering the following commands:
  # Create a new ksms object file
make -f ins_rdbms.mk ksms.o

# Create a new "oracle" executable ($ORACLE_HOME/bin/oracle):
make -f ins_rdbms.mk ioracle

# The last step will create a new Oracle kernel that loads the SGA at
# the address specified by sgabeg in ksms.s:
# .set sgabeg,0X15000000
# It also backs up the old oracle executable to $ORACLE_HOME/bin/oracleO,
# it sets the correct privileges for the new Oracle executable "oracle", and
# moves the new executable "oracle" into the $ORACLE_HOME/bin directory.

Now when Oracle is started, the lowered base addresses for Oracle's shared library and shared memory (SGA) can be seen with the following commands:
  # Get the pid of e.g. the Oracle checkpoint process
su - oracle
$ pgrep -f -x ora_dbw0_$ORACLE_SID -l
13519 ora_dbw0_test
# You can also use /sbin/pidof to get the process ID
$ /sbin/pidof ora_dbw0_$ORACLE_SID
13519
$ DBW0_PID=`pgrep -f -x ora_dbw0_$ORACLE_SID`
$ echo $DBW0_PID
13519

# Check the base addresses for shared libraries and shared memory for the
# process ID 1049:

$ grep '.so' /proc/$DBW0_PID/maps |head -1
10000000-10016000 r-xp 00000000 03:02 750738 /lib/ld-2.2.4.so

$ grep 'SYS' /proc/$DBW0_PID/maps |head -1
15000000-24000000 rw-s 00000000 00:04 262150 /SYSV3ecee0b0 (deleted)
$
Now you can increase the init.ora parameters db_cache_size or db_block_buffer to create a larger database buffer cache. If the size of the SGA is larger than 2.65 GB, then I would test the database very thoroughly to make sure no other memory allocation problems arise.

For fun I tried to test these settings on a little test PC with 256 MB RAM and 4 GB swap space. I wanted to see if I was able to bring up a database on such a little PC. I set db_block_buffer to 315000 and db_block_size to 8192 (2580480000 bytes), and I was able to bring up a database with 2.654 GB (2850033824 bytes) SGA on this PC:
Total System Global Area 2850033824 bytes
Fixed Size 450720 bytes
Variable Size 268435456 bytes
Database Buffers 2580480000 bytes
Redo Buffers 667648 bytes


Giving Oracle Users the Privilege to Change the Base Address for Oracle's Shared Libraries Without Giving them root Access

As shown above, only root can change the base address "mapped base" for shared libraries. Using sudo we can give Oracle users the privilege to change "mapped base" for their own shells without giving them full root access. Here is the procedure:
su - root

# E.g. create a script called "/usr/local/bin/ChangeMappedBase"
# which changes the "mapped base" for the parent process,
# the shell used by the Oracle user where the "sudo" program
# is executed (forked). Here is an example:

#/bin/sh
# Lowering "mapped base" to 0x10000000
echo 268435456 > /proc/$PPID/mapped_base


# Make sure that owernship and permissions are correct
chown root.root /usr/local/bin/ChangeMappedBase
chmod 755 /usr/local/bin/ChangeMappedBase


# Allow the Oracle user to execute /usr/local/bin/ChangeMappedBase via sudo
echo "oracle ALL=/usr/local/bin/ChangeMappedBase" >> /etc/sudoers

Now the Oracle user can run /usr/local/bin/ChangeMappedBase to change "mapped base" for it's own shell:
$ su - oracle
$ cat /proc/$$/mapped_base; echo
1073741824
$ sudo /usr/local/bin/ChangeMappedBase
Password: # type in the password for the Oracle user account
$ cat /proc/$$/mapped_base; echo
268435456
$
When /usr/local/bin/ChangeMappedBase is executed the first time after an Oracle login, sudo will ask for a password. The password that needs to be entered is the password of the Oracle user account.


Changing the Base Address for Oracle's Shared Libraries Automatically During an Oracle Login

The procedure in the previous section asks for a password each time /usr/local/bin/ChangeMappedBase is executed the first time after an Oracle login. To have "mapped base" changed automatically during an Oracle login without a password, the following can be done:

Edit the /etc/sudoers file with visudo:
su - root
visudo
Change the entry in /etc/sudoers from:
oracle   ALL=/usr/local/bin/ChangeMappedBase
to read:
oracle   ALL=NOPASSWD: /usr/local/bin/ChangeMappedBase
Make sure bash executes /usr/local/bin/ChangeMappedBase during the login process. You can use e.g. ~oracle/.bash_profile:
su - oracle
echo "sudo /usr/local/bin/ChangeMappedBase" >> ~/.bash_profile
The next time you login to Oracle, the base address for shared libraries will bet set automatically.
$ ssh oracle@localhost
oracle@localhost's password:
Last login: Sun Apr 6 13:59:22 2003 from localhost
$ cat /proc/$$/mapped_base; echo
268435456
$

Important Notes

When the base address "mapped base" for Oracle's processes has changed, then every Linux shell that spawns Oracle processes (e.g. listener) must have the same "mapped base" as well. This means that even shells that are used to connect locally to the database need to have the same "mapped base". For example, if you run sqlplus to connect to the local database, then you will get the following error message if "mapped base" of this shell is not the same as for the Oracle processes:
SQL> connect scott/tiger
ERROR:
ORA-01034: ORACLE not available
ORA-27102: out of memory
Linux Error: 12: Cannot allocate memory

Additional information: 1
Additional information: 491524

SQL>

Using Large Memory Pages (Bigpages)

This feature is very useful for large SGA sizes. In the following example I will show how to use and configure Linux bigpage memory area for System V shared memory segments. System V shared memory segments are allocated for SGA if "shmfs" is not used or configured for SGA.

A separate Linux memory area can be allocated to use 4 MB memory pages rather than the normal 4 kB pages. Large memory pages "bigpages" are locked in memory and do not get swapped out. This means that a whole separate pigpage memory area can be allocated for the entire SGA not to get swapped out of memory. This means that it is very important that the bigpage memory area is only as large as needed for SGA because unused memory in the bigpage pool won't be available for other use than for shared memory allocations, even if the Linux system starts swapping. It is also important to be aware that if bigpages is set to a high value, then the available memory for user connection will be low.

Sizing Bigpages

Oracle says that the maximum value of Bigpages should be:
Maximum value of Bigpages = HighTotal / 1024 * 0.8 MB
The bigpage memory area is only available for shared memory. So if bigpages is set to a high value, then the available memory for user connection will be low. If the memory consumption for the maximum number of user connections is known, then Oracle says that bigpages can be calculated as follows:
Maximum value of Bigpages = (HighTotal - Memory required by maximum user connections in KB) / 1024 * 0.8 MB
According to Oracle's white paper Linux Virtual Memory in Red Hat Advanced Server 2.1 and Oracle's Memory Usage Characteristics, the assumption is that 20% of memory is reserved for kernel bookkeeping.

The value for "HighTotal" can be obtained with the following command:
grep HighTotal /proc/meminfo
Note that highmem is all memory above (approx) 860MB of physical RAM. This means that "HighTotal" is the the total amount of memory in the high memory region. It should now be clear that large memory pages should only be configured if enough physical RAM is available. For instance, if the server has only 512 MB RAM, then "HighTotal" will be 0 kB. And on my 1 GB RAM desktop PC, "HighTotal" shows 130992 kB.

Here are a few examples for bigpage sizes taken from Tips and Techniques: Install and Configure Oracle9i on Red Hat Linux Advanced Server:
2 GB SGA    2100 MB bigpages
4 GB SGA 4100 MB bigpages
The bigpages feature allows a maximum size of 5.4 GB SGA on a machine with 8 GB RAM.

Configuring Bigpages

The kernel needs to be told to use the bigpages pool for shared memory allocations. The bigpages feature can be enabled for System V shared memory in the proc file system without reboot with the following command:
su - root
echo "1" > /proc/sys/kernel/shm-use-bigpages
Alternatively, you can use sysctl(8) to change it:
sysctl -w kernel.shm-use-bigpages=1
To make the change permanent, add the following line to the file /etc/sysctl.conf. This file is used during the boot process.
echo "kernel.shm-use-bigpages=1" >> /etc/sysctl.conf
Setting kernel.shm-use-bigpages=2 will enable bigpages for "
shmfs" which I'm not covering in this article. Setting kernel.shm-use-bigpages=0 will disable the bigpages feature.

The kernel needs to be told how large the bigpage pool should be. If you use GRUB, add the "bigpages" parameter in the etc/grub.conf file and set the maximum value of bigpages as follows. In this example I will set bigpages to 2100 MB for the SMP kernel 2.4.9-e.25 that is started on my database server:
default=1
timeout=10
splashimage=(hd0,1)/boot/grub/splash.xpm.gz
title Red Hat Linux (2.4.9-e.25enterprise)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25enterprise ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25enterprise.img
title Red Hat Linux Advanced Server (2.4.9-e.25smp)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25smp ro root=/dev/hda2 hdc=ide-scsi bigpages=2100MB
initrd /boot/initrd-2.4.9-e.25smp.img
title Red Hat Linux Advanced Server-up (2.4.9-e.25)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25 ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25.img

After this change the system needs to be rebooted:
su - root
shutdown -r now

After a system reboot, the "MemFree" value (free system memory) in the /proc/meminfo is subtracted by 2100 MB in this example. The 2100 MB show now up in the "BigPagesFree" which means that 2100 MB are now in a separate allocation area:
grep MemTotal /proc/meminfo
grep BigPagesFree /proc/meminfo
Note that if you configure "bigpages" in the etc/grub.conf file and reboot the system, "BigPagesFree" in /proc/meminfo will be 0 KB if "HighTotal" in /proc/meminfo is 0 KB and if /proc/sys/kernel/shm-use-bigpages is set to "1".









Implement Advanced memory management techniques
Increasing Usable Address Space for Oracle on 32-bit Linux
To increase this size, Oracle needs to be relinked with a lower SGA base and Linux needs to have the mapped base lowered for processes running Oracle. Increasing the address space allows for more database buffers or a larger indirect data buffer window to be used.
There are changes that need to be made to the Oracle binary and the Linux environment (requiring root access), so the appropriate privileges are needed.
Currently, a solution exists only when running Oracle 9iR2 on Red Hat 2.1 Advanced Server. Red Hat provides an adjustable parameter in the /proc filesystem to allow more useable address space in processes. First, the SGA base address that Oracle uses must be lowered by relinking Oracle. Currently, Oracle ships with this base address set at 0x50000000 so that it is compatible with the defaults set by most distributions of Linux. Lowering this address allows Oracle to use more of the address space in the process, but it is important to note that the newly relinked Oracle binary will no longer work unless a corresponding modification is also made to Linux (Red Hat 2.1AS provides a way to do this at runtime).



Follow these steps to complete the first part of the solution:
1. Shutdown all instances of Oracle
2. cd $ORACLE_HOME/lib
3. cp -a libserver9.a libserver9.a.org (to make a backup copy)
4. cd $ORACLE_HOME/bin
5. cp -a oracle oracle.org (to make a backup copy)
6. cd $ORACLE_HOME/rdbms/lib
7. genksms -s 0x15000000 >ksms.s (lower SGA base to 0x15000000)
8. make -f ins_rdbms.mk ksms.o (compile in new SGA base address)
9. make -f ins_rdbms.mk ioracle (relink)

The relinked Oracle binary now has a lower SGA base and is now able to use about 2.65GB of address space if Linux is also modified to support this. Next, the Linux kernel's mapped base needs to be lowered below Oracle's new SGA base. Red Hat 2.1AS has a parameter in /proc that lowers the kernel's mapped base for each process. This parameter is not a system-wide parameter. It is a perprocess parameter, but it is inherited by child processes. This parameter can only be modified by root. The following steps document how to lower the mapped base for a single bash terminal session. The default mapped base is 0x40000000. Once this session has been modified with the lower mapped base, this session (terminal window) will need to be used for all Oracle commands so that Oracle processes use the inherited (lower) mapped base:
1. Shutdown the instance of Oracle.
2. Open a terminal session (Oracle session), and get the process id using "echo $$”.
3. Open a second terminal session and su to root (root session).
4. Now, from the root session,
echo 268435456 >/proc/<pid>/mapped_base,
where <pid> is the process id determined in step 2. This lowers the mapped base for the Oracle session to 0x10000000.
5. Again, from the root session,
echo 3000000000 >/proc/sys/kernel/shmmax
Ths increases the value of shmmax so that Oracle will allocate the SGA in one segment.

6. From the Oracle terminal session, startup the Oracle instance.
The SGA now begins at a lower address, so more of the address space can be used by Oracle. Now you can increase the init.ora values of db_cache_size or db_block_buffers to increase the size of the database buffer cache. You can also write a small program that uses setuid() to set the /proc/<pid>/mapped_base. It would look something like this:
int main(int argc, char *argv[]) {
pid_t ppid;
char buf[256];
unsigned long mapped_base;
int ret;
#define NEW_MAPPED_BASE 0x10000000
ppid = getppid();
mapped_base = NEW_MAPPED_BASE;
sprintf(buf, "echo %lu >/proc/%u/mapped_base", mapped_base, ppid);
setuid(0);
ret = system(buf);
if (ret == 0)
printf("Lowering mapped base of pid=%u to 0x%X\n", ppid,
mapped_base);
else
printf("unable to lower mapped base. You might need to:\n" "
chmod 4711 lowermap\n chown root.root lowermap\n");
exit(-ret);
}

If you are running with the init.ora parameter 'use_indirect_data_buffers=true' and already have a large buffer cache, you can use the above solution to increase the indirect buffer window size. The default is 512MB and should be fine for most applications. Increasing the window size may increase performance slightly under certain conditions because a larger indirect window reduces the overhead of mapping an indirect buffer into Oracle's address space.
To increase the indirect window size, set the environment variable VLM_WINDOW_SIZE to the window size in bytes before starting up the Oracle instance. For example: export VLM_WINDOW_SIZE=1073741824 to set the indirect window size to 1GB. Any value set should be a multiple of 64KB.
Notes:
1. Increasing the buffer cache size (or the indirect window size) too high can cause Oracle attach errors while starting up.
2. If you try to use an Oracle binary that has a lower SGA base but did lower the /proc/<pid>/mapped_base value, you will experience unpredictable results ranging from ORA-3113 errors, attach errors, etc while starting up.
3. If you don't increase the shmmax value, you could get attach errors while starting up.
4. The address space is limited. So if you lower the SGA base and consume most of the address space with a larger SGA, there will be less room available for PGA memory. If your application uses a lot of PGA memory, you could get ORA-4030 errors (out of process memory). In this case, setting the SGA base to a higher value (and lowering the SGA size) will reserve more space for PGA memory.
5. If you lower the SGA base and your SGA size is below around 800MB, you may get attach errors. Lowering the SGA base is mainly a way to allocate a large SGA area. Sizes below 800MB should work without having to lower the SGA base.
6. It doesn't always help to increase VLM_WINDOW_SIZE. Also, keep in mind that increasing VLM_WINDOW_SIZE reduces the amount of SGA that can be allocated for other memory areas that might be needed (e.g. locks on RAC). It is best to raise this value as the very last step. This value could be increased once you know how much available address space is left after adjusting init.ora parameters.
7. If you get attach errors while starting up, you will probably need to clean up the shared memory segments by running 'ipcs' and then removing segments via 'ipcrm shm XXX' or 'ipcrm sem XXX'.

Page Address Extensions
In order to get above 4GB virtual memory on IA-32 architecture a technique known as PAE (Page Address Extensions) is used. It is a method that translates 32-bit linear addresses to 36-bit physical addresses. In the linux kernel, the support is provided through a compile time option that produces two separate kernels - the SMP kernel which supports only upto 4GB VM and the enterprise kernel which can go up to 64GB VM (also called VLM capable). This means applications like oracle can make use of the large memory and scale up to a large number of users without loss of performance or reliability.

Shared memory file-system (shmfs) support
It is a memory-based file system optimized for shared memory operations and for larger SGA size.
The shmfs (/dev/shm based) is used by oracle to memory map the dynamic portions of the SGA. This can theoretically allow an SGA up to the size of the shmfs file system that is created. Since shmfs is a memory file system, its size can be as high as the maximum allowable VM size which is 64GB.
1. Mount the shmfs file system as root using command:
      mount -t shm shmfs -o nr_blocks=8388608 /dev/shm
2. Set the shmmax parameter to half of RAM size
      echo 3000000000 >/proc/sys/kernel/shmmax
3. Set the init.ora parameter use_indirect_data_buffers=true.
4. Startup oracle.

Bigpages feature
Page frame of size 4MB as opposed to the regular 4KB. Oracle uses a large contiguous area in the VM for mapping the VLM window. These are used for the dynamic part of the SGA the size of which is specified by the db_block_buffers parameter. The pages corresponding to this area in the VM can easily be of a larger size than the default 4KB and yet there would not be any of the problems like granularity associated with using large page size. A page size of 4MB for these pages would reduce the number of pte-s thus reducing the kernel overhead considerably. The number of TLBs used are also fewer thus reducing TLB thrashing. The result is better scalability in terms of the number of oracle users. Better performance is also achieved because the big pages are not swapped out which means the entire db_block_buffers are in physical memory. The system performance increases as a result of kswapd not having to ‘think’ about swapping out these pages. Since swap space is not pre-allocated for these pages, there is more swap area available and less pagecache complexity.
Use the following steps to set Bigpages feature:
1. Calculate bigpages value for your system with the following formula:

                HighTotal – Max memory required by user connections in KB
Bigpages = -------------------------------------------------------------- * 0.8 MB
                             1024

where
• HighTotal is value in Kbytes and obtained from /proc/meminfo.
• Assuming that 20 % memory is reserved for kernel bookkeeping.

For example, assume that a machine with 8 GB memory and HighTotal of 7208944 KB, is estimated to have 2000 concurrent users, each occupying a memory of 3KB.

Now,
                        7208944 - 2000*30
Bigpages = ---------------------------------- * 0.8 MB = 5585 MB
                             1024


There is a trade-off between the number of users and the bigpages value because, if the value for bigpages is set to a very high value, the memory available for user connections would be low. Hence, always estimate a high value for the maximum number of user connections and the memory that each will consume.

2. In the kernel boot options, add the following line to the boot loader file (e.g. /etc/lilo.conf):
      bigpages=<size>MB
where size is a value in MB calculated in the previous step.
3. Set the /proc/sys/kernel/shm-use-bigpages file to contain the value 2. The other possible values are 0 for no bigpages and 1 for bigpages using sysV shared memory (as opposed to shmfs).