Large
SGA On Linux
The current shipping version of Oracle is able to use about 1.7GB of
address space for its SGA.
There are
several ways to allocate more memory than this.
On a 4 GB RAM machine, the size of the SGA (SGA utilizes shared memory)
can be increased up to is 2.7 GB. This requires changes
in Linux and Oracle.
On a 8 GB RAM machine, the size of the SGA can be increased up to 7 GB
by using the shared memory
filesystem
"shmfs".
A maximum size of 5.4 GB of SGA can be created using the "bigpages"
feature for System V shared memory where the page size
is 4 MB vs. the regular 4 KB.
On a machine that supports Physical Address Extension (PAE), the SGA
can theoretically have a size of 62 GB.
The PAE mechanism allows addressing using 36 bits on IA-32 systems.
But current hardware limitations and practical consideration limit the
actual size of the SGA
on such systems.
Red Hat Linux Advanced Server has several features and enhancements
that don't exist in other Red Hat versions.
Among other things, Red Hat AS provides:
- Asynchronous I/O
- Process scheduler with CPU affinity, cache affinity, and per CPU
runqueues and locks that provide better performance
- "mapped base" (base address for shared libaries) can be changed
dynamically allowing larger sizes for the SGA
- Page frame of size 4 MB as opposed to 4 KB can be used for the SGA
which improves performance for large SGAs
- The kernel can also use the "high memory" pool (physical memory above
1 GB) for allocating page table entries (PTE) which allow a higher
number of Oracle connections
- Elimination of copy to bounce buffer improves I/O performance
More
information
Note
260152.1 - Linux Big SGA, Large Memory, VLM - White Paper
Note 225220.1 - OS Configuration for large SGA
Note 260152.1 - Summary About the Large SGA & Address Space on RH
Linux
Note 275318.1 - The Bigpages Feature on Linux
Note 317055.1 - How to Configure RHEL 3.0 32-bit for Very Large Memory
and HugePages
Note 317141.1 - How to Configure RHEL 4 32-bit for Very Large Memory
with ramfs and HugePages
Note 200266.1 - Increasing Usable Address Space for Oracle on 32-bit
Linux
Note 401749.1 - Shell Script to Calculate Values Recommended HugePages
/ HugeTLB Configuration
The most robust and
scalable method to increase the SGA memory requires the use
of a shared memory file system (shmfs). The procedure presented in this
article assumes you are using RedHat Advanced Server (AS) 2.1 with the
enterprise kernel
which supports Page Address Extension (PAE).
Create a shared
memory file system (shmfs)
The
shmfs is a memory file system so it can be as large as the maximum
allowable virtual memory supported by Red Hat Linux AS2.1, currently 16
GB, although the enterprise kernel theoretically supports up to 64 GB
of RAM.
The shmfs is created using the following command as the root user:
mount -t shm shmfs -o size=3g /dev/shm
The shared memory file system can be mounted automatically by adding
the following line into /etc/fstab file:
shmfs /dev/shm shm size=3g 0 0
In the above example I've created the shmfs with a size of 3G as that
is the size of the buffer cache I am planning to use. The other
elements of the SGA
are placed in regular memory, not this shared memory file system, so
they should not be included when deciding on the size of the shmfs. It
is advisable to
size this slightly bigger than the actual size needed, but in this
example I've used a 3G shmfs for a 3G buffer cache.
Enabling big
pages
Big pages are enabled by adding the bigpages=xMB to the
relevant kernel entry in the boot loader file /boot/grub/grub.conf
file, where "x" is
calculated as follows:
(Total SGA size in Gig) x 1024
Then round this value to the nearest hundredth. So for a 4G SGA we
would do the following:
4 x 1024 = 4096 = 4100
So the /boot/grub/grub.conf file entry might look like this:
kernel /vmlinuz-2.4.9-e.40enterprise ro root=/dev/cciss/c0d0p2 bigpages=4100MB
With this entry saved the system should be rebooted. Once the system is
available you must perform the following command as the root user:
echo 2 > /proc/sys/kernel/shm-use-bigpages
Alternatively you can add the following entry into the /etc/sysctl.conf
file so this value persists between reboots:
kernel.shm-use-bigpages = 2
Setting the
SHMMAX value
The shmmax value should be set at half the physical memory up to a
maximum of 4294967295. For a server with 6G of memory we can set this
value to 3G (half physical memory)
using the following command as the root user:
echo 3221225472 > /proc/sys/kernel/shmmax
Alternatively it can be set in the /etc/sysctl.conf file with the
following entry:
kernel.shmmax = 3221225472
The contents of your /etc/sysctl.conf file may look something like this:
kernel.shmmax = 3221225472
kernel.shmmni = 4096
kernel.shmall = 2097152
kernel.sem = 1000 32000 100 150
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000
kernel.shm-use-bigpages = 2
Alterations to the /etc/sysctl.conf file can be applied without a
reboot by issuing the following command as root:
/sbin/sysctl -p
Instance
Parameters
Some instance parameter changes are necessary to allow the Oracle
instance to use the shared memory file system. The spfile parameters
can be manipulated using the ALTER SYSTEM SET command in a running
instance, or by modifying the spfile contents offline:
-- Change the parameter value in the spfile directly.
ALTER SYSTEM SET parameter = value SCOPE=spfile
-- Create a pfile with the contents of the current spfile.
CREATE PFILE='/tmp/pfile' FROM SPFILE;
-- Manually manipulate the contents of the pfile.
-- Recreate the spfile from the amended pfile.
CREATE SPFILE FROM PFILE='/tmp/pfile';
The following parameter should be added to the spfile or pfile:
use_indirect_data_buffers=true
Also, any references to db_cache_size and db_xK_cache_size parameters
should be removed and replaced with the old style db_block_buffers
parameter entry:
# 3Gig for an 8K db_block_size.
db_block_buffers = 393216
This means that the multiple block size feature is not available when
using this method. Remember that the buffer cache is only one part of
the SGA.
For further information see:
- Metalink Note:211424.1 (second half)
The recommended kernel for Red Hat Enterprise Linux 2.1 is 2.4.9-e.25
or higher.
This kernel has several fixes that are relevant to Oracle including
fixes for memory
problems and kswapd problems.
If the Linux server has <= 4 GB RAM, the kernel "kernel-smp" should
be used for SMP machines, or the kernel
"kernel" should be used for UP machines.
If the Linux server has > 4 GB RAM, the enterprise kernel
"kernel-enterprise" should be used for UP and SMP machines.
To check if these kernels are installed, execute e.g. the following
command:
rpm -q kernel-smp kernel-enterprise
To check which kernel is currently running, execute the following
command:
uname -a
To install e.g. the enterprise kernel, download the
"kernel-enterprise" RPM and execute
the following command:
rpm -ivh kernel-enterprise-2.4.9-e.25.i686.rpm
To make sure that the right kernel is booted, check the /etc/grub.conf
file if you
use GRUB, and change the "default" attribute if necessary.
Here is an example:
default=1
timeout=10
splashimage=(hd0,1)/boot/grub/splash.xpm.gz
title Red Hat Linux (2.4.9-e.25enterprise)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25enterprise ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25enterprise.img
title Red Hat Linux Advanced Server (2.4.9-e.25smp)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25smp ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25smp.img
title Red Hat Linux Advanced Server-up (2.4.9-e.25)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25 ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25.img
In this example, the "default" attribute is set to "1" which
means that the 2.4.9-e.25smp kernel will be booted.
If the "default" attribute would be set to "0", then the
2.4.9-e.25enterprise kernel would be booted.
After you installed the new kernel and/or made changes to the /etc/grub.conf
file, reboot
the server.
Once you are sure you don't need the old kernel anymore, you can remove
the old kernel by running:
su - root
rpm -e <OldKernelVersion>
When you remove the kernel, you don't
need to make any changes to the /etc/grub.conf file.
NOTE: Be very careful when removing a kernel! Making a
mistake could render the server
unbootable.
If the size of SGA does not need
to be increased from 1.7 GB to 2.7 GB,
then the following steps can be skipped.
By default, the maximum size for SGA is 1.7 GB on a 32-bit system
without Physical Address Extension (PAE).
You will also be able to allocate 1.7 GB SGA if you have less than 4 GB
RAM.
In this case you have to make sure you have enough swap space, however,
this will have an
impact to the performance of the database. I was even able to bring up
a database with
a SGA size of 2.64 GB on a test PC that had 256 MB RAM.
Theoretically, the SGA can have a size of up to 62 GB on a system that
supports Physical Address
Extension (PAE). The PAE mechanism allows addressing using 36 bits on
IA-32 systems.
But current hardware limitations and practical consideration limit the
actual size of the SGA
on such a system.
Since I do not have such a system, I will not cover the steps for
creating
SGAs larger than 2.7 GB via the tmpfs filesystem.
To increase the size of the SGA to 2.7 GB without using a shared memory
filesystem
(tmpfs),
the following needs to be done:
- The base address "mapped base" for Oracle's shared
libraries has to be lowered
at the Linux OS level.
- Oracle needs to be relinked with a lower base address for
SGA which uses shared memory segments.
Address Mappings on Linux - Shared
Memory and Shared Library Mapping on Linux
Normally, the 4 GB linear address space (also known as virtual address
space) for a 32-bit Linux system
is split into 4 equal sized sections for different purposes:
0GB-1GB User space - Used for executable and brk/sbrk allocations (malloc uses brk for small chunks).
1GB-2GB User space - Used for mmaps (shared memory), shared libraries and malloc uses mmap (malloc uses mmap for large chunks).
2GB-3GB User space - Used for stack.
3GB-4GB Kernel Space - Used for the kernel itself.
- The mmaps grow bottom up
and the stack grows top down. The unused space used by the one can
be used by the other.
- The split between userspace and kernelspace can
be changed by setting the kernel parameter PAGE_OFFSET and recompiling
the kernel. By default,
the PAGE_OFFSET macro yields the value 0xc0000000.
- The split between brk(2) and mmap(2) can be
changed by setting the kernel parameter
TASK_UNMAPPED_BASE and recompiling the kernel. However, on Red Hat AS
this parameter can be changed for individual
processes dynamically without reboot or kernel recompilation.
Usually, the portion of address space available for mapping shared
libraries and shared memory
segments consists of virtual addresses in the range of 0x40000000 (1
GB) - 0xc0000000 (3 GB).
On Red Hat AS, 0x40000000 is the default base address for shared
libraries and shared memory
segments. The default base address for mapping shared memory segments
can be changed and
overwritten for programs and applications by non-root users.
The default base address "mapped base" for loading shared libraries for
programs and
applications can be changed by the user root only.
The default base address that Oracle uses for SGA (shared memory
segment) is 0x50000000 and
not 0x40000000. Oracle uses or keeps the space from
0x40000000-0x50000000 for loading Oracle shared
libraries. As I mentioned before, 0x40000000 is the default base
address on RH AS for loading
shared libraries which can only be changed by the user root.
Oracle increased the base address for SGA to prevent address range
conflicts between the
segments (shared memory segment and shared libraries).
If the base address for shared memory segments would be 0x15000000 and
if the base address for
shared libraries would be 0x40000000, then Oracle cannot create the SGA
larger than 0x2b000000
bytes or 688 MB, even though there is address space available above the
shared libraries portion.
(According to Oracle, Oracle binaries will no longer work if the base
address for
shared memory segments is lower than the base address shared libraries
like in this example.
Even though I didn't experience any problems, I would not recommend
it).
If the base address for shared memory segments is 0x50000000 and if the
base address for
shared libraries is 0x40000000, then Oracle can create a SGA that
starts at 0x50000000 and ends
almost at 0xc0000000; 0xc0000000 is the address where the kernel
address space begins. This means that the SGA
can have a size of almost 0x70000000 bytes or 1.792 GB - actually it's
about 100 MB less due to
stack space and other use of memory.
Once again, Oracle increased the default base address for SGA to
0x50000000 so that all
shared libraries can be loaded below 0x50000000, and the rest of the
space up to almost
0xc0000000 can be used for shared memory.
You can verify the address mappings of Oracle processes by viewing the
proc file /proc/<pid>/maps
where <pid> stands for the Oracle process ID. The default
mapping of an Oracle process might look like this:
08048000-0ab11000 r-xp 00000000 08:09 273078 /ora/product/9.2.0/bin/oracle
0ab11000-0ab99000 rw-p 02ac8000 08:09 273078 /ora/product/9.2.0/bin/oracle
0ab99000-0ad39000 rwxp 00000000 00:00 0
40000000-40016000 r-xp 00000000 08:01 16 /lib/ld-2.2.4.so
40016000-40017000 rw-p 00015000 08:01 16 /lib/ld-2.2.4.so
40017000-40018000 rw-p 00000000 00:00 0
40018000-40019000 r-xp 00000000 08:09 17935 /ora/product/9.2.0/lib/libodmd9.so
40019000-4001a000 rw-p 00000000 08:09 17935 /ora/product/9.2.0/lib/libodmd9.so
4001a000-4001c000 r-xp 00000000 08:09 16066 /ora/product/9.2.0/lib/libskgxp9.so
...
42606000-42607000 rw-p 00009000 08:01 50 /lib/libnss_files-2.2.4.so
50000000-50400000 rw-s 00000000 00:04 163842 /SYSV00000000 (deleted)
51000000-53000000 rw-s 00000000 00:04 196611 /SYSV00000000 (deleted)
53000000-55000000 rw-s 00000000 00:04 229380 /SYSV00000000 (deleted)
...
bfffb000-c0000000 rwxp ffffc000 00:00 0
As this address mapping shows, shared libraries start at base address
0x40000000.
The address mapping also shows that Oracle uses the base address
0x50000000 for SGA
(in this example System V shared memory for SGA). Here is a summary of
all the entries:
The text (code) section is mapped at 0x08048000:
08048000-0ab11000 r-xp 00000000 08:09 273078 /ora/product/9.2.0/bin/oracle
The data section is mapped at
0x0ab11000:
0ab11000-0ab99000 rw-p 02ac8000 08:09 273078 /ora/product/9.2.0/bin/oracle
The uninitialized data segment .bss is
allocated at 0x0ab99000:
0ab99000-0ad39000 rwxp 00000000 00:00 0
The base address for shared libraries
is 0x40000000:
40000000-40016000 r-xp 00000000 08:01 16 /lib/ld-2.2.4.so
The base address for SGA (System V
shared memory) is 0x50000000:
50000000-50400000 rw-s 00000000 00:04 163842 /SYSV00000000 (deleted)
The stack is allocated at 0xbfffb000:
bfffb000-c0000000 rwxp ffffc000 00:00 0
Now it should become clear what needs
to be done to provide more space for SGA.
To increase the space for SGA, two base addresses need to be changed.
The base address "mapped base" for shared libraries needs to be lowered
at the Linux OS level, and the base address for SGA (shared memory)
needs to be lowered at the Oracle level (application level).
Note: Once the base addresses have been changed at the Linux OS
level and at the Oracle level,
all Oracle commands need to be executed with a lower "mapped base"!
This means that every new shell
must run with a lowered "mapped base". Further down I will show you how
you can automate this so that
every Oracle user gets automatically a shell with a lowered "mapped
base".
Changing the
Base Address "mapped base" for Shared Libraries at the Linux OS Level
The default base address "mapped base" on RH 2.1AS is
TASK_UNMAPPED_BASE = 0x40000000
(decimal 1073741824 or 1 GB).
This is the address that splits the section between brk(2)
and mmap(2), which defines
available space for shared libraries (if it hasn't been changed and
overwritten at the application level)
and for shared memory (e.g. SGA).
To change "mapped base" for a Linux process, the file /proc/<pid>/mapped_base
needs
to be changed where <pid> stands for the process ID. Note that
this is not a system wide parameter!
So in order to change "mapped base" for the Oracle database (i.e.
Oracle processes), the parent shell
that starts the database needs to be modified at the Linux OS level to
allow it's child processes
to inherit the change. The following procedure shows how this can be
done.
Execute the following command to identify the process ID "pid" of the
shell process used by the Oracle user
that will start the database:
echo $$
As root in another shell, change "mapped base" to 0x10000000
(decimal 268435456 bytes or 256 MB) for the Oracle shell
with the pid we identified above:
su - root
echo 268435456 > /proc/<pid>/mapped_base
This will tell the kernel to load shared libraries at the
virtual address portion starting at 0x10000000.
Now if Oracle is started with sqlplus in the shell used by
the Oracle user for which
we changed "mapped base", the Oracle processes will inherit the new
base address.
Once the
base
address for shared memory
has been changed at the Oracle level as well, more space will become
available
for the SGA. To accommodate the increased space for shared memory
allocations by the Oracle processes,
the maximum value of SHMMAX needs to be raised. This value
defines the largest shared memory segment size allowed by the kernel.
Since the SGA can be increased up to 2.7 GB
with this method, the maximum size for SHMMAX can be rounded to
3000000000. This will allow Oracle to allocate one
large shared memory segment for the SGA. This is also what Oracle
recommends.
The maximum size SHMMAX for a shared memory segment can be changed in
the proc file system without reboot:
su - root
echo "3000000000" > /proc/sys/kernel/shmmax
Alternatively, you can use sysctl(8) to change it:
sysctl -w kernel.shmmax=3000000000
To make the change permanent, add or change the following line
in the file /etc/sysctl.conf.
This file is used during the boot process.
kernel.shmmax=3000000000
Changing the Base
Address for Shared Memory at the Oracle Level
The previous steps showed how to lower the base address "mapped base"
for Oracle's shared libraries to 0x10000000 (256 MB).
The following steps show how to lower the base address for shared
memory (SGA) for Oracle to 0x15000000 (336 MB).
The base address for SGA (shared memory) should not be lowered to
0x10000000 at the Oracle level.
As I explained in the section
"
Address Mappings on Linux - Shared Memory and Shared Library Mapping on
Linux",
to prevent address range conflicts between the segments
(Oracle shared libraries and Oracle shared memory), the address at
which the SGA should be attached is 0x15000000.
It can be lowered to 0x12000000, but this would require thorough
testing. So I would not recommend it.
The following calculation shows how large the SGA can be created:
0xc0000000 (base address of the kernel space -> 3 GB)
- 0x15000000 (base address of SGA -> 336 MB)
-------------
0xab000000 (decimal 2868903936 or 2.736 GB)
- stack space
- other memory allocations
------------
~ 2.65 to 2.70 GB
To lower the base address at which the SGA (shared memory) should be
attached, Oracle needs to be relinked.
Changing the base address for SGA can be done on Linux with genksms,
which is an Oracle utility:
# shutdown Oracle
SQL> shutdown
su - oracle
cd $ORACLE_HOME/rdbms/lib
# Make a backup of the ksms.s file if it exists
[[ -f ksms.s ]] && cp ksms.s ksms.s_orig
# Modify the attach address in the ksms.s file before relinking Oracle
genksms -s 0x15000000 > ksms.s
Rebuild the Oracle executable in the $ORACLE_HOME/rdbms/lib
directory by
entering the following commands:
# Create a new ksms object file
make -f ins_rdbms.mk ksms.o
# Create a new "oracle" executable ($ORACLE_HOME/bin/oracle):
make -f ins_rdbms.mk ioracle
# The last step will create a new Oracle kernel that loads the SGA at
# the address specified by sgabeg in ksms.s:
# .set sgabeg,0X15000000
# It also backs up the old oracle executable to $ORACLE_HOME/bin/oracleO,
# it sets the correct privileges for the new Oracle executable "oracle", and
# moves the new executable "oracle" into the $ORACLE_HOME/bin directory.
Now when Oracle is started, the lowered base addresses for Oracle's
shared library and shared memory (SGA)
can be seen with the following commands:
# Get the pid of e.g. the Oracle checkpoint process
su - oracle
$ pgrep -f -x ora_dbw0_$ORACLE_SID -l
13519 ora_dbw0_test
# You can also use /sbin/pidof to get the process ID
$ /sbin/pidof ora_dbw0_$ORACLE_SID
13519
$ DBW0_PID=`pgrep -f -x ora_dbw0_$ORACLE_SID`
$ echo $DBW0_PID
13519
# Check the base addresses for shared libraries and shared memory for the
# process ID 1049:
$ grep '.so' /proc/$DBW0_PID/maps |head -1
10000000-10016000 r-xp 00000000 03:02 750738 /lib/ld-2.2.4.so
$ grep 'SYS' /proc/$DBW0_PID/maps |head -1
15000000-24000000 rw-s 00000000 00:04 262150 /SYSV3ecee0b0 (deleted)
$
Now you can increase the init.ora parameters db_cache_size
or db_block_buffer
to create a larger database buffer cache.
If the size of the SGA is larger than 2.65 GB, then I would test the
database very thoroughly to
make sure no other memory allocation problems arise.
For fun I tried to test these settings on a little test PC with 256 MB
RAM and 4 GB swap space.
I wanted to see if I was able to bring up a database on such a little
PC.
I set db_block_buffer to 315000 and db_block_size
to 8192 (2580480000 bytes),
and I was able to bring up a database with 2.654 GB (2850033824 bytes)
SGA on this PC:
Total System Global Area 2850033824 bytes
Fixed Size 450720 bytes
Variable Size 268435456 bytes
Database Buffers 2580480000 bytes
Redo Buffers 667648 bytes
Giving
Oracle Users the Privilege to Change the Base Address for Oracle's
Shared Libraries Without Giving them root Access
As shown above, only root can change the base address "mapped base" for
shared libraries.
Using sudo we can give Oracle users the privilege to change
"mapped base" for their own shells without giving them full root
access. Here is the procedure:
su - root
# E.g. create a script called "/usr/local/bin/ChangeMappedBase"
# which changes the "mapped base" for the parent process,
# the shell used by the Oracle user where the "sudo" program
# is executed (forked). Here is an example:
#/bin/sh
# Lowering "mapped base" to 0x10000000
echo 268435456 > /proc/$PPID/mapped_base
# Make sure that owernship and permissions are correct
chown root.root /usr/local/bin/ChangeMappedBase
chmod 755 /usr/local/bin/ChangeMappedBase
# Allow the Oracle user to execute /usr/local/bin/ChangeMappedBase via sudo
echo "oracle ALL=/usr/local/bin/ChangeMappedBase" >> /etc/sudoers
Now the
Oracle user can run /usr/local/bin/ChangeMappedBase to change
"mapped base" for it's own shell:
$ su - oracle
$ cat /proc/$$/mapped_base; echo
1073741824
$ sudo /usr/local/bin/ChangeMappedBase
Password: # type in the password for the Oracle user account
$ cat /proc/$$/mapped_base; echo
268435456
$
When /usr/local/bin/ChangeMappedBase
is executed the first time after an Oracle login,
sudo will ask for a password. The password that needs to be
entered is the password
of the Oracle user account.
Changing
the Base Address for Oracle's Shared Libraries Automatically During an
Oracle Login
The procedure in the previous section asks for a password each time /usr/local/bin/ChangeMappedBase
is executed the first time after an Oracle login. To have "mapped base"
changed automatically during an Oracle login without a password, the
following can be done:
Edit the /etc/sudoers file with visudo:
su - root
visudo
Change the entry in /etc/sudoers from:
oracle ALL=/usr/local/bin/ChangeMappedBase
to read:
oracle ALL=NOPASSWD: /usr/local/bin/ChangeMappedBase
Make sure bash executes /usr/local/bin/ChangeMappedBase
during the login
process. You can use e.g. ~oracle/.bash_profile:
su - oracle
echo "sudo /usr/local/bin/ChangeMappedBase" >> ~/.bash_profile
The next time you login to Oracle, the base address for shared
libraries will bet
set automatically.
$ ssh oracle@localhost
oracle@localhost's password:
Last login: Sun Apr 6 13:59:22 2003 from localhost
$ cat /proc/$$/mapped_base; echo
268435456
$
Important Notes
When the base address "mapped base" for Oracle's processes has changed,
then every Linux shell
that spawns Oracle processes (e.g. listener) must have the same "mapped
base" as well.
This means that even shells that are used to connect locally to the
database need
to have the same "mapped base".
For example, if you run sqlplus to connect to the local
database, then you will
get the following error message if "mapped base" of this shell is not
the same as
for the Oracle processes:
SQL> connect scott/tiger
ERROR:
ORA-01034: ORACLE not available
ORA-27102: out of memory
Linux Error: 12: Cannot allocate memory
Additional information: 1
Additional information: 491524
SQL>
This feature is very useful for large
SGA sizes.
In the following example I will show how to use and configure Linux
bigpage memory area for System
V shared memory segments. System V shared memory segments are allocated
for SGA if
"shmfs"
is not used or configured for SGA.
A separate Linux memory area can be allocated to use 4 MB memory pages
rather than the normal 4 kB pages.
Large memory pages "bigpages" are locked in memory and do not get
swapped out. This means that a whole
separate pigpage memory area can be allocated for the entire SGA not to
get swapped out of memory.
This means that it is very important that the bigpage memory area is
only as large as needed for SGA
because unused memory in the bigpage pool won't be available for other
use than for shared memory allocations,
even if the Linux system starts swapping. It is also important to be
aware that if bigpages is set to a high value,
then the available memory for user connection will be low.
Sizing Bigpages
Oracle says that the maximum value of Bigpages should be:
Maximum value of Bigpages = HighTotal / 1024 * 0.8 MB
The bigpage memory area is only available for
shared memory. So if bigpages is set to a high value,
then the available memory for user connection will be low. If the
memory consumption for the maximum number of user
connections is known, then Oracle says that bigpages can be calculated
as follows:
Maximum value of Bigpages = (HighTotal - Memory required by maximum user connections in KB) / 1024 * 0.8 MB
According to Oracle's white paper
Linux
Virtual Memory in Red Hat Advanced Server 2.1 and Oracle's Memory Usage
Characteristics,
the assumption is that 20% of memory is reserved for kernel
bookkeeping.
The value for "HighTotal" can be obtained with the following command:
grep HighTotal /proc/meminfo
Note that highmem is all memory above (approx) 860MB of physical
RAM. This means that "HighTotal" is the the
total amount of memory in the high memory region. It should now be
clear that large memory
pages should only be configured if enough physical RAM is available.
For instance, if the server
has only 512 MB RAM, then "HighTotal" will be 0 kB. And on my 1 GB RAM
desktop PC, "HighTotal" shows 130992 kB.
Here are a few examples for bigpage sizes taken from
Tips
and Techniques: Install and Configure Oracle9i on Red Hat Linux
Advanced Server:
2 GB SGA 2100 MB bigpages
4 GB SGA 4100 MB bigpages
The bigpages feature allows a maximum size of 5.4 GB SGA on a machine
with 8 GB RAM.
Configuring Bigpages
The kernel needs to be told to use the bigpages pool for shared memory
allocations.
The bigpages feature can be enabled for System V shared memory in the
proc file system without reboot
with the following command:
su - root
echo "1" > /proc/sys/kernel/shm-use-bigpages
Alternatively, you can use sysctl(8) to change it:
sysctl -w kernel.shm-use-bigpages=1
To make the change permanent, add the following line to the file
/etc/sysctl.conf. This file is used during the boot process.
echo "kernel.shm-use-bigpages=1" >> /etc/sysctl.conf
Setting kernel.shm-use-bigpages=2 will enable bigpages
for
"shmfs" which
I'm not covering in this article. Setting kernel.shm-use-bigpages=0
will disable the bigpages feature.
The kernel needs to be told how large the bigpage pool should be. If
you use GRUB, add the "bigpages" parameter in
the etc/grub.conf file and set the maximum value of bigpages
as follows.
In this example I will set bigpages
to
2100 MB for the SMP kernel 2.4.9-e.25 that is started on my database
server:
default=1
timeout=10
splashimage=(hd0,1)/boot/grub/splash.xpm.gz
title Red Hat Linux (2.4.9-e.25enterprise)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25enterprise ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25enterprise.img
title Red Hat Linux Advanced Server (2.4.9-e.25smp)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25smp ro root=/dev/hda2 hdc=ide-scsi bigpages=2100MB
initrd /boot/initrd-2.4.9-e.25smp.img
title Red Hat Linux Advanced Server-up (2.4.9-e.25)
root (hd0,1)
kernel /boot/vmlinuz-2.4.9-e.25 ro root=/dev/hda2 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25.img
After this change the system needs to be rebooted:
su - root
shutdown -r now
After a system reboot, the "MemFree" value (free system memory) in the /proc/meminfo
is
subtracted by 2100 MB in this example. The 2100 MB show now up in the
"BigPagesFree" which means that 2100 MB
are now in a separate allocation area:
grep MemTotal /proc/meminfo
grep BigPagesFree /proc/meminfo
Note that if you configure "bigpages" in the etc/grub.conf
file and reboot the system, "BigPagesFree" in /proc/meminfo
will be 0 KB if "HighTotal" in /proc/meminfo is 0 KB and if
/proc/sys/kernel/shm-use-bigpages is set to "1".
Implement
Advanced memory management techniques
Increasing Usable Address
Space for Oracle on 32-bit Linux
To increase this size, Oracle needs to be
relinked with a lower SGA base and Linux needs to have the mapped base
lowered for processes running Oracle. Increasing the address space
allows for more database buffers or a larger indirect data buffer
window to be used.
There are changes that need to be made to the Oracle binary and the
Linux environment (requiring root access), so the appropriate
privileges are needed.
Currently, a solution exists only when running Oracle 9iR2 on Red Hat
2.1 Advanced Server. Red Hat provides an adjustable parameter in the
/proc filesystem to allow more useable address space in processes.
First, the SGA base address that Oracle uses must be lowered by
relinking Oracle. Currently, Oracle ships with this base address set at
0x50000000 so that it is compatible with the defaults set by most
distributions of Linux. Lowering this address allows Oracle to use more
of the address space in the process, but it is important to note that
the newly relinked Oracle binary will no longer work unless a
corresponding modification is also made to Linux (Red Hat 2.1AS
provides a way to do this at runtime).
Follow these steps to complete the first part of the solution:
1. Shutdown all instances of Oracle
2. cd $ORACLE_HOME/lib
3. cp -a libserver9.a libserver9.a.org (to make a backup copy)
4. cd $ORACLE_HOME/bin
5. cp -a oracle oracle.org (to make a backup copy)
6. cd $ORACLE_HOME/rdbms/lib
7. genksms -s 0x15000000 >ksms.s (lower SGA base to 0x15000000)
8. make -f ins_rdbms.mk ksms.o (compile in new SGA base address)
9. make -f ins_rdbms.mk ioracle (relink)
The relinked Oracle binary now has a lower SGA base and is now able to
use about 2.65GB of address space if Linux is also modified to support
this. Next, the Linux kernel's mapped base needs to be lowered below
Oracle's new SGA base. Red Hat 2.1AS has a parameter in /proc that
lowers the kernel's mapped base for each process. This parameter is not
a system-wide parameter. It is a perprocess parameter, but it is
inherited by child processes. This parameter can only be modified by
root. The following steps document how to lower the mapped base for a
single bash terminal session. The default mapped base is 0x40000000.
Once this session has been modified with the lower mapped base, this
session (terminal window) will need to be used for all Oracle commands
so that Oracle processes use the inherited (lower) mapped base:
1. Shutdown the instance of Oracle.
2. Open a terminal session (Oracle session), and get the process id
using "echo $$”.
3. Open a second terminal session and su to root (root session).
4. Now, from the root session,
echo 268435456 >/proc/<pid>/mapped_base,
where <pid> is the process id determined in step 2. This lowers
the mapped base for the Oracle session to 0x10000000.
5. Again, from the root session,
echo 3000000000 >/proc/sys/kernel/shmmax
Ths increases the value of shmmax so that Oracle will allocate the SGA
in one segment.
6. From the Oracle terminal session, startup the Oracle instance.
The SGA now begins at a lower address, so more of the address space can
be used by Oracle. Now you can increase the init.ora values of
db_cache_size or db_block_buffers to increase the size of the database
buffer cache. You can also write a small program that uses setuid() to
set the /proc/<pid>/mapped_base. It would look something like
this:
int main(int argc, char *argv[]) {
pid_t ppid;
char buf[256];
unsigned long mapped_base;
int ret;
#define NEW_MAPPED_BASE 0x10000000
ppid = getppid();
mapped_base = NEW_MAPPED_BASE;
sprintf(buf, "echo %lu >/proc/%u/mapped_base", mapped_base, ppid);
setuid(0);
ret = system(buf);
if (ret == 0)
printf("Lowering mapped base of pid=%u to 0x%X\n", ppid,
mapped_base);
else
printf("unable to lower mapped base. You might need to:\n" "
chmod 4711 lowermap\n chown root.root lowermap\n");
exit(-ret);
}
If you are running with the init.ora parameter
'use_indirect_data_buffers=true' and already have a large buffer cache,
you can use the above solution to increase the indirect buffer window
size. The default is 512MB and should be fine for most applications.
Increasing the window size may increase performance slightly under
certain conditions because a larger indirect window reduces the
overhead of mapping an indirect buffer into Oracle's address space.
To increase the indirect window size, set the environment variable
VLM_WINDOW_SIZE to the window size in bytes before starting up the
Oracle instance. For example: export VLM_WINDOW_SIZE=1073741824 to set
the indirect window size to 1GB. Any value set should be a multiple of
64KB.
Notes:
1. Increasing the buffer cache size (or the indirect window size) too
high can cause Oracle attach errors while starting up.
2. If you try to use an Oracle binary that has a lower SGA base but did
lower the /proc/<pid>/mapped_base value, you will experience
unpredictable results ranging from ORA-3113 errors, attach errors, etc
while starting up.
3. If you don't increase the shmmax value, you could get attach errors
while starting up.
4. The address space is limited. So if you lower the SGA base and
consume most of the address space with a larger SGA, there will be less
room available for PGA memory. If your application uses a lot of PGA
memory, you could get ORA-4030 errors (out of process memory). In this
case, setting the SGA base to a higher value (and lowering the SGA
size) will reserve more space for PGA memory.
5. If you lower the SGA base and your SGA size is below around 800MB,
you may get attach errors. Lowering the SGA base is mainly a way to
allocate a large SGA area. Sizes below 800MB should work without having
to lower the SGA base.
6. It doesn't always help to increase VLM_WINDOW_SIZE. Also, keep in
mind that increasing VLM_WINDOW_SIZE reduces the amount of SGA that can
be allocated for other memory areas that might be needed (e.g. locks on
RAC). It is best to raise this value as the very last step. This value
could be increased once you know how much available address space is
left after adjusting init.ora parameters.
7. If you get attach errors while starting up, you will probably need
to clean up the shared memory segments by running 'ipcs' and then
removing segments via 'ipcrm shm XXX' or 'ipcrm sem XXX'.
Page
Address Extensions
In order to get above 4GB virtual memory on IA-32 architecture a
technique known as PAE (Page Address Extensions) is used. It is a
method that translates 32-bit linear addresses to 36-bit physical
addresses. In the linux kernel, the support is provided through a
compile time option that produces two separate kernels - the SMP kernel
which supports only upto 4GB VM and the enterprise kernel which can go
up to 64GB VM (also called VLM capable). This means applications like
oracle can make use of the large memory and scale up to a large number
of users without loss of performance or reliability.
Shared
memory file-system (shmfs) support
It is a memory-based file system optimized for shared memory operations
and for larger SGA size.
The shmfs (/dev/shm based) is used by oracle to memory map the dynamic
portions of the SGA. This can theoretically allow an SGA up to the size
of the shmfs file system that is created. Since shmfs is a memory file
system, its size can be as high as the maximum allowable VM size which
is 64GB.
1. Mount the shmfs file system as root using command:
mount -t shm shmfs -o nr_blocks=8388608
/dev/shm
2. Set the shmmax parameter to half of RAM size
echo 3000000000
>/proc/sys/kernel/shmmax
3. Set the init.ora parameter use_indirect_data_buffers=true.
4. Startup oracle.
Bigpages
feature
Page frame of size 4MB as opposed to the regular 4KB. Oracle uses a
large contiguous area in the VM for mapping the VLM window. These are
used for the dynamic part of the SGA the size of which is specified by
the db_block_buffers parameter. The pages corresponding to this area in
the VM can easily be of a larger size than the default 4KB and yet
there would not be any of the problems like granularity associated with
using large page size. A page size of 4MB for these pages would reduce
the number of pte-s thus reducing the kernel overhead considerably. The
number of TLBs used are also fewer thus reducing TLB thrashing. The
result is better scalability in terms of the number of oracle users.
Better performance is also achieved because the big pages are not
swapped out which means the entire db_block_buffers are in physical
memory. The system performance increases as a result of kswapd not
having to ‘think’ about swapping out these pages. Since swap space is
not pre-allocated for these pages, there is more swap area available
and less pagecache complexity.
Use the following steps to set Bigpages feature:
1. Calculate bigpages value for your system with the following formula:
HighTotal – Max memory required by user connections in KB
Bigpages =
-------------------------------------------------------------- * 0.8 MB
1024
where
• HighTotal is value in Kbytes and obtained from /proc/meminfo.
• Assuming that 20 % memory is reserved for kernel bookkeeping.
For example, assume that a machine with 8 GB memory and HighTotal of
7208944 KB, is estimated to have 2000 concurrent users, each occupying
a memory of 3KB.
Now,
7208944 - 2000*30
Bigpages = ---------------------------------- * 0.8 MB = 5585 MB
1024
There is a trade-off between the number of users and the bigpages value
because, if the value for bigpages is set to a very high value, the
memory available for user connections would be low. Hence, always
estimate a high value for the maximum number of user connections and
the memory that each will consume.
2. In the kernel boot options, add the following line to the boot
loader file (e.g. /etc/lilo.conf):
bigpages=<size>MB
where size is a value in MB calculated in the previous step.
3. Set the /proc/sys/kernel/shm-use-bigpages file to contain the value
2. The other possible values are 0 for no bigpages and 1 for bigpages
using sysV shared memory (as opposed to shmfs).