Conference PaperPDF Available

Virtual Memory Subsystem in Linux Kernel 2.6 and Challenges



Content may be subject to copyright.
Virtual Memory Subsystem in Linux Kernel 2.6
and Challenges
A.S.Sumant ,P.M.Chawan
#Computer Engineering Department , Veermata Jijabai Technological Institute
Matunga ,Mumbai 400019
Matunga ,Mumbai 400019,India
AbstractThe 2.6 Linux kernel employs a number of
techniques to improve the use of large amounts of
memory, making Linux more enterprise-ready than
ever before. This article outlines a few of the more
important changes, including reverse mapping, the
use of larger memory pages, storage of page-table
entries in high memory, and greater stability of the
memory manager.
The memory management subsystem is one of the
most important parts of the operating system. Since
the early days of computing, there has been a need
for more memory than exists physically in a
system. Strategies have been developed to
overcome this limitation and the most successful of
these is virtual memory. Virtual memory makes the
system appear to have more memory than it
actually has by sharing it between competing
processes as they need it. Linux supports virtual
memory, that is, using a disk as an extension of
RAM so that the effective size of usable memory
grows correspondingly. The kernel will write the
contents of a currently unused block of memory to
the hard disk so that the memory can be used for
another purpose. When the original contents are
needed again, they are read back into memory. This
is all made completely transparent to the user;
programs running under Linux only see the larger
amount of memory available and don't notice that
parts of them reside on the disk from time to time.
As the Linux kernel has grown and matured,
more users are looking to Linux for running very
large systems that handle scientific analysis
applications or even enormous databases. These
enterprise-class applications often demand large
amounts of memory in order to perform well. The
2.4 Linux kernel had facilities to understand fairly
large amounts of memory, but many changes were
made to the 2.5 kernel to make it able to handle
larger amounts of memory in a more efficient
Reverse mappings
In the Linux memory manager, page tables
keep track of the physical pages of memory that are
used by a process, and they map the virtual pages
to the physical pages. Some of these pages might
not be used for long periods of time, making them
good candidates for swapping out. However, before
they can be swapped out, every single process
mapping that page must be found so that the page-
table entry for the page in that process can be
updated. In the Linux 2.4 kernel, this can be a
daunting task as the page tables for every process
must be traversed in order to determine whether or
not the page is mapped by that process. As the
number of processes running on the system grows,
so does the work involved in swapping out one of
these pages.
Reverse mapping, or RMAP, was
implemented in the 2.5 kernel to solve this
problem. Reverse mapping provides a mechanism
for discovering which processes are using a given
physical page of memory. Instead of traversing the
page tables for every process, the memory manager
now has, for each physical page, a linked list
containing pointers to the page-table entries (PTEs)
of every process currently mapping that page. This
linked list is called a PTE chain. The PTE chain
greatly increases the speed of finding those
processes that are mapping a page, as shown in
Figure 1.
Figure 1. Reverse-mapping in 2.6
Nothing is free, of course: the performance gains
obtained by using reverse mappings come at a
price. The most notable and obvious cost of reverse
mapping is that it incurs some memory overhead.
Some memory has to be used to keep track of all
those reverse mappings. Each entry in the PTE
chain uses 4 bytes to store a pointer to the page-
table entry and an additional 4 bytes to store the
pointer to the next entry on the chain. This memory
must also come from low memory, which on 32-bit
hardware is somewhat limited. Sometimes this can
be optimized down to a single entry instead of
using a linked list. This method is called the page-
direct approach. If there is only a single mapping
to the page, then a single pointer called "direct" can
be used instead of a linked list. It is only possible to
use this optimization if that page is mapped by only
one process. If the page is later mapped by another
process, the page will have to be converted to a
PTE chain. A flag is set to tell the memory
manager when this optimization is in effect for a
given page.
There are also a few other complexities brought
about by reverse mappings. Whenever pages are
mapped by a process, reverse mappings must be
established for all of those pages. Likewise, when a
process unmaps pages, the corresponding reverse
mappings must also be removed. This is especially
common at exit time. All of these operations must
be performed under locks. For applications that
perform a lot of forks and exits, this can be very
expensive and add a lot of overhead.
Despite a few tradeoffs, reverse mappings
have proven to be a valuable modification to the
Linux memory manager. A serious bottleneck with
locating processes that map a page is minimized to
a simple operation using this approach. Reverse
mappings help the system continue to perform and
scale well when large applications are placing huge
memory demands on the kernel and multiple
processes are sharing memory. There are also more
enhancements for reverse mapping currently being
researched for possible inclusion in future versions
of the Linux kernel.
Large pages
Typically, the memory manager deals with
memory in 4 KB pages on x86 systems. The actual
page size is architecture dependent. For most uses,
pages of this size are the most efficient way for the
memory manager to deal with memory. Some
applications, however, make use of extremely large
amounts of memory. Large databases are a
common example of this. For every page mapped
by each process, page-table entries must also be
created to map the virtual address to the physical
address. If you have a process that maps 1 GB of
memory with 4 KB pages, it would take 262,144
page-table entries to keep track of those pages. If
each page-table entry consumes 8 bytes, then that
would be 2 MB of overhead for every 1 GB of
memory mapped. This is quite a bit of overhead by
itself, but the problem becomes even worse if you
have multiple processes sharing that memory. In
such a situation, every process mapping that same
1 GB of memory would consume its own 2 MB
worth of page-table entries. With enough
processes, the memory wasted on overhead might
exceed the amount of memory the application
requested for use.
One way to help alleviate this situation is to use a
larger page size. Most modern processors support
at least a small and a large page size, and some
support even more than that. On x86, the size of a
large page is 4 MB, or 2MB on systems with
physical address extension (PAE) turned on.
Assuming a large page size of 4 MB is used in the
same example from above, that same 1 GB of
memory could be mapped with only 256 page-table
entries instead of 262,144. This translates to only
2,048 bytes of overhead instead of 2 MB.
The use of large pages can also improve
performance by reducing the number of translation
lookaside buffer (TLB) misses. The TLB is a sort of
cache for the page tables that allows virtual to
physical address translation to be performed more
quickly for pages that are listed in the table. Of
course, the TLB can only hold a limited number of
translations. Large pages can accommodate more
memory in fewer actual pages, so as more large
pages are used, more memory can be referenced
through the TLB than with smaller page sizes.
Storing page-table entries in high memory
Page-tables can normally be stored only in low
memory on 32-bit machines. This low memory is
limited to the first 896 MB of physical memory and
required for use by most of the rest of the kernel as
well. In a situation where applications use a large
number of processes and map a lot of memory, low
memory can quickly become scarce.
A configuration option, called Highmem PTE in the
2.6 kernel now allows the page-table entries to be
placed in high memory, freeing more of the low
memory area for other kernel data structures that
do have to be placed there. In exchange, the
process of using these page-table entries is
somewhat slower. However, for systems in which a
large number of processes are running, storing page
tables in high memory can be enabled to squeeze
more memory out of the low memory area.
Figure 2. Memory regions
Better stability is another important improvement
of the 2.6 memory manager. When the 2.4 kernel
was released, users started having memory
management-related stability problems almost
immediately. Given the system wide impact of
memory management, stability is of utmost
importance. The problems were mostly resolved,
but the solution entailed essentially gutting the
memory manager and replacing it with a much
simpler rewrite. This left a lot of room for Linux
distributors to improve on the memory manager for
their own particular distribution of Linux. The
other side of those improvements, however, is that
memory management features in 2.4 can be quite
different depending on which distribution is used.
In order to prevent such a situation from happening
again, memory management was one of the most
scrutinized areas of kernel development in 2.6. The
new memory management code has been tested
and optimized on everything from very low end
desktop systems to large, enterprise-class, multi-
processor systems.
The memory management improvements in the
Linux 2.6 kernel go far beyond the features
mentioned in this article. Many of the changes are
subtle but equally important. These changes all
work together to produce a memory manager in the
2.6 kernel designed for better performance,
efficiency, and stability. Some changes, like
Highmem PTE and Large pages, work to reduce
the overhead caused by memory management.
Other changes, like reverse mappings, speed up
performance in certain critical areas. These specific
examples were chosen because they exemplify how
the Linux 2.6 kernel has been tuned and enhanced
to better handle enterprise-class hardware and
The paper "Linux Memory Management on Larger
Machines" by Martin Bligh and David Hansen was
presented at the 2003 Linux Symposium.
Towards an O(1) VM.
[3]Mel.Gorman, Understanding the Linux Virtual
Memory Manager Published By Prentice Hall
ISBN 0-13-145348-3
[5] Large page support
in the Linux kernel and another on the Object-
based reverse-mapping VM,
ry/l-dev26/index.html Article by Paul Larson on
improvements to the process of kernel development
for the 2.6 kernel
get an idea of the speed improvements the 2.6
kernel offers
ResearchGate has not been able to resolve any citations for this publication.
KernelAnalysisHOWTO-7.html [5] Large page support in the Linux kernel and another on the Objectbased reverse-mapping VM
  • Mel
  • Gorman
Mel.Gorman, Understanding the Linux Virtual Memory Manager Published By Prentice Hall ISBN 0-13-145348-3 [4] [5] Large page support in the Linux kernel and another on the Objectbased reverse-mapping VM, [6] ry/l-dev26/index.html Article by Paul Larson on improvements to the process of kernel development for the 2.6 kernel [7] get an idea of the speed improvements the 2.6 kernel offers