|
NAME | DESCRIPTION | SEE ALSO | COLOPHON |
|
|
|
proc_sys_vm(5) File Formats Manual proc_sys_vm(5)
/proc/sys/vm/ - virtual memory subsystem
/proc/sys/vm/
This directory contains files for memory management tuning,
buffer, and cache management.
/proc/sys/vm/admin_reserve_kbytes (since Linux 3.10)
This file defines the amount of free memory (in KiB) on the
system that should be reserved for users with the
capability CAP_SYS_ADMIN.
The default value in this file is the minimum of [3% of
free pages, 8MiB] expressed as KiB. The default is
intended to provide enough for the superuser to log in and
kill a process, if necessary, under the default overcommit
'guess' mode (i.e., 0 in /proc/sys/vm/overcommit_memory).
Systems running in "overcommit never" mode (i.e., 2 in
/proc/sys/vm/overcommit_memory) should increase the value
in this file to account for the full virtual memory size of
the programs used to recover (e.g., login(1) ssh(1), and
top(1)) Otherwise, the superuser may not be able to log in
to recover the system. For example, on x86-64 a suitable
value is 131072 (128MiB reserved).
Changing the value in this file takes effect whenever an
application requests memory.
/proc/sys/vm/compact_memory (since Linux 2.6.35)
When 1 is written to this file, all zones are compacted
such that free memory is available in contiguous blocks
where possible. The effect of this action can be seen by
examining /proc/buddyinfo.
Present only if the kernel was configured with
CONFIG_COMPACTION.
/proc/sys/vm/drop_caches (since Linux 2.6.16)
Writing to this file causes the kernel to drop clean
caches, dentries, and inodes from memory, causing that
memory to become free. This can be useful for memory
management testing and performing reproducible filesystem
benchmarks. Because writing to this file causes the
benefits of caching to be lost, it can degrade overall
system performance.
To free pagecache, use:
echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes, use:
echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries, and inodes, use:
echo 3 > /proc/sys/vm/drop_caches
Because writing to this file is a nondestructive operation
and dirty objects are not freeable, the user should run
sync(1) first.
/proc/sys/vm/hugetlb_shm_group (since Linux 2.6.7)
This writable file contains a group ID that is allowed to
allocate memory using huge pages. If a process has a
filesystem group ID or any supplementary group ID that
matches this group ID, then it can make huge-page
allocations without holding the CAP_IPC_LOCK capability;
see memfd_create(2), mmap(2), and shmget(2).
/proc/sys/vm/legacy_va_layout (since Linux 2.6.9)
If nonzero, this disables the new 32-bit memory-mapping
layout; the kernel will use the legacy (2.4) layout for all
processes.
/proc/sys/vm/memory_failure_early_kill (since Linux 2.6.32)
Control how to kill processes when an uncorrected memory
error (typically a 2-bit error in a memory module) that
cannot be handled by the kernel is detected in the
background by hardware. In some cases (like the page still
having a valid copy on disk), the kernel will handle the
failure transparently without affecting any applications.
But if there is no other up-to-date copy of the data, it
will kill processes to prevent any data corruptions from
propagating.
The file has one of the following values:
1 Kill all processes that have the corrupted-and-not-
reloadable page mapped as soon as the corruption is
detected. Note that this is not supported for a few
types of pages, such as kernel internally allocated
data or the swap cache, but works for the majority
of user pages.
0 Unmap the corrupted page from all processes and kill
a process only if it tries to access the page.
The kill is performed using a SIGBUS signal with si_code
set to BUS_MCEERR_AO. Processes can handle this if they
want to; see sigaction(2) for more details.
This feature is active only on architectures/platforms with
advanced machine check handling and depends on the hardware
capabilities.
Applications can override the memory_failure_early_kill
setting individually with the prctl(2) PR_MCE_KILL
operation.
Present only if the kernel was configured with
CONFIG_MEMORY_FAILURE.
/proc/sys/vm/memory_failure_recovery (since Linux 2.6.32)
Enable memory failure recovery (when supported by the
platform).
1 Attempt recovery.
0 Always panic on a memory failure.
Present only if the kernel was configured with
CONFIG_MEMORY_FAILURE.
/proc/sys/vm/oom_dump_tasks (since Linux 2.6.25)
Enables a system-wide task dump (excluding kernel threads)
to be produced when the kernel performs an OOM-killing.
The dump includes the following information for each task
(thread, process): thread ID, real user ID, thread group ID
(process ID), virtual memory size, resident set size, the
CPU that the task is scheduled on, oom_adj score (see the
description of /proc/pid/oom_adj), and command name. This
is helpful to determine why the OOM-killer was invoked and
to identify the rogue task that caused it.
If this contains the value zero, this information is
suppressed. On very large systems with thousands of tasks,
it may not be feasible to dump the memory state information
for each one. Such systems should not be forced to incur a
performance penalty in OOM situations when the information
may not be desired.
If this is set to nonzero, this information is shown
whenever the OOM-killer actually kills a memory-hogging
task.
The default value is 0.
/proc/sys/vm/oom_kill_allocating_task (since Linux 2.6.24)
This enables or disables killing the OOM-triggering task in
out-of-memory situations.
If this is set to zero, the OOM-killer will scan through
the entire tasklist and select a task based on heuristics
to kill. This normally selects a rogue memory-hogging task
that frees up a large amount of memory when killed.
If this is set to nonzero, the OOM-killer simply kills the
task that triggered the out-of-memory condition. This
avoids a possibly expensive tasklist scan.
If /proc/sys/vm/panic_on_oom is nonzero, it takes
precedence over whatever value is used in
/proc/sys/vm/oom_kill_allocating_task.
The default value is 0.
/proc/sys/vm/overcommit_kbytes (since Linux 3.14)
This writable file provides an alternative to
/proc/sys/vm/overcommit_ratio for controlling the
CommitLimit when /proc/sys/vm/overcommit_memory has the
value 2. It allows the amount of memory overcommitting to
be specified as an absolute value (in kB), rather than as a
percentage, as is done with overcommit_ratio. This allows
for finer-grained control of CommitLimit on systems with
extremely large memory sizes.
Only one of overcommit_kbytes or overcommit_ratio can have
an effect: if overcommit_kbytes has a nonzero value, then
it is used to calculate CommitLimit, otherwise
overcommit_ratio is used. Writing a value to either of
these files causes the value in the other file to be set to
zero.
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting
mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2) with MAP_NORESERVE are not
checked, and the default check is very weak, leading to the
risk of getting a process "OOM-killed".
In mode 1, the kernel pretends there is always enough
memory, until memory actually runs out. One use case for
this mode is scientific computing applications that employ
large sparse arrays. Before Linux 2.6.0, any nonzero value
implies mode 1.
In mode 2 (available since Linux 2.6), the total virtual
address space that can be allocated (CommitLimit in
/proc/meminfo) is calculated as
CommitLimit = (total_RAM - total_huge_TLB) *
overcommit_ratio / 100 + total_swap
where:
• total_RAM is the total amount of RAM on the system;
• total_huge_TLB is the amount of memory set aside for
huge pages;
• overcommit_ratio is the value in
/proc/sys/vm/overcommit_ratio; and
• total_swap is the amount of swap space.
For example, on a system with 16 GB of physical RAM, 16 GB
of swap, no space dedicated to huge pages, and an
overcommit_ratio of 50, this formula yields a CommitLimit
of 24 GB.
Since Linux 3.14, if the value in
/proc/sys/vm/overcommit_kbytes is nonzero, then CommitLimit
is instead calculated as:
CommitLimit = overcommit_kbytes + total_swap
See also the description of
/proc/sys/vm/admin_reserve_kbytes and
/proc/sys/vm/user_reserve_kbytes.
/proc/sys/vm/overcommit_ratio (since Linux 2.6.0)
This writable file defines a percentage by which memory can
be overcommitted. The default value in the file is 50.
See the description of /proc/sys/vm/overcommit_memory.
/proc/sys/vm/panic_on_oom (since Linux 2.6.18)
This enables or disables a kernel panic in an out-of-memory
situation.
If this file is set to the value 0, the kernel's OOM-killer
will kill some rogue process. Usually, the OOM-killer is
able to kill a rogue process and the system will survive.
If this file is set to the value 1, then the kernel
normally panics when out-of-memory happens. However, if a
process limits allocations to certain nodes using memory
policies (mbind(2) MPOL_BIND) or cpusets (cpuset(7)) and
those nodes reach memory exhaustion status, one process may
be killed by the OOM-killer. No panic occurs in this case:
because other nodes' memory may be free, this means the
system as a whole may not have reached an out-of-memory
situation yet.
If this file is set to the value 2, the kernel always
panics when an out-of-memory condition occurs.
The default value is 0. 1 and 2 are for failover of
clustering. Select either according to your policy of
failover.
/proc/sys/vm/swappiness
The value in this file controls how aggressively the kernel
will swap memory pages. Higher values increase
aggressiveness, lower values decrease aggressiveness. The
default value is 60.
/proc/sys/vm/user_reserve_kbytes (since Linux 3.10)
Specifies an amount of memory (in KiB) to reserve for user
processes. This is intended to prevent a user from
starting a single memory hogging process, such that they
cannot recover (kill the hog). The value in this file has
an effect only when /proc/sys/vm/overcommit_memory is set
to 2 ("overcommit never" mode). In this case, the system
reserves an amount of memory that is the minimum of [3% of
current process size, user_reserve_kbytes].
The default value in this file is the minimum of [3% of
free pages, 128MiB] expressed as KiB.
If the value in this file is set to zero, then a user will
be allowed to allocate all free memory with a single
process (minus the amount reserved by
/proc/sys/vm/admin_reserve_kbytes). Any subsequent
attempts to execute a command will result in "fork: Cannot
allocate memory".
Changing the value in this file takes effect whenever an
application requests memory.
/proc/sys/vm/unprivileged_userfaultfd (since Linux 5.2)
This (writable) file exposes a flag that controls whether
unprivileged processes are allowed to employ
userfaultfd(2). If this file has the value 1, then
unprivileged processes may use userfaultfd(2). If this
file has the value 0, then only processes that have the
CAP_SYS_PTRACE capability may employ userfaultfd(2). The
default value in this file is 1.
proc(5), proc_sys(5)
This page is part of the man-pages (Linux kernel and C library
user-space interface documentation) project. Information about
the project can be found at
⟨https://www.kernel.org/doc/man-pages/⟩. If you have a bug report
for this manual page, see
⟨https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/CONTRIBUTING⟩.
This page was obtained from the tarball man-pages-6.15.tar.gz
fetched from
⟨https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/⟩ on
2025-08-11. If you discover any rendering problems in this HTML
version of the page, or you believe there is a better or more up-
to-date source for the page, or you have corrections or
improvements to the information in this COLOPHON (which is not
part of the original manual page), send a mail to
man-pages@man7.org
Linux man-pages 6.15 2025-05-31 proc_sys_vm(5)
Pages that refer to this page: memfd_create(2), mmap(2), PR_MCE_KILL_CLEAR(2const)