This directory is present only if the kernel was configured with CONFIG_SECURITY.
In SELinux, this file is used to get the security context of a process. Prior to Linux 2.6.11, this file could not be used to set the security context (a write was always denied), since SELinux limited process security transitions to execve(2) (see the description of /proc/[pid]/attr/exec, below). Since Linux 2.6.11, SELinux lifted this restriction and began supporting "set" operations via writes to this node if authorized by policy, although use of this operation is only suitable for applications that are trusted to maintain any desired separation between the old and new security contexts. Prior to Linux 2.6.28, SELinux did not allow threads within a multi-threaded process to set their security context via this node as it would yield an inconsistency among the security contexts of the threads sharing the same memory space. Since Linux 2.6.28, SELinux lifted this restriction and began supporting "set" operations for threads within a multithreaded process if the new security context is bounded by the old security context, where the bounded relation is defined in policy and guarantees that the new security context has a subset of the permissions of the old security context. Other security modules may choose to support "set" operations via writes to this node.
In SELinux, this is needed to support role/domain transitions, and execve(2) is the preferred point to make such transitions because it offers better control over the initialization of the process in the new security label and the inheritance of state. In SELinux, this attribute is reset on execve(2) so that the new program reverts to the default behavior for any execve(2) calls that it may make. In SELinux, a process can set only its own /proc/[pid]/attr/exec attribute.
SELinux employs this file to support creation of a file (using the aforementioned system calls) in a secure state, so that there is no risk of inappropriate access being obtained between the time of creation and the time that attributes are set. In SELinux, this attribute is reset on execve(2), so that the new program reverts to the default behavior for any file creation calls it may make, but the attribute will persist across multiple file creation calls within a program unless it is explicitly reset. In SELinux, a process can set only its own /proc/[pid]/attr/fscreate attribute.
This is a write-only file, writable only by owner of the process.
The following values may be written to the file:
A further value can be written to affect a different bit:
The /proc/[pid]/clear_refs file is present only if the CONFIG_PROC_PAGE_MONITOR kernel configuration option is enabled.
$ cd /proc/20/cwd; /bin/pwd
Note that the pwd command is often a shell built-in, and might not work properly. In bash(1), you may use pwd -P.
In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3)).
$ strings /proc/1/environ
Under Linux 2.0 and earlier, /proc/[pid]/exe is a pointer to the binary which was executed, and appears as a symbolic link. A readlink(2) call on this file under Linux 2.0 returns a string in the format:
For example, :1502 would be inode 1502 on device major 03 (IDE, MFM, etc. drives) minor 01 (first partition on the first drive).
find(1) with the -inum option can be used to locate the file.
For file descriptors for pipes and sockets, the entries will be symbolic links whose content is the file type with the inode. A readlink(2) call on this file returns a string in the format:
For example, socket: will be a socket and its inode is 2248868. For sockets, that inode can be used to find more information in one of the files under /proc/net/.
For file descriptors that have no corresponding inode (e.g., file descriptors produced by epoll_create(2), eventfd(2), inotify_init(2), signalfd(2), and timerfd(2)), the entry will be a symbolic link with contents of the form
In some cases, the file-type is surrounded by square brackets.
For example, an epoll file descriptor will have a symbolic link whose content is the string anon_inode:[eventpoll].
In a multithreaded process, the contents of this directory are not available if the main thread has already terminated (typically by calling pthread_exit(3)).
Programs that will take a filename as a command-line argument, but will not take input from standard input if no argument is supplied, or that write to a file named as a command-line argument, but will not send their output to standard output if no argument is supplied, can nevertheless be made to use standard input or standard out using /proc/[pid]/fd. For example, assuming that -i is the flag designating an input file and -o is the flag designating an output file:
$ foobar -i /proc/self/fd/0 -o /proc/self/fd/1 ...
and you have a working filter.
/proc/self/fd/N is approximately the same as /dev/fd/N in some UNIX and UNIX-like systems. Most Linux MAKEDEV scripts symbolically link /dev/fd to /proc/self/fd, in fact.
Most systems provide symbolic links /dev/stdin, /dev/stdout, and /dev/stderr, which respectively link to the files 0, 1, and 2 in /proc/self/fd. Thus the example command above could be written as:
$ foobar -i /dev/stdin -o /dev/stdout ...
For regular files and directories, we see something like:
$ cat /proc/12015/fdinfo/4 pos: 1000 flags: 01002002 mnt_id: 21
The pos field is a decimal number showing the current file offset. The flags field is an octal number that displays the file access mode and file status flags (see open(2)). The mnt_id field, present since Linux 3.15, is the ID of the mount point containing this file. See the description of /proc/[pid]/mountinfo.
For eventfd file descriptors (see eventfd(2)), we see the following fields:
pos: 0 flags: 02 mnt_id: 10 eventfd-count: 40
eventfd-count is the current value of the eventfd counter, in hexadecimal.
For epoll file descriptors (see epoll(7)), we see the following fields:
pos: 0 flags: 02 mnt_id: 10 tfd: 9 events: 19 data: 74253d2500000009 tfd: 7 events: 19 data: 74253d2500000007
Each of the lines beginning tfd describes one of the file descriptors being monitored via the epoll file descriptor (see epoll_ctl(2) for some details). The tfd field is the number of the file descriptor. The events field is a hexadecimal mask of the events being monitored for this file descriptor. The data field is the data value associated with this file descriptor.
For signalfd file descriptors (see signalfd(2)), we see the following fields:
pos: 0 flags: 02 mnt_id: 10 sigmask: 0000000000000006
sigmask is the hexadecimal mask of signals that are accepted via this signalfd file descriptor. (In this example, bits 2 and 3 are set, corresponding to the signals SIGINT and SIGQUIT; see signal(7).)
# cat /proc/3828/io rchar: 323934931 wchar: 323929600 syscr: 632687 syscw: 632675 read_bytes: 0 write_bytes: 323932160 cancelled_write_bytes: 0
The fields are as follows:
# ls -l /proc/self/map_files/ lr--------. 1 root root 64 Apr 16 21:31 3252e00000-3252e20000 -> /usr/lib64/ld-2.15.so ...
Although these entries are present for memory regions that were mapped with the MAP_FILE flag, the way anonymous shared memory (regions created with the MAP_ANON | MAP_SHARED flags) is implemented in Linux means that such regions also appear on this directory. Here is an example where the target file is the deleted /dev/zero one:
lrw-------. 1 root root 64 Apr 16 21:33 7fc075d2f000-7fc075e6f000 -> /dev/zero (deleted)
This directory appears only if the CONFIG_CHECKPOINT_RESTORE kernel configuration option is enabled. Privilege (CAP_SYS_ADMIN) is required to view the contents of this directory.
The format of the file is:
address perms offset dev inode pathname 00400000-00452000 r-xp 00000000 08:02 173521 /usr/bin/dbus-daemon 00651000-00652000 r--p 00051000 08:02 173521 /usr/bin/dbus-daemon 00652000-00655000 rw-p 00052000 08:02 173521 /usr/bin/dbus-daemon 00e03000-00e24000 rw-p 00000000 00:00 0 [heap] 00e24000-011f7000 rw-p 00000000 00:00 0 [heap] ... 35b1800000-35b1820000 r-xp 00000000 08:02 135522 /usr/lib64/ld-2.15.so 35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522 /usr/lib64/ld-2.15.so 35b1a20000-35b1a21000 rw-p 00020000 08:02 135522 /usr/lib64/ld-2.15.so 35b1a21000-35b1a22000 rw-p 00000000 00:00 0 35b1c00000-35b1dac000 r-xp 00000000 08:02 135870 /usr/lib64/libc-2.15.so 35b1dac000-35b1fac000 ---p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so 35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so 35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870 /usr/lib64/libc-2.15.so ... f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0 [stack:986] ... 7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0 [stack] 7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0 [vdso]
The address field is the address space in the process that the mapping occupies. The perms field is a set of permissions:
r = read w = write x = execute s = shared p = private (copy on write)
The offset field is the offset into the file/whatever; dev is the device (major:minor); inode is the inode on that device. 0 indicates that no inode is associated with the memory region, as would be the case with BSS (uninitialized data).
The pathname field will usually be the file that is backing the mapping. For ELF files, you can easily coordinate with the offset field by looking at the Offset field in the ELF program headers (readelf -l).
There are additional helpful pseudo-paths:
Under Linux 2.0, there is no field giving pathname.
36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue (1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11)
For more information on mount propagation see: Documentation/filesystems/sharedsubtree.txt in the Linux kernel source tree.
device /dev/sda7 mounted on /home with fstype ext3 [statistics] ( 1 ) ( 2 ) (3 ) (4)
See namespaces(7) for more information.
The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from, based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500.
There is an additional factor included in the badness score: root processes are given 3% extra memory over other tasks.
The amount of "allowed" memory depends on the context in which the OOM-killer was called. If it is due to the memory assigned to the allocating task's cpuset being exhausted, the allowed memory represents the set of mems assigned to that cpuset (see cpuset(7)). If it is due to a mempolicy's node(s) being exhausted, the allowed memory represents the set of mempolicy nodes. If it is due to a memory limit (or swap limit) being reached, the allowed memory is that configured limit. Finally, if it is due to the entire system being out of memory, the allowed memory represents all allocatable resources.
The value of oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX). This allows user space to control the preference for OOM-killing, ranging from always preferring a certain task or completely disabling it from OOM killing. The lowest possible value, -1000, is equivalent to disabling OOM-killing entirely for that task, since it will always report a badness score of 0.
Consequently, it is very simple for user space to define the amount of memory to consider for each task. Setting a oom_score_adj value of +500, for example, is roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use at least 50% more memory. A value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task's allowed memory from being considered as scoring against the task.
For backward compatibility with previous kernels, /proc/[pid]/oom_adj can still be used to tune the badness score. Its value is scaled linearly with oom_score_adj.
Writing to /proc/[pid]/oom_score_adj or /proc/[pid]/oom_adj will change the other with its scaled value.
In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3)).
00400000-0048a000 r-xp 00000000 fd:03 960637 /bin/bash Size: 552 kB Rss: 460 kB Pss: 100 kB Shared_Clean: 452 kB Shared_Dirty: 0 kB Private_Clean: 8 kB Private_Dirty: 0 kB Referenced: 460 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kBThe first of these lines shows the same information as is displayed for the mapping in /proc/[pid]/maps. The remaining lines show the size of the mapping, the amount of the mapping that is currently resident in RAM ("Rss"), the process' proportional share of this mapping ("Pss"), the number of clean and dirty shared pages in the mapping, and the number of clean and dirty private pages in the mapping. "Referenced" indicates the amount of memory currently marked as referenced or accessed. "Anonymous" shows the amount of memory that does not belong to any file. "Swap" shows how much would-be-anonymous memory is also used, but out on swap.
The "KernelPageSize" entry is the page size used by the kernel to back a VMA. This matches the size used by the MMU in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64K as a base page size may still use 4K pages for the MMU on older processors. To distinguish, this patch reports "MMUPageSize" as the page size used by the MMU.
The "Locked" indicates whether the mapping is locked in memory or not.
"VmFlags" field represents the kernel flags associated with the particular virtual memory area in two letter encoded manner. The codes are the following:
rd - readable
wr - writable
ex - executable
sh - shared
mr - may read
mw - may write
me - may execute
ms - may share
gd - stack segment grows down
pf - pure PFN range
dw - disabled write to the mapped file
lo - pages are locked in memory
io - memory mapped I/O area
sr - sequential read advise provided
rr - random read advise provided
dc - do not copy area on fork
de - do not expand area on remapping
ac - area is accountable
nr - swap space is not reserved for the area
ht - area uses huge tlb pages
nl - non-linear mapping
ar - architecture specific flag
dd - do not include area into core dump
sd - soft-dirty flag
mm - mixed map area
hg - huge page advise flag
nh - no-huge page advise flag
mg - mergeable advise flag
The /proc/[pid]/smaps file is present only if the CONFIG_PROC_PAGE_MONITOR kernel configuration option is enabled.
The fields, in order, with their proper scanf(3) format specifiers, are:
The format for this field was %lu before Linux 2.6.
Before Linux 2.6, this was a scaled value based on the scheduler weighting given to this process.
The format for this field was %lu before Linux 2.6.
The format for this field was %lu before Linux 2.6.22.
size (1) total program size (same as VmSize in /proc/[pid]/status) resident (2) resident set size (same as VmRSS in /proc/[pid]/status) share (3) shared pages (i.e., backed by a file) text (4) text (code) lib (5) library (unused in Linux 2.6) data (6) data + stack dt (7) dirty pages (unused in Linux 2.6)
$ cat /proc/$$/status Name: bash State: S (sleeping) Tgid: 3515 Pid: 3515 PPid: 3452 TracerPid: 0 Uid: 1000 1000 1000 1000 Gid: 100 100 100 100 FDSize: 256 Groups: 16 33 100 VmPeak: 9136 kB VmSize: 7896 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 7572 kB VmRSS: 6316 kB VmData: 5224 kB VmStk: 88 kB VmExe: 572 kB VmLib: 1708 kB VmPMD: 4 kB VmPTE: 20 kB VmSwap: 0 kB Threads: 1 SigQ: 0/3067 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000010000 SigIgn: 0000000000384004 SigCgt: 000000004b813efb CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffffffffffff CapAmb: 0000000000000000 Seccomp: 0 Cpus_allowed: 00000001 Cpus_allowed_list: 0 Mems_allowed: 1 Mems_allowed_list: 0 voluntary_ctxt_switches: 150 nonvoluntary_ctxt_switches: 545
If the process is blocked, but not in a system call, then the file displays -1 in place of the system call number, followed by just the values of the stack pointer and program counter. If process is not blocked, then the file contains just the string "running".
This file is present only if the kernel was configured with CONFIG_HAVE_ARCH_TRACEHOOK.
In a multithreaded process, the contents of the /proc/[pid]/task directory are not available if the main thread has already terminated (typically by calling pthread_exit(3)).
ID: 1 signal: 60/00007fff86e452a8 notify: signal/pid.2634 ClockID: 0 ID: 0 signal: 60/00007fff86e452a8 notify: signal/pid.2634 ClockID: 1
The lines shown for each timer have the following meanings:
(2^order) * PAGE_SIZE
The binary buddy allocator algorithm inside the kernel will split one chunk into two chunks of a smaller order (thus with half the size) or combine two contiguous chunks into one larger chunk of a higher order (thus with double the size) to satisfy allocation requests and to counter memory fragmentation. The order matches the column number, when starting to count at zero.
For example on a x86_64 system:
Node 0, zone DMA 1 1 1 0 2 1 1 0 1 1 3 Node 0, zone DMA32 65 47 4 81 52 28 13 10 5 1 404 Node 0, zone Normal 216 55 189 101 84 38 37 27 5 3 587
In this example, there is one node containing three zones and there are 11 different chunk sizes. If the page size is 4 kilobytes, then the first zone called DMA (on x86 the first 16 megabyte of memory) has 1 chunk of 4 kilobytes (order 0) available and has 3 chunks of 4 megabytes (order 10) available.
If the memory is heavily fragmented, the counters for higher order chunks will be zero and allocation of large contiguous areas will fail.
Further information about the zones can be found in /proc/zoneinfo.
cat /lib/modules/$(uname -r)/build/.config
Incidentally, this file may be used by mount(8) when no filesystem is specified and it didn't manage to determine the filesystem type. Then filesystems contained in this file are tried (excepted those that are marked with "nodev").
cache buffer size in KB capacity number of sectors driver driver version geometry physical and logical geometry identify in hexadecimal media media type model manufacturer's model number settings drive settings smart_thresholds in hexadecimal smart_values in hexadecimal
The hdparm(8) utility provides access to this information in a friendly format.
The total length of the file is the size of physical memory (RAM) plus 4KB.
Information in this file is retrieved with the dmesg(1) program.
0 - KPF_LOCKED
1 - KPF_ERROR
2 - KPF_REFERENCED
3 - KPF_UPTODATE
4 - KPF_DIRTY
5 - KPF_LRU
6 - KPF_ACTIVE
7 - KPF_SLAB
8 - KPF_WRITEBACK
9 - KPF_RECLAIM
10 - KPF_BUDDY
11 - KPF_MMAP (since Linux 2.6.31)
12 - KPF_ANON (since Linux 2.6.31)
13 - KPF_SWAPCACHE (since Linux 2.6.31)
14 - KPF_SWAPBACKED (since Linux 2.6.31)
15 - KPF_COMPOUND_HEAD (since Linux 2.6.31)
16 - KPF_COMPOUND_TAIL (since Linux 2.6.31)
16 - KPF_HUGE (since Linux 2.6.31)
18 - KPF_UNEVICTABLE (since Linux 2.6.31)
19 - KPF_HWPOISON (since Linux 2.6.31)
20 - KPF_NOPAGE (since Linux 2.6.31)
21 - KPF_KSM (since Linux 2.6.32)
22 - KPF_THP (since Linux 3.4)
For further details on the meanings of these bits, see the kernel source file Documentation/vm/pagemap.txt. Before kernel 2.6.29, KPF_WRITEBACK, KPF_RECLAIM, KPF_BUDDY, and KPF_LOCKED did not report correctly.
This 1GB is memory which has been "committed" to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 in IR /proc/sys/vm/overcommit_memory ), allocations which would exceed the CommitLimit will not be permitted. This is useful if one needs to guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated.
IP address HW type Flags HW address Mask Device 192.168.0.50 0x1 0x2 00:50:BF:25:68:F3 * eth0 192.168.0.250 0x1 0xc 00:00:00:00:00:00 * eth0
Here "IP address" is the IPv4 address of the machine and the "HW type" is the hardware type of the address from RFC 826. The flags are the internal flags of the ARP structure (as defined in /usr/include/linux/if_arp.h) and the "HW address" is the data link layer mapping for that IP address if it is known.
Inter-| Receive | Transmit face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed lo: 2776770 11307 0 0 0 0 0 0 2776770 11307 0 0 0 0 0 0 eth0: 1215645 2751 0 0 0 0 0 0 1782404 4324 0 0 0 427 0 0 ppp0: 1622270 5552 1 0 0 0 0 0 354130 5669 0 0 0 0 0 0 tap0: 7714 81 0 0 0 0 0 0 7714 81 0 0 0 0 0 0
indx interface_name dmi_u dmi_g dmi_address 2 eth0 1 0 01005e000001 3 eth1 1 0 01005e000001 4 eth2 1 0 01005e000001
sl local_address rem_address st tx_queue rx_queue tr rexmits tm->when uid 1: 01642C89:0201 0C642C89:03FF 01 00000000:00000001 01:000071BA 00000000 0 1: 00000000:0801 00000000:0000 0A 00000000:00000000 00:00000000 6F000100 0 1: 00000000:0201 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0
Num RefCount Protocol Flags Type St Path 0: 00000002 00000000 00000000 0001 03 1: 00000001 00000000 00010000 0001 01 /dev/printer
The fields are as follows:
1 4207 0 2 65535 0 0 0 1 (1) (2) (3)(4) (5) (6) (7) (8)
This file has been deprecated in favor of a new /proc interface for PCI (/proc/bus/pci). It became optional in Linux 2.2 (available with CONFIG_PCI_OLD_PROC set at kernel compilation). It became once more nonoptionally enabled in Linux 2.4. Next, it was deprecated in Linux 2.6 (still available with CONFIG_PCI_LEGACY_PROC set), and finally removed altogether since Linux 2.6.17.
You can also write to some of the files to reconfigure the subsystem or switch certain features on or off.
echo 'scsi add-single-device 1 0 5 0' > /proc/scsi/scsiwill cause host scsi1 to scan on SCSI channel 0 for a device on ID 5 LUN 0. If there is already a device known on this address or the address is invalid, an error will be returned.
Reading these files will usually show driver and host configuration, statistics, and so on.
Writing to these files allows different things on different hosts. For example, with the latency and nolatency commands, root can switch on and off command latency measurement code in the eata_dma driver. With the lockup and unlock commands, root can control bus lockups simulated by the scsi_debug driver.
cache-name num-active-objs total-objs object-size num-active-slabs total-slabs num-pages-per-slab
See slabinfo(5) for details.
String values may be terminated by either '\0' or '\n'.
Integer and long values may be written either in decimal or in hexadecimal notation (e.g. 0x3FFF). When writing multiple integer or long values, these may be separated by any of the following whitespace characters: ' ', '\t', or '\n'. Using other separators leads to the error EINVAL.
echo 100000 > /proc/sys/fs/file-max
Privileged processes (CAP_SYS_ADMIN) can override the file-max limit.
Starting with Linux 2.4, there is no longer a static limit on the number of inodes, and this file is removed.
nr_inodes is the number of inodes the system has allocated. nr_free_inodes represents the number of free inodes.
preshrink is nonzero when the nr_inodes > inode-max and the system needs to prune the inode list instead of allocating more; since Linux 2.4, this field is a dummy value (always zero).
Since Linux 3.19, the content of this file has no effect (because msgmni defaults to near the maximum value possible), and reads from this file always return the value "0".
# echo 'darkstar' > /proc/sys/kernel/hostname # echo 'mydomain' > /proc/sys/kernel/domainname
has the same effect as
# hostname 'darkstar' # domainname 'mydomain'
Note, however, that the classic darkstar.frop.org has the hostname "darkstar" and DNS (Internet Domain Name Server) domainname "frop.org", not to be confused with the NIS (Network Information Service) or YP (Yellow Pages) domainname. These two domain names are in general different. For a detailed discussion see the hostname(1) man page.
#5 Wed Feb 25 21:49:24 MET 1998
The "#5" means that this is the fifth kernel built from this source base and the date following it indicates the time the kernel was built.
Since Linux 4.1, the value that can be written to threads-max is bounded. The minimum value that can be written is 20. The maximum value that can be written is given by the constant FUTEX_TID_MASK (0x3fffffff). If a value outside of this range is written to threads-max, the error EINVAL occurs.
The value written is checked against the available RAM pages. If the thread structures would occupy too much (more than 1/8th) of the available RAM pages, threads-max is reduced accordingly.
To free pagecache, use:
echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes, use:
echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries and inodes, use:
echo 3 > /proc/sys/vm/drop_caches
Because writing to this file is a nondestructive operation and dirty objects are not freeable, the user should run sync(1) first.
The file has one of the following values:
This feature is active only on architectures/platforms with advanced machine check handling and depends on the hardware capabilities.
Applications can override the memory_failure_early_kill setting individually with the prctl(2) PR_MCE_KILL operation.
If this contains the value zero, this information is suppressed. On very large systems with thousands of tasks, it may not be feasible to dump the memory state information for each one. Such systems should not be forced to incur a performance penalty in OOM situations when the information may not be desired.
If this is set to nonzero, this information is shown whenever the OOM-killer actually kills a memory-hogging task.
The default value is 0.
If this is set to zero, the OOM-killer will scan through the entire tasklist and select a task based on heuristics to kill. This normally selects a rogue memory-hogging task that frees up a large amount of memory when killed.
If this is set to nonzero, the OOM-killer simply kills the task that triggered the out-of-memory condition. This avoids a possibly expensive tasklist scan.
If /proc/sys/vm/panic_on_oom is nonzero, it takes precedence over whatever value is used in /proc/sys/vm/oom_kill_allocating_task.
The default value is 0.
Only one of overcommit_kbytes or overcommit_ratio can have an effect: if overcommit_kbytes has a nonzero value, then it is used to calculate CommitLimit, otherwise overcommit_ratio is used. Writing a value to either of these files causes the value in the other file to be set to zero.
In mode 2 (available since Linux 2.6), the total virtual address space that can be allocated (CommitLimit in /proc/meminfo) is calculated as
CommitLimit = (total_RAM - total_huge_TLB) *
overcommit_ratio / 100 + total_swap
Since Linux 3.14, if the value in /proc/sys/vm/overcommit_kbytes is nonzero, then CommitLimit is instead calculated as:
CommitLimit = overcommit_kbytes + total_swap
If this file is set to the value 0, the kernel's OOM-killer will kill some rogue process. Usually, the OOM-killer is able to kill a rogue process and the system will survive.
If this file is set to the value 1, then the kernel normally panics when out-of-memory happens. However, if a process limits allocations to certain nodes using memory policies (mbind(2) MPOL_BIND) or cpusets (cpuset(7)) and those nodes reach memory exhaustion status, one process may be killed by the OOM-killer. No panic occurs in this case: because other nodes' memory may be free, this means the system as a whole may not have reached an out-of-memory situation yet.
If this file is set to the value 2, the kernel always panics when an out-of-memory condition occurs.
The default value is 0. 1 and 2 are for failover of clustering. Select either according to your policy of failover.
If enabled in the kernel (CONFIG_TIMER_STATS), but not used, it has almost zero runtime overhead and a relatively small data-structure overhead. Even if collection is enabled at runtime, overhead is low: all the locking is per-CPU and lookup is hashed.
The /proc/timer_stats file is used both to control sampling facility and to read out the sampled information.
The timer_stats functionality is inactive on bootup. A sampling period can be started using the following command:
# echo 1 > /proc/timer_stats
The following command stops a sampling period:
# echo 0 > /proc/timer_stats
The statistics can be retrieved by:
$ cat /proc/timer_stats
While sampling is enabled, each readout from /proc/timer_stats will see newly updated statistics. Once sampling is disabled, the sampled information is kept until a new sample period is started. This allows multiple readouts.
Sample output from /proc/timer_stats:
$ cat /proc/timer_stats Timer Stats Version: v0.3 Sample period: 1.764 s Collection: active 255, 0 swapper/3 hrtimer_start_range_ns (tick_sched_timer) 71, 0 swapper/1 hrtimer_start_range_ns (tick_sched_timer) 58, 0 swapper/0 hrtimer_start_range_ns (tick_sched_timer) 4, 1694 gnome-shell mod_delayed_work_on (delayed_work_timer_fn) 17, 7 rcu_sched rcu_gp_kthread (process_timeout) ... 1, 4911 kworker/u16:0 mod_delayed_work_on (delayed_work_timer_fn) 1D, 2522 kworker/0:0 queue_delayed_work_on (delayed_work_timer_fn) 1029 total events, 583.333 events/sec
Linux version 1.0.9 (quinlan@phaze) #1 Sat May 14 01:51:54 EDT 1994