As
The Linux Programming Interface
went to press in August 2010, it was up to date with the then current
versions of the Linux kernel (2.6.35) and
glibc (2.12).
Because the developers of both the Linux kernel and
glibc
are committed to maintaining
ABI
compatibility,
virtually all of the details provided in TLPI should
remain accurate in the future.
However, new features are added to the kernel and
glibc
with each release.
As each new release of the Linux kernel and
glibc occurs,
this page will attempt to note new interface features that are
relevant to the subject area of the book.
POSIX clocks and timers now support a new clock,
CLOCK_TAI,
that measures against International Atomic Time.
…
See also:
LWN articles on the kernel 3.10 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.10 summary.
Linux 3.9 (28 Apr 2013)
API changes include the following:
A new
SO_REUSEPORT
socket option allows multiple sockets to be bound to a
UDP or TCP port.
The option improves performance in some
network server designs.
More information can be found in my LWN.net article
The SO_REUSEPORT socket option.
A new
/proc/sys/kernel/sched_rr_timeslice_ms
file can be used to view and set the
SCHED_RR
(realtime scheduling round-robin)
quantum as a millisecond value.
The default value is 100.
Writing 0 to this file resets the quantum to the default value.
See also:
LWN articles on the kernel 3.9 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.9 summary.
Linux 3.8 (18 Feb 2013)
API changes include the following:
The
ptrace()
system call supports a new flag,
PTRACE_O_EXITKILL.
If a tracing process sets this flag, a
SIGKILL
signal will be sent to every traced process
if the tracing process exits.
On systems that provide multiple huge page sizes,
shmget()
and
mmap()
can now select the desired page size for an allocation.
For more information, see my LWN.net article
Supporting variable-sized
huge pages.
The user namespaces implementation
has been completed,
allowing unprivileged process to create and work with
user namespaces via
clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), and
setns().
See also my series of LWN.net articles on namespaces starting
here
(you can find a directory of the follow-up articles
in the series at the end of that article).
The
setns()
and
unshare()
system calls add support for PID, mount, and user namespaces.
A new
finit_module()
system call is added.
This system call is like
init_module(),
but loads the module from an open file descriptor.
In addition, the new system call has a
flags
argument that can be used to modify the behavior of the system call.
For more information, see this
LWN article
and the
init_module(2)
manual page.
The
ext4
and
tmpfs
file systems add support for the
lseek()SEEK_HOLE
and
SEEK_DATA
operations.
See also:
LWN articles on the kernel 3.8 merge window
(1,
2)
and the Kernel Newbies
kernel 3.8 summary.
Linux 3.7 (10 Dec 2012)
API changes include the following:
The server-side implementation of the TCP Fast Open feature was merged.
This complements the implementation of the client-side functionality
that was merged in 3.6.
To enable server-side (i.e., passive) TCP Fast Open,
a TCP server must use
setsockopt()
to set the
TCP_FASTOPEN
For more information, see my LWN.net article,
TCP
Fast Open: expediting web services.
The
epoll_ctl()
adds a new flag,
EPOLL_CTL_DISABLE,
that allows multithreaded applications to safely
disable monitoring of a file descriptor.
The
Btrfs
file system adds support for hole punching
(the fallocate(2)FALLOC_FL_PUNCH_HOLE
operation added in Linux 2.6.38).
See also:
LWN articles on the kernel 3.7 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.7 summary.
Linux 3.6 (30 Sep 2012)
API changes include the following:
The client-side implementation of the TCP Fast Open feature
was merged. This implements a new flag,
MSG_FASTOPEN,
used with either
sendto()
or
sendmsg()
to initiate a TCP fast open.
The new /proc/sys/net/ipv4/tcp_fastopen
file can be set to enable or disable client (and server) TCP Fast Open
functionality.
For more information, see my LWN.net article,
TCP
Fast Open: expediting web services.
Some restrictions on the creation of hard and soft links were added,
in order to improve security.
For more information, see
this LWN.net article
and the documentation of
/proc/sys/fs/protected_hardlinks
and
/proc/sys/fs/protected_symlinks
in the
proc(5)
manual page.
The
fcntl()
system call adds support for a new command,
F_GETOWNER_UIDS,
that can be used to retrieve the real and effective user IDs
associated with a previous call to
F_SETOWNER.
(Those UIDs determine the rules for sending
a signal to another process for signal-driven I/O.)
The third argument of the call is of type
uid_t *, and should point
to a two-element array that stores the real user ID and
effective user ID.
This feature is intended for use by the checkpoint/restore
facility and is only provided if the kernel was configured with the
CONFIG_CHECKPOINT_RESTORE option.
See also:
LWN articles on the kernel 3.6 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.6 summary.
Linux 3.5 (21 Jul 2012)
API changes include the following:
The seccomp filters mechanism was added. This feature
is designed to allow security conscious applications
to limit the set of system calls that they can make.
For further information, see
this LWN.net article
and the kernel source file
Documentation/prctl/seccomp_filter.txt.
Among other things, the change adds a new a new
PTRACE_O_TRACESECCOMP
flag to
ptrace(2).
A new
kcmp()
system call used to determine whether various kernel objects are
shared between tasks.
This is useful for the checkpoint-restore facility.
Some information can be found in
this LWN.net article
and in the
kcmp(2)
manual page.
Some additional
PR_SET_MM_* flags for use with the
PR_SET_MMprctl()
operation added in Linux 3.3.
A new
epollEPOLLWAKEUP
flag prevents system suspend while
epoll
events are ready.
Use of this flag requires that the caller have the newly added
CAP_BLOCK_SUSPEND
capability.
The
tmpfs
file system adds support for hole punching
(the fallocate(2)FALLOC_FL_PUNCH_HOLE
operation added in Linux 2.6.38)
and the
lseek()SEEK_HOLE
and
SEEK_DATA
operations.
The new
prctl()PR_SET_NO_NEW_PRIVS
operation prevents
execve()
from granting privileges.
For example,
a process will not be able to execute a set-user-ID binary to
change its UID or GID if this flag is set.
The same is true for file capabilities.
A corresponding
PR_GET_NO_NEW_PRIVS
operation can be used to retrieve the state of this
attribute for the caller.
Documentation can found in the
prctl(2)
manual page.
The new
prctl()PR_GET_TID_ADDRESS
operation allows the caller to retrieve its
clear_child_tid
address
(see set_tid_address(2)).
Documentation can found in the
prctl(2)
manual page.
If the kernel was configured with
CONFIG_CHECKPOINT_RESTORE,
then a new
/proc/PID/children
file lists the children of a process.
See also:
LWN articles on the kernel 3.5 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.5 summary.
Linux 3.4 (20 May 2012)
API changes include the following:
The
PR_SET_CHILD_SUBREAPERprctl()
operation allows
a "service manager" process to mark itself as a sort of
'sub-init', able to stay as the parent for all orphaned processes
created by the started services.
All
SIGCHLD
signals will be delivered to the service manager.
There is a corresponding
PR_GET_CHILD_SUBREAPERprctl()
operation.
Documentation can be found in the
prctl(2)
manual page.
Planned users of this feature include
D-Bus
and
systemd.
The
madvise()MADV_DONTDUMP
operation can be used to specify that an address range should
be excluded from core dumps.
The
MADV_DODUMP
operation
reverses the effect of
MADV_DONTDUMP.
Documentation can be found in the
madvise(2)
manual page.
The
setsockopt()SO_PEEK_OFF
allows controlling the offset for
peeking at data queued in a socket.
(Currently supported for UNIX domain sockets only.)
Documentation can found in the
socket(7)
manual page
A new
prctl() operation,
PR_SET_PTRACER,
is used with the Yama Linux Security Module to control
which processes can ptrace()
the calling process.
Documentation can found in the
prctl(2)
manual page
and in the kernel source file
Documentation/security/Yama.txt.
See also:
LWN articles on the kernel 3.4 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.4 summary.
Linux 3.3 (18 Mar 2012)
API changes include the following:
A new
prctl() operation,
PR_SET_MM,
intended for use by the checkpoint/restart facility,
allows text, data, and heap sizes to be set
to the values in effect at checkpoint time
when a process is restored.
The caller must have the
CAP_SYS_RESOURCE
capability.
This operation is only supported if the kernel is configured with the
CONFIG_CHECKPOINT_RESTORE
option.
Documentation can found in the
prctl(2) manual page.
Two changes related to the /proc
filesystem, as reported on
LWN.net:
A new
/proc/PID/map_files
directory contains symbolic links
describing the file mappings of the process identified by PID;
documentation can be found in the
proc(5) manual page.
New mount options for the
/proc
file system can be used to control the visibility of the
/proc/PID
directories.
See also:
LWN articles on the kernel 3.3 merge window
(1,
2)
and the Kernel Newbies
kernel 3.3 summary.
Linux 3.2 (4 Jan 2012)
API changes include the following:
The
process_vm_readv()
and
process_vm_writev()
functions, which provide a technique for fast message passing.
Some information can be found on LWN.net
here and
here
(describes an early version of the API),
and in the
process_vm_readv(2)
manual page.
Files under
/proc/sys
are now pollable, meaning
that applications can use
poll(),
select(),
and
epoll
to check for changes to
sysctl
parameters.
A new
/proc/sys/kernel/cap_last_cap
file exposes the numerical value of the highest capability
supported by the running kernel;
this can be used to determine the highest bit
that may be set in a capability set.
See also:
LWN articles on the kernel 3.2 merge window
(1,
2)
and the Kernel Newbies
kernel 3.2 summary.
Linux 3.1 (24 Oct 2011)
API changes include the following:
Three new operations are added for the
ptrace()
system call:
PTRACE_SEIZE,
PTRACE_INTERRUPT,
and
PTRACE_LISTEN.
Documentation can be found in the
ptrace(2)
manual page.
Some further information can be found
here.
Two new flags for the
lseek()
system call,
SEEK_HOLE
and
SEEK_DATA,
provide the ability to search for holes in sparsely allocated files.
Some further information can be found in
this LWN.net article
and the
lseek(2)
manual page.
A new
/proc/sys/kernel/shm_rmid_forced
file can be used to control the handling of System V
shared memory segments that have no attached process.
The default value in this file is 0,
which provides the traditional behavior:
unattached segments remain in existence and
can be reattached at a later point in time by another process.
If the value in
shm_rmid_forced
is 1, then the effect is as though an
IPC_RMID
operation is performed on all shared memory segments
that currently exist and that are created in the future.
This means that those segments will be destroyed as soon
as the last process detaches from them.
This can be useful to ensure that shared memory segments
are counted against the resource usage and limits
of at least one process,
but it is nonstandard and has the potential to break
applications that depend on the traditional behavior.
Further details can be found in the
proc(5)
manual page.
See also:
LWN articles on the kernel 3.1 merge window
(1,
2)
and the Kernel Newbies
kernel 3.1 summary.
A new
setns()
system call allows its caller to join the namespace
specified by its two arguments—a namespace type
(one of a subset of the
CLONE_*
constants given to
clone(2))
and a file descriptor referring to one of the files in a
/proc/PID/ns
directory.
Some further info
here,
and in the
setns(2)
manual page contributed by Eric Biederman.
A new
sendmmsg()
system call provides multiple message sending facilities
(the analog of the
recvmmsg(2)
system call added in Linux 2.6.33).
For more information, see the
sendmmsg(2)
manual page.
The
timerfd_settime()
system call adds a
TFD_TIMER_CANCEL_ON_SET
flag.
If this flag is set for a
CLOCK_REALTIME
absolute
(TFD_TIMER_ABSTIME)
timer, then the timer is expired if the clock is reset.
Two new POSIX clocks:
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM.
According to the commit message,
these clocks behave identically to
CLOCK_REALTIME
and
CLOCK_BOOTTIME,
but the
_ALARM
suffixed clocks will wake the system if it is suspended.
Some further details can be found
here.
A new
CAP_WAKE_ALARMcapability
governs the use of the
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM
clocks.
The
/proc/sys/kernel/core_pattern
file adds a new specifier, %E.
This specifier is replaced by the pathname of the executable,
with slashes replaced by exclamation marks
(so that the basename of the resulting core dump filename
does not contain slashes).
Documentation can be found in the
core(5)
manual page.
See also:
LWN articles on the kernel 3.0 merge window
(1,
2)
and the Kernel Newbies
kernel 3.0 summary.
Linux 2.6.39 (18 May 2011)
API changes include the following:
New
name_to_handle_at() and
open_by_handle_at()
system calls.
These system calls provide functionality that is useful for
file-system servers that run in user space.
Some details
here and
here.
A new
O_PATH
flag is added for
open(2).
Some details
here.
O_PATH
descriptors can be obtained for symbolic links,
and can be passed via
SCM_RIGHTS
datagrams.
A new
AT_EMPTY_PATH
flag allows empty relative pathnames for
linkat(2),
fchownat(2),
fstatat(2),
and
name_to_handle_at(),
in which case the calls operate on
their directory file descriptor argument.
In addition, an empty pathname can now be supplied to
readlinkat(2),
to produce the same behavior for that call.
A new
clock_adjtime()
system call, analogous to
adjtimex(2),
permits adjustments to POSIX clocks.
A new
syncfs()
system call, which is similar to
sync(2),
but flushes only the file system containing the file
referred to by its file-descriptor argument.
Details in the
syncfs(2)
manual page.
A new POSIX clock,
CLOCK_BOOTTIME,
is identical to
CLOCK_MONOTONIC,
but includes time that the system has been suspended.
This clock is intended for applications that want a
monotonically increasing clock and also want to be aware of
time the system has been suspended.
Some background
here.
A thread operating under the
SCHED_IDLEpolicy
is now allowed to upgrade itself to the
SCHED_BATCH
or
SCHED_OTHER
policy if its nice value falls within the range permitted by its
RLIMIT_NICE
resource limit.
See also:
LWN articles on the kernel 2.6.39 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 2.6.39 summary.
Linux 2.6.38 (14 Mar 2011)
API changes include the following:
A new
AT_NO_AUTOMOUNT
flag for
fstatat(2),
which can be used to suppress automounting of the terminal
component of the pathname argument.
Further information can be found in the
fstatat(2)
manual page.
A new
CAP_SYSLOGcapability,
used (instead of
CAP_SYS_ADMIN)
to govern privileged
syslog(2)
operations.
Documentation can be found in the manual pages.
A new
FALLOC_FL_PUNCH_HOLE
operation for
fallocate(2).
This operation creates a hole (see page 83 of TLPI) in the file
in the byte range indicated by the
offset
and
len
arguments.
(The file data in the specified range is lost.)
File system support is required for the
FALLOC_FL_PUNCH_HOLE
operation.
Among the file systems that support
FALLOC_FL_PUNCH_HOLE
are XFS and
(since Linux 3.0) ext4.
Btrfs is capable of supporting the operation,
and support is likely to be added in the future.
As currently implemented,
FALLOC_FL_PUNCH_HOLE
must be specified with
FALLOC_FL_KEEP_SIZE,
which means that the size of a file can't change,
even if a hole is punched at the end of the file.
Further information can be found in the
fallocate(2)
manual page.
New
MADV_HUGEPAGE
and
MADV_NOHUGEPAGE
flags for
madvise(2).
These flags enable and disable an attribute on the memory region
that indicates that it is important that the region be backed by
huge pages, when this is possible.
Further information on this feature can be found
in the Kernel source file
Documentation/vm/transhuge.txt
as well as
here,
here,
and in the
madvise(2)
manual page.
The new
/proc/sys/kernel/kptr_restrict
file can be used to prevent exposure of kernel pointers via
/proc
files and other interfaces.
(This affects how pointers are printed when using the new
%pK
specifier
for the kernel-internal
printf()
function.)
See the
proc(5)
man page for further details.
See also:
LWN articles on the kernel 2.6.38 merge window
(1,
2)
and the Kernel Newbies
kernel 2.6.38 summary.
Linux 2.6.37 (4 Jan 2011)
API changes include the following:
The permissions on /proc/PID/limits
changed from readable for the owner only to readable for all users
on the system.
The
fanotify_init()
and
fanotify_mark()
system calls were added.
These system calls are designed for use in virus-scanning tools,
but may also serve other more general uses.
They provide functionality that is in some ways similar to
inotify(7).
Note, however, that the
fanotify
interface is not a superset of
inotify.
(The existence of two APIs with heavily overlapping functionality,
rather than a new API that is a superset of the earlier API,
is unfortunate.)
These two system calls were added in Linux 2.6.36,
but disabled while concerns about the API were resolved.
In Linux 2.6.37, the system calls have been enabled.
As yet, there are no manual pages for
fanotify;
in the meantime, see the
Kernel Newbies 2.6.36 changes page
and
this LWN.net article
for further information.
See also:
LWN articles on the kernel 2.6.37 merge window
(1,
2)
and the Kernel Newbies
kernel 2.6.37 summary.
Linux 2.6.36 (20 Oct 2010)
API changes include the following:
The new
prlimit()
system call is an enhancement of
setrlimit()
and
getrlimit().
It allows the caller to both set and retrieve its own resource limits
(including retrieving the old limit at the same time
as a new limit is set), and (with suitable permissions)
perform the same task for other processes.
This system call does not suffer
this kernel bug,
which affects
getrlimit()/setrlimit().
(See pages 759 and 760 of the book.)
Indeed, starting with version 2.13,
glibc provides library implementations
for
setrlimit()
and
getrlimit()
that employ
prlimit()
to work around the kernel bug.
I've added documentation of this system call to the
getrlimit(2)
man page.
The
inotify
API adds a new flag,
IN_EXCL_UNLINK,
that prevents children of a watched directory
from generating events for a directory after they have been
unlinked from that directory.
I've added documentation of this flag to the
inotify(7)
man page.
The OOM killer was been rewritten (again).
In the process, the
/proc/PID/oom_adj
file became obsolete, in favor of the new
/proc/PID/oom_score_adj
file.
For further information, see the
proc(5)
manual page.
See also:
LWN articles on the kernel 2.6.36 merge window
(1,
2)
and the Kernel Newbies
kernel 2.6.36 summary.
glibc API changes
glibc 2.18 (Not yet released)
API changes include the following:
…
glibc 2.17 (25 Dec 2012)
Note: the minimum Linux kernel version to run
with this glibc version is Linux 2.6.16.
API changes include the following:
A new
secure_getenv()
function allows secure access to the environment.
It is similar to
getenv(3),
but returns
NULL
if running in a set-user-ID/set-group-ID process.
Documentation can be found in the
secure_getenv(3)
manual page.
glibc 2.16 (30 Jun 2012)
Note: this and subsequent glibc versions
are not expected to work with any Linux kernel less than version 2.6.
API changes include the following:
The glibc header files now handle the
_ISOC11_SOURCE
feature test macro,
as a mechanism for exposing declarations conforming to the
C11
standard.
A new
getauxval(3)
function allows retrieval of auxiliary vector
(AT_*)
key-value pairs passed from the Linux kernel.
Further information can be found in my LWN.net article
"getauxval()
and the auxiliary vector"
and in the
getauxval(3)
manual page that I wrote.
glibc 2.15 (tagged 25 Dec 2011)
API changes include the following:
A new
scandirat()
function, which is to
scandir()
as
openat(2)
is to
open().
Documentation can be found in the
scandirat(3)
manual page.
glibc 2.14 (tagged 31 May 2011)
No API changes (other than simple
wrappers for recently added Linux system calls).
glibc 2.13 (tagged 17 Jan 2011)
API changes include the following:
Newly added library implementation of
setrlimit()
and
getrlimit()
bypass the system calls of the same name, instead using the
prlimit()
system call to bypass the bug described above
in the API changes for Linux 2.6.39.