As
The Linux Programming Interface
went to press in August 2010, it was up to date with the then current
versions of the Linux kernel (2.6.35),
glibc (2.12),
and the POSIX.1/Single UNIX Standard (POSIX.1-2008/SUSv4).
Because the developers of both the Linux kernel and
glibc
are committed to maintaining
ABI
compatibility,
virtually all of the details provided in TLPI should
remain accurate in the future.
However, (a few) new features are added to the kernel and
glibc
with each release.
As each new release of the Linux kernel and
glibc occurs,
this page will attempt to note new interface features that are
relevant to the subject area of the book.
In addition, this page provides links to information
about subsequent updates to the POSIX/SUS standard.
See also:
LWN
articles on the kernel 4.18 merge window
(1,
2)
and the Kernel Newbies
kernel 4.18 summary.
Linux 4.17 (3 June 2018)
API changes include the following:
…
See also:
LWN
articles on the kernel 4.17 merge window
(1,
2)
and the Kernel Newbies
kernel 4.17 summary.
Linux 4.16 (1 April 2018)
API changes include the following:
The PowerPC architecture now supports the memory-protection keys
feature that first appeared in Linux 4.9
(which provided support only on the Intel x86 architecture).
The
pwritev2(2)
system call now supports the
RWF_APPEND
flag, which allows data to be appended to a file on
a per-call basis.
For further details, see the
pwritev2(2)
manual page.
The
membarrier()
system call adds support for the following new commands:
MEMBARRIER_CMD_GLOBAL_EXPEDITED,
MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED,
MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE,
and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE.
Details can be found in the
membarrier(2)
manual page.
See also:
LWN
articles on the kernel 4.16 merge window
(1,
2)
and the Kernel Newbies
kernel 4.16 summary.
Linux 4.15 (28 Jan 2018)
API changes include the following:
The limit on the number of lines that can be written to the
/proc/PID/uid_map,
and
/proc/PID/gid_map
files has been increased from 5 to 340.
Details can be found in the
user_namespaces(7)
manual page.
A
cpu
cgroups controller is now available for cgroups version 2.
The
/sys/kernel/cgroup/delegate
file exports a list of the files that must be made writable
when doing delegation in the cgroups v2 hierarchy.
Details can be found in the
cgroups(7)
manual page.
The
/sys/kernel/cgroup/features
file exports a list of the features supported by cgroups v2.
Details can be found in the
cgroups(7)
manual page.
The
mmap()
system call supports two new flags,
MAP_SHARED_VALIDATE
and
MAP_SYNC.
Details can be found in the
mmap(2)
manual page.
See also:
LWN
articles on the kernel 4.15 merge window
(1,
2)
and the Kernel Newbies
kernel 4.15 summary.
Linux 4.14 (12 Nov 2017)
API changes include the following:
The new
memfd_create()MFD_HUGETLB
flag allows the creation of anonymous files
in the RAM-base hugetlbfs filesystem.
For details, see the
memfd_create(2)
manual page.
The new
madvise()MADV_WIPEONFORK
and
MADV_KEEPONFORK
allow a process to set or clear the
"wipe on fork" attribute
for the pages in a specified private anonymous address range.
If this attribute is set,
then the pages in this range are cleared in
a child process created by
fork().
For details, see the
madvise(2)
manual page.
There are multiple additions to the seccomp facility,
all of which are documented in the
seccomp(2)
manual page:
The kernel now provides the ability to log the
actions returned by seccomp filters to the audit log.
All actions other than
SECCOMP_RET_ALLOW
can be logged.
The new
/proc/sys/kernel/seccomp/actions_logged
can be used to limit the set of actions that are logged
to the audit log.
The new
seccomp()SECCOMP_FILTER_FLAG_LOG
flag allows a BPF filter to request that all return
actions (except
SECCOMP_RET_ALLOW)
are logged to the audit log.
The new
SECCOMP_RET_LOG
filter return action permits the system call (like
SECCOMP_RET_ALLOW),
but logs the action to the audit log.
The new
SECCOMP_RET_KILL_PROCESS
filter return action causes the kernel to terminate
all of the threads in a multithreaded process.
This contrasts with the preexisting
SECCOMP_RET_KILL_THREAD
filter return action, which terminates only the thread
that made the system call.
To clearly distinguish the new
SECCOMP_RET_KILL_PROCESS
filter return action from the older
SECCOMP_RET_KILL
action, the name
SECCOMP_RET_KILL_THREAD
has been added as a synonym for
SECCOMP_RET_KILL.
The default treatment for an unrecognized filter
action return value changes from
SECCOMP_RET_KILL_THREAD
to
SECCOMP_RET_KILL_PROCESS.
The new
/proc/sys/kernel/seccomp/actions_avail
file shows a list of the seccomp filter actions
that are supported by the kernel.
The new
seccomp()SECCOMP_GET_ACTION_AVAIL
operation allows a program to ask the kernel whether it
supports a specified filter return action.
A range of new features appear in the cgroups version 2
implementation, all of which are documented in the
cgroups(7)
manual page:
Support is added for the so-called "thread mode",
whereby some restrictions that hitherto existed in
cgroups v2 are relaxed.
The implementation now allows for the creation of
"threaded subtrees", within which the threads of
a multithreaded process may be spread across
different cgroups. Within a threaded subtree,
the "no internal processes" rule is relaxed,
so that a cgroup insidea threaded subtree can both
have member processes and exercise control over
child cgrops.
Only so-called threaded controllers (currently,
cpu,
perf_event,
and
pids)
can be employed within the cgroups
of a threaded subtree.
A new
cgroup.type
file, which appears in each nonroot cgroup and
which was added to support the "thread mode" concept,
can be used to view and change the
"type" of a thread group.
A new
cgroup.threads
file is used with "thread mode" to view the threads that
are members of a cgroup and to move threads to new cgroups.
Two new files that appear in each cgroup,
cgroup.max.depth
and
cgroup.max.descendants,
can be used to limit the depth of a cgroup subtree and
the number of descendant cgroups in the subtree.
A new
cgroup.stat
file exports information about the number of
cgroups under a cgroup subtree.
Version 3 file capabilities were added, in order to allow
the implementation of namespaced file capabilities.
Namespaced file capabilities are a mechanism that
allows a process that has capabilities inside a
noninitial user namespace (but which has no
capabilities in the initial user namespace) to
attach capabilities to an executable file in a way that
means those capabilities will be conferred to a process that
executes the file only if the process resides inside
that user namespace.
Further information can be found in the
capabilities(7)
manual page.
The
membarrier()
system call adds an expedited option
(the
MEMBARRIER_CMD_PRIVATE_EXPEDITED
command).
For further details, see the
membarrier(2)
manual page
and Jonathan Corbet's LWN.net article
Expediting membarrier().
The
preadv2(2)
system call adds support for a new flag,
RWF_NOWAIT,
which can be used to avoid blocking for data that is
not immediately available.
For further details, see the
preadv2(2)
manual page.
See also:
LWN
articles on the kernel 4.14 merge window
(1,
2)
and the Kernel Newbies
kernel 4.14 summary.
Linux 4.13 (3 Sep 2017)
API changes include the following:
The new
kcmp()KCMP_EPOLL_TFD
request can be used to discover whether a specified file descriptor
is present in an epoll instance.
Further details can be found in the
kcmp(2)
manual page.
A set of new
fcntl()
requests
(F_GET_RW_HINT,
F_SET_RW_HINT,
F_GET_FILE_RW_HINT,
F_SET_FILE_RW_HINT)
can be used to get and set file read/write hints
that are associated with open file descriptions or inodes.
Details can be found in
fcntl(2)
manual page, in the section
"File read/write hints".
Given a file descriptor that refers to a pseudoterminal master,
the new
TIOCGPTPEERioctl()
operation opens and returns a new file descriptor that
refers to the peer pseudoterminal slave device.
This operation can be performed regardless of whether
the pathname of the slave device is accessible through the
calling process's mount namespace.
Details can be found in
ioctl_tty(2)
manual page.
A new cgroups v2 mount option
nsdelegate
causes cgroup namespaces to automatically become delegation
boundaries.
Details can be found in the
cgroups(7)
manual page.
See also:
LWN
articles on the kernel 4.13 merge window
(1,
2)
and the Kernel Newbies
kernel 4.13 summary.
Linux 4.12 (2 Jul 2017)
API changes include the following:
The new
/proc/PID/ns/pid_for_children
file provides a handle that shows which PID namespace the children
of process will be created in.
For details, see the
namespaces(7)
and
pid_namespaces(7)
manual pages.
The new
ioctl()GETFSMAP
retrieves physical extent mappings for a filesystem.
For details, see the
ioctl_getfsmap(2)
manual page.
See also:
LWN
articles on the kernel 4.12 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.12 summary.
Linux 4.11 (30 April 2017)
API changes include the following:
A new
statx()
system call has been added.
This system call provides a range of extensions to
the functionality of the older
stat()
system call.
Various enhancements have been made to the
userfaultfd mechanism that was added in Linux 4.3.
Details can be found in the
userfaultfd(2)
and
ioctl_userfaultfd(2)
Two new namespace
ioctl()
operations permit the possibility to
discover details of the namespace set-up on the system:
NS_GET_NSTYPE
can be used to discover the type of namespace referred to
by a file descriptor, and
NS_GET_OWNER_UID
can be used to discover the user ID of the owner of
a user namespace that is referred to by a file descriptor.
Details can be found in the
ioctl_ns(2)
manual page.
A new RDMA cgroups resource controller has been added
(for both version 1 and version 2 cgroups).
(RDMA stands for remote direct memory access,
a technique to copy data directly from the memory of
one computer to the memory of another computer.
RDMA can be used to implement zero-copy networking;
that is, no kernel-user-space buffer copying.)
See also:
LWN
articles on the kernel 4.11 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.11 summary.
Linux 4.10 (19 Feb 2017)
API changes include the following:
It is now possible to attach a BPF filter to a cgroup in order to
perform network filtering for all processes within cgroup.
For further information, see Jonathan Corbet's LWN.net article,
Network filtering
for control groups
Support for POSIX timers is now configurable.
Support is enabled by default, but can be disabled via the
CONFIG_POSIX_TIMERS option.
A process's "No new privileges" setting, set via the
prctl()PR_SET_NO_NEW_PRIVS
operation added in Linux 3.5,
is now exposed in the
/proc/PID/status
file.
See also:
LWN
articles on the kernel 4.10 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.10 summary.
Linux 4.9 (11 Dec 2016)
API changes include the following:
The memory protection keys interface has been added.
Further details can be found in the following manual pages:
pkeys(7),
mprotect(2),
and
pkey_alloc(2).
See also the LWN.net articles
(1,
2)
by Jon Corbet.
Two new ioctl(2) operations,
NS_GET_USERNS
and
NS_GET_PARENT,
can be used to discover the relationships
between non-user namespaces and their associated user namespaces
and to find the parents of PID and user namespaces.
Details can be found in the
ioctl_ns(2)
manual page and in my blog post
Introspecting namespace relationships.
A set of files added in the
/proc/sys/user
directory can be used to view and modify limits
on the number of namespaces of each type that
can be created by each user inside a user namespace.
Details can be found in the
cgroup_namespaces(7)
manual page.
The list of locks shown in
/proc/locks
is now filtered to show just the locks for the processes in
the PID namespace for which the
/proc
filesystem was mounted.
See also:
LWN
articles on the kernel 4.9 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.9 summary.
Linux 4.8 (2 Oct 2016)
API changes include the following:
A new
pids.events
interface file for the
pids
cgroup controller allows notification of events for this cgroup.
This is a key-value file that currently supports one key, named
max,
which shows the number of times that
fork()
failed because the
pids.max
limit for this cgroup was encountered.
This file can be monitored with
inotify(7)
(changes produce
IN_MODIFY
events)
and
poll()
(changes produce
POLLPRI
readiness notifications).
See also:
LWN
articles on the kernel 4.8 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.8 summary.
Linux 4.7 (24 July 2016)
API changes include the following:
The
sigaltstack(2)
system call adds a flag,
SS_AUTODISARM,
that disables the alternate signal stack while the signal handler
is running.
This allows the application to safely call
swapcontext(3)
from within the signal handler without
corrupting the stack when subsequent signals are delivered.
Details can be found in the
sigaltstack(2)
manual page.
The
waitid(2)
system call adds support for the
__WCLONE,
__WALL,
and
__WNOTHREAD
flags.
A new Umask field in the
/proc/PID/status
file can be used to inspect a process's umask.
Details can be found in the
umask(2)
manual page.
The
preadv2(2)
and
pwritev2(2)
system calls add support for two new flags,
RWF_SYNC
and
RWF_DSYNC,
although the flags are meaningful only for
pwritev2(2).
These flags provide the per-I/O equivalent of the
O_SYNC
and
O_DSYNC
file status flags (described in the
open(2)
manual page).
For further details, see the
pwritev2(2)
manual page.
See also:
LWN
articles on the kernel 4.7 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.7 summary.
Linux 4.6 (15 May 2016)
API changes include the following:
A new
clone()
flag,
CLONE_NEWCGROUP
can be used to create a new process in a new control-group
namespace.
Further details can be found in the
commit message
for the patch that added this feature, as well as the
cgroup_namespaces(7),
clone(2),
unshare(2), and
setns(2)
manual pages.
Two new system calls,
preadv2()
and
pwritev2(),
are like
preadv()
and
pwritev(),
but add a flags argument.
For further information, see the
preadv2(2)
manual page, and Jon Corbet's LWN.net article,
The return of preadv2()/pwritev2().
See also:
LWN
articles on the kernel 4.6 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.6 summary.
Linux 4.5 (14 Mar 2016)
API changes include the following:
A new
copy_file_range(2)
system call has been added, permitting fast in-kernel copying
of data between two files without the need to shift data
through user-space buffers.
Details can be found in the
manual page.
A new flag for the
madvise()
system call,
MADV_FREE,
allows a process to advise the kernel that
it no longer needs the pages in a specified address range.
The kernel is then at liberty to (destructively)
free these pages for reuse.
Further details can be found in the
madvise(2)
manual page.
A new event flag for use with the
epoll_ctl()
system call,
EPOLLEXCLUSIVE,
can be used in some circumstances
to avoid thundering herd problems
when multiple processes are monitoring the same file.
Further details can be found in the
epoll_ctl(2)
manual page.
The unified-hierarchy ("version 2") control-group interface,
which has been in development since Linux 3.16
but was hitherto marked as experimental,
is now considered to be officially released.
However, not all controllers support the new interface yet.
Information about the new interface can be found in
the kernel source file
Documentation/cgroup-v2.txt
and in the
cgroups(7)
manual page.
Mandatory file locking is now an optional feature, governed
by a kernel configuration option
(CONFIG_MANDATORY_FILE_LOCKING).
This is the first step toward eventually removing a feature
that is buggy and believed to be little or completely unused.
See also:
LWN
articles on the kernel 4.5 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.5 summary.
Linux 4.4 (10 Jan 2016)
API changes include the following:
A new
mlock2()
system call has been added, and a related
MCL_ONFAULT
flag has been added for the
mlockall()
system call.
Details can be found in the
mlock(2) manual page.
The new
ptrace()PTRACE_SECCOMP_GET_FILTER
operation can be used to dump a process's seccomp filters.
Details can be found in the
ptrace(2)
manual page.
See also:
LWN
articles on the kernel 4.4 merge window
(1,
2)
and the Kernel Newbies
kernel 4.4 summary.
Linux 4.3 (1 Nov 2015)
API changes include the following:
A new membarrier()
system call has been added.
Information about this system call can be found in the
membarrier(2)
manual page and in the
commit message.
The motivation for adding this system call,
which has been under development for several years,
is discussed Jon Corbet's LWN.net article,
sys_membarrier().
A new userfaultfd()
system call and some associated
ioctl()
operations have been added.
Further information can be found in Jon Corbet's LWN.net article,
Page faults
in user space
and in the
userfaultfd(2)
and
ioctl_userfaultfd(2)
manual pages.
The ambient capabilities feature has been merged.
Details can be found in the
capabilities(7) manual page
and the description of
CAP_AMBIENT
in the
prctl(2) manual page.
Direct system calls are now provided for the sockets API on x86-32,
rather than multiplexing via the
socketcall(2)
system call
(which continues to be provided for backward compatibility).
This change facilitates
seccomp(2) filtering
of sockets system calls.
(In order to employ such filters, the filtered program
must have been compiled so as to employ the new system calls).
A new
pids
cgroups controller can be used to limit the number of tasks
in a cgroup.
See also:
LWN
articles on the kernel 4.3 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.3 summary.
Linux 4.2 (30 Aug 2015)
API changes include the following:
The
splice()
system call now supports UNIX domain stream sockets.
The
ext4
and
f2fs
filesystems now support the
fallocate()FALLOC_FL_INSERT_RANGE
operation (which first appeared in Linux 4.1).
The limit of 8 recursions while resolving a pathname
containing symbolic links has been lifted.
The only limit now imposed is the maximum of 40 dereferences
while resolving the entire pathname.
Further information can be found in the
path_resolution(7)
manual page.
See also:
LWN
articles on the kernel 4.2 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.2 summary.
Linux 4.1 (21 June 2015)
API changes include the following:
The
fallocate()
system call adds the
FALLOC_FL_INSERT_RANGE
command for inserting a hole into the middle of a file.
(The bytes past the point of insertion are shifted in order to
make room for the hole.)
In the initial implementation, this operation is supported
only by the XFS filesystem.
Details can be found in the
fallocate()
man page.
The
/proc/PID/status
file adds four new fields:
NStgid,
NSpid,
NSpgid,
and
NSsid.
These fields show respectively the
process ID, (kernel) thread ID, process group ID, and session ID
in each of the PID namespaces of which the process is a member.
The leftmost entry shows the value with respect to the PID namespace
of the reading process,
followed by the value in successively nested inner namespaces.
Details can be found in the
proc(5)
manual page.
The XFS filesystem adds support for the
renameat2()RENAME_WHITEOUT
flag.
See also:
LWN
articles on the kernel 4.1 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.1 summary.
Linux 4.0 (12 April 2015)
API changes include the following:
The
mount(2)
system call adds a new
MS_LAZYTIME
option that
minimizes the number of updates to
file timestamps in the on-disk i-node.
This can provide greatly improved performance in some circumstances.
Further details can be found in the
mount(2)
manual page.
The implementation of the
remap_file_pages(2)
system call, which had already been deprecated in Linux 3.16,
has been replaced by a slower in-kernel implementation.
For information on why this change was made,
see the LWN.net article,
The possible demise
of remap_file_pages().
See also:
LWN
articles on the kernel 4.0 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 4.0 summary.
Linux 3.19 (9 Feb 2015)
API changes include the following:
A new
execveat()
system call has been added.
This system call is to
execve()
what
openat()
is to
open().
The primary motivation for adding this system call is
to allow an implementation of the
fexecve()
library function that does not rely on the
/proc
filesystem being mounted.
Further information can be found in the
execveat()
manual page.
The default values for the System V semaphore limits,
SEMMSL,
SEMMNI,
and
SEMOPM,
have been increased.
Details can be found in the
semget(2)
and
semop(2)
manual pages.
A new /proc/PID/setgroups
file has been added, and the behavior of the
setgroups(2)
has been changed in order to close a security loophole
concerning the interaction of
setgroups(2)
and user namespaces.
The background story can be read in Jon Corbet's LWN.net article,
User namespaces and setgroups(),
and the full details on
setgroups(2)
and why it was needed can be found in the
user_namespaces(7)
manual page.
See also:
LWN
articles on the kernel 3.19 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.19 summary.
The
renameat2()
system call adds a new flag,
RENAME_WHITEOUT,
that is used to support "whiteouts" when renaming
files on overlay/union filesystems.
Details can be found in the
renameat2(2)
manual page.
This operation requires filesystem support,
which is provided by the
ext4
and
shmem
filesystems in the initial implementation.
See also:
LWN
articles on the kernel 3.18 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.18 summary.
Linux 3.17 (5 October 2014)
API changes include the following:
A new getrandom()
system call has been added.
This system call returns randomness from the entropy pool.
Some details can be found in Jake Edge's LWN.net article,
A system call
for random numbers: getrandom()
and in the
getrandom(2)
manual page.
A new seccomp()
system call has been added,
for controlling seccomp filters.
For further information, see the
seccomp(2)
manual page.
A new file-sealing API is implemented for files
inside shared memory filesystems;
this API consists of two
new fcntl() operations,
F_GET_SEALS
and
F_ADD_SEALS,
which get and add seals to a file
(the current seals are
F_SEAL_SHRINK,
F_SEAL_GROW,
F_SEAL_WRITE,
and
F_SEAL_SEAL).
In addition, a new memfd_create()
system call has been added.
This system call can be used to
create anonymous shared memory mappings referred
to via a file descriptor; that file descriptor can used
with the file-sealing API.
More information can be found in Jon Corbet's LWN.net article
that discusses an earlier version of this API,
Sealed files
and in the
memfd_create(2)
and
fcntl(2)
manual pages.
A new kexec_file_load()
system call has been added.
This provides the ability to load a kexec kernel
and initrd filesystem
specified as file descriptors.
For details, see the
kexec_load(2)
manual page.
A new
/proc/thread-self
directory is added in the
/proc
filesystem.
This directory is the threads analog of
/proc/self;
in other words, it is a synonym for
/proc/self/task/TID,
where
TID
is the thread ID of the calling thread.
See also:
LWN
articles on the kernel 3.17 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.17 summary.
A new capability,
CAP_AUDIT_READ,
allows reading the audit log via a multicast netlink socket.
Btrfs now supports the
open()O_TMPFILE
flag that was added in Linux 3.11.
The default values for the System V shared memory limits,
SHMALL
and
SHMMAX,
have been increased.
Details can be found in the
shmget(2)
manual page.
A new
/proc/sys/kernel/sysctl_strict_writes
file determines the behavior when an application
tries to write into a
/proc/sys
file at a nonzero offset.
For details, see the
proc(5)
manual page.
See also:
LWN
articles on the kernel 3.16 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.16 summary.
Linux 3.15 (8 Jun 2014)
API changes include the following:
A new
renameat2()
system call has been added.
This system call extends the existing
renameat()
system call to allow two filenames to be swapped in an
atomic operation.
Further information can be found in Jon Corbet's LWN.net article,
Exchanging two files,
and also in my LWN.net article,
Flags as a system call API design pattern.
Documentation can be found in the
rename(2)
manual page.
Open file description (OFD) locks
(formerly known
as "file private locks") have been added.
This feature improves on some significant deficiencies
in the traditional byte-range locking API.
(That API is described in
Chapter 55 of TLPI, and the limitations are described in
Section 55.3.5.)
Further information on OFD locks can be found in
Jeffrey T. Layton's LWN.net article,
File-private POSIX locks,
and in the
fcntl(2)
manual page.
The
ext4
and
XFS
filesystems implement two new flags for the
fallocate()
system call:
FALLOC_FL_ZERO_RANGE
and
FALLOC_FL_COLLAPSE_RANGE.
Further information on the
FALLOC_FL_COLLAPSE_RANGE
flag can be found in Jon Corbet's LWN.net article
Finding the proper scope of a file collapse operation.
Both new flags are documented in the
fallocate(2)
manual page.
Two new
prctl()
operations,
PR_SET_THP_DISABLE
and
PR_GET_THP_DISABLE,
set and get the value of the calling process's
"THP disable" flag.
Details can be found in the
prctl(2)
manual page.
XFS now supports the
open()O_TMPFILE
flag that was added in Linux 3.11.
timerfd_create()
adds support for the
CLOCK_BOOTTIME
clock.
Details can be found in the
timerfd_create(2)
manual page.
See also:
LWN
articles on the kernel 3.15 merge window
(1,
2)
and the Kernel Newbies
kernel 3.15 summary.
Linux 3.14 (31 Mar 2014)
API changes include the following:
A new deadline scheduling policy
(SCHED_DEADLINE)
has been added.
In order to control the scheduling of processes under
this policy, two new system calls have been added:
sched_setattr()
and
sched_getattr().
These are more generalized versions of the
sched_setscheduler()
and
sched_getscheduler()
system calls: they allow setting scheduling policy and
parameters for all of the previously existing scheduling policies
as well as the new
SCHED_DEADLINE
policy.
Documentation can be found in the
sched_setattr(2)
and
sched(7)
manual pages.
See also Jonathan Corbet's LWN.net article,
Deadline scheduling: coming soon?,
and in the kernel source file
Documentation/scheduler/sched-deadline.txt.
The user-space lockdep feature has been added.
See Jonathan Corbet's LWN.net article,
User-space lockdep
for details.
TCP has a new "autocorking" feature, controlled via
/proc/sys/net/ipv4/tcp_autocorking.
Documentation can be found in the
tcp(7)
manual page.
The
HARD_QUEUESMAX
ceiling (added in Linux 3.5) on the
/proc/sys/fs/mqueue/msgsize_default
limit is removed.
See also:
LWN
articles on the kernel 3.14 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.14 summary.
Linux 3.13 (20 Jan 2014)
API changes include the following:
The TCP Fast Open feature that was added in Linux 3.7
is now enabled by default.
Other new features (yet to be detailed):
SO_MAX_PACING_RATE
socket option;
keyctl()KEYCTL_GET_PERSISTENT
operation.
See also:
LWN
articles on the kernel 3.13 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.13 summary.
Linux 3.12 (3 Nov 2013)
API changes include the following:
A new per-socket option,
TCP_NOTSENT_LOWAT
and a system-wide setting,
/proc/sys/net/ipv4/tcp_notsent_lowat,
can be used to limit the number of unsent bytes in TCP sockets,
in order to reduce usage of kernel memory.
Some details can be found in the
commit message
and the kernel source file
Documentation/networking/ip-sysctl.txt
The
/proc/sys/kernel/core_pattern
file adds a new specifier, %d.
This specifier is replaced by the "dumpable" mode of the process
(the same value as is returned by the
prctl(2)PR_GET_DUMPABLE
operation).
Details can be found in the
core(5)
manual page.
See also:
LWN
articles on the kernel 3.12 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.12 summary.
Linux 3.11 (2 Sep 2013)
API changes include the following:
A new socket option,
SO_BUSY_POLL,
and a
poll()
flag,
POLL_BUSY_LOOP,
allow for low latency, busy polling on sockets.
Some details can be found in the
socket(7)
manual page, the kernel source file
Documentation/sysctl/net.txt,
and Jonathan Corbet's LWN.net articles,
Ethernet polling and patch-pulling latency
and
Low-latency Ethernet device polling.
A new
open()
flag,
O_TMPFILE,
provides an improved method for race-free creation of temporary files.
Details can be found in the
open(2)
manual page.
As of kernel 3.11, support for this flag is provided in the
ext2,
ext3,
ext4,
UDF,
and
minix
filesystems.
Two new ptrace() commands,
PTRACE_GETSIGMASK
and
PTRACE_SETSIGMASK,
can be used to get and set a process's signal mask.
Details can be found in the
ptrace(2)
manual page.
timerfd_create()
adds support for the
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM
clocks.
Details can be found in the
timerfd_create(2)
manual page.
See also:
LWN
articles on the kernel 3.11 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.11 summary.
Linux 3.10 (30 Jun 2013)
API changes include the following:
POSIX clocks and timers now support a new clock,
CLOCK_TAI,
that measures against International Atomic Time.
The new
ptrace()PTRACE_PEEKSIGINFO
request can be used to nondestructively retrieve pending
signals.
Signals can be retrieved either from the process-wide queue,
or from the per-thread queue.
Details can be found in the
ptrace(2)
manual page.
A test case for this feature can be found in the kernel source file
tools/testing/selftests/ptrace/peeksiginfo.c
POSIX timer IDs are no longer guaranteed to be unique system-wide.
Each process's timers are now visible via the
/proc/PID/timers
file.
This change was made so that the checkpoint/restore
facility can restore a process's timers with the same IDs.
Details of
/proc/PID/timers
can be found in the
proc(5)
manual page.
Two new files
/proc/sys/vm/admin_reserve_bytes
and
/proc/sys/vm/user_reserve_bytes
influence the behavior of memory overcommitting.
For details, see the
proc(5)
manual page.
Other new features (yet to be detailed):
SO_SELECT_ERR_QUEUE socket option.
See also:
LWN
articles on the kernel 3.10 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.10 summary.
Linux 3.9 (29 Apr 2013)
API changes include the following:
A new
SO_REUSEPORT
socket option allows multiple sockets to be bound to a
UDP or TCP port.
The option improves performance in some
network server designs.
More information can be found in my LWN.net article,
The SO_REUSEPORT socket option
and in the
socket(7)
manual page.
A new
/proc/sys/kernel/sched_rr_timeslice_ms
file can be used to view and set the
SCHED_RR
(realtime scheduling round-robin)
quantum as a millisecond value.
The default value is 100.
Writing 0 to this file resets the quantum to the default value.
Other new features (yet to be detailed):
SO_LOCK_FILTER
and
TCP_TIMESTAMP
socket options.
See also:
LWN
articles on the kernel 3.9 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.9 summary.
Linux 3.8 (19 Feb 2013)
API changes include the following:
The
ptrace()
system call supports a new flag,
PTRACE_O_EXITKILL.
If a tracing process sets this flag, a
SIGKILL
signal will be sent to every traced process
if the tracing process exits.
Details can be found in the
ptrace(2)
manual page.
On systems that provide multiple huge page sizes,
shmget()
and
mmap()
can now select the desired page size for an allocation.
For more information, see my LWN.net article,
Supporting variable-sized huge pages.
The user namespaces implementation
has been completed,
allowing unprivileged process to create and work with
user namespaces via
clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), and
setns().
For more information, see the manual pages of those system calls,
and the documentation of the
/proc/PID/ns/*,
/proc/PID/uid_map,
and
/proc/PID/gid_map
files in the
namespaces(7)
manual page.
See also my series of LWN.net articles on namespaces starting
here
(you can find a directory of the follow-up articles
in the series at the end of that article).
The
setns()
and
unshare()
system calls add support for PID, mount, and user namespaces.
Details can be found in the manual pages.
A new
finit_module()
system call is added.
This system call is like
init_module(),
but loads the module from an open file descriptor.
In addition, the new system call has a
flags
argument that can be used to modify the behavior of the system call.
For more information, see my LWN.net article,
Loading modules from file descriptors,
and the
init_module(2)
manual page.
The
msgrcv()
system call adds a new, Linux-specific flag,
MSG_COPY.
This flag causes the
msgtyp
argument to be interpreted as an ordinal position within the message queue,
and causes the call to nondestructively fetch a copy of the
message at that position in the queue.
Details can be found in the
msgrcv(2)
manual page.
The
ext4
and
tmpfs
filesystems add support for the
lseek()SEEK_HOLE
and
SEEK_DATA
operations.
Reads from
inotify(7)
file descriptors are now restarted if the
SA_RESTART
flag was specified when establishing the signal handler.
In addition, reads from
inotify(7)
file descriptors no longer demonstrate
the Linux-specific oddity of failing with the error
EINTR
when the calling process is resumed after a
stop signal plus
SIGCONT
(see page 445 of TLPI).
Other new features (yet to be detailed):
MPOL_LOCAL
and
MPOL_MF_LAZY
memory policy flags;
SO_GET_FILTER
socket option.
See also:
LWN
articles on the kernel 3.8 merge window
(1,
2)
and the Kernel Newbies
kernel 3.8 summary.
Linux 3.7 (11 Dec 2012)
API changes include the following:
The server-side implementation of the TCP Fast Open feature was merged.
This complements the implementation of the client-side functionality
that was merged in 3.6.
To enable server-side (i.e., passive) TCP Fast Open,
a TCP server must use
setsockopt()
to set the
TCP_FASTOPEN.
For more information, see my LWN.net article,
TCP Fast Open: expediting web services.
The
Btrfs
filesystem adds support for hole punching
(the fallocate(2)FALLOC_FL_PUNCH_HOLE
operation added in Linux 2.6.38).
The
/proc/sys/kernel/core_pattern
file adds a new specifier, %P.
This specifier is replaced by the PID of the process
as seen in the initial PID namespace
(whereas the existing %p specifier
is replaced by the PID in the PID namespace where the
process resides).
Details can be found in the
core(5)
manual page.
See also:
LWN
articles on the kernel 3.7 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.7 summary.
Linux 3.6 (1 Oct 2012)
API changes include the following:
The client-side implementation of the TCP Fast Open feature
was merged. This implements a new flag,
MSG_FASTOPEN,
used with either
sendto()
or
sendmsg()
to initiate a TCP fast open.
The new /proc/sys/net/ipv4/tcp_fastopen
file can be set to enable or disable client (and server) TCP Fast Open
functionality.
For more information, see my LWN.net article,
TCP Fast Open: expediting web services.
Some restrictions on the creation of hard and soft links were added,
in order to improve security.
For more information, see Jonathan Corbet's LWN.net article,
Tightening security: not for the impatient,
and the documentation of
/proc/sys/fs/protected_hardlinks
and
/proc/sys/fs/protected_symlinks
in the
proc(5)
manual page.
The
fcntl()
system call adds support for a new command,
F_GETOWNER_UIDS,
that can be used to retrieve the real and effective user IDs
associated with a previous call to
F_SETOWNER.
(Those UIDs determine the rules for sending
a signal to another process for signal-driven I/O.)
The third argument of the call is of type
uid_t *, and should point
to a two-element array that stores the real user ID and
effective user ID.
This feature is intended for use by the checkpoint/restore
facility and is only provided if the kernel was configured with the
CONFIG_CHECKPOINT_RESTORE option.
A new
hugetlb
cgroups controller can be used to limit HugeTLB usage per cgroup.
See also:
LWN
articles on the kernel 3.6 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.6 summary.
Linux 3.5 (21 Jul 2012)
API changes include the following:
The seccomp filters mechanism was added. This feature
is designed to allow security conscious applications
to limit the set of system calls that they can make.
For further information, see Jonathan Corbet's LWN.net article,
Yet another new approach to seccomp,
and the kernel source file
Documentation/prctl/seccomp_filter.txt.
Among other things, the change adds a new a new
PTRACE_O_TRACESECCOMP
flag to
ptrace(2).
A new
kcmp()
system call used to determine whether various kernel objects are
shared between tasks.
This is useful for the checkpoint-restore facility.
Some information can be found in Jonathan Corbet's LWN.net article,
Preparing for user-space checkpoint/restore,
and in the
kcmp(2)
manual page.
Some additional
PR_SET_MM_* flags for use with the
PR_SET_MMprctl()
operation added in Linux 3.3.
A new
epollEPOLLWAKEUP
flag prevents system suspend while
epoll
events are ready.
Use of this flag requires that the caller have the newly added
CAP_BLOCK_SUSPEND
capability
(if the caller does not have this capability, then the
EPOLLWAKEUP flag is
silently ignored).
Details can be found in the
epoll_ctl(2)
and
epoll(7)
manual pages.
The
tmpfs
filesystem adds support for hole punching
(the fallocate(2)FALLOC_FL_PUNCH_HOLE
operation added in Linux 2.6.38).
The
XFS
filesystem adds support for the
lseek()SEEK_HOLE
and
SEEK_DATA
operations.
The new
prctl()PR_SET_NO_NEW_PRIVS
operation prevents
execve()
from granting privileges.
For example,
a process will not be able to execute a set-user-ID binary to
change its UID or GID if this flag is set.
The same is true for file capabilities.
A corresponding
PR_GET_NO_NEW_PRIVS
operation can be used to retrieve the state of this
attribute for the caller.
Details can found in the
prctl(2)
manual page.
The new
prctl()PR_GET_TID_ADDRESS
operation allows the caller to retrieve its
clear_child_tid
address
(see set_tid_address(2)).
Details can found in the
prctl(2)
manual page.
If the kernel was configured with
CONFIG_CHECKPOINT_RESTORE,
then a new
/proc/PID/children
file lists the children of a process.
(In Linux 4.2,
the kernel option governing the presence
of this file was changed to
CONFIG_CHECKPOINT_RESTORE.)
Documentation can be found in the
proc(5)
manual page.
Two new
/proc
files can be used to read and modify the
values that are used to provide defaults when
a POSIX message queue is created using an
mq_open()
call in which the
attr
argument is specified as
NULL.
The
/proc/sys/fs/mqueue/msg_default
file defines the default value used for a new queue's
mq_maxmsg
attribute.
The default value in this file is 10.
The
/proc/sys/fs/mqueue/msgsize_default
file defines the default value used for a new queue's
mq_msgsize
attribute.
The default value in this file is 8192.
In addition,
Linux 3.5 changed the interpretation of various files in the
/proc/sys/fs/mqueue/
directory that specify message queue limits.
Full details can be found in the
mq_overview(7)
manual page.
Other new features (yet to be detailed):
TCP_REPAIR,
TCP_REPAIR_OPTIONS,
TCP_REPAIR_QUEUE,
and
TCP_QUEUE_SEQ
socket options;
keyctl()KEYCTL_INVALIDATE
operation.
See also:
LWN
articles on the kernel 3.5 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.5 summary.
Linux 3.4 (21 May 2012)
API changes include the following:
The
pipe2()
system call permits a new flag,
O_DIRECT,
that creates a pipe that operates in "packet" mode.
Each
write()
(of less than
PIPE_BUF
bytes) to the pipe creates a packet,
and each
read()
reads exactly one packet
(discarding excess bytes if the supplied buffer is too small).
Details can be found in the
pipe(2)
manual page.
The
PR_SET_CHILD_SUBREAPERprctl()
operation allows
a "service manager" process to mark itself as a sort of
'sub-init', able to stay as the parent for all orphaned processes
created by the started services.
All
SIGCHLD
signals will be delivered to the service manager.
There is a corresponding
PR_GET_CHILD_SUBREAPERprctl()
operation.
Details can be found in the
prctl(2)
manual page.
Planned users of this feature include
D-Bus
and
systemd.
The
madvise()MADV_DONTDUMP
operation can be used to specify that an address range should
be excluded from core dumps.
The
MADV_DODUMP
operation
reverses the effect of
MADV_DONTDUMP.
Details can be found in the
madvise(2)
manual page.
The
setsockopt()SO_PEEK_OFF
allows controlling the offset for
peeking at data queued in a socket.
(Currently supported for UNIX domain sockets only.)
Details can found in the
socket(7)
manual page.
A new
prctl() operation,
PR_SET_PTRACER,
is used with the Yama Linux Security Module to control
which processes can ptrace()
the calling process.
Details can found in the
prctl(2)
manual page
and in the kernel source file
Documentation/security/Yama.txt.
UNIX domain sockets now support the use of the
MSG_TRUNC
flag for
recv(2)
and related system calls.
Other new features (yet to be detailed):
SO_NOFCS
socket option.
See also:
LWN
articles on the kernel 3.4 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 3.4 summary.
Linux 3.3 (19 Mar 2012)
API changes include the following:
A new
prctl() operation,
PR_SET_MM,
intended for use by the checkpoint/restart facility,
allows text, data, and heap sizes to be set
to the values in effect at checkpoint time
when a process is restored.
The caller must have the
CAP_SYS_RESOURCE
capability.
This operation is only supported if the kernel is configured with the
CONFIG_CHECKPOINT_RESTORE
option.
Details can found in the
prctl(2)
manual page.
Two changes related to the /proc
filesystem:
A new
/proc/PID/map_files
directory contains symbolic links
describing the file mappings of the process identified by PID;
documentation can be found in the
proc(5) manual page.
Two new mount options for the
/proc filesystem
(hidepid=
and gid=)
can be used to control the visibility of
/proc/PID
directories.
Documentation can be found in the
proc(5) manual page.
A new
net_prio
cgroups controller allows control of the priority of a cgroup's
outgoing network traffic.
Other new features (yet to be detailed):
SO_WIFI_STATUS
socket option.
See also:
LWN
articles on the kernel 3.3 merge window
(1,
2)
and the Kernel Newbies
kernel 3.3 summary.
Linux 3.2 (5 Jan 2012)
API changes include the following:
The
process_vm_readv()
and
process_vm_writev()
functions, which provide a technique for fast message passing.
Some information can be found on LWN.net
here and
here
(describes an early version of the API),
and in the
process_vm_readv(2)
manual page.
Files under
/proc/sys
are now pollable, meaning
that applications can use
poll(),
select(),
and
epoll
to check for changes to
sysctl
parameters.
A new
/proc/sys/kernel/cap_last_cap
file exposes the numerical value of the highest capability
supported by the running kernel;
this can be used to determine the highest bit
that may be set in a capability set.
Extensions to the
cpu
cgroup controller
(governed by CONFIG_CFS_BANDWIDTH)
make it possible to impose a quota on the amount of CPU time
that the processes in a cgroup may consume in each
scheduling period.
Unlike the "shares" mechanism already provided by the
cpu
controller, these quotas apply regardless of whether
there is competition for the CPU.
Within each cgroup, the allocation of the CPU to
processes scheduled under the
SCHED_OTHER
policy can be further controlled using the
nice values of the processes.
For further information, see the kernel source files
Documentation/scheduler/sched-bwc.txt
and
Documentation/scheduler/sched-design-CFS.txt,
the
cgroups(7)
manual page,
and Jonathan Corbet's LWN.net article,
CFS bandwidth control.
See also:
LWN
articles on the kernel 3.2 merge window
(1,
2)
and the Kernel Newbies
kernel 3.2 summary.
Linux 3.1 (24 Oct 2011)
API changes include the following:
Three new operations are added for the
ptrace()
system call:
PTRACE_SEIZE,
PTRACE_INTERRUPT,
and
PTRACE_LISTEN.
Details can be found in the
ptrace(2)
manual page.
Some further information can be found
here.
Two new flags for the
lseek()
system call,
SEEK_HOLE
and
SEEK_DATA,
provide the ability to search for holes in sparsely allocated files.
Some further information can be found in
Jonathan Corbet's LWN.net article
The return of SEEK_HOLE,
and the
lseek(2)
manual page.
For the 3.1 release,
only the Btrfs filesystem supports these operations.
A new
/proc/sys/kernel/shm_rmid_forced
file can be used to control the handling of System V
shared memory segments that have no attached process.
The default value in this file is 0,
which provides the traditional behavior:
unattached segments remain in existence and
can be reattached at a later point in time by another process.
If the value in
shm_rmid_forced
is 1, then the effect is as though an
IPC_RMID
operation is performed on all shared memory segments
that currently exist and that are created in the future.
This means that those segments will be destroyed as soon
as the last process detaches from them.
This can be useful to ensure that shared memory segments
are counted against the resource usage and limits
of at least one process,
but it is nonstandard and has the potential to break
applications that depend on the traditional behavior.
Further details can be found in the
proc(5)
manual page.
See also:
LWN
articles on the kernel 3.1 merge window
(1,
2)
and the Kernel Newbies
kernel 3.1 summary.
A new
setns()
system call allows its caller to join the namespace
specified by its two arguments—a namespace type
(one of a subset of the
CLONE_*
constants given to
clone(2))
and a file descriptor referring to one of the files in a
/proc/PID/ns
directory.
Some further info
here,
and in the
setns(2)
manual page contributed by Eric Biederman.
A new
sendmmsg()
system call provides multiple message sending facilities
(the analog of the
recvmmsg(2)
system call added in Linux 2.6.33).
For more information, see the
sendmmsg(2)
manual page.
The
timerfd_settime()
system call adds a
TFD_TIMER_CANCEL_ON_SET
flag.
If this flag is set for a
CLOCK_REALTIME
absolute
(TFD_TIMER_ABSTIME)
timer, then the timer is expired if the clock is reset.
For more information, see the
timerfd_create(2)
manual page.
Two new POSIX clocks:
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM.
According to the commit message,
these clocks behave identically to
CLOCK_REALTIME
and
CLOCK_BOOTTIME,
but the
_ALARM
suffixed clocks will wake the system if it is suspended.
Some further details can be found
here.
A new
CAP_WAKE_ALARMcapability
governs the use of the
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM
clocks.
The
/proc/sys/kernel/core_pattern
file adds a new specifier, %E.
This specifier is replaced by the pathname of the executable,
with slashes replaced by exclamation marks
(so that the basename of the resulting core dump filename
does not contain slashes).
Details can be found in the
core(5)
manual page.
The ext4 filesystem adds support for the
fallocate()FALLOC_FL_PUNCH_HOLE
See also:
LWN
articles on the kernel 3.0 merge window
(1,
2)
and the Kernel Newbies
kernel 3.0 summary.
Linux 2.6.39 (19 May 2011)
API changes include the following:
New
name_to_handle_at() and
open_by_handle_at()
system calls.
These system calls provide functionality that is useful for
file-system servers that run in user space.
Details can be found in the
open_by_handle_at(2)
manual page that I wrote.
Some details
here and
here.
A new
O_PATH
flag is added for
open(2).
Some details
here.
O_PATH
descriptors can be obtained for symbolic links,
and can be passed via
SCM_RIGHTS
datagrams.
Details can be found in the
open(2)
manual page.
A new
AT_EMPTY_PATH
flag allows empty relative pathnames for
linkat(2),
fchownat(2),
fstatat(2),
and
name_to_handle_at(),
in which case the calls operate on
their directory file descriptor argument.
In addition, an empty pathname can now be supplied to
readlinkat(2),
to produce the same behavior for that call.
Details can be found in the respective manual pages.
A new
clock_adjtime()
system call, analogous to
adjtimex(2),
permits adjustments to POSIX clocks.
A new
syncfs()
system call, which is similar to
sync(2),
but flushes only the filesystem containing the file
referred to by its file-descriptor argument.
Details in the
syncfs(2)
manual page.
A new POSIX clock,
CLOCK_BOOTTIME,
is identical to
CLOCK_MONOTONIC,
but includes time that the system has been suspended.
This clock is intended for applications that want a
monotonically increasing clock and also want to be aware of
time the system has been suspended.
Details can be found in the
timer_create(2)
man page;
some background can be found
here.
A thread operating under the
SCHED_IDLEpolicy
is now allowed to upgrade itself to the
SCHED_BATCH
or
SCHED_OTHER
policy if its nice value falls within the range permitted by its
RLIMIT_NICE
resource limit.
Other new features (yet to be detailed):
keyctl()KEYCTL_INSTANTIATE_IOV
and
KEYCTL_REJECT
operations.
A new inode flag,
FS_NOCOW_FL,
can be used to disable copy-on-write semantics on a filesystem
(such as Btrfs) that supports copy-on-write.
For details, see the
ioctl_iflags(2)
manual page.
A new
perf_event
cgroups controller make it possible to do
perf
monitoring per cgroup.
See also:
LWN
articles on the kernel 2.6.39 merge window
(1,
2,
3)
and the Kernel Newbies
kernel 2.6.39 summary.
Linux 2.6.38 (15 Mar 2011)
API changes include the following:
A new
AT_NO_AUTOMOUNT
flag for
fstatat(2),
which can be used to suppress automounting of the terminal
component of the pathname argument.
Further information can be found in the
fstatat(2)
manual page.
A new
CAP_SYSLOGcapability,
used (instead of
CAP_SYS_ADMIN)
to govern privileged
syslog(2)
operations.
Details can be found in the manual pages.
A new
FALLOC_FL_PUNCH_HOLE
operation for
fallocate(2).
This operation creates a hole (see page 83 of TLPI) in the file
in the byte range indicated by the
offset
and
len
arguments.
(The file data in the specified range is lost.)
Filesystem support is required for the
FALLOC_FL_PUNCH_HOLE
operation.
In the initial implementation, support is provided by just the
XFS filesystem.
As currently implemented,
FALLOC_FL_PUNCH_HOLE
must be specified with
FALLOC_FL_KEEP_SIZE,
which means that the size of a file can't change,
even if a hole is punched at the end of the file.
Further information can be found in the
fallocate(2)
manual page.
New
MADV_HUGEPAGE
and
MADV_NOHUGEPAGE
flags for
madvise(2).
These flags enable and disable an attribute on the memory region
that indicates that it is important that the region be backed by
huge pages,
when this is possible.
Further information on this feature can be found
in the Kernel source file
Documentation/vm/transhuge.txt
as well as
here,
here,
and in the
madvise(2)
manual page.
The new
/proc/sys/kernel/kptr_restrict
file can be used to prevent exposure of kernel pointers via
/proc
files and other interfaces.
(This affects how pointers are printed when using the new
%pK
specifier
for the kernel-internal
printf()
function.)
See the
proc(5)
manual page for further details.
The addition of the autogroup feature significantly changed
the semantics of the nice value.
For details, see the
sched(7)
manual page.
See also:
LWN
articles on the kernel 2.6.38 merge window
(1,
2)
and the Kernel Newbies
kernel 2.6.38 summary.
Linux 2.6.37 (5 Jan 2011)
API changes include the following:
The permissions on /proc/PID/limits
changed from readable for the owner only to readable for all users
on the system.
The
fanotify_init()
and
fanotify_mark()
system calls were added.
These system calls are designed for use in virus-scanning tools,
but may also serve other more general uses.
They provide functionality that is in some ways similar to
inotify(7).
Note, however, that the
fanotify
interface is not a superset of
inotify.
(The existence of two APIs with heavily overlapping functionality,
rather than a new API that is a superset of the earlier API,
is unfortunate.)
These two system calls were added in Linux 2.6.36,
but disabled while concerns about the API were resolved.
In Linux 2.6.37, the system calls have been enabled.
Documentation for these system calls can be found in the
fanotify_init(2)
and
fanotify_mark(2)
manual pages.
The
fanotify(7)
manual page provides an overview of the API.
The
TCP_USER_TIMEOUT
specifies the maximum amount of time
(in milliseconds) that transmitted
data may remain unacknowledged before TCP will forcibly
close the corresponding connection and return
ETIMEDOUT
to the application.
Details can be found in the
tcp(7)
manual page.
See also:
LWN
articles on the kernel 2.6.37 merge window
(1,
2)
and the Kernel Newbies
kernel 2.6.37 summary.
Linux 2.6.36 (20 Oct 2010)
API changes include the following:
The new
prlimit()
system call is an enhancement of
setrlimit()
and
getrlimit().
It allows the caller to both set and retrieve
its own resource limits
(including retrieving the old limit at the same time
as a new limit is set), and (with suitable permissions)
perform the same task for other processes.
This system call does not suffer
this kernel bug,
which affects
getrlimit()/setrlimit().
(See pages 759 and 760 of TLPI.)
Indeed, starting with version 2.13,
glibc provides library implementations
for
setrlimit()
and
getrlimit()
that employ
prlimit()
to work around the kernel bug.
I've added documentation of this system call to the
getrlimit(2)
manual page.
The
inotify
API adds a new flag,
IN_EXCL_UNLINK,
that prevents children of a watched directory
from generating events for a directory after they have been
unlinked from that directory.
I've added documentation of this flag to the
inotify(7)
manual page.
The OOM killer was been rewritten (again).
In the process, the
/proc/PID/oom_adj
file became obsolete, in favor of the new
/proc/PID/oom_score_adj
file.
For further information, see the
proc(5)
manual page.
As originally designed and implemented, the
inotifyIN_ONESHOT
flag did not cause an
IN_IGNORED
event to be generated when a watch was dropped
after an event was triggered.
Starting with Linux 2.6.36, an
IN_IGNORED
event is generated in this case.
(This was almost certainly an unintended consequence of some
code reworking during the 2.6.36 development cycle.)
The
statfs
structure returned by
statfs()
adds a field,
f_flags,
that returns a bit mask indicating various mount options
that a filesystem was mounted with.
Details can be found in the
statfs(2)
manual page.
This allows the
statvfs()
library function to more efficiently populate the information
returned in the
f_flag
field, as described in the
statvfs(3)
manual page.
See also:
LWN
articles on the kernel 2.6.36 merge window
(1,
2)
and the Kernel Newbies
kernel 2.6.36 summary.
glibc API changes
glibc 2.28 (Not yet released)
API changes include the following:
…
glibc 2.27 (1 Feb 2018)
API changes include the following:
…
glibc 2.26 (2 Aug 2017)
API changes include the following:
A new
reallocarray()
function can be used to reallocate an array buffer.
This function is to
calloc()
what
realloc()
is to
malloc().
The purpose of this function (as opposed to the use of
realloc(ptr, nmemb*size))
is to safely handle the condition
where reallocating an array of
nmemb
items of
size
bytes would lead to an overflow when calculating the value
nmemb*size.
Details can be found in the
reallocarray(3)
manual page.
glibc 2.25 (5 Feb 2017)
API changes include the following:
A new
getentropy(3)
function, which is implemented on top of
getrandom(2),
can be used to obtain a buffer of random data.
This function is nonstandard,
but is also present on at least OpenBSD.
Further details can be found in tha manual page.
A new
explicit_bzero(3)
function performs the same task as
bzero(3),
but the call is guaranteed never to be optimized away
by the compiler.
Details can be found in the manual page.
A large number of new math functions and macros, specified in
ISO/IEC TR 24731-2:2010,
ISO/IEC TS 18661-1:2014,
and
ISO/IEC TS 18661-4:2015, are added.
glibc 2.24 (2 Aug 2016)
API changes include the following:
With the exception of 32-bit and 64-bit Intel architectures,
the minimum kernel versions required for glibc 2.24 is Linux 3.2.
For intel architectures, the minimum kernel version is 2.6.32.
The
LD_POINTER_GUARD environment variable
can no longer be used to disable pointer guarding, which is now
always enabled.
Details can be found in the
ld.so(8)
manual page.
glibc 2.22 (5 Aug 2015)
API changes include the following:
Numerous bugs in the implementation of
fmemopen(3)
were fixed.
glibc 2.21 (6 Feb 2015)
No API changes (other than simple
wrappers for recently added Linux system calls).
glibc 2.20 (7 Sep 2014)
Note: the minimum Linux kernel version to run
with this and later glibc versions is Linux 2.6.32.
API changes include the following:
The
_BSD_SOURCE
and
_SVID_SOURCE
feature test macros are deprecated.
They now have the same effect as
_DEFAULT_SOURCE,
but generate a compile-time warning if used.
For further information, see the
feature_test_macros(7)
manual page.
glibc 2.19 (7 Feb 2014)
API changes include the following:
The
_BSD_SOURCE
feature test macro no longer causes BSD
definitions to be favored in a few cases where
standards conflict.
The affected APIs here include
getpgrp(),
setpgrp(),
sigpause(),
and
setjmp(),
and source code changes may be needed to maintain historical
behavior in applications that use these APIs.
For further information, see the
feature_test_macros(7)
manual page.
A new feature test macro,
_DEFAULT_SOURCE,
has been added.
Defining this macro provides an effect similar to
the feature test macros that are defined by default—that is,
_BSD_SOURCE,
_SVID_SOURCE, and
_POSIX_C_SOURCE=200809.
This macro can be defined to ensure that the "default"
definitions are provided even when the defaults would otherwise
be disabled,
as happens when individual macros are explicitly defined,
or the compiler is invoked in one of its "standard" modes (e.g.,
cc -std=c99).
For further information, see the
feature_test_macros(7)
manual page.
glibc 2.18 (10 Aug 2013)
API changes include the following:
New (nonstandard)
pthread_getattr_default_np()
and
pthread_setattr_default_np()
functions are added.
These functions permit the caller to
get and set the default attributes that are used to create
new threads (i.e., the attributes used when the
attr
argument of
pthread_create()
is
NULL).
glibc 2.17 (25 Dec 2012)
Note: the minimum Linux kernel version to run
with this and later glibc versions is Linux 2.6.16.
API changes include the following:
A new
secure_getenv()
function allows secure access to the environment.
It is similar to
getenv(3),
but returns
NULL
if running in a set-user-ID/set-group-ID process.
Documentation can be found in the
secure_getenv(3)
manual page.
The functions
clock_getres(),
clock_gettime(),
clock_settime(),
clock_getcpuclockid(),
and
clock_nanosleep(),
moved from the realtime library
(librt to the main C library.
Consequently, it is no longer necessary to link against
the realtime library
(cc -lrt)
when using these functions.
The rationale for this change is explained in
glibc bug 14743
("clock_gettime et al from -lrt always bring in libpthread").
glibc 2.16 (30 Jun 2012)
Note: this and subsequent glibc versions
are not expected to work with any Linux kernel less than version 2.6.
API changes include the following:
The glibc header files now handle the
_ISOC11_SOURCE
feature test macro,
as a mechanism for exposing declarations conforming to the
C11
standard.
A new
getauxval(3)
function allows retrieval of auxiliary vector
(AT_*)
key-value pairs passed from the Linux kernel.
Further information can be found in my LWN.net article
"getauxval() and the auxiliary vector"
and in the
getauxval(3)
manual page that I wrote.
glibc 2.15 (tagged 25 Dec 2011)
API changes include the following:
A new
scandirat()
function, which is to
scandir()
as
openat(2)
is to
open().
Documentation can be found in the
scandirat(3)
manual page.
A new
pldd
command lists the dynamic shared objects that are linked into
a process.
For further information, see the
pldd(1)
manual page.
glibc 2.14 (tagged 31 May 2011)
No API changes (other than simple
wrappers for recently added Linux system calls).
glibc 2.13 (tagged 17 Jan 2011)
API changes include the following:
Newly added library implementations of
setrlimit()
and
getrlimit()
bypass the system calls of the same name, instead using the
prlimit()
system call to avoid the bug described above
in the API changes for Linux 2.6.36.
POSIX/Single UNIX Specification updates
Since the last major release of the POSIX/SUS standard in 2008,
there have been some Technical Corrigenda—essentially
bug fix releases to the standard.
POSIX.1-2008 Technical Corrigendum 3 (in progress)