|
NAME | SYNOPSIS | DESCRIPTION | RETURN VALUE | ERRORS | COLOPHON |
|
|
|
io_uring_register(2) Linux Programmer's Manual io_uring_register(2)
io_uring_register - register files or user buffers for
asynchronous I/O
#include <liburing.h>
int io_uring_register(unsigned int fd, unsigned int opcode,
void *arg, unsigned int nr_args);
The io_uring_register(2) system call registers resources (e.g.
user buffers, files, eventfd, personality, restrictions) for use
in an io_uring(7) instance referenced by fd. Registering files
or user buffers allows the kernel to take long term references to
internal data structures or create long term mappings of
application memory, greatly reducing per-I/O overhead.
fd is the file descriptor returned by a call to
io_uring_setup(2). opcode can be one of:
IORING_REGISTER_BUFFERS
arg points to a struct iovec array of nr_args entries.
The buffers associated with the iovecs will be locked in
memory and charged against the user's RLIMIT_MEMLOCK
resource limit. See getrlimit(2) for more information.
Additionally, there is a size limit of 1GiB per buffer.
Currently, the buffers must be anonymous, non-file-backed
memory, such as that returned by malloc(3) or mmap(2) with
the MAP_ANONYMOUS flag set. It is expected that this
limitation will be lifted in the future. Huge pages are
supported as well. Note that the entire huge page will be
pinned in the kernel, even if only a portion of it is
used.
After a successful call, the supplied buffers are mapped
into the kernel and eligible for I/O. To make use of
them, the application must specify the
IORING_OP_READ_FIXED or IORING_OP_WRITE_FIXED opcodes in
the submission queue entry (see the struct io_uring_sqe
definition in io_uring_enter(2)), and set the buf_index
field to the desired buffer index. The memory range
described by the submission queue entry's addr and len
fields must fall within the indexed buffer.
It is perfectly valid to setup a large buffer and then
only use part of it for an I/O, as long as the range is
within the originally mapped region.
An application can increase or decrease the size or number
of registered buffers by first unregistering the existing
buffers, and then issuing a new call to
io_uring_register(2) with the new buffers.
Note that before 5.13 registering buffers would wait for
the ring to idle. If the application currently has
requests in-flight, the registration will wait for those
to finish before proceeding.
An application need not unregister buffers explicitly
before shutting down the io_uring instance. Available
since 5.1.
IORING_REGISTER_BUFFERS2
Register buffers for I/O. Similar to
IORING_REGISTER_BUFFERS but aims to have a more extensible
ABI.
arg points to a struct io_uring_rsrc_register, and nr_args
should be set to the number of bytes in the structure.
struct io_uring_rsrc_register {
__u32 nr;
__u32 resv;
__u64 resv2;
__aligned_u64 data;
__aligned_u64 tags;
};
The data field contains a pointer to a struct iovec array
of nr entries. The tags field should either be 0, then
tagging is disabled, or point to an array of nr "tags"
(unsigned 64 bit integers). If a tag is zero, then
tagging for this particular resource (a buffer in this
case) is disabled. Otherwise, after the resource had been
unregistered and it's not used anymore, a CQE will be
posted with user_data set to the specified tag and all
other fields zeroed.
Note that resource updates, e.g.
IORING_REGISTER_BUFFERS_UPDATE, don't necessarily
deallocate resources by the time it returns, but they
might be held alive until all requests using it complete.
Available since 5.13.
IORING_REGISTER_BUFFERS_UPDATE
Updates registered buffers with new ones, either turning a
sparse entry into a real one, or replacing an existing
entry.
arg must contain a pointer to a struct
io_uring_rsrc_update2, which contains an offset on which
to start the update, and an array of struct iovec. tags
points to an array of tags. nr must contain the number of
descriptors in the passed in arrays. See
IORING_REGISTER_BUFFERS2 for the resource tagging
description.
struct io_uring_rsrc_update2 {
__u32 offset;
__u32 resv;
__aligned_u64 data;
__aligned_u64 tags;
__u32 nr;
__u32 resv2;
};
Available since 5.13.
IORING_UNREGISTER_BUFFERS
This operation takes no argument, and arg must be passed
as NULL. All previously registered buffers associated
with the io_uring instance will be released. Available
since 5.1.
IORING_REGISTER_FILES
Register files for I/O. arg contains a pointer to an
array of nr_args file descriptors (signed 32 bit
integers).
To make use of the registered files, the IOSQE_FIXED_FILE
flag must be set in the flags member of the struct
io_uring_sqe, and the fd member is set to the index of the
file in the file descriptor array.
The file set may be sparse, meaning that the fd field in
the array may be set to -1. See
IORING_REGISTER_FILES_UPDATE for how to update files in
place.
Note that before 5.13 registering files would wait for the
ring to idle. If the application currently has requests
in-flight, the registration will wait for those to finish
before proceeding. See IORING_REGISTER_FILES_UPDATE for
how to update an existing set without that limitation.
Files are automatically unregistered when the io_uring
instance is torn down. An application needs only
unregister if it wishes to register a new set of fds.
Available since 5.1.
IORING_REGISTER_FILES2
Register files for I/O. Similar to IORING_REGISTER_FILES.
arg points to a struct io_uring_rsrc_register, and nr_args
should be set to the number of bytes in the structure.
The data field contains a pointer to an array of nr file
descriptors (signed 32 bit integers). tags field should
either be 0 or or point to an array of nr "tags" (unsigned
64 bit integers). See IORING_REGISTER_BUFFERS2 for more
info on resource tagging.
Note that resource updates, e.g.
IORING_REGISTER_FILES_UPDATE, don't necessarily deallocate
resources, they might be held until all requests using
that resource complete.
Available since 5.13.
IORING_REGISTER_FILES_UPDATE
This operation replaces existing files in the registered
file set with new ones, either turning a sparse entry (one
where fd is equal to -1 ) into a real one, removing an
existing entry (new one is set to -1 ), or replacing an
existing entry with a new existing entry.
arg must contain a pointer to a struct
io_uring_files_update, which contains an offset on which
to start the update, and an array of file descriptors to
use for the update. nr_args must contain the number of
descriptors in the passed in array. Available since 5.5.
File descriptors can be skipped if they are set to
IORING_REGISTER_FILES_SKIP. Skipping an fd will not touch
the file associated with the previous fd at that index.
Available since 5.12.
IORING_REGISTER_FILES_UPDATE2
Similar to IORING_REGISTER_FILES_UPDATE, replaces existing
files in the registered file set with new ones, either
turning a sparse entry (one where fd is equal to -1 ) into
a real one, removing an existing entry (new one is set to
-1 ), or replacing an existing entry with a new existing
entry.
arg must contain a pointer to a struct
io_uring_rsrc_update2, which contains an offset on which
to start the update, and an array of file descriptors to
use for the update stored in data. tags points to an
array of tags. nr must contain the number of descriptors
in the passed in arrays. See IORING_REGISTER_BUFFERS2 for
the resource tagging description.
Available since 5.13.
IORING_UNREGISTER_FILES
This operation requires no argument, and arg must be
passed as NULL. All previously registered files
associated with the io_uring instance will be
unregistered. Available since 5.1.
IORING_REGISTER_EVENTFD
It's possible to use eventfd(2) to get notified of
completion events on an io_uring instance. If this is
desired, an eventfd file descriptor can be registered
through this operation. arg must contain a pointer to the
eventfd file descriptor, and nr_args must be 1. Note that
while io_uring generally takes care to avoid spurious
events, they can occur. Similarly, batched completions of
CQEs may only trigger a single eventfd notification even
if multiple CQEs are posted. The application should make
no assumptions on number of events being available having
a direct correlation to eventfd notifications posted. An
eventfd notification must thus only be treated as a hint
to check the CQ ring for completions. Available since 5.2.
An application can temporarily disable notifications,
coming through the registered eventfd, by setting the
IORING_CQ_EVENTFD_DISABLED bit in the flags field of the
CQ ring. Available since 5.8.
IORING_REGISTER_EVENTFD_ASYNC
This works just like IORING_REGISTER_EVENTFD , except
notifications are only posted for events that complete in
an async manner. This means that events that complete
inline while being submitted do not trigger a notification
event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD. Available since 5.6.
IORING_UNREGISTER_EVENTFD
Unregister an eventfd file descriptor to stop
notifications. Since only one eventfd descriptor is
currently supported, this operation takes no argument, and
arg must be passed as NULL and nr_args must be zero.
Available since 5.2.
IORING_REGISTER_PROBE
This operation returns a structure, io_uring_probe, which
contains information about the opcodes supported by
io_uring on the running kernel. arg must contain a
pointer to a struct io_uring_probe, and nr_args must
contain the size of the ops array in that probe struct.
The ops array is of the type io_uring_probe_op, which
holds the value of the opcode and a flags field. If the
flags field has IO_URING_OP_SUPPORTED set, then this
opcode is supported on the running kernel. Available since
5.6.
IORING_REGISTER_PERSONALITY
This operation registers credentials of the running
application with io_uring, and returns an id associated
with these credentials. Applications wishing to share a
ring between separate users/processes can pass in this
credential id in the sqe personality field. If set, that
particular sqe will be issued with these credentials. Must
be invoked with arg set to NULL and nr_args set to zero.
Available since 5.6.
IORING_UNREGISTER_PERSONALITY
This operation unregisters a previously registered
personality with io_uring. nr_args must be set to the id
in question, and arg must be set to NULL. Available since
5.6.
IORING_REGISTER_ENABLE_RINGS
This operation enables an io_uring ring started in a
disabled state (IORING_SETUP_R_DISABLED was specified in
the call to io_uring_setup(2)). While the io_uring ring
is disabled, submissions are not allowed and registrations
are not restricted.
After the execution of this operation, the io_uring ring
is enabled: submissions and registration are allowed, but
they will be validated following the registered
restrictions (if any). This operation takes no argument,
must be invoked with arg set to NULL and nr_args set to
zero. Available since 5.10.
IORING_REGISTER_RESTRICTIONS
arg points to a struct io_uring_restriction array of
nr_args entries.
With an entry it is possible to allow an
io_uring_register(2) opcode, or specify which opcode and
flags of the submission queue entry are allowed, or
require certain flags to be specified (these flags must be
set on each submission queue entry).
All the restrictions must be submitted with a single
io_uring_register(2) call and they are handled as an
allowlist (opcodes and flags not registered, are not
allowed).
Restrictions can be registered only if the io_uring ring
started in a disabled state (IORING_SETUP_R_DISABLED must
be specified in the call to io_uring_setup(2)).
Available since 5.10.
IORING_REGISTER_IOWQ_AFF
By default, async workers created by io_uring will inherit
the CPU mask of its parent. This is usually all the CPUs
in the system, unless the parent is being run with a
limited set. If this isn't the desired outcome, the
application may explicitly tell io_uring what CPUs the
async workers may run on. arg must point to a cpu_set_t
mask, and nr_args the byte size of that mask.
Available since 5.14.
IORING_UNREGISTER_IOWQ_AFF
Undoes a CPU mask previously set with
IORING_REGISTER_IOWQ_AFF. Must not have arg or nr_args
set.
Available since 5.14.
IORING_REGISTER_IOWQ_MAX_WORKERS
By default, io_uring limits the unbounded workers created
to the maximum processor count set by RLIMIT_NPROC and the
bounded workers is a function of the SQ ring size and the
number of CPUs in the system. Sometimes this can be
excessive (or too little, for bounded), and this command
provides a way to change the count per ring (per NUMA
node) instead.
arg must be set to an unsigned int pointer to an array of
two values, with the values in the array being set to the
maximum count of workers per NUMA node. Index 0 holds the
bounded worker count, and index 1 holds the unbounded
worker count. On successful return, the passed in array
will contain the previous maximum valyes for each type. If
the count being passed in is 0, then this command returns
the current maximum values and doesn't modify the current
setting. nr_args must be set to 2, as the command takes
two values.
Available since 5.15.
IORING_REGISTER_RING_FDS
Whenever io_uring_enter(2) is called to submit request or
wait for completions, the kernel must grab a reference to
the file descriptor. If the application using io_uring is
threaded, the file table is marked as shared, and the
reference grab and put of the file descriptor count is
more expensive than it is for a non-threaded application.
Similarly to how io_uring allows registration of files,
this allow registration of the ring file descriptor
itself. This reduces the overhead of the io_uring_enter(2)
system call.
arg must be set to an unsigned int pointer to an array of
type struct io_uring_rsrc_register of nr_args number of
entries. The data field of this struct must point to an
io_uring file descriptor, and the offset field can be
either -1 or an explicit offset desired for the registered
file descriptor value. If -1 is used, then upon successful
return of this system call, the field will contain the
value of the registered file descriptor to be used for
future io_uring_enter(2) system calls.
On successful completion of this request, the returned
descriptors may be used instead of the real file
descriptor for io_uring_enter(2), provided that
IORING_ENTER_REGISTERED_RING is set in the flags for the
system call. This flag tells the kernel that a registered
descriptor is used rather than a real file descriptor.
Each thread or process using a ring must register the file
descriptor directly by issuing this request.
The maximum number of supported registered ring
descriptors is currently limited to 16.
Available since 5.18.
IORING_UNREGISTER_RING_FDS
Unregister descriptors previously registered with
IORING_REGISTER_RING_FDS.
arg must be set to an unsigned int pointer to an array of
type struct io_uring_rsrc_register of nr_args number of
entries. Only the offset field should be set in the
structure, containing the registered file descriptor
offset previously returned from IORING_REGISTER_RING_FDS
that the application wishes to unregister.
Note that this isn't done automatically on ring exit, if
the thread or task that previously registered a ring file
descriptor isn't exiting. It is recommended to manually
unregister any previously registered ring descriptors if
the ring is closed and the task persists. This will free
up a registration slot, making it available for future
use.
Available since 5.18.
IORING_REGISTER_PBUF_RING
Registers a shared buffer ring to be used with provided
buffers. This is a newer alternative to using
IORING_OP_PROVIDE_BUFFERS which is more efficient, to be
used with request types that support the
IOSQE_BUFFER_SELECT flag.
The arg argument must be filled in with the appropriate
information. It looks as follows:
struct io_uring_buf_reg {
__u64 ring_addr;
__u32 ring_entries;
__u16 bgid;
__u16 pad;
__u64 resv[3];
};
The ring_addr field must contain the address to the
memory allocated to fit this ring. The memory must be
page aligned and hence allocated appropriately using eg
posix_memalign(3) or similar. The size of the ring is the
product of ring_entries and the size of struct
io_uring_buf. ring_entries is the desired size of the
ring, and must be a power-of-2 in size. The maximum size
allowed is 2^15 (32768). bgid is the buffer group ID
associated with this ring. SQEs that select a buffer have
a buffer group associated with them in their buf_group
field, and the associated CQEs will have
IORING_CQE_F_BUFFER set in their flags member, which will
also contain the specific ID of the buffer selected. The
rest of the fields are reserved and must be cleared to
zero.
nr_args must be set to 1.
Also see io_uring_register_buf_ring(3) for more details.
Available since 5.19.
IORING_UNREGISTER_PBUF_RING
Unregister a previously registered provided buffer ring.
arg must be set to the address of a struct
io_uring_buf_reg, with just the bgid field set to the
buffer group ID of the previously registered provided
buffer group. nr_args must be set to 1. Also see
IORING_REGISTER_PBUF_RING .
Available since 5.19.
IORING_REGISTER_SYNC_CANCEL
Performs a synchronous cancelation request, which works in
a similar fashion to IORING_OP_ASYNC_CANCEL except it
completes inline. This can be useful for scenarios where
cancelations should happen synchronously, rather than
needing to issue an SQE and wait for completion of that
specific CQE.
arg must be set to a pointer to a struct
io_uring_sync_cancel_reg structure, with the details
filled in for what request(s) to target for cancelation.
See io_uring_register_sync_cancel(3) for details on that.
The return values are the same, except they are passed
back synchronously rather than through the CQE res field.
nr_args must be set to 1.
Available since 6.0.
IORING_REGISTER_FILE_ALLOC_RANGE
sets the allowable range for fixed file index allocations
within the kernel. When requests that can instantiate a
new fixed file are used with IORING_FILE_INDEX_ALLOC , the
application is asking the kernel to allocate a new fixed
file descriptor rather than pass in a specific value for
one. By default, the kernel will pick any available fixed
file descriptor within the range available. This
effectively allows the application to set aside a range
just for dynamic allocations, with the remainder being
used for specific values.
nr_args must be set to 1 and arg must be set to a pointer
to a struct io_uring_file_index_range:
struct io_uring_file_index_range {
__u32 off;
__u32 len;
__u64 resv;
};
with off being set to the starting value for the range,
and len being set to the number of descriptors. The
reserved resv field must be cleared to zero.
The application must have registered a file table first.
Available since 6.0.
On success, io_uring_register(2) returns either 0 or a positive
value, depending on the opcode used. On error, a negative error
value is returned. The caller should not rely on the errno
variable.
EACCES The opcode field is not allowed due to registered
restrictions.
EBADF One or more fds in the fd array are invalid.
EBADFD IORING_REGISTER_ENABLE_RINGS or
IORING_REGISTER_RESTRICTIONS was specified, but the
io_uring ring is not disabled.
EBUSY IORING_REGISTER_BUFFERS or IORING_REGISTER_FILES or
IORING_REGISTER_RESTRICTIONS was specified, but there were
already buffers, files, or restrictions registered.
EFAULT buffer is outside of the process' accessible address
space, or iov_len is greater than 1GiB.
EINVAL IORING_REGISTER_BUFFERS or IORING_REGISTER_FILES was
specified, but nr_args is 0.
EINVAL IORING_REGISTER_BUFFERS was specified, but nr_args exceeds
UIO_MAXIOV
EINVAL IORING_UNREGISTER_BUFFERS or IORING_UNREGISTER_FILES was
specified, and nr_args is non-zero or arg is non-NULL.
EINVAL IORING_REGISTER_RESTRICTIONS was specified, but nr_args
exceeds the maximum allowed number of restrictions or
restriction opcode is invalid.
EMFILE IORING_REGISTER_FILES was specified and nr_args exceeds
the maximum allowed number of files in a fixed file set.
EMFILE IORING_REGISTER_FILES was specified and adding nr_args
file references would exceed the maximum allowed number of
files the user is allowed to have according to the
RLIMIT_NOFILE resource limit and the caller does not have
CAP_SYS_RESOURCE capability. Note that this is a per user
limit, not per process.
ENOMEM Insufficient kernel resources are available, or the caller
had a non-zero RLIMIT_MEMLOCK soft resource limit, but
tried to lock more memory than the limit permitted. This
limit is not enforced if the process is privileged
(CAP_IPC_LOCK).
ENXIO IORING_UNREGISTER_BUFFERS or IORING_UNREGISTER_FILES was
specified, but there were no buffers or files registered.
ENXIO Attempt to register files or buffers on an io_uring
instance that is already undergoing file or buffer
registration, or is being torn down.
EOPNOTSUPP
User buffers point to file-backed memory.
This page is part of the liburing (A library for io_uring)
project. Information about the project can be found at
⟨https://github.com/axboe/liburing⟩. If you have a bug report for
this manual page, send it to io-uring@vger.kernel.org. This page
was obtained from the project's upstream Git repository
⟨https://github.com/axboe/liburing⟩ on 2022-12-17. (At that
time, the date of the most recent commit that was found in the
repository was 2022-12-12.) If you discover any rendering
problems in this HTML version of the page, or you believe there
is a better or more up-to-date source for the page, or you have
corrections or improvements to the information in this COLOPHON
(which is not part of the original manual page), send a mail to
man-pages@man7.org
Linux 2019-01-17 io_uring_register(2)
Pages that refer to this page: io_uring_enter2(2), io_uring_enter(2), io_uring_register(2), io_uring_setup(2), syscalls(2), io_uring_prep_accept(3), io_uring_prep_accept_direct(3), io_uring_prep_fadvise(3), io_uring_prep_files_update(3), io_uring_prep_madvise(3), io_uring_prep_multishot_accept(3), io_uring_prep_multishot_accept_direct(3), io_uring_prep_openat2(3), io_uring_prep_openat2_direct(3), io_uring_prep_openat(3), io_uring_prep_openat_direct(3), io_uring_prep_provide_buffers(3), io_uring_prep_remove_buffers(3), io_uring_prep_splice(3), io_uring_prep_tee(3), io_uring_register_iowq_aff(3), io_uring_register_iowq_max_workers(3), io_uring_unregister_iowq_aff(3), io_uring(7)