io_uring_setup_flags(7) — Linux manual page

NAME | DESCRIPTION | NOTES | SEE ALSO | COLOPHON

io_uring_setup_flags(7) Linux Programmer's Manual io_uring_setup_flags(7)

NAME         top

       io_uring_setup_flags - io_uring ring setup flags overview

DESCRIPTION         top

       When creating an io_uring instance with
       io_uring_queue_init_params(3) or io_uring_setup(2), various flags
       control the ring's behavior. These flags are set in the flags
       field of struct io_uring_params.

       Choosing the right flags can significantly impact performance.
       This page provides an overview of available flags, their purposes,
       and common combinations.

   Polling flags
       These flags control how I/O completion and submission polling
       works.

       IORING_SETUP_IOPOLL
           Enable I/O polling mode for file descriptors that support it.
           Instead of relying on interrupts, the kernel polls for
           completions. This reduces latency for high-performance storage
           devices (NVMe, etc.) but requires:

           • Files opened with O_DIRECT (if using the
             IORING_OP_{READ,WRITE}(V)(_FIXED) opcodes)

           • Hardware and drivers that support polling

           • The application to call io_uring_enter(2) to reap
             completions (busy-polling)

           • Storage device configuration for polling support

           Only the following opcodes are allowed on IOPOLL rings:

           • IORING_OP_NOP(128)IORING_OP_{READ,WRITE}(V)(_FIXED) (if the file supports
             busy-polling)

           • IORING_OP_FILES_UPDATEIORING_OP_{PROVIDE,REMOVE}_BUFFERSIORING_OP_MSG_RINGIORING_OP_URING_CMD(128)

           Since kernel 7.1, an IORING_OP_URING_CMD(128) request will use
           busy-polling if the file supports it (i.e., NVMe passthrough
           I/O commands).  Previously, IORING_OP_URING_CMD(128) was only
           allowed on files that supported busy-polling.

           Using IOPOLL generally requires storage device setup. For NVMe
           devices, the kernel parameter nvme.poll_queues=X must be set,
           where X is the number of completion queues on the NVMe device
           to set aside for polling operations.

       IORING_SETUP_SQPOLL
           Create a kernel thread that polls the submission queue.
           Eliminates the need for system calls to submit I/O. See
           io_uring_sqpoll(7) for details.

       IORING_SETUP_SQ_AFF
           Pin the SQPOLL thread to a specific CPU. Requires
           IORING_SETUP_SQPOLL.  The CPU is specified in sq_thread_cpu of
           struct io_uring_params.

       IORING_SETUP_HYBRID_IOPOLL
           Enable hybrid polling mode. Instead of pure busy-polling, the
           kernel uses an adaptive approach that may sleep briefly,
           reducing CPU usage while still providing low latency. This is
           a middle ground between interrupt-driven and pure polling
           modes.

   Task run flags
       These flags control when and how completion processing runs.

       IORING_SETUP_COOP_TASKRUN
           Disable interrupting the application for completion
           processing. Normally, the kernel signals the application when
           completions are ready, which can interrupt system calls. With
           this flag, completions are only processed when the application
           returns to userspace from any system call, not just io_uring-
           related ones. This means completions may be processed after
           read(2), write(2), poll(2), or any other syscall returns.

           This improves performance by eliminating asynchronous
           interrupts but requires the application to regularly enter the
           kernel to process completions. Recommended for most
           applications that have an event loop.

       IORING_SETUP_TASKRUN_FLAG
           When completions are pending, set IORING_SQ_TASKRUN in the SQ
           ring flags. This allows applications to check if there is
           completion work to process without making a system call.
           Typically used with IORING_SETUP_COOP_TASKRUN.

       IORING_SETUP_DEFER_TASKRUN
           Defer completion task work to when the application explicitly
           enters the kernel via io_uring_enter(2).  Unlike
           IORING_SETUP_COOP_TASKRUN, completions are only processed
           during io_uring-related syscalls, not on return from arbitrary
           syscalls. This provides the tightest and most predictable
           control over when completion processing occurs, as well as
           optimal cache behavior since work runs in the application's
           context.

           This flag should be considered the default mode for
           applications setting up a ring. It requires
           IORING_SETUP_SINGLE_ISSUER and a ring created per-thread. The
           application must regularly call io_uring_enter(2) (via
           io_uring_submit(3), io_uring_wait_cqe(3), or similar) to
           process deferred work; failing to do so will stall
           completions.

           Some features require this flag:

           • Ring resizing (io_uring_register_resize_rings(3))

           • Zero-copy receive (IORING_OP_RECV_ZC)

       IORING_SETUP_SINGLE_ISSUER
           Hint that only one task will submit requests to this ring.
           Enables internal optimizations including reduced locking
           overhead. The first task to submit a request becomes the
           designated submitter; others attempting to submit will get
           -EEXIST.

           Each thread or task having its own ring is the idiomatic use
           case for io_uring. Sharing a ring between multiple threads or
           tasks is discouraged as it requires additional synchronization
           and prevents many optimizations. Applications should create a
           ring per thread rather than sharing rings.

   Ring sizing flags
       These flags control the size and layout of the submission and
       completion queues.

       IORING_SETUP_CQSIZE
           Override the default completion queue size. By default, the CQ
           has twice as many entries as the SQ. Set cq_entries in struct
           io_uring_params to specify a custom CQ size. Must be a power
           of 2.

           Larger CQ sizes are useful when the application may submit
           many requests before processing completions, avoiding CQ
           overflow.

       IORING_SETUP_CLAMP
           Clamp the SQ and CQ sizes to the maximum allowed values
           instead of returning -EINVAL if the requested sizes are too
           large. Useful when the application wants the largest possible
           rings without querying limits.

       IORING_SETUP_SQE128
           Use 128-byte SQEs instead of the default 64 bytes. Required
           for some operations that need extra space, such as
           IORING_OP_URING_CMD passthrough commands.

       IORING_SETUP_CQE32
           Use 32-byte CQEs instead of the default 16 bytes. Required for
           operations that return extra data, such as some passthrough
           commands or when using IORING_OP_MSG_RING.

       IORING_SETUP_NO_SQARRAY
           Do not create the SQ array. The SQ array is a level of
           indirection that allows SQEs to be submitted in a different
           order than they appear in the ring. Most applications submit
           SQEs in order and do not need this.  This flag saves memory
           and is required for some modes like
           IORING_SETUP_REGISTERED_FD_ONLY.

       IORING_SETUP_SQ_REWIND
           Use non-circular submission queue mode. The kernel ignores the
           SQ head and tail pointers and instead fetches SQEs starting
           from index 0 on each submit. The application places all SQEs
           at the beginning of the ring before calling io_uring_enter(2),
           and the sq_entries parameter determines how many SQEs are
           submitted.

           Requires IORING_SETUP_NO_SQARRAY.  Not compatible with
           IORING_SETUP_SQPOLL.

           This mode keeps SQEs hot in cache by always accessing the same
           memory locations at the start of the ring, improving
           performance for workloads that submit small batches
           frequently.

       IORING_SETUP_CQE_MIXED
           Allow the ring to return a mix of 16-byte and 32-byte CQEs,
           controlled per-request. When a request needs a 32-byte CQE, it
           sets IOSQE_BIG_CQE in its flags. Otherwise, a 16-byte CQE is
           used. Requires IORING_SETUP_CQE32.

           This is useful when certain operations require 32-byte CQEs
           (such as some passthrough commands) but most operations do
           not. Using mixed mode instead of IORING_SETUP_CQE32 alone
           provides efficiency benefits in terms of memory bandwidth and
           usage, since the smaller 16-byte CQEs are used for operations
           that do not need the extra space.

       IORING_SETUP_SQE_MIXED
           Allow the ring to accept a mix of 64-byte and 128-byte SQEs.
           When a request needs a 128-byte SQE, it sets IOSQE_BIG_SQE in
           its flags. Requires IORING_SETUP_SQE128.

           This is useful when certain operations require 128-byte SQEs
           (such as IORING_OP_URING_CMD) but most operations do not.
           Using mixed mode instead of IORING_SETUP_SQE128 alone provides
           efficiency benefits in terms of memory bandwidth and usage,
           since the smaller 64-byte SQEs are used for operations that do
           not need the extra space.

   Memory and setup flags
       These flags control memory allocation and ring initialization.

       IORING_SETUP_NO_MMAP
           The application provides its own memory for the rings instead
           of the kernel allocating and the application mmap'ing it. The
           application fills in sq_off.user_addr, cq_off.user_addr, and
           sq_sqes.user_addr in struct io_uring_params with addresses of
           application-allocated memory.

           This is useful for placing rings in specific memory (huge
           pages, shared memory, etc.) or for creating rings without
           mmap.

       IORING_SETUP_REGISTERED_FD_ONLY
           The ring file descriptor is not installed in the process's
           file descriptor table. Instead, a "registered ring" index is
           returned in ring_fd that can be used with io_uring_enter(2)
           when IORING_ENTER_REGISTERED_RING is set. This reduces per-
           operation overhead.

           Requires IORING_SETUP_NO_SQARRAY.  The application must use
           io_uring_register_ring_fd(3) to use the ring or access it via
           the registered index.

       IORING_SETUP_R_DISABLED
           Create the ring in a disabled state. The ring will not accept
           submissions until it is enabled via io_uring_enable_rings(3).
           This is useful when setting up restrictions or registered
           resources before allowing I/O. See
           io_uring_register_restrictions(3).

   Submission flags
       These flags control submission behavior.

       IORING_SETUP_SUBMIT_ALL
           Continue processing submissions even if one fails. Normally,
           if an SQE fails during submission (not execution), subsequent
           SQEs in the same submit call are not processed. With this
           flag, all SQEs are processed regardless of earlier failures.

           The failed SQE still generates a CQE with the error; this flag
           only affects whether subsequent SQEs are submitted. This is
           probably the behavior most applications expect, since CQEs are
           generated for failed submissions anyway and the application
           must handle them regardless.

   Workqueue flags
       These flags control the async worker threads.

       IORING_SETUP_ATTACH_WQ
           Share the async worker thread pool with another ring. Set
           wq_fd in struct io_uring_params to the file descriptor of the
           ring to share with. This reduces resource usage when an
           application uses multiple rings.

           When combined with IORING_SETUP_SQPOLL, the SQPOLL thread is
           also shared.

   Common flag combinations
       High-performance single-threaded application:

               .flags = IORING_SETUP_SINGLE_ISSUER |
                        IORING_SETUP_DEFER_TASKRUN |
                        IORING_SETUP_COOP_TASKRUN

           This combination provides the best latency and throughput for
           applications where each thread has its own ring and processes
           completions in a dedicated event loop.

       Low-latency storage with polling:

               .flags = IORING_SETUP_IOPOLL |
                        IORING_SETUP_SINGLE_ISSUER |
                        IORING_SETUP_DEFER_TASKRUN

           For NVMe or other devices that support polling, this
           eliminates interrupt overhead. Combined with DEFER_TASKRUN for
           optimal completion handling.

       System call-free submission:

               .flags = IORING_SETUP_SQPOLL |
                        IORING_SETUP_SQ_AFF
               .sq_thread_cpu = preferred_cpu
               .sq_thread_idle = 1000

           For workloads that benefit from eliminating submission syscall
           overhead.  See io_uring_sqpoll(7).

       Multiple rings sharing resources:

               /* First ring */
               p1.flags = IORING_SETUP_SQPOLL;

               /* Subsequent rings */
               p2.flags = IORING_SETUP_SQPOLL | IORING_SETUP_ATTACH_WQ;
               p2.wq_fd = ring1_fd;

           Reduces kernel thread and workqueue overhead when using
           multiple rings.

NOTES         top

       • Not all flag combinations are valid. The kernel returns -EINVAL
         for incompatible combinations.

       • Some flags require specific kernel versions. Check
         io_uring_setup(2) for version requirements.

       • The io_uring_queue_init_params(3) function handles the
         complexity of ring setup. Using the raw io_uring_setup(2)
         syscall requires careful mmap setup.

       • For most applications with a proper event loop,
         IORING_SETUP_DEFER_TASKRUN combined with
         IORING_SETUP_SINGLE_ISSUER is the recommended default. This
         provides the best control over when completion work runs and
         optimal cache locality.

SEE ALSO         top

       io_uring(7), io_uring_sqpoll(7), io_uring_setup(2),
       io_uring_queue_init_params(3), io_uring_register_restrictions(3),
       io_uring_enable_rings(3)

COLOPHON         top

       This page is part of the liburing (A library for io_uring)
       project.  Information about the project can be found at 
       ⟨https://github.com/axboe/liburing⟩.  If you have a bug report for
       this manual page, send it to io-uring@vger.kernel.org.  This page
       was obtained from the project's upstream Git repository
       ⟨https://github.com/axboe/liburing⟩ on 2026-05-24.  (At that time,
       the date of the most recent commit that was found in the
       repository was 2026-05-18.)  If you discover any rendering
       problems in this HTML version of the page, or you believe there is
       a better or more up-to-date source for the page, or you have
       corrections or improvements to the information in this COLOPHON
       (which is not part of the original manual page), send a mail to
       man-pages@man7.org

Linux                        January 18, 2025     io_uring_setup_flags(7)

Pages that refer to this page: io_uring_setup(2)