|
NAME | DESCRIPTION | EXAMPLES | SEE ALSO | SEE ALSO | COLOPHON |
|
|
|
lxc.container.conf(5) lxc.container.conf(5)
lxc.container.conf - LXC container configuration file
LXC is the well-known and heavily tested low-level Linux container
runtime. It is in active development since 2008 and has proven
itself in critical production environments world-wide. Some of its
core contributors are the same people that helped to implement
various well-known containerization features inside the Linux
kernel.
LXC's main focus is system containers. That is, containers which
offer an environment as close as possible as the one you'd get
from a VM but without the overhead that comes with running a
separate kernel and simulating all the hardware.
This is achieved through a combination of kernel security features
such as namespaces, mandatory access control and control groups.
LXC has support for unprivileged containers. Unprivileged
containers are containers that are run without any privilege. This
requires support for user namespaces in the kernel that the
container is run on. LXC was the first runtime to support
unprivileged containers after user namespaces were merged into the
mainline kernel.
In essence, user namespaces isolate given sets of UIDs and GIDs.
This is achieved by establishing a mapping between a range of UIDs
and GIDs on the host to a different (unprivileged) range of UIDs
and GIDs in the container. The kernel will translate this mapping
in such a way that inside the container all UIDs and GIDs appear
as you would expect from the host whereas on the host these UIDs
and GIDs are in fact unprivileged. For example, a process running
as UID and GID 0 inside the container might appear as UID and GID
100000 on the host. The implementation and working details can be
gathered from the corresponding user namespace man page. UID and
GID mappings can be defined with the lxc.idmap key.
Linux containers are defined with a simple configuration file.
Each option in the configuration file has the form key = value
fitting in one line. The "#" character means the line is a
comment. List options, like capabilities and cgroups options, can
be used with no value to clear any previously defined values of
that option.
LXC namespaces configuration keys use single dots. This means
complex configuration keys such as lxc.net.0 expose various
subkeys such as lxc.net.0.type, lxc.net.0.link,
lxc.net.0.ipv6.address, and others for even more fine-grained
configuration.
CONFIGURATION
In order to ease administration of multiple related containers, it
is possible to have a container configuration file cause another
file to be loaded. For instance, network configuration can be
defined in one common file which is included by multiple
containers. Then, if the containers are moved to another host,
only one file may need to be updated.
lxc.include
Specify the file to be included. The included file must be
in the same valid lxc configuration file format.
ARCHITECTURE
Allows one to set the architecture for the container. For example,
set a 32bits architecture for a container running 32bits binaries
on a 64bits host. This fixes the container scripts which rely on
the architecture to do some work like downloading the packages.
lxc.arch
Specify the architecture for the container.
Some valid options are x86, i686, x86_64, amd64
HOSTNAME
The utsname section defines the hostname to be set for the
container. That means the container can set its own hostname
without changing the one from the system. That makes the hostname
private for the container.
lxc.uts.name
specify the hostname for the container
HALT SIGNAL
Allows one to specify signal name or number sent to the
container's init process to cleanly shutdown the container.
Different init systems could use different signals to perform
clean shutdown sequence. This option allows the signal to be
specified in kill(1) fashion, e.g. SIGPWR, SIGRTMIN+14,
SIGRTMAX-10 or plain number. The default signal is SIGPWR.
lxc.signal.halt
specify the signal used to halt the container
REBOOT SIGNAL
Allows one to specify signal name or number to reboot the
container. This option allows signal to be specified in kill(1)
fashion, e.g. SIGTERM, SIGRTMIN+14, SIGRTMAX-10 or plain number.
The default signal is SIGINT.
lxc.signal.reboot
specify the signal used to reboot the container
STOP SIGNAL
Allows one to specify signal name or number to forcibly shutdown
the container. This option allows signal to be specified in
kill(1) fashion, e.g. SIGKILL, SIGRTMIN+14, SIGRTMAX-10 or plain
number. The default signal is SIGKILL.
lxc.signal.stop
specify the signal used to stop the container
INIT COMMAND
Sets the command to use as the init system for the containers.
lxc.execute.cmd
Absolute path from container rootfs to the binary to run by
default. This mostly makes sense for lxc-execute.
lxc.init.cmd
Absolute path from container rootfs to the binary to use as
init. This mostly makes sense for lxc-start. Default is
/sbin/init.
INIT WORKING DIRECTORY
Sets the absolute path inside the container as the working
directory for the containers. LXC will switch to this directory
before executing init.
lxc.init.cwd
Absolute path inside the container to use as the working
directory.
INIT ID
Sets the UID/GID to use for the init system, and subsequent
commands. Note that using a non-root UID when booting a system
container will likely not work due to missing privileges. Setting
the UID/GID is mostly useful when running application containers.
Defaults to: UID(0), GID(0)
lxc.init.uid
UID to use for init.
lxc.init.gid
GID to use for init.
CORE SCHEDULING
Core scheduling defines if the container payload is marked as
being schedulable on the same core. Doing so will cause the kernel
scheduler to ensure that tasks that are not in the same group
never run simultaneously on a core. This can serve as an extra
security measure to prevent the container payload from using cross
hyper thread attacks.
lxc.sched.core
The only allowed values are 0 and 1. Set this to 1 to
create a core scheduling domain for the container or 0 to
not create one. If not set explicitly no core scheduling
domain will be created for the container.
PROC
Configure proc filesystem for the container.
lxc.proc.[proc file name]
Specify the proc file name to be set. The file names
available are those listed under /proc/PID/. Example:
lxc.proc.oom_score_adj = 10
EPHEMERAL
Allows one to specify whether a container will be destroyed on
shutdown.
lxc.ephemeral
The only allowed values are 0 and 1. Set this to 1 to
destroy a container on shutdown.
NETWORK
The network section defines how the network is virtualized in the
container. The network virtualization acts at layer two. In order
to use the network virtualization, parameters must be specified to
define the network interfaces of the container. Several virtual
interfaces can be assigned and used in a container even if the
system has only one physical network interface.
lxc.net
may be used without a value to clear all previous network
options.
lxc.net.[i].type
specify what kind of network virtualization to be used for
the container. Must be specified before any other
option(s) on the net device. Multiple networks can be
specified by using an additional index i after all
lxc.net.* keys. For example, lxc.net.0.type = veth and
lxc.net.1.type = veth specify two different networks of the
same type. All keys sharing the same index i will be
treated as belonging to the same network. For example,
lxc.net.0.link = br0 will belong to lxc.net.0.type.
Currently, the different virtualization types can be:
empty: will create only the loopback interface.
veth: a virtual ethernet pair device is created with one
side assigned to the container and the other side on the
host. lxc.net.[i].veth.mode specifies the mode the veth
parent will use on the host. The accepted modes are bridge
and router. The mode defaults to bridge if not specified.
In bridge mode the host side is attached to a bridge
specified by the lxc.net.[i].link option. If the bridge
link is not specified, then the veth pair device will be
created but not attached to any bridge. Otherwise, the
bridge has to be created on the system before starting the
container. lxc won't handle any configuration outside of
the container. In router mode static routes are created on
the host for the container's IP addresses pointing to the
host side veth interface. Additionally Proxy ARP and Proxy
NDP entries are added on the host side veth interface for
the gateway IPs defined in the container to allow the
container to reach the host. By default, lxc chooses a
name for the network device belonging to the outside of the
container, but if you wish to handle this name yourselves,
you can tell lxc to set a specific name with the
lxc.net.[i].veth.pair option (except for unprivileged
containers where this option is ignored for security
reasons). Static routes can be added on the host pointing
to the container using the lxc.net.[i].veth.ipv4.route and
lxc.net.[i].veth.ipv6.route options. Several lines specify
several routes. The route is in format x.y.z.t/m, eg.
192.168.1.0/24. In bridge mode untagged VLAN membership
can be set with the lxc.net.[i].veth.vlan.id option. It
accepts a special value of 'none' indicating that the
container port should be removed from the bridge's default
untagged VLAN. The lxc.net.[i].veth.vlan.tagged.id option
can be specified multiple times to set the container's
bridge port membership to one or more tagged VLANs.
vlan: a vlan interface is linked with the interface
specified by the lxc.net.[i].link and assigned to the
container. The vlan identifier is specified with the option
lxc.net.[i].vlan.id.
macvlan: a macvlan interface is linked with the interface
specified by the lxc.net.[i].link and assigned to the
container. lxc.net.[i].macvlan.mode specifies the mode the
macvlan will use to communicate between different macvlan
on the same upper device. The accepted modes are private,
vepa, bridge and passthru. In private mode, the device
never communicates with any other device on the same
upper_dev (default). In vepa mode, the new Virtual
Ethernet Port Aggregator (VEPA) mode, it assumes that the
adjacent bridge returns all frames where both source and
destination are local to the macvlan port, i.e. the bridge
is set up as a reflective relay. Broadcast frames coming in
from the upper_dev get flooded to all macvlan interfaces in
VEPA mode, local frames are not delivered locally. In
bridge mode, it provides the behavior of a simple bridge
between different macvlan interfaces on the same port.
Frames from one interface to another one get delivered
directly and are not sent out externally. Broadcast frames
get flooded to all other bridge ports and to the external
interface, but when they come back from a reflective relay,
we don't deliver them again. Since we know all the MAC
addresses, the macvlan bridge mode does not require
learning or STP like the bridge module does. In passthru
mode, all frames received by the physical interface are
forwarded to the macvlan interface. Only one macvlan
interface in passthru mode is possible for one physical
interface.
ipvlan: an ipvlan interface is linked with the interface
specified by the lxc.net.[i].link and assigned to the
container. lxc.net.[i].ipvlan.mode specifies the mode the
ipvlan will use to communicate between different ipvlan on
the same upper device. The accepted modes are l3, l3s and
l2. It defaults to l3 mode. In l3 mode TX processing up to
L3 happens on the stack instance attached to the dependent
device and packets are switched to the stack instance of
the parent device for the L2 processing and routing from
that instance will be used before packets are queued on the
outbound device. In this mode the dependent devices will
not receive nor can send multicast / broadcast traffic. In
l3s mode TX processing is very similar to the L3 mode
except that iptables (conn-tracking) works in this mode and
hence it is L3-symmetric (L3s). This will have slightly
less performance but that shouldn't matter since you are
choosing this mode over plain-L3 mode to make conn-tracking
work. In l2 mode TX processing happens on the stack
instance attached to the dependent device and packets are
switched and queued to the parent device to send devices
out. In this mode the dependent devices will RX/TX
multicast and broadcast (if applicable) as well.
lxc.net.[i].ipvlan.isolation specifies the isolation mode.
The accepted isolation values are bridge, private and vepa.
It defaults to bridge. In bridge isolation mode dependent
devices can cross-talk among themselves apart from talking
through the parent device. In private isolation mode the
port is set in private mode. i.e. port won't allow cross
communication between dependent devices. In vepa isolation
mode the port is set in VEPA mode. i.e. port will offload
switching functionality to the external entity as described
in 802.1Qbg.
phys: an already existing interface specified by the
lxc.net.[i].link is assigned to the container.
lxc.net.[i].flags
Specify an action to do for the network.
up: activates the interface.
lxc.net.[i].link
Specify the interface to be used for real network traffic.
lxc.net.[i].l2proxy
Controls whether layer 2 IP neighbour proxy entries will be
added to the lxc.net.[i].link interface for the IP
addresses of the container. Can be set to 0 or 1. Defaults
to 0. When used with IPv4 addresses, the following sysctl
values need to be set: net.ipv4.conf.[link].forwarding=1
When used with IPv6 addresses, the following sysctl values
need to be set: net.ipv6.conf.[link].proxy_ndp=1
net.ipv6.conf.[link].forwarding=1
lxc.net.[i].mtu
Specify the maximum transfer unit for this interface.
lxc.net.[i].name
The interface name is dynamically allocated, but if another
name is needed because the configuration files being used
by the container use a generic name, eg. eth0, this option
will rename the interface in the container.
lxc.net.[i].hwaddr
The interface mac address is dynamically allocated by
default to the virtual interface, but in some cases, this
is needed to resolve a mac address conflict or to always
have the same link-local ipv6 address. Any "x" in address
will be replaced by random value, this allows setting
hwaddr templates.
lxc.net.[i].ipv4.address
Specify the ipv4 address to assign to the virtualized
interface. Several lines specify several ipv4 addresses.
The address is in format x.y.z.t/m, eg. 192.168.1.123/24.
You can optionally specify the broadcast address after the
IP address, e.g. 192.168.1.123/24 255.255.255.255.
Otherwise it is automatically calculated from the IP
address.
lxc.net.[i].ipv4.gateway
Specify the ipv4 address to use as the gateway inside the
container. The address is in format x.y.z.t, eg.
192.168.1.123. Can also have the special value auto, which
means to take the primary address from the bridge interface
(as specified by the lxc.net.[i].link option) and use that
as the gateway. auto is only available when using the veth,
macvlan and ipvlan network types. Can also have the
special value of dev, which means to set the default
gateway as a device route. This is primarily for use with
layer 3 network modes, such as IPVLAN.
lxc.net.[i].ipv6.address
Specify the ipv6 address to assign to the virtualized
interface. Several lines specify several ipv6 addresses.
The address is in format x::y/m, eg.
2003:db8:1:0:214:1234:fe0b:3596/64
lxc.net.[i].ipv6.gateway
Specify the ipv6 address to use as the gateway inside the
container. The address is in format x::y, eg.
2003:db8:1:0::1 Can also have the special value auto, which
means to take the primary address from the bridge interface
(as specified by the lxc.net.[i].link option) and use that
as the gateway. auto is only available when using the veth,
macvlan and ipvlan network types. Can also have the
special value of dev, which means to set the default
gateway as a device route. This is primarily for use with
layer 3 network modes, such as IPVLAN.
lxc.net.[i].script.up
Add a configuration option to specify a script to be
executed after creating and configuring the network used
from the host side.
In addition to the information available to all hooks. The
following information is provided to the script:
• LXC_HOOK_TYPE: the hook type. This is either 'up' or
'down'.
• LXC_HOOK_SECTION: the section type 'net'.
• LXC_NET_TYPE: the network type. This is one of the valid
network types listed here (e.g. 'vlan', 'macvlan',
'ipvlan', 'veth').
• LXC_NET_PARENT: the parent device on the host. This is
only set for network types 'mavclan', 'veth', 'phys'.
• LXC_NET_PEER: the name of the peer device on the host.
This is only set for 'veth' network types. Note that this
information is only available when lxc.hook.version is
set to 1.
Whether this information is provided in the form of environment
variables or as arguments to the script depends on the value of
lxc.hook.version. If set to 1 then information is provided in the
form of environment variables. If set to 0 information is provided
as arguments to the script.
Standard output from the script is logged at debug level.
Standard error is not logged, but can be captured by the hook
redirecting its standard error to standard output.
lxc.net.[i].script.down
Add a configuration option to specify a script to be
executed before destroying the network used from the host
side.
In addition to the information available to all hooks. The
following information is provided to the script:
• LXC_HOOK_TYPE: the hook type. This is either 'up' or
'down'.
• LXC_HOOK_SECTION: the section type 'net'.
• LXC_NET_TYPE: the network type. This is one of the valid
network types listed here (e.g. 'vlan', 'macvlan',
'ipvlan', 'veth').
• LXC_NET_PARENT: the parent device on the host. This is
only set for network types 'mavclan', 'veth', 'phys'.
• LXC_NET_PEER: the name of the peer device on the host.
This is only set for 'veth' network types. Note that this
information is only available when lxc.hook.version is
set to 1.
Whether this information is provided in the form of environment
variables or as arguments to the script depends on the value of
lxc.hook.version. If set to 1 then information is provided in the
form of environment variables. If set to 0 information is provided
as arguments to the script.
Standard output from the script is logged at debug level.
Standard error is not logged, but can be captured by the hook
redirecting its standard error to standard output.
NEW PSEUDO TTY INSTANCE (DEVPTS)
For stricter isolation the container can have its own private
instance of the pseudo tty.
lxc.pty.max
If set, the container will have a new pseudo tty instance,
making this private to it. The value specifies the maximum
number of pseudo ttys allowed for a pty instance (this
limitation is not implemented yet).
CONTAINER SYSTEM CONSOLE
If the container is configured with a root filesystem and the
inittab file is setup to use the console, you may want to specify
where the output of this console goes.
lxc.console.buffer.size
Setting this option instructs liblxc to allocate an in-
memory ringbuffer. The container's console output will be
written to the ringbuffer. Note that ringbuffer must be at
least as big as a standard page size. When passed a value
smaller than a single page size liblxc will allocate a
ringbuffer of a single page size. A page size is usually
4KB. The keyword 'auto' will cause liblxc to allocate a
ringbuffer of 128KB. When manually specifying a size for
the ringbuffer the value should be a power of 2 when
converted to bytes. Valid size prefixes are 'KB', 'MB',
'GB'. (Note that all conversions are based on multiples of
1024. That means 'KB' == 'KiB', 'MB' == 'MiB', 'GB' ==
'GiB'. Additionally, the case of the suffix is ignored,
i.e. 'kB', 'KB' and 'Kb' are treated equally.)
lxc.console.size
Setting this option instructs liblxc to place a limit on
the size of the console log file specified in
lxc.console.logfile. Note that size of the log file must be
at least as big as a standard page size. When passed a
value smaller than a single page size liblxc will set the
size of log file to a single page size. A page size is
usually 4KB. The keyword 'auto' will cause liblxc to place
a limit of 128KB on the log file. When manually specifying
a size for the log file the value should be a power of 2
when converted to bytes. Valid size prefixes are 'KB',
'MB', 'GB'. (Note that all conversions are based on
multiples of 1024. That means 'KB' == 'KiB', 'MB' == 'MiB',
'GB' == 'GiB'. Additionally, the case of the suffix is
ignored, i.e. 'kB', 'KB' and 'Kb' are treated equally.) If
users want to mirror the console ringbuffer on disk they
should set lxc.console.size equal to
lxc.console.buffer.size.
lxc.console.logfile
Specify a path to a file where the console output will be
written. Note that in contrast to the on-disk ringbuffer
logfile this file will keep growing potentially filling up
the users disks if not rotated and deleted. This problem
can also be avoided by using the in-memory ringbuffer
options lxc.console.buffer.size and
lxc.console.buffer.logfile.
lxc.console.rotate
Whether to rotate the console logfile specified in
lxc.console.logfile. Users can send an API request to
rotate the logfile. Note that the old logfile will have the
same name as the original with the suffix ".1" appended.
Users wishing to prevent the console log file from filling
the disk should rotate the logfile and delete it if
unneeded. This problem can also be avoided by using the in-
memory ringbuffer options lxc.console.buffer.size and
lxc.console.buffer.logfile.
lxc.console.path
Specify a path to a device to which the console will be
attached. The keyword 'none' will simply disable the
console. Note, when specifying 'none' and creating a device
node for the console in the container at /dev/console or
bind-mounting the hosts's /dev/console into the container
at /dev/console the container will have direct access to
the hosts's /dev/console. This is dangerous when the
container has write access to the device and should thus be
used with caution.
CONSOLE THROUGH THE TTYS
This option is useful if the container is configured with a root
filesystem and the inittab file is setup to launch a getty on the
ttys. The option specifies the number of ttys to be available for
the container. The number of gettys in the inittab file of the
container should not be greater than the number of ttys specified
in this option, otherwise the excess getty sessions will die and
respawn indefinitely giving annoying messages on the console or in
/var/log/messages.
lxc.tty.max
Specify the number of tty to make available to the
container.
CONSOLE DEVICES LOCATION
LXC consoles are provided through Unix98 PTYs created on the host
and bind-mounted over the expected devices in the container. By
default, they are bind-mounted over /dev/console and /dev/ttyN.
This can prevent package upgrades in the guest. Therefore you can
specify a directory location (under /dev under which LXC will
create the files and bind-mount over them. These will then be
symbolically linked to /dev/console and /dev/ttyN. A package
upgrade can then succeed as it is able to remove and replace the
symbolic links.
lxc.tty.dir
Specify a directory under /dev under which to create the
container console devices. Note that LXC will move any
bind-mounts or device nodes for /dev/console into this
directory.
/DEV DIRECTORY
By default, lxc creates a few symbolic links
(fd,stdin,stdout,stderr) in the container's /dev directory but
does not automatically create device node entries. This allows the
container's /dev to be set up as needed in the container rootfs.
If lxc.autodev is set to 1, then after mounting the container's
rootfs LXC will mount a fresh tmpfs under /dev (limited to 500K by
default, unless defined in lxc.autodev.tmpfs.size) and fill in a
minimal set of initial devices. This is generally required when
starting a container containing a "systemd" based "init" but may
be optional at other times. Additional devices in the containers
/dev directory may be created through the use of the
lxc.hook.autodev hook.
lxc.autodev
Set this to 0 to stop LXC from mounting and populating a
minimal /dev when starting the container.
lxc.autodev.tmpfs.size
Set this to define the size of the /dev tmpfs. The default
value is 500000 (500K). If the parameter is used but
without value, the default value is used.
MOUNT POINTS
The mount points section specifies the different places to be
mounted. These mount points will be private to the container and
won't be visible by the processes running outside of the
container. This is useful to mount /etc, /var or /home for
examples.
NOTE - LXC will generally ensure that mount targets and relative
bind-mount sources are properly confined under the container root,
to avoid attacks involving over-mounting host directories and
files. (Symbolic links in absolute mount sources are ignored)
However, if the container configuration first mounts a directory
which is under the control of the container user, such as
/home/joe, into the container at some path, and then mounts under
path, then a TOCTTOU attack would be possible where the container
user modifies a symbolic link under their home directory at just
the right time.
lxc.mount.fstab
specify a file location in the fstab format, containing the
mount information. The mount target location can and in
most cases should be a relative path, which will become
relative to the mounted container root. For instance,
proc proc proc nodev,noexec,nosuid 0 0
Will mount a proc filesystem under the container's /proc,
regardless of where the root filesystem comes from. This is
resilient to block device backed filesystems as well as
container cloning.
Note that when mounting a filesystem from an image file or
block device the third field (fs_vfstype) cannot be auto as
with mount(8) but must be explicitly specified.
lxc.mount.entry
Specify a mount point corresponding to a line in the fstab
format. Moreover lxc supports mount propagation, such as
rshared or rprivate, and adds three additional mount
options. optional don't fail if mount does not work.
create=dir or create=file to create dir (or file) when the
point will be mounted. relative source path is taken to be
relative to the mounted container root. For instance,
dev/null proc/kcore none bind,relative 0 0
Will expand dev/null to ${LXC_ROOTFS_MOUNT}/dev/null, and
mount it to proc/kcore inside the container.
lxc.mount.auto
specify which standard kernel file systems should be
automatically mounted. This may dramatically simplify the
configuration. The file systems are:
• proc:mixed (or proc): mount /proc as read-write, but
remount /proc/sys and /proc/sysrq-trigger read-only for
security / container isolation purposes.
• proc:rw: mount /proc as read-write
• sys:mixed (or sys): mount /sys as read-only but with
/sys/devices/virtual/net writable.
• sys:ro: mount /sys as read-only for security / container
isolation purposes.
• sys:rw: mount /sys as read-write
• cgroup:mixed: Mount a tmpfs to /sys/fs/cgroup, create
directories for all hierarchies to which the container is
added, create subdirectories in those hierarchies with
the name of the cgroup, and bind-mount the container's
own cgroup into that directory. The container will be
able to write to its own cgroup directory, but not the
parents, since they will be remounted read-only.
• cgroup:mixed:force: The force option will cause LXC to
perform the cgroup mounts for the container under all
circumstances. Otherwise it is similar to cgroup:mixed.
This is mainly useful when the cgroup namespaces are
enabled where LXC will normally leave mounting cgroups to
the init binary of the container since it is perfectly
safe to do so.
• cgroup:ro: similar to cgroup:mixed, but everything will
be mounted read-only.
• cgroup:ro:force: The force option will cause LXC to
perform the cgroup mounts for the container under all
circumstances. Otherwise it is similar to cgroup:ro.
This is mainly useful when the cgroup namespaces are
enabled where LXC will normally leave mounting cgroups to
the init binary of the container since it is perfectly
safe to do so.
• cgroup:rw: similar to cgroup:mixed, but everything will
be mounted read-write. Note that the paths leading up to
the container's own cgroup will be writable, but will not
be a cgroup filesystem but just part of the tmpfs of
/sys/fs/cgroup
• cgroup:rw:force: The force option will cause LXC to
perform the cgroup mounts for the container under all
circumstances. Otherwise it is similar to cgroup:rw.
This is mainly useful when the cgroup namespaces are
enabled where LXC will normally leave mounting cgroups to
the init binary of the container since it is perfectly
safe to do so.
• cgroup (without specifier): defaults to cgroup:rw if the
container retains the CAP_SYS_ADMIN capability,
cgroup:mixed otherwise.
• cgroup-full:mixed: mount a tmpfs to /sys/fs/cgroup,
create directories for all hierarchies to which the
container is added, bind-mount the hierarchies from the
host to the container and make everything read-only
except the container's own cgroup. Note that compared to
cgroup, where all paths leading up to the container's own
cgroup are just simple directories in the underlying
tmpfs, here /sys/fs/cgroup/$hierarchy will contain the
host's full cgroup hierarchy, albeit read-only outside
the container's own cgroup. This may leak quite a bit of
information into the container.
• cgroup-full:mixed:force: The force option will cause LXC
to perform the cgroup mounts for the container under all
circumstances. Otherwise it is similar to
cgroup-full:mixed. This is mainly useful when the cgroup
namespaces are enabled where LXC will normally leave
mounting cgroups to the init binary of the container
since it is perfectly safe to do so.
• cgroup-full:ro: similar to cgroup-full:mixed, but
everything will be mounted read-only.
• cgroup-full:ro:force: The force option will cause LXC to
perform the cgroup mounts for the container under all
circumstances. Otherwise it is similar to
cgroup-full:ro. This is mainly useful when the cgroup
namespaces are enabled where LXC will normally leave
mounting cgroups to the init binary of the container
since it is perfectly safe to do so.
• cgroup-full:rw: similar to cgroup-full:mixed, but
everything will be mounted read-write. Note that in this
case, the container may escape its own cgroup. (Note also
that if the container has CAP_SYS_ADMIN support and can
mount the cgroup filesystem itself, it may do so anyway.)
• cgroup-full:rw:force: The force option will cause LXC to
perform the cgroup mounts for the container under all
circumstances. Otherwise it is similar to
cgroup-full:rw. This is mainly useful when the cgroup
namespaces are enabled where LXC will normally leave
mounting cgroups to the init binary of the container
since it is perfectly safe to do so.
• cgroup-full (without specifier): defaults to
cgroup-full:rw if the container retains the CAP_SYS_ADMIN
capability, cgroup-full:mixed otherwise.
If cgroup namespaces are enabled, then any cgroup auto-mounting
request will be ignored, since the container can mount the
filesystems itself, and automounting can confuse the container
init.
Note that if automatic mounting of the cgroup filesystem is
enabled, the tmpfs under /sys/fs/cgroup will always be mounted
read-write (but for the :mixed and :ro cases, the individual
hierarchies, /sys/fs/cgroup/$hierarchy, will be read-only). This
is in order to work around a quirk in Ubuntu's mountall(8) command
that will cause containers to wait for user input at boot if
/sys/fs/cgroup is mounted read-only and the container can't
remount it read-write due to a lack of CAP_SYS_ADMIN.
Examples:
lxc.mount.auto = proc sys cgroup
lxc.mount.auto = proc:rw sys:rw cgroup-full:rw
ROOT FILE SYSTEM
The root file system of the container can be different than that
of the host system.
lxc.rootfs.path
specify the root file system for the container. It can be
an image file, a directory or a block device. If not
specified, the container shares its root file system with
the host.
For directory or simple block-device backed containers, a
pathname can be used. If the rootfs is backed by a nbd
device, then nbd:file:1 specifies that file should be
attached to a nbd device, and partition 1 should be mounted
as the rootfs. nbd:file specifies that the nbd device
itself should be mounted. overlayfs:/lower:/upper specifies
that the rootfs should be an overlay with /upper being
mounted read-write over a read-only mount of /lower. For
overlay multiple /lower directories can be specified.
loop:/file tells lxc to attach /file to a loop device and
mount the loop device.
lxc.rootfs.mount
where to recursively bind lxc.rootfs.path before pivoting.
This is to ensure success of the pivot_root(8) syscall. Any
directory suffices, the default should generally work.
lxc.rootfs.options
Specify extra mount options to use when mounting the
rootfs. The format of the mount options corresponds to the
format used in fstab. In addition, LXC supports the custom
idmap= mount option. This option can be used to tell LXC to
create an idmapped mount for the container's rootfs. This
is useful when the user doesn't want to recursively chown
the rootfs of the container to match the idmapping of the
user namespace the container is going to use. Instead an
idmapped mount can be used to handle this. The argument
for idmap= can either be a path pointing to a user
namespace file that LXC will open and use to idmap the
rootfs or the special value "container" which will instruct
LXC to use the container's user namespace to idmap the
rootfs.
lxc.rootfs.managed
Set this to 0 to indicate that LXC is not managing the
container storage, then LXC will not modify the container
storage. The default is 1.
CONTROL GROUPS ("CGROUPS")
The control group section contains the configuration for the
different subsystem. lxc does not check the correctness of the
subsystem name. This has the disadvantage of not detecting
configuration errors until the container is started, but has the
advantage of permitting any future subsystem.
The kernel implementation of cgroups has changed significantly
over the years. With Linux 4.5 support for a new cgroup filesystem
was added usually referred to as "cgroup2" or "unified hierarchy".
Since then the old cgroup filesystem is usually referred to as
"cgroup1" or the "legacy hierarchies". Please see the cgroups
manual page for a detailed explanation of the differences between
the two versions.
LXC distinguishes settings for the legacy and the unified
hierarchy by using different configuration key prefixes. To alter
settings for controllers in a legacy hierarchy the key prefix
lxc.cgroup. must be used and in order to alter the settings for a
controller in the unified hierarchy the lxc.cgroup2. key must be
used. Note that LXC will ignore lxc.cgroup. settings on systems
that only use the unified hierarchy. Conversely, it will ignore
lxc.cgroup2. options on systems that only use legacy hierarchies.
At its core a cgroup hierarchy is a way to hierarchically organize
processes. Usually a cgroup hierarchy will have one or more
"controllers" enabled. A "controller" in a cgroup hierarchy is
usually responsible for distributing a specific type of system
resource along the hierarchy. Controllers include the "pids"
controller, the "cpu" controller, the "memory" controller and
others. Some controllers however do not fall into the category of
distributing a system resource, instead they are often referred to
as "utility" controllers. One utility controller is the device
controller. Instead of distributing a system resource it allows
one to manage device access.
In the legacy hierarchy the device controller was implemented like
most other controllers as a set of files that could be written to.
These files where named "devices.allow" and "devices.deny". The
legacy device controller allowed the implementation of both
"allowlists" and "denylists".
An allowlist is a device program that by default blocks access to
all devices. In order to access specific devices "allow rules" for
particular devices or device classes must be specified. In
contrast, a denylist is a device program that by default allows
access to all devices. In order to restrict access to specific
devices "deny rules" for particular devices or device classes must
be specified.
In the unified cgroup hierarchy the implementation of the device
controller has completely changed. Instead of files to read from
and write to a eBPF program of BPF_PROG_TYPE_CGROUP_DEVICE can be
attached to a cgroup. Even though the kernel implementation has
changed completely LXC tries to allow for the same semantics to be
followed in the legacy device cgroup and the unified eBPF-based
device controller. The following paragraphs explain the semantics
for the unified eBPF-based device controller.
As mentioned the format for specifying device rules for the
unified eBPF-based device controller is the same as for the legacy
cgroup device controller; only the configuration key prefix has
changed. Specifically, device rules for the legacy cgroup device
controller are specified via lxc.cgroup.devices.allow and
lxc.cgroup.devices.deny whereas for the cgroup2 eBPF-based device
controller lxc.cgroup2.devices.allow and lxc.cgroup2.devices.deny
must be used.
• A denylist device rule
lxc.cgroup2.devices.deny = a
will cause LXC to instruct the kernel to block access to all
devices by default. To grant access to devices allow device
rules must be added via the lxc.cgroup2.devices.allow key. This
is referred to as a "allowlist" device program.
• An allowlist device rule
lxc.cgroup2.devices.allow = a
will cause LXC to instruct the kernel to allow access to all
devices by default. To deny access to devices deny device rules
must be added via lxc.cgroup2.devices.deny key. This is
referred to as a "denylist" device program.
• Specifying any of the aforementioned two rules will cause all
previous rules to be cleared, i.e. the device list will be
reset.
• When an allowlist program is requested, i.e. access to all
devices is blocked by default, specific deny rules for
individual devices or device classes are ignored.
• When a denylist program is requested, i.e. access to all devices
is allowed by default, specific allow rules for individual
devices or device classes are ignored.
For example the set of rules:
lxc.cgroup2.devices.deny = a
lxc.cgroup2.devices.allow = c *:* m
lxc.cgroup2.devices.allow = b *:* m
lxc.cgroup2.devices.allow = c 1:3 rwm
implements an allowlist device program, i.e. the kernel will block
access to all devices not specifically allowed in this list. This
particular program states that all character and block devices may
be created but only /dev/null might be read or written.
If we instead switch to the following set of rules:
lxc.cgroup2.devices.allow = a
lxc.cgroup2.devices.deny = c *:* m
lxc.cgroup2.devices.deny = b *:* m
lxc.cgroup2.devices.deny = c 1:3 rwm
then LXC would instruct the kernel to implement a denylist, i.e.
the kernel will allow access to all devices not specifically
denied in this list. This particular program states that no
character devices or block devices might be created and that
/dev/null is not allow allowed to be read, written, or created.
Now consider the same program but followed by a "global rule"
which determines the type of device program (allowlist or
denylist) as explained above:
lxc.cgroup2.devices.allow = a
lxc.cgroup2.devices.deny = c *:* m
lxc.cgroup2.devices.deny = b *:* m
lxc.cgroup2.devices.deny = c 1:3 rwm
lxc.cgroup2.devices.allow = a
The last line will cause LXC to reset the device list without
changing the type of device program.
If we specify:
lxc.cgroup2.devices.allow = a
lxc.cgroup2.devices.deny = c *:* m
lxc.cgroup2.devices.deny = b *:* m
lxc.cgroup2.devices.deny = c 1:3 rwm
lxc.cgroup2.devices.deny = a
instead then the last line will cause LXC to reset the device list
and switch from an allowlist program to a denylist program.
lxc.cgroup.[controller name].[controller file]
Specify the control group value to be set on a legacy
cgroup hierarchy. The controller name is the literal name
of the control group. The permitted names and the syntax of
their values is not dictated by LXC, instead it depends on
the features of the Linux kernel running at the time the
container is started, eg. lxc.cgroup.cpuset.cpus
lxc.cgroup2.[controller name].[controller file]
Specify the control group value to be set on the unified
cgroup hierarchy. The controller name is the literal name
of the control group. The permitted names and the syntax of
their values is not dictated by LXC, instead it depends on
the features of the Linux kernel running at the time the
container is started, eg. lxc.cgroup2.memory.high
lxc.cgroup.dir
specify a directory or path in which the container's cgroup
will be created. For example, setting lxc.cgroup.dir =
my-cgroup/first for a container named "c1" will create the
container's cgroup as a sub-cgroup of "my-cgroup". For
example, if the user's current cgroup "my-user" is located
in the root cgroup of the cpuset controller in a cgroup v1
hierarchy this would create the cgroup
"/sys/fs/cgroup/cpuset/my-user/my-cgroup/first/c1" for the
container. Any missing cgroups will be created by LXC. This
presupposes that the user has write access to its current
cgroup.
lxc.cgroup.dir.container
This is similar to lxc.cgroup.dir, but must be used
together with lxc.cgroup.dir.monitor and affects only the
container's cgroup path. This option is mutually exclusive
with lxc.cgroup.dir. Note that the final path the
container attaches to may be extended further by the
lxc.cgroup.dir.container.inner option.
lxc.cgroup.dir.monitor
This is the monitor process counterpart to
lxc.cgroup.dir.container.
lxc.cgroup.dir.monitor.pivot
On container termination the PID of the monitor process is
attached to this cgroup. This path should not be a subpath
of any other configured cgroup dir to ensure proper removal
of other cgroup paths on container termination.
lxc.cgroup.dir.container.inner
Specify an additional subdirectory where the cgroup
namespace will be created. With this option, the cgroup
limits will be applied to the outer path specified in
lxc.cgroup.dir.container, which is not accessible from
within the container, making it possible to better enforce
limits for privileged containers in a way they cannot
override them. This only works in conjunction with the
lxc.cgroup.dir.container and lxc.cgroup.dir.monitor options
and has otherwise no effect.
lxc.cgroup.relative
Set this to 1 to instruct LXC to never escape to the root
cgroup. This makes it easy for users to adhere to
restrictions enforced by cgroup2 and systemd. Specifically,
this makes it possible to run LXC containers as systemd
services.
CAPABILITIES
The capabilities can be dropped in the container if this one is
run as root.
lxc.cap.drop
Specify the capability to be dropped in the container. A
single line defining several capabilities with a space
separation is allowed. The format is the lower case of the
capability definition without the "CAP_" prefix, eg.
CAP_SYS_MODULE should be specified as sys_module. See
capabilities(7). If used with no value, lxc will clear any
drop capabilities specified up to this point.
lxc.cap.keep
Specify the capability to be kept in the container. All
other capabilities will be dropped. When a special value of
"none" is encountered, lxc will clear any keep capabilities
specified up to this point. A value of "none" alone can be
used to drop all capabilities.
NAMESPACES
A namespace can be cloned (lxc.namespace.clone), kept
(lxc.namespace.keep) or shared (lxc.namespace.share.[namespace
identifier]).
lxc.namespace.clone
Specify namespaces which the container is supposed to be
created with. The namespaces to create are specified as a
space separated list. Each namespace must correspond to one
of the standard namespace identifiers as seen in the
/proc/PID/ns directory. When lxc.namespace.clone is not
explicitly set all namespaces supported by the kernel and
the current configuration will be used.
To create a new mount, net and ipc namespace set
lxc.namespace.clone=mount net ipc.
lxc.namespace.keep
Specify namespaces which the container is supposed to
inherit from the process that created it. The namespaces to
keep are specified as a space separated list. Each
namespace must correspond to one of the standard namespace
identifiers as seen in the /proc/PID/ns directory. The
lxc.namespace.keep is a denylist option, i.e. it is useful
when enforcing that containers must keep a specific set of
namespaces.
To keep the network, user and ipc namespace set
lxc.namespace.keep=user net ipc.
Note that sharing pid namespaces will likely not work with
most init systems.
Note that if the container requests a new user namespace
and the container wants to inherit the network namespace it
needs to inherit the user namespace as well.
lxc.namespace.share.[namespace identifier]
Specify a namespace to inherit from another container or
process. The [namespace identifier] suffix needs to be
replaced with one of the namespaces that appear in the
/proc/PID/ns directory.
To inherit the namespace from another process set the
lxc.namespace.share.[namespace identifier] to the PID of
the process, e.g. lxc.namespace.share.net=42.
To inherit the namespace from another container set the
lxc.namespace.share.[namespace identifier] to the name of
the container, e.g. lxc.namespace.share.pid=c3.
To inherit the namespace from another container located in
a different path than the standard liblxc path set the
lxc.namespace.share.[namespace identifier] to the full path
to the container, e.g. lxc.namespace.share.user=/opt/c3.
In order to inherit namespaces the caller needs to have
sufficient privilege over the process or container.
Note that sharing pid namespaces between system containers
will likely not work with most init systems.
Note that if two processes are in different user namespaces
and one process wants to inherit the other's network
namespace it usually needs to inherit the user namespace as
well.
Note that without careful additional configuration of an
LSM, sharing user+pid namespaces with a task may allow that
task to escalate privileges to that of the task calling
liblxc.
lxc.time.offset.boot
Specify a positive or negative offset for the boottime
clock. The format accepts hours (h), minutes (m), seconds
(s), milliseconds (ms), microseconds (us), and nanoseconds
(ns).
lxc.time.offset.monotonic
Specify a positive or negative offset for the monotonic
clock. The format accepts hours (h), minutes (m), seconds
(s), milliseconds (ms), microseconds (us), and nanoseconds
(ns).
RESOURCE LIMITS
The soft and hard resource limits for the container can be
changed. Unprivileged containers can only lower them. Resources
which are not explicitly specified will be inherited.
lxc.prlimit.[limit name]
Specify the resource limit to be set. A limit is specified
as two colon separated values which are either numeric or
the word 'unlimited'. A single value can be used as a
shortcut to set both soft and hard limit to the same value.
The permitted names the "RLIMIT_" resource names in
lowercase without the "RLIMIT_" prefix, eg. RLIMIT_NOFILE
should be specified as "nofile". See setrlimit(2). If used
with no value, lxc will clear the resource limit specified
up to this point. A resource with no explicitly configured
limitation will be inherited from the process starting up
the container.
SYSCTL
Configure kernel parameters for the container.
lxc.sysctl.[kernel parameters name]
Specify the kernel parameters to be set. The parameters
available are those listed under /proc/sys/. Note that not
all sysctls are namespaced. Changing Non-namespaced sysctls
will cause the system-wide setting to be modified.
sysctl(8). If used with no value, lxc will clear the
parameters specified up to this point.
APPARMOR PROFILE
If lxc was compiled and installed with apparmor support, and the
host system has apparmor enabled, then the apparmor profile under
which the container should be run can be specified in the
container configuration. The default is lxc-container-default-cgns
if the host kernel is cgroup namespace aware, or lxc-container-
default otherwise.
lxc.apparmor.profile
Specify the apparmor profile under which the container
should be run. To specify that the container should be
unconfined, use
lxc.apparmor.profile = unconfined
If the apparmor profile should remain unchanged (i.e. if
you are nesting containers and are already confined), then
use
lxc.apparmor.profile = unchanged
If you instruct LXC to generate the apparmor profile, then
use
lxc.apparmor.profile = generated
lxc.apparmor.allow_incomplete
Apparmor profiles are pathname based. Therefore many file
restrictions require mount restrictions to be effective
against a determined attacker. However, these mount
restrictions are not yet implemented in the upstream
kernel. Without the mount restrictions, the apparmor
profiles still protect against accidental damager.
If this flag is 0 (default), then the container will not be
started if the kernel lacks the apparmor mount features, so
that a regression after a kernel upgrade will be detected.
To start the container under partial apparmor protection,
set this flag to 1.
lxc.apparmor.allow_nesting
If set this to 1, causes the following changes. When
generated apparmor profiles are used, they will contain the
necessary changes to allow creating a nested container. In
addition to the usual mount points, /dev/.lxc/proc and
/dev/.lxc/sys will contain procfs and sysfs mount points
without the lxcfs overlays, which, if generated apparmor
profiles are being used, will not be read/writable
directly.
lxc.apparmor.raw
A list of raw AppArmor profile lines to append to the
profile. Only valid when using generated profiles.
SELINUX CONTEXT
If lxc was compiled and installed with SELinux support, and the
host system has SELinux enabled, then the SELinux context under
which the container should be run can be specified in the
container configuration. The default is unconfined_t, which means
that lxc will not attempt to change contexts. See
/usr/share/lxc/selinux/lxc.te for an example policy and more
information.
lxc.selinux.context
Specify the SELinux context under which the container
should be run or unconfined_t. For example
lxc.selinux.context = system_u:system_r:lxc_t:s0:c22
lxc.selinux.context.keyring
Specify the SELinux context under which the container's
keyring should be created. By default this the same as
lxc.selinux.context, or the context lxc is executed under
if lxc.selinux.context has not been set.
lxc.selinux.context.keyring = system_u:system_r:lxc_t:s0:c22
KERNEL KEYRING
The Linux Keyring facility is primarily a way for various kernel
components to retain or cache security data, authentication keys,
encryption keys, and other data in the kernel. By default lxc will
create a new session keyring for the started application.
lxc.keyring.session
Disable the creation of new session keyring by lxc. The
started application will then inherit the current session
keyring. By default, or when passing the value 1, a new
keyring will be created.
lxc.keyring.session = 0
SECCOMP CONFIGURATION
A container can be started with a reduced set of available system
calls by loading a seccomp profile at startup. The seccomp
configuration file must begin with a version number on the first
line, a policy type on the second line, followed by the
configuration.
Versions 1 and 2 are currently supported. In version 1, the policy
is a simple allowlist. The second line therefore must read
"allowlist", with the rest of the file containing one (numeric)
syscall number per line. Each syscall number is allowlisted, while
every unlisted number is denylisted for use in the container
In version 2, the policy may be denylist or allowlist, supports
per-rule and per-policy default actions, and supports per-
architecture system call resolution from textual names.
An example denylist policy, in which all system calls are allowed
except for mknod, which will simply do nothing and return 0
(success), looks like:
2
denylist
mknod errno 0
ioctl notify
Specifying "errno" as action will cause LXC to register a seccomp
filter that will cause a specific errno to be returned to the
caller. The errno value can be specified after the "errno" action
word.
Specifying "notify" as action will cause LXC to register a seccomp
listener and retrieve a listener file descriptor from the kernel.
When a syscall is made that is registered as "notify" the kernel
will generate a poll event and send a message over the file
descriptor. The caller can read this message, inspect the syscalls
including its arguments. Based on this information the caller is
expected to send back a message informing the kernel which action
to take. Until that message is sent the kernel will block the
calling process. The format of the messages to read and sent is
documented in seccomp itself.
lxc.seccomp.profile
Specify a file containing the seccomp configuration to load
before the container starts.
lxc.seccomp.allow_nesting
If this flag is set to 1, then seccomp filters will be
stacked regardless of whether a seccomp profile is already
loaded. This allows nested containers to load their own
seccomp profile. The default setting is 0.
lxc.seccomp.notify.proxy
Specify a unix socket to which LXC will connect and forward
seccomp events to. The path must be in the form
unix:/path/to/socket or unix:@socket. The former specifies
a path-bound unix domain socket while the latter specifies
an abstract unix domain socket.
lxc.seccomp.notify.cookie
An additional string sent along with proxied seccomp
notification requests.
PR_SET_NO_NEW_PRIVS
With PR_SET_NO_NEW_PRIVS active execve() promises not to grant
privileges to do anything that could not have been done without
the execve() call (for example, rendering the set-user-ID and set-
group-ID mode bits, and file capabilities non-functional). Once
set, this bit cannot be unset. The setting of this bit is
inherited by children created by fork() and clone(), and preserved
across execve(). Note that PR_SET_NO_NEW_PRIVS is applied after
the container has changed into its intended AppArmor profile or
SElinux context.
lxc.no_new_privs
Specify whether the PR_SET_NO_NEW_PRIVS flag should be set
for the container. Set to 1 to activate.
UID MAPPINGS
A container can be started in a private user namespace with user
and group id mappings. For instance, you can map userid 0 in the
container to userid 200000 on the host. The root user in the
container will be privileged in the container, but unprivileged on
the host. Normally a system container will want a range of ids, so
you would map, for instance, user and group ids 0 through 20,000
in the container to the ids 200,000 through 220,000.
lxc.idmap
Four values must be provided. First a character, either
'u', or 'g', to specify whether user or group ids are being
mapped. Next is the first userid as seen in the user
namespace of the container. Next is the userid as seen on
the host. Finally, a range indicating the number of
consecutive ids to map.
CONTAINER HOOKS
Container hooks are programs or scripts which can be executed at
various times in a container's lifetime.
When a container hook is executed, additional information is
passed along. The lxc.hook.version argument can be used to
determine if the following arguments are passed as command line
arguments or through environment variables. The arguments are:
• Container name.
• Section (always 'lxc').
• The hook type (i.e. 'clone' or 'pre-mount').
• Additional arguments. In the case of the clone hook, any extra
arguments passed will appear as further arguments to the hook.
In the case of the stop hook, paths to filedescriptors for each
of the container's namespaces along with their types are passed.
The following environment variables are set:
• LXC_CGNS_AWARE: indicator whether the container is cgroup
namespace aware.
• LXC_CONFIG_FILE: the path to the container configuration file.
• LXC_HOOK_TYPE: the hook type (e.g. 'clone', 'mount', 'pre-
mount'). Note that the existence of this environment variable is
conditional on the value of lxc.hook.version. If it is set to 1
then LXC_HOOK_TYPE will be set.
• LXC_HOOK_SECTION: the section type (e.g. 'lxc', 'net'). Note
that the existence of this environment variable is conditional
on the value of lxc.hook.version. If it is set to 1 then
LXC_HOOK_SECTION will be set.
• LXC_HOOK_VERSION: the version of the hooks. This value is
identical to the value of the container's lxc.hook.version
config item. If it is set to 0 then old-style hooks are used. If
it is set to 1 then new-style hooks are used.
• LXC_LOG_LEVEL: the container's log level.
• LXC_NAME: is the container's name.
• LXC_[NAMESPACE IDENTIFIER]_NS: path under /proc/PID/fd/ to a
file descriptor referring to the container's namespace. For each
preserved namespace type there will be a separate environment
variable. These environment variables will only be set if
lxc.hook.version is set to 1.
• LXC_ROOTFS_MOUNT: the path to the mounted root filesystem.
• LXC_ROOTFS_PATH: this is the lxc.rootfs.path entry for the
container. Note this is likely not where the mounted rootfs is
to be found, use LXC_ROOTFS_MOUNT for that.
• LXC_SRC_NAME: in the case of the clone hook, this is the
original container's name.
Standard output from the hooks is logged at debug level. Standard
error is not logged, but can be captured by the hook redirecting
its standard error to standard output.
lxc.hook.version
To pass the arguments in new style via environment
variables set to 1 otherwise set to 0 to pass them as
arguments. This setting affects all hooks arguments that
were traditionally passed as arguments to the script.
Specifically, it affects the container name, section (e.g.
'lxc', 'net') and hook type (e.g. 'clone', 'mount', 'pre-
mount') arguments. If new-style hooks are used then the
arguments will be available as environment variables. The
container name will be set in LXC_NAME. (This is set
independently of the value used for this config item.) The
section will be set in LXC_HOOK_SECTION and the hook type
will be set in LXC_HOOK_TYPE. It also affects how the
paths to file descriptors referring to the container's
namespaces are passed. If set to 1 then for each namespace
a separate environment variable LXC_[NAMESPACE
IDENTIFIER]_NS will be set. If set to 0 then the paths will
be passed as arguments to the stop hook.
lxc.hook.pre-start
A hook to be run in the host's namespace before the
container ttys, consoles, or mounts are up.
lxc.hook.pre-mount
A hook to be run in the container's fs namespace but before
the rootfs has been set up. This allows for manipulation of
the rootfs, i.e. to mount an encrypted filesystem. Mounts
done in this hook will not be reflected on the host (apart
from mounts propagation), so they will be automatically
cleaned up when the container shuts down.
lxc.hook.mount
A hook to be run in the container's namespace after
mounting has been done, but before the pivot_root.
lxc.hook.autodev
A hook to be run in the container's namespace after
mounting has been done and after any mount hooks have run,
but before the pivot_root, if lxc.autodev == 1. The
purpose of this hook is to assist in populating the /dev
directory of the container when using the autodev option
for systemd based containers. The container's /dev
directory is relative to the ${LXC_ROOTFS_MOUNT}
environment variable available when the hook is run.
lxc.hook.start-host
A hook to be run in the host's namespace after the
container has been setup, and immediately before starting
the container init.
lxc.hook.start
A hook to be run in the container's namespace immediately
before executing the container's init. This requires the
program to be available in the container.
lxc.hook.stop
A hook to be run in the host's namespace with references to
the container's namespaces after the container has been
shut down. For each namespace an extra argument is passed
to the hook containing the namespace's type and a filename
that can be used to obtain a file descriptor to the
corresponding namespace, separated by a colon. The type is
the name as it would appear in the /proc/PID/ns directory.
For instance for the mount namespace the argument usually
looks like mnt:/proc/PID/fd/12.
lxc.hook.post-stop
A hook to be run in the host's namespace after the
container has been shut down.
lxc.hook.clone
A hook to be run when the container is cloned to a new one.
See lxc-clone(1) for more information.
lxc.hook.destroy
A hook to be run when the container is destroyed.
CONTAINER HOOKS ENVIRONMENT VARIABLES
A number of environment variables are made available to the
startup hooks to provide configuration information and assist in
the functioning of the hooks. Not all variables are valid in all
contexts. In particular, all paths are relative to the host system
and, as such, not valid during the lxc.hook.start hook.
LXC_NAME
The LXC name of the container. Useful for logging messages
in common log environments. [-n]
LXC_CONFIG_FILE
Host relative path to the container configuration file.
This gives the container to reference the original, top
level, configuration file for the container in order to
locate any additional configuration information not
otherwise made available. [-f]
LXC_CONSOLE
The path to the console output of the container if not
NULL. [-c] [lxc.console.path]
LXC_CONSOLE_LOGPATH
The path to the console log output of the container if not
NULL. [-L]
LXC_ROOTFS_MOUNT
The mount location to which the container is initially
bound. This will be the host relative path to the
container rootfs for the container instance being started
and is where changes should be made for that instance.
[lxc.rootfs.mount]
LXC_ROOTFS_PATH
The host relative path to the container root which has been
mounted to the rootfs.mount location. [lxc.rootfs.path]
LXC_SRC_NAME
Only for the clone hook. Is set to the original container
name.
LXC_TARGET
Only for the stop hook. Is set to "stop" for a container
shutdown or "reboot" for a container reboot.
LXC_CGNS_AWARE
If unset, then this version of lxc is not aware of cgroup
namespaces. If set, it will be set to 1, and lxc is aware
of cgroup namespaces. Note this does not guarantee that
cgroup namespaces are enabled in the kernel. This is used
by the lxcfs mount hook.
LOGGING
Logging can be configured on a per-container basis. By default,
depending upon how the lxc package was compiled, container startup
is logged only at the ERROR level, and logged to a file named
after the container (with '.log' appended) either under the
container path, or under /var/log/lxc.
Both the default log level and the log file can be specified in
the container configuration file, overriding the default behavior.
Note that the configuration file entries can in turn be overridden
by the command line options to lxc-start.
lxc.log.level
The level at which to log. The log level is an integer in
the range of 0..8 inclusive, where a lower number means
more verbose debugging. In particular 0 = trace, 1 = debug,
2 = info, 3 = notice, 4 = warn, 5 = error, 6 = critical, 7
= alert, and 8 = fatal. If unspecified, the level defaults
to 5 (error), so that only errors and above are logged.
Note that when a script (such as either a hook script or a
network interface up or down script) is called, the
script's standard output is logged at level 1, debug.
lxc.log.file
The file to which logging info should be written.
lxc.log.syslog
Send logging info to syslog. It respects the log level
defined in lxc.log.level. The argument should be the syslog
facility to use, valid ones are: daemon, local0, local1,
local2, local3, local4, local5, local5, local6, local7.
AUTOSTART
The autostart options support marking which containers should be
auto-started and in what order. These options may be used by LXC
tools directly or by external tooling provided by the
distributions.
lxc.start.auto
Whether the container should be auto-started. Valid values
are 0 (off) and 1 (on).
lxc.start.delay
How long to wait (in seconds) after the container is
started before starting the next one.
lxc.start.order
An integer used to sort the containers when auto-starting a
series of containers at once. A lower value means an
earlier start.
lxc.monitor.unshare
If not zero the mount namespace will be unshared from the
host before initializing the container (before running any
pre-start hooks). This requires the CAP_SYS_ADMIN
capability at startup. Default is 0.
lxc.monitor.signal.pdeath
Set the signal to be sent to the container's init when the
lxc monitor exits. By default it is set to SIGKILL which
will cause all container processes to be killed when the
lxc monitor process dies. To ensure that containers stay
alive even if lxc monitor dies set this to 0.
lxc.group
A multi-value key (can be used multiple times) to put the
container in a container group. Those groups can then be
used (amongst other things) to start a series of related
containers.
AUTOSTART AND SYSTEM BOOT
Each container can be part of any number of groups or no group at
all. Two groups are special. One is the NULL group, i.e. the
container does not belong to any group. The other group is the
"onboot" group.
When the system boots with the LXC service enabled, it will first
attempt to boot any containers with lxc.start.auto == 1 that is a
member of the "onboot" group. The startup will be in order of
lxc.start.order. If an lxc.start.delay has been specified, that
delay will be honored before attempting to start the next
container to give the current container time to begin
initialization and reduce overloading the host system. After
starting the members of the "onboot" group, the LXC system will
proceed to boot containers with lxc.start.auto == 1 which are not
members of any group (the NULL group) and proceed as with the
onboot group.
CONTAINER ENVIRONMENT
If you want to pass environment variables into the container (that
is, environment variables which will be available to init and all
of its descendents), you can use lxc.environment parameters to do
so. Be careful that you do not pass in anything sensitive; any
process in the container which doesn't have its environment
scrubbed will have these variables available to it, and
environment variables are always available via /proc/PID/environ.
This configuration parameter can be specified multiple times; once
for each environment variable you wish to configure.
lxc.environment
Specify an environment variable to pass into the container.
Example:
lxc.environment = APP_ENV=production
lxc.environment = SYSLOG_SERVER=192.0.2.42
It is possible to inherit host environment variables by
setting the name of the variable without a "=" sign. For
example:
lxc.environment = PATH
In addition to the few examples given below, you will find some
other examples of configuration file in
/usr/share/doc/lxc/examples
NETWORK
This configuration sets up a container to use a veth pair device
with one side plugged to a bridge br0 (which has been configured
before on the system by the administrator). The virtual network
device visible in the container is renamed to eth0.
lxc.uts.name = myhostname
lxc.net.0.type = veth
lxc.net.0.flags = up
lxc.net.0.link = br0
lxc.net.0.name = eth0
lxc.net.0.hwaddr = 4a:49:43:49:79:bf
lxc.net.0.ipv4.address = 10.2.3.5/24 10.2.3.255
lxc.net.0.ipv6.address = 2003:db8:1:0:214:1234:fe0b:3597
UID/GID MAPPING
This configuration will map both user and group ids in the range
0-9999 in the container to the ids 100000-109999 on the host.
lxc.idmap = u 0 100000 10000
lxc.idmap = g 0 100000 10000
CONTROL GROUP
This configuration will setup several control groups for the
application, cpuset.cpus restricts usage of the defined cpu,
cpus.share prioritize the control group, devices.allow makes
usable the specified devices.
lxc.cgroup.cpuset.cpus = 0,1
lxc.cgroup.cpu.shares = 1234
lxc.cgroup.devices.deny = a
lxc.cgroup.devices.allow = c 1:3 rw
lxc.cgroup.devices.allow = b 8:0 rw
COMPLEX CONFIGURATION
This example show a complex configuration making a complex network
stack, using the control groups, setting a new hostname, mounting
some locations and a changing root file system.
lxc.uts.name = complex
lxc.net.0.type = veth
lxc.net.0.flags = up
lxc.net.0.link = br0
lxc.net.0.hwaddr = 4a:49:43:49:79:bf
lxc.net.0.ipv4.address = 10.2.3.5/24 10.2.3.255
lxc.net.0.ipv6.address = 2003:db8:1:0:214:1234:fe0b:3597
lxc.net.0.ipv6.address = 2003:db8:1:0:214:5432:feab:3588
lxc.net.1.type = macvlan
lxc.net.1.flags = up
lxc.net.1.link = eth0
lxc.net.1.hwaddr = 4a:49:43:49:79:bd
lxc.net.1.ipv4.address = 10.2.3.4/24
lxc.net.1.ipv4.address = 192.168.10.125/24
lxc.net.1.ipv6.address = 2003:db8:1:0:214:1234:fe0b:3596
lxc.net.2.type = phys
lxc.net.2.flags = up
lxc.net.2.link = random0
lxc.net.2.hwaddr = 4a:49:43:49:79:ff
lxc.net.2.ipv4.address = 10.2.3.6/24
lxc.net.2.ipv6.address = 2003:db8:1:0:214:1234:fe0b:3297
lxc.cgroup.cpuset.cpus = 0,1
lxc.cgroup.cpu.shares = 1234
lxc.cgroup.devices.deny = a
lxc.cgroup.devices.allow = c 1:3 rw
lxc.cgroup.devices.allow = b 8:0 rw
lxc.mount.fstab = /etc/fstab.complex
lxc.mount.entry = /lib /root/myrootfs/lib none ro,bind 0 0
lxc.rootfs.path = dir:/mnt/rootfs.complex
lxc.rootfs.options = idmap=container
lxc.cap.drop = sys_module mknod setuid net_raw
lxc.cap.drop = mac_override
chroot(1), pivot_root(8), fstab(5), capabilities(7)
lxc(7), lxc-create(1), lxc-copy(1), lxc-destroy(1), lxc-start(1),
lxc-stop(1), lxc-execute(1), lxc-console(1), lxc-monitor(1),
lxc-wait(1), lxc-cgroup(1), lxc-ls(1), lxc-info(1), lxc-freeze(1),
lxc-unfreeze(1), lxc-attach(1), lxc.conf(5)
This page is part of the lxc (Linux containers) project.
Information about the project can be found at
⟨http://linuxcontainers.org/⟩. If you have a bug report for this
manual page, send it to lxc-devel@lists.linuxcontainers.org. This
page was obtained from the project's upstream Git repository
⟨https://github.com/lxc/lxc.git⟩ on 2025-08-11. (At that time,
the date of the most recent commit that was found in the
repository was 2025-07-29.) If you discover any rendering
problems in this HTML version of the page, or you believe there is
a better or more up-to-date source for the page, or you have
corrections or improvements to the information in this COLOPHON
(which is not part of the original manual page), send a mail to
man-pages@man7.org
2024-04-03 lxc.container.conf(5)
Pages that refer to this page: lxc.conf(5), lxc.system.conf(5)