Docker Breakout
Mounted docker socket
If somehow you find that the docker socket is mounted inside the docker container, you will be able to escape from it. This usually happen in docker containers that for some reason need to connect to docker daemon to perform actions.
#Search the socket
find / -name docker.sock 2>/dev/null
#It's usually in /run/docker.sockIn this case you can use regular docker commands to communicate with the docker daemon:
#List images to use one
docker images
#Run the image mounting the host disk and chroot on it
docker run -it -v /:/host/ ubuntu:18.04 chroot /host/ bashContainer Capabilities
You should check the capabilities of the container, if it has any of the following ones, you might be able to scape from it: CAP_SYS_ADMIN, CAP_SYS_PTRACE, CAP_SYS_MODULE, DAC_READ_SEARCH, DAC_OVERRIDE
You can check currently container capabilities with:
capsh --printIn the following page you can learn more about linux capabilities and how to abuse them:
Linux Capabilities--privileged flag
--privileged flagThe --privileged flag allows the container to have access to the host devices.
I own Root
Well configured docker containers won't allow command like fdisk -l. However on missconfigured docker command where the flag --privileged is specified, it is possible to get the privileges to see the host drive.

So to take over the host machine, it is trivial:
And voilĂ ! You can now acces the filesystem of the host because it is mounted in the /mnt/hole folder.
The --privileged flag introduces significant security concerns, and the exploit relies on launching a docker container with it enabled. When using this flag, containers have full access to all devices and lack restrictions from seccomp, AppArmor, and Linux capabilities.
In fact, --privileged provides far more permissions than needed to escape a docker container via this method. In reality, the “only” requirements are:
We must be running as root inside the container
The container must be run with the
SYS_ADMINLinux capabilityThe container must lack an AppArmor profile, or otherwise allow the
mountsyscallThe cgroup v1 virtual filesystem must be mounted read-write inside the container
The SYS_ADMIN capability allows a container to perform the mount syscall (see man 7 capabilities). Docker starts containers with a restricted set of capabilities by default and does not enable the SYS_ADMIN capability due to the security risks of doing so.
Further, Docker starts containers with the docker-default AppArmor policy by default, which prevents the use of the mount syscall even when the container is run with SYS_ADMIN.
A container would be vulnerable to this technique if run with the flags: --security-opt apparmor=unconfined --cap-add=SYS_ADMIN
Breaking down the proof of concept
Now that we understand the requirements to use this technique and have refined the proof of concept exploit, let’s walk through it line-by-line to demonstrate how it works.
To trigger this exploit we need a cgroup where we can create a release_agent file and trigger release_agent invocation by killing all processes in the cgroup. The easiest way to accomplish that is to mount a cgroup controller and create a child cgroup.
To do that, we create a /tmp/cgrp directory, mount the RDMA cgroup controller and create a child cgroup (named “x” for the purposes of this example). While every cgroup controller has not been tested, this technique should work with the majority of cgroup controllers.
If you’re following along and get “mount: /tmp/cgrp: special device cgroup does not exist”, it’s because your setup doesn’t have the RDMA cgroup controller. Change rdma to memory to fix it. We’re using RDMA because the original PoC was only designed to work with it.
Note that cgroup controllers are global resources that can be mounted multiple times with different permissions and the changes rendered in one mount will apply to another.
We can see the “x” child cgroup creation and its directory listing below.
Next, we enable cgroup notifications on release of the “x” cgroup by writing a 1 to its notify_on_release file. We also set the RDMA cgroup release agent to execute a /cmd script — which we will later create in the container — by writing the /cmd script path on the host to the release_agent file. To do it, we’ll grab the container’s path on the host from the /etc/mtab file.
The files we add or modify in the container are present on the host, and it is possible to modify them from both worlds: the path in the container and their path on the host.
Those operations can be seen below:
Note the path to the /cmd script, which we are going to create on the host:
Now, we create the /cmd script such that it will execute the ps aux command and save its output into /output on the container by specifying the full path of the output file on the host. At the end, we also print the /cmd script to see its contents:
Finally, we can execute the attack by spawning a process that immediately ends inside the “x” child cgroup. By creating a /bin/sh process and writing its PID to the cgroup.procs file in “x” child cgroup directory, the script on the host will execute after /bin/sh exits. The output of ps aux performed on the host is then saved to the /output file inside the container:
--privileged flag v2
--privileged flag v2The previous PoCs work fine when the container is configured with a storage-driver which exposes the full host path of the mount point, for example overlayfs, however I recently came across a couple of configurations which did not obviously disclose the host file system mount point.
Kata Containers
Kata Containers by default mounts the root fs of a container over 9pfs. This discloses no information about the location of the container file system in the Kata Containers Virtual Machine.
* More on Kata Containers in a future blog post.
Device Mapper
I saw a container with this root mount in a live environment, I believe the container was running with a specific devicemapper storage-driver configuration, but at this point I have been unable to replicate this behaviour in a test environment.
An Alternative PoC
Obviously in these cases there is not enough information to identify the path of container files on the host file system, so Felix’s PoC cannot be used as is. However, we can still execute this attack with a little ingenuity.
The one key piece of information required is the full path, relative to the container host, of a file to execute within the container. Without being able to discern this from mount points within the container we have to look elsewhere.
Proc to the Rescue
The Linux /proc pseudo-filesystem exposes kernel process data structures for all processes running on a system, including those running in different namespaces, for example within a container. This can be shown by running a command in a container and accessing the /proc directory of the process on the host:Container
As an aside, the /proc/<pid>/root data structure is one that confused me for a very long time, I could never understand why having a symbolic link to / was useful, until I read the actual definition in the man pages:
/proc/[pid]/root
UNIX and Linux support the idea of a per-process root of the filesystem, set by the chroot(2) system call. This file is a symbolic link that points to the process’s root directory, and behaves in the same way as exe, and fd/*.
Note however that this file is not merely a symbolic link. It provides the same view of the filesystem (including namespaces and the set of per-process mounts) as the process itself.
The /proc/<pid>/root symbolic link can be used as a host relative path to any file within a container:Container
This changes the requirement for the attack from knowing the full path, relative to the container host, of a file within the container, to knowing the pid of any process running in the container.
Pid Bashing
This is actually the easy part, process ids in Linux are numerical and assigned sequentially. The init process is assigned process id 1 and all subsequent processes are assigned incremental ids. To identify the host process id of a process within a container, a brute force incremental search can be used:Container
Host
Putting it All Together
To complete this attack the brute force technique can be used to guess the pid for the path /proc/<pid>/root/payload.sh, with each iteration writing the guessed pid path to the cgroups release_agent file, triggering the release_agent, and seeing if an output file is created.
The only caveat with this technique is it is in no way shape or form subtle, and can increase the pid count very high. As no long running processes are kept running this should not cause reliability issues, but don’t quote me on that.
The below PoC implements these techniques to provide a more generic attack than first presented in Felix’s original PoC for escaping a privileged container using the cgroups release_agent functionality:
Executing the PoC within a privileged container should provide output similar to:
Runc exploit (CVE-2019-5736)
In case you can execute docker exec as root (probably with sudo), you try to escalate privileges escaping from a container abusing CVE-2019-5736 (exploit here). This technique will basically overwrite the /bin/sh binary of the host from a container, so anyone executing docker exec may trigger the payload.
Change the payload accordingly and build the main.go with go build main.go. The resulting binary should be placed in the docker container for execution.
Upon execution, as soon as it displays [+] Overwritten /bin/sh successfully you need to execute the following from the host machine:
docker exec -it <container-name> /bin/sh
This will trigger the payload which is present in the main.go file.
For more information: https://blog.dragonsector.pl/2019/02/cve-2019-5736-escape-from-docker-and.html
Docker API Firewall Bypass
In some occasions, the sysadmin may install some plugins to docker to avoid low privilege users to interact with docker without being able to escalate privileges.
disallowed run --privileged
run --privilegedIn this case the sysadmin disallowed users to mount volumes and run containers with the --privileged flag or give any extra capability to the container:
However, a user can create a shell inside the running container and give it the extra privileges:
Now, the user can escape from the container using any of the previously discussed techniques and escalate privileges inside the host.
Mount Writable Folder
In this case the sysadmin disallowed users to run containers with the --privileged flag or give any extra capability to the container, and he only allowed to mount the /tmp folder:
Unchecked JSON Structure
It's possible that when the sysadmin configured the docker firewall he forgot about some important parameter of the API (https://docs.docker.com/engine/api/v1.40/#operation/ContainerList) like "Binds". In the following example it's possible to abuse this misconfiguration to create and run a container that mounts the root (/) folder of the host:
Unchecked JSON Attribute
It's possible that when the sysadmin configured the docker firewall he forgot about some important attribute of a parametter of the API (https://docs.docker.com/engine/api/v1.40/#operation/ContainerList) like "Capabilities" inside "HostConfig". In the following example it's possible to abuse this misconfiguration to create and run a container with the SYS_MODULE capability:
Writable hostPath Mount
(Info from here) Within the container, an attacker may attempt to gain further access to the underlying host OS via a writable hostPath volume created by the cluster. Below is some common things you can check within the container to see if you leverage this attacker vector:
Containers Security Improvements
Seccomp in Docker
This is not a technique to breakout from a Docker container but a security feature that Docker uses and you should know about as it might prevent you from breaking out from docker:
SeccompAppArmor in Docker
This is not a technique to breakout from a Docker container but a security feature that Docker uses and you should know about as it might prevent you from breaking out from docker:
AppArmorgVisor
gVisor is an application kernel, written in Go, that implements a substantial portion of the Linux system surface. It includes an Open Container Initiative (OCI) runtime called runsc that provides an isolation boundary between the application and the host kernel. The runsc runtime integrates with Docker and Kubernetes, making it simple to run sandboxed containers.
Kata Containers
Kata Containers is an open source community working to build a secure container runtime with lightweight virtual machines that feel and perform like containers, but provide stronger workload isolation using hardware virtualization technology as a second layer of defense.
Use containers securely
Docker restricts and limits containers by default. Loosening these restrictions may create security issues, even without the full power of the --privileged flag. It is important to acknowledge the impact of each additional permission, and limit permissions overall to the minimum necessary.
To help keep containers secure:
Do not use the
--privilegedflag or mount a Docker socket inside the container. The docker socket allows for spawning containers, so it is an easy way to take full control of the host, for example, by running another container with the--privilegedflag.Do not run as root inside the container. Use a different user or user namespaces. The root in the container is the same as on host unless remapped with user namespaces. It is only lightly restricted by, primarily, Linux namespaces, capabilities, and cgroups.
Drop all capabilities (
--cap-drop=all) and enable only those that are required (--cap-add=...). Many of workloads don’t need any capabilities and adding them increases the scope of a potential attack.Use the “no-new-privileges” security option to prevent processes from gaining more privileges, for example through suid binaries.
Limit resources available to the container. Resource limits can protect the machine from denial of service attacks.
Use official docker images or build your own based on them. Don’t inherit or use backdoored images.
Regularly rebuild your images to apply security patches. This goes without saying.
References
Last updated