A Simple Introduction to Docker III
What is Docker?
Underlying Techniques
What do we need to do to build a container?
Isolate every container
We need to isolate every container, which means make a process in a container (which usualy only has one sigle process) think it is the only process on the system.
To do that, we need to use a technique provided by linux kernel, named namespace.
Restrict resources a container can use
We may need to restrict the resources a container can use since we don't want to let them to eat up all our system resources such as CPU, RAM, Disks.
We may also need to account for the resources a container have used, for charging or something else.
To do that, we need to use cgroups (abbreviated from control groups), which is a linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.
Provide these resources wisely
We may only have a couple of small, weak devices, but we want to provide great services. We can use container to combine these devices into one single device, at least it looks like one. So we can have enough resources we want. And more importantly, it's scalable.
Linux Namespace
- Docker takes advantage of a technology called namespaces to provide the isolated workspace1
Purpose of Namespace
To wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.
6 namespaces
There are 6 different types of namespaces. Docker has used five of them, and is working on the last one (user namespace).
- The pid namespace
- Used for process isolation (PID: Process ID).
- The net namespace
- Used for managing network interfaces (NET: Networking).
- The ipc namespace
- Used for managing access to IPC resources (IPC: InterProcess Communication).
- The mnt namespace
- Used for managing mount-points (MNT: Mount).
- The uts namespace
- Used for isolating kernel and version identifiers. (UTS: Unix Timesharing System).
- Related System Calls
clone()
Create a new process. And can use different identifiers/flags to isolate different resources.
Namespaces Identifier Usage Related Kernel Versions Mount namespaces CLONE_NEWNS Managing mount-points Since Linux 2.4.19 UTS namespaces CLONE_NEWUTS Isolating kernel and version identifiers Since Linux 2.6.19 IPC namespaces CLONE_NEWIPC Managing access to IPC resources Since Linux 2.6.19 PID namespaces CLONE_NEWPID Process isolation Since Linux 2.6.24 Network namespaces CLONE_NEWNET Managing network interfaces Started in Linux 2.6.24 and largely completed by about Linux 2.6.29 User namespaces CLONE_NEWUSER Managing user and group ID Started in Linux 2.6.23 and completed in Linux 3.8 unshare()
Disassociate parts of a progress's execution context, such as the mount namespace, that are currently being shared with other processes (or threads).
setns()
Reassociate thread with a namespace
Control groups
Control groups allow Docker to share available hardware resources to containers and, if required, set up limits and constraints.1
- Features
- Resource limitation
- Prioritization
- Accounting
- Control
- CPU Restrictions
- Example
- mount cgroups
You can see your mounted cgroups using this command
mount -t cgroup
systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd) cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,cpuset) cpu on /sys/fs/cgroup/cpu type cgroup (rw,cpu) memory on /sys/fs/cgroup/memory type cgroup (rw,memory)
If you can't see them, you can mount them by your self
sudo mkdir cgroup sudo mount -t tmpfs cgroup_root ./cgroup sudo mkdir cgroup/cpuset sudo mount -t cgroup -ocpuset cpuset ./cgroup/cpuset/ sudo mkdir cgroup/cpu sudo mount -t cgroup -ocpu cpu ./cgroup/cpu/ sudo mkdir cgroup/memory sudo mount -t cgroup -omemory memory ./cgroup/memory/
Then you can see config files under these directory
ls /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuset/
/sys/fs/cgroup/cpu: cgroup.clone_children cpu.cfs_period_us cpu.shares release_agent cgroup.procs cpu.cfs_quota_us cpu.stat tasks cgroup.sane_behavior cpu_group notify_on_release /sys/fs/cgroup/cpuset/: cgroup.clone_children cpuset.memory_pressure_enabled cgroup.procs cpuset.memory_spread_page cgroup.sane_behavior cpuset.memory_spread_slab cpuset.cpu_exclusive cpuset.mems cpuset.cpus cpuset.sched_load_balance cpuset.mem_exclusive cpuset.sched_relax_domain_level cpuset.mem_hardwall notify_on_release cpuset.memory_migrate release_agent cpuset.memory_pressure tasks
Create a cpu group
mkdir /sys/fs/cgroup/cpu/temp
- Running a process in a cpu group
C program
/* deadloop.c */ int main(void) { int i = 0; for(;;) i++; return 0; }
top
resultsWe can see in the result table below that the CPU usage of this process is almost 100%.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 107539 dsdshcym 20 0 4196 628 548 R 99.8 0.1 0:09.80 deadloop - Restrict CPU usage
Restrict a cgroup's CPU usage
cat /sys/fs/cgroup/cpu/temp/cpu.cfs_quota_us echo 20000 > /sys/fs/cgroup/cpu/temp/cpu.cfs_quota_us cat /sys/fs/cgroup/cpu/temp/cpu.cfs_quota_us
-1 25000
Add the process to a cgroup
# Add the process's pid to the cgroup's tasks file echo 107539 >> /sys/fs/cgroup/cpu/temp/tasks
top
resultsAs we can see in the results below, the CPU usage have dropped down to about 20%, which was set by the value 20000 in the
cpu.cfs_quota_us
file.PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 107539 dsdshcym 20 0 4196 628 548 R 19.9 0.1 1:25.30 deadloop
- mount cgroups
- Example
- Memory Restrictions
- IO Speed Restrictions
- Disk Capacity Restrictions
Union File Systems
Docker uses union file systems to provide the building blocks for containers.1
UFS
Unionfs is a filesystem service for Linux, FreeBSD and NetBSD which implements a union mount for other file systems. It allows files and directories of separate file systems, known as branches, to be transparently overlaid, forming a single coherent file system. Contents of directories which have the same path within the merged branches will be seen together in a single merged directory, within the new, virtual filesystem.
Usage in docker
Each Docker image consists of a series of layers. Docker makes use of union file systems to combine these layers into a single image.
- AUFS
Prepare
tree
. |-- fruits | |-- apple | `-- tomato `-- vegetables |-- carrots `-- tomato 2 directories, 4 files
Mount
mkdir mnt sudo mount -t aufs -o dirs=./fruits:./vegetables none ./mnt tree ./mnt
./mnt |-- apple |-- carrots `-- tomato 0 directories, 3 files
Modify
echo mnt_carrots > ./mnt/carrots
results in vegetables
cat ./vegetables/carrots
results in fruits
cat ./fruits/carrots
mnt_carrots