A Simple Introduction to Docker III

What is Docker?

Underlying Techniques

What do we need to do to build a container?

  1. Isolate every container

    We need to isolate every container, which means make a process in a container (which usualy only has one sigle process) think it is the only process on the system.

    To do that, we need to use a technique provided by linux kernel, named namespace.

  2. Restrict resources a container can use

    We may need to restrict the resources a container can use since we don't want to let them to eat up all our system resources such as CPU, RAM, Disks.

    We may also need to account for the resources a container have used, for charging or something else.

    To do that, we need to use cgroups (abbreviated from control groups), which is a linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.

  3. Provide these resources wisely

    We may only have a couple of small, weak devices, but we want to provide great services. We can use container to combine these devices into one single device, at least it looks like one. So we can have enough resources we want. And more importantly, it's scalable.

Linux Namespace

  • Docker takes advantage of a technology called namespaces to provide the isolated workspace1
  • Purpose of Namespace

    To wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.

  • 6 namespaces

    There are 6 different types of namespaces. Docker has used five of them, and is working on the last one (user namespace).

    The pid namespace
    Used for process isolation (PID: Process ID).
    The net namespace
    Used for managing network interfaces (NET: Networking).
    The ipc namespace
    Used for managing access to IPC resources (IPC: InterProcess Communication).
    The mnt namespace
    Used for managing mount-points (MNT: Mount).
    The uts namespace
    Used for isolating kernel and version identifiers. (UTS: Unix Timesharing System).
  • Related System Calls
    • clone()

      Create a new process. And can use different identifiers/flags to isolate different resources.

      Namespaces Identifier Usage Related Kernel Versions
      Mount namespaces CLONE_NEWNS Managing mount-points Since Linux 2.4.19
      UTS namespaces CLONE_NEWUTS Isolating kernel and version identifiers Since Linux 2.6.19
      IPC namespaces CLONE_NEWIPC Managing access to IPC resources Since Linux 2.6.19
      PID namespaces CLONE_NEWPID Process isolation Since Linux 2.6.24
      Network namespaces CLONE_NEWNET Managing network interfaces Started in Linux 2.6.24 and largely completed by about Linux 2.6.29
      User namespaces CLONE_NEWUSER Managing user and group ID Started in Linux 2.6.23 and completed in Linux 3.8
    • unshare()

      Disassociate parts of a progress's execution context, such as the mount namespace, that are currently being shared with other processes (or threads).

    • setns()

      Reassociate thread with a namespace

Control groups

Control groups allow Docker to share available hardware resources to containers and, if required, set up limits and constraints.1

  • Features
    • Resource limitation
    • Prioritization
    • Accounting
    • Control
  • CPU Restrictions
    • Example
      1. mount cgroups
        • You can see your mounted cgroups using this command

          mount -t cgroup
          
          systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd)
          cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,cpuset)
          cpu on /sys/fs/cgroup/cpu type cgroup (rw,cpu)
          memory on /sys/fs/cgroup/memory type cgroup (rw,memory)
          
          
        • If you can't see them, you can mount them by your self

          sudo mkdir cgroup
          sudo mount -t tmpfs cgroup_root ./cgroup
          sudo mkdir cgroup/cpuset
          sudo mount -t cgroup -ocpuset cpuset ./cgroup/cpuset/
          sudo mkdir cgroup/cpu
          sudo mount -t cgroup -ocpu cpu ./cgroup/cpu/
          sudo mkdir cgroup/memory
          sudo mount -t cgroup -omemory memory ./cgroup/memory/
          
        • Then you can see config files under these directory

          ls /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuset/
          
          /sys/fs/cgroup/cpu:
          cgroup.clone_children  cpu.cfs_period_us  cpu.shares       release_agent
          cgroup.procs         cpu.cfs_quota_us   cpu.stat       tasks
          cgroup.sane_behavior   cpu_group    notify_on_release
          
          /sys/fs/cgroup/cpuset/:
          cgroup.clone_children cpuset.memory_pressure_enabled
          cgroup.procs    cpuset.memory_spread_page
          cgroup.sane_behavior  cpuset.memory_spread_slab
          cpuset.cpu_exclusive  cpuset.mems
          cpuset.cpus   cpuset.sched_load_balance
          cpuset.mem_exclusive  cpuset.sched_relax_domain_level
          cpuset.mem_hardwall notify_on_release
          cpuset.memory_migrate release_agent
          cpuset.memory_pressure  tasks
          
      2. Create a cpu group

        mkdir /sys/fs/cgroup/cpu/temp
        
      3. Running a process in a cpu group
        • C program

          /* deadloop.c */
          int main(void) {
              int i = 0;
              for(;;) i++;
              return 0;
          }
          
        • top results

          We can see in the result table below that the CPU usage of this process is almost 100%.

          PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
          107539 dsdshcym 20 0 4196 628 548 R 99.8 0.1 0:09.80 deadloop
        • Restrict CPU usage
          1. Restrict a cgroup's CPU usage

            cat /sys/fs/cgroup/cpu/temp/cpu.cfs_quota_us
            echo 20000 > /sys/fs/cgroup/cpu/temp/cpu.cfs_quota_us
            cat /sys/fs/cgroup/cpu/temp/cpu.cfs_quota_us
            
            -1
            25000
            
            
          2. Add the process to a cgroup

            # Add the process's pid to the cgroup's tasks file
            echo 107539 >> /sys/fs/cgroup/cpu/temp/tasks
            
          3. top results

            As we can see in the results below, the CPU usage have dropped down to about 20%, which was set by the value 20000 in the cpu.cfs_quota_us file.

            PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
            107539 dsdshcym 20 0 4196 628 548 R 19.9 0.1 1:25.30 deadloop
  • Memory Restrictions
  • IO Speed Restrictions
  • Disk Capacity Restrictions

Union File Systems

Docker uses union file systems to provide the building blocks for containers.1

  • UFS

    Unionfs is a filesystem service for Linux, FreeBSD and NetBSD which implements a union mount for other file systems. It allows files and directories of separate file systems, known as branches, to be transparently overlaid, forming a single coherent file system. Contents of directories which have the same path within the merged branches will be seen together in a single merged directory, within the new, virtual filesystem.

  • Usage in docker

    Each Docker image consists of a series of layers. Docker makes use of union file systems to combine these layers into a single image.

  • AUFS
    • Prepare

      tree
      
      .
      |-- fruits
      |   |-- apple
      |   `-- tomato
      `-- vegetables
          |-- carrots
          `-- tomato
      
      2 directories, 4 files
      
      
    • Mount

      mkdir mnt
      sudo mount -t aufs -o dirs=./fruits:./vegetables none ./mnt
      tree ./mnt
      
      ./mnt
      |-- apple
      |-- carrots
      `-- tomato
      
      0 directories, 3 files
      
      
    • Modify

      echo mnt_carrots > ./mnt/carrots
      
      • results in vegetables

        cat ./vegetables/carrots
        
      • results in fruits

        cat ./fruits/carrots
        
        mnt_carrots