Containers in Linux were available since 2001 and in the following years with initiatives such as the followings:
Linux-VServer (2001)
OpenVZ (2005)
LXC (2008)
Let Me Contain That for You (lmctfy) (2013)
The creation of several container images format and container engines have significantly simplified the adoption, but competing standards exists
Containers Images, Image Format and Repositories
Image Formats
Today, almost all major tools and engines have moved to a format defined by the Open Container Initiative (OCI). This image format defines the layers and metadata within a container image. Essentially, the OCI image format defines a container image composed of tar files for each layer, and a manifest.json file with the metadata. Historically, LXD, RKT and Docker had different image format (i.e. single layer vs multi-layer) but the Docker v2 image format was used as a based for OCI
Each Container Engine had its container images format. LXD, RKT, and Docker all had their own image formats. Some were made up of a single layer, while others were made up of a bunch of layers in a tree structure.
Images or Repositories?
When people use the word container image they are often referring to a repository, and a bundle of multiple image layers and metadata. In fact, on the command line you specify a repository, not an image:
docker pull rhel7
This is actually expanded automatically to docker pull registry.access.redhat.com/rhel7:latest. This can be confusing, and many people refer to this as an image or a container image. However, running docker images result first column is “repository”
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
registry.access.redhat.com/rhel7 latest 6883d5422f4e 4 weeks ago 201.7 MB
registry.access.redhat.com/rhel latest 6883d5422f4e 4 weeks ago 201.7 MB
registry.access.redhat.com/rhel6 latest 05c3d56ba777 4 weeks ago 166.1 MB
registry.access.redhat.com/rhel6/rhel latest 05c3d56ba777 4 weeks ago 166.1 MB
...
When we specify the repository on the command line, the Container Engine is doing some extra work for you:
search the repository to a list of server
default the tag
If we wanted to express the full URL ourselves, we should use this format: REGISTRY/NAMESPACE/REPOSITORY[:TAG] for example docker pull registry.access.redhat.com/rhel7/rhel:latest
Image Layer
Image layers in a repository are connected together in a parent-child relationship. Each image layer represents changes between itself and the parent layer.
Since Docker 1.7, there is no native tooling to inspect image layers in a local repository (there are tools for online registries). With the help of a tool called Dockviz, you can quickly inspect all of the layers: each layer has tag and a Universally Unique Identifier (UUID).
Image tags
Tags are a way for image builders to communicate what best image layers consumers should use.
Info
This is only a convention, nor OCI nor any other standards mandate what tags should be used for.
One can list the tags available for a specific repository like so:
Namespaces allow to separate groups of different repositories. For example in Dockerhub the namespace is the username of the person sharing the image, and in Red Hat they use the product named (rhel7, openshift, etc)
Tip
There might be a default repository for a given namespace, so the following commands are the same
Container engines are software that accept user requests and command line options, and run the container from the user perspective. There are many container engines:
Docker
RKT
CRI-O
LXD
The ones created by PAAS and Container platform for internal usage
Container engines do not actually run the containers themselves but use an OCI runtime such as Runc, but are still responsible for:
Handling user input
Handling input over an API often from a Container Orchestrator
Pulling the Container Images from the Registry Server
Expanding decompressing and expanding the container image on disk using a Graph Driver (block, or file depending on driver)
Preparing a container mount point, typically on copy-on-write storage (again block or file depending on driver)
Preparing the metadata which will be passed to the container Container Runtime to start the Container correctly
Using some defaults from the container image (ex.ArchX86)
Using user input to override defaults in the container image (ex. CMD, ENTRYPOINT)
Using defaults specified by the container image (ex. SECCOM rules)
Calling the Container Runtime
Container runtime
It is a lower level component typically used in a Container Engine but can also be used by hand for testing. The Open Containers Initiative (OCI) Runtime Standard reference implementation is runc. This is the most widely used container runtime, but there are others OCI compliant runtimes, such as crun, railcar, and katacontainers. Docker, CRI-O, and many other Container Engines rely on runc.
The container runtime is responsible for:
Consuming the container mount point provided by the Container Engine (can also be a plain directory for testing)
Consuming the container metadata provided by the Container Engine (can be a also be a manually crafted config.json for testing)
Communicating with the kernel to start containerized processes (clone system call)
When the Docker engine was first created it relied on LXC as the container runtime. Later, the Docker team developed their own library called libcontainer to start containers. This library was written in Golang, and compiled into the original Docker engines.
Kernel namespaces
Container runtime makes usage of Kernel namespaces, a feature that allows different processes to have its own mount points, network interfaces, user identifiers, process identifiers, etc.
Instead of using exec() syscall to launch a new process, a different syscall clone() is used, that allows isolating the process
Container Runtime Interface
When Google released Kubernetes in 2015, the individual nodes of the cluster used Docker’s runtime to run containers and manage container images. In late 2016, developers introduced an abstraction between Kubernetes and the container runtime it uses: the Container Runtime Interface — or CRI, for short.
To plug a new container runtime into Kubernetes, all that is needed is a small piece of code called a shim that translates requests made by Kubernetes into requests understandable by the runtime. In theory, each additional runtime would need a custom shim, but a generic one exists for all container runtimes that implement the OCI Specification.
CRI-O is a a minimal runtime implementation that adheres to CRI and allow Kubernetes to run containers without Docker.
Container Host
The container host is the system that runs the containerized process, often simply call containers. It could be your laptop, a VM instance in your public cloud, etc. Containers host typically cache images after they are pulled from the registry server
Registry servers
Registry severs are fancy file servers that are used to store docker repository, and when a container engine doesn’t have a locally cached copy, it will pull it from registry servers. By default, docker.io is configured, but others can be added.
Warning
Docker trusts the registry server, so be careful: you might be pulling licensed software, insecure software, etc.
The Graph Driver
The graph driver is the piece of software that maps the necessary image alyers to local storage. The image layers can be mapped to a directory using aufs, devicemapper, btrfs, zfs and overlays.
When a new container process is started, the image layers are mounted read-only with a kernel namespace and a copy-on-write layer is created to allow the container to write data
Container orchestration
Container orchestration emerges as need after teams install a container host and pull some repositories. Soon they will want to use a cluster of container hosts to schedule work and standardize how applications are defined.
A container orchestrator really does two things:
Dynamically schedules container workloads within a cluster of computers. This is often referred to as distributed computing.
Kubernetes has become the defacto standard in container orchestration, similar to Linux before it, while alternatives such as Swarm and Mesos are losing traction. If you are looking at container orchestration, Red Hat recommends our enterprise distribution called OpenShift.
Container use cases
Today most containers are application containers (i.e. MySQL) but containers can be used also to run Operating Systems such as LXC and LXD. Super Privileged Containers (SPC) can be used for monitoring and other administrative tasks, such as loading kernel modules on Kubernetes or OpenShift.
OCI
To make sure that all container runtimes could run images produced by any build tool, the community started the Open Container Initiative — or OCI — to define industry standards around container image formats and runtimes.
Docker’s original image format has become the OCI Image Specification, and various open-source build tools support it, including:
BuildKit, an optimized rewrite of Docker’s build engine;
Podman, an alternative implementation of Docker’s command-line tool that doesn’t need a daemon
Buildah, a command-line alternative to writing Dockerfiles;
Skopeo, a CLI tool to interact with registries.
Given an OCI image, any container runtime that implements the OCI Runtime Specification can unbundle the image and run its contents in an isolated environment. Docker donated its runtime, runc, to the OCI to serve as the first implementation of the standard.
Other open-sources implementation
Kata containers are an implementation that uses virtual machines rather than Linux namespaces: namespaces allow applications to escape their containers under certain circumstances and for specific use cases, like running untrusted workloads, stronger security guarantees are required;
gVisor, a.k.a runsc, which focuses on security and efficiency, released by Google in 2018. Applications running inside the gVisorsandbox rarely interact with the underlying Linux kernel directly, reducing the attack surface untrusted workloads may exploit. The sandbox implements many Linux system calls in userspace.
Firecracker, a runtime optimized for serverless workloads. This container technology powers AWS Lambda and AWS Fargate. Firecracker runs containerized applications inside MicroVMs: lightweight virtual machines optimized for running single applications instead of entire operating systems.