Advantages and challenges of Docker, differences from virtualization infrastructure and architecture overview

Machine Learning Artificial Intelligence Natural Language Processing Semantic Web Algorithm Search Technology DataBase Technology Ontology Technology Digital Transformation DevOps UI and DataVisualization Workflow & Services IT Infrastructure Navigation of this blog

Summary

From the Docker Practical Guide. In this article, we will give an overview of Docker.

Introduction

Docker provides a mechanism to immediately run applications as containers, saving as much labor as possible from complex human manual work such as OS installation, which is done manually in conventional hypervisor-based virtualization infrastructures. In addition, it has features such as automated application construction, emphasized operation of multiple containers, disposal of software components, and shortened time for development and production environment construction, which enables a level of efficiency unmatched by conventional virtualization infrastructures.

Thus, the so-called “container revolution” that began in 2013 is now riding a major wave of “post-virtualization” in Europe and the United States, and various advanced vendors are working hard to develop peripheral software and provide services based on Docker.

Container infrastructure based on Docker will become an essential elemental technology for fields where software development capabilities are key, such as artificial intelligence, IoT, and big data. The adoption of “advanced technology that brings certainty with flexibility and speed” is essential to winning the global development race, and the open source community and leading companies in Europe and the United States are making daily efforts to use confident IT such as Docker to generate profits for their companies. They fully understand that their businesses cannot grow if they use the same technology as before.

What is Docker?

Docker is a “container” based infrastructure software for application and OS development and deployment targeted at software developers and IT department managers. A container is an application environment that runs as an independent process on the host OS, and is a technology that packages the entire execution environment, including basic OS commands, application execution binaries, and libraries, and executes them in a separate OS space. Docker was developed by dotCloud (now Docker Inc.) and released as open source software in 2013, and its ease of use quickly made it popular among developers and IT department managers, and it has been adopted in IT infrastructures around the world.

Docker, through the use of container technology, can realize IT systems that are extremely intensive (less hardware resource consumption and performance degradation) compared to hypervisor-based virtual software. However, Docker is attracting attention not only because of its performance advantages in virtual environments, but also because IT engineers have recognized its usefulness as a tool for responding to recent rapid business changes.

Benefits of using Docker in software development

From a software development perspective, rapid development and flexibility of services provided over the Internet require focusing on the essential aspects of software development. If software development can be freed from the troublesome tasks of securing the hardware environment and installing the development environment, and can concentrate on application development, the number of man-hours worked can be reduced, the unit cost of developed deliverables can be lowered, and the price competitiveness of software can be further enhanced.

In order to realize such demands of application developers, it is necessary to have a mechanism that allows easy creation and disposal of separate, application-by-application, development environments. This will require further simplification of application maintenance. Of course, conventional hypervisor-based virtual environments such as KVM and public cloud services provide software developers with the means to improve efficiency, but Docker’s unique packaging and repository features provide an even more convenient development environment.

Operational and Administrative Benefits

As the businesses (customer needs) supported by IT systems become more diverse and globalized, IT managers are under pressure to respond to these needs. The proliferation of virtual software and the emergence of cloud services such as IaaS and PaaS are of great benefit to IT managers. In the field of IT infrastructure, Docker will enable rapid application development, deployment, and operation, contributing greatly to the realization of “DeveOps environments” and “immutable infrastructures. This will greatly contribute to the realization of “DeveOps environments” and “immutable infrastructures. For software developers and IT department managers, Docker provides an environment in which they can quickly operate and dispose of applications without being aware of hardware resources.

Docker brings you – packaged development and execution environments

Speaking of environments that can be used immediately through networks and prepared by IT departments, there are IaaS (Infrastracture as a Service) and PaaS in cloud computing. One of the reasons why Docker is attracting attention is because of its ability to provide a user-friendly and highly customizable environment for application development and execution. One of the reasons Docker is attracting attention is that it “packages the application development and execution environments” and allows for rapid deployment and disposal.

For example, if a developer were to prepare multiple versions of a development environment running on multiple Linux operating systems from scratch, the work involved in acquiring, building, using, and disposing of the development environment would be a significant amount of man-hours. With Docker, however, various development environments can be easily built by using “Docker images,” which package the application environment and execution environment as mentioned above.

A Docker image is a file system required to create a Docker container, which contains executables, libraries, and commands to be invoked when the Docker container is executed. When a container is launched based on a Dcoker image, the built-in applications are automatically configured and basic functionality is immediately available. This system of making applications available as soon as a Docker image is obtained can greatly reduce the developer’s man-hours.

Docker brings the environment – heterogeneous OS environment

The ability to easily create heterogeneous OSs without the need to build a cloud infrastructure such as IaaS is one of Docker’s major attractions. The ability to easily deploy application development and heterogeneous OS execution environments without relying on cloud software is a great benefit for both developers and administrators involved in IT systems with limited initial investment. Here, “heterogeneous OS” means different Linux distributions in the case of a Linux environment, such as a CentOS Docker image, an Ubuntu Docker image, a SUSE-based Docker image, or even a Debian-based Web Docker images with applications, and so on.

Provide registry/utilize deliverables

Docker provides not only a development and execution environment for high-performance applications, but also a web service to share those OS and applications (Docker images) around the world and automate the processes necessary for IT systems to achieve their goals. The web service is a cloud-based Docker image registry (vault) service called Docker Hub. It is a so-called Docker image public cloud service for building and distributing application and service containers.

Immutable infrastructure for transition to new IT infrastructure

One of the most popular Docker system developments is an IT infrastructure architecture called Immutable Infrastructure, which has been adopted by service providers. The term “immutable” is derived from the fact that the production system is not modified in any way and remains in an “unchanged” state.

The Immutable Infrastructure has two systems, the production system and the development system, as shown in the figure below.

After software is developed on the development system, the load balancing total power old system is disconnected, the development system is connected to the load balancer, and operated as a new production environment. In an immutable infrastructure, when new services or applications are needed for the development system due to new requirements, a new server is created and the existing old system (old server), which is no longer needed, is discarded. This discarded old system is called disposable components in immutable infrastructure. For example, a new server is created for each iteration of application testing, and when all testing is completed, the server is disposed of.

In the case of a conventional individually optimized system, even if the system is running stably when it is first introduced, when batches are operated or functions are added, the system will have a large number of batches applied to it. Even if the administrator were able to strictly track the history of the large number of batches applied and changes made, the judgment of whether the system can operate normally would be hazy, and no one would be able to accurately grasp the behavior of the system. Therefore, if it is too complicated for the administrator to keep track of the current status of the server environment, such as the status of batch application, then the Immutable Infrastructure approach is to adopt an operational method that does not manage the server status, thereby freeing both the developer and the administrator from complicated system management. This is the concept of immutable infrastructure.

Systems suitable for Docker, systems unsuitable for Docker

Systems for which Docker is not suitable include mission-critical systems (such as online systems for banks, mail-order systems for mass retailers, telecommunications infrastructure, control systems for nuclear power plants, etc.) where requirements are strictly defined in order to achieve a certain business objective, and where it is absolutely essential that the system not stop. . Such systems are those where even a small change in configuration would have a significant impact on business operations, so even if there are some business changes, the IT system itself is not likely to be changed significantly.

Conversely, Docker is best suited for service providers, hosting, and other systems where IT services change on a daily basis, where applications are frequently developed on lightweight containers, and where innovative services are offered on a continuous basis.

Docker Challenges

Docker is still in the process of evolution and has some challenges in its use. These are listed below.

Reduction of management man-hours

Docker is used by a wide variety of companies, and many users cite the reduction of man-hours required to manage large numbers of Docker containers as an issue Dcoker offers this simple all-round line, but also provides GUI management tools, automatic deployment of Docker containers, resource management, and other tools. Dcoker offers this simple all-around line. In other words, Docker by itself does not complete the entire system, and some peripheral software must be in place to improve efficiency.

capacity planning

Another issue is the guideline for how many applications should be placed in a container. If too many applications are placed in a container, portability will be compromised. On the other hand, if only one application is placed in a container, the number of containers will increase exponentially and management will become more complex. It is necessary to design the usage of containers with the purpose of use in mind.

orchestration

In Docker operations, orchestration software such as Kubernetes is gaining attention as a way for multiple containers to work together. Since this orchestration software is primarily based on open source software, it is necessary to clearly understand the scope of vendor support.

Use in mission-critical areas

If Docker itself is frequently adding functions or changing specifications in a short period of time, it is necessary to carefully consider whether to adopt the combination of these orchestration software and Docker for business production systems. In addition, the integration of HA cluster software and Docker, which is essential for mission-critical business systems, should also be considered, considering the situation where commercial HA cluster software applied by vendors does not officially support Docker containers, the adoption of Docker in the mission-critical domain, the adoption of Docker should be discouraged.

Live Migration Support

In systems where virtual environments are already in place, live migration of guest OSs is often used for non-disruptive daily maintenance, but Docker does not officially support live migration at this time. If live migration of the guest OS (non-disruptively moving the guest OS to another physical server) is a requirement for the virtual infrastructure on which your current operations are running, you should either continue to use non-disruptive servers (commonly known as FT servers) or hypervisor-based virtualization software, or In this case, you should consider adopting a commercial product such as OpenVX.

Operating System Restrictions

If the Docker host OS is Linux, the containers running on it are also limited to Linux. This means that if the host OS is Linux, it is not possible to mix Linux and Windows containers. Server Container” that comes with Windows Server must be installed.

Docker Container Architecture

In IT systems, one solution to quickly respond to changes in development and operational aspects has traditionally been the use of virtual software. Virtual software handles multiple OS environments and applications as a single file, providing an extremely portable infrastructure. However, compared to Docker, there have been problems with performance degradation when multiple OSs are consolidated and the complexity of isolating problems when failures occur due to the intervention of virtual software that acts as a go-between for the OS and applications.

Docker, on the other hand, differs from conventional hypervisor-type virtualization in that it allows for the immediate creation of separate spaces, called containers, within a single OS environment, and each of these separate spaces can contain a different OS environment. Since containers can realize multiple heterogeneous Linux OS environments, they have the advantage of consolidating IT systems that require multiple OS versions into a single OS environment.

Containers can provide an isolated space for applications. By realizing a separate space for applications, it is possible to realize multiple heterogeneous Linux OS environments because processes can be separated even though they are on a single OS. For example, Docker can be run on a CentOS7 host OS, and multiple CentOS6 Docker containers and Ubuntu server 14.04LTS Docker containers can be simultaneously exposed to the flower path.

Containers themselves have existed since before the advent of Docker and have been in use for a long time. They all have the characteristic of providing multiple isolated spaces on a single OS and are not hypervisor-type virtualization software.

In general, in hypervisor-type virtualization software, software called a hypervisor provides a “virtual machine,” which is virtual hardware. The virtual BIOS, virtual CPU, virtual memory, virtual disk, virtual NIC, etc. provided by the virtual machine are shown to the guest OS, making it appear to the guest OS as if it were running on a physical machine.

For this reason, the guest OS needs to be operated no differently from a normal OS startup and shutdown. For example, if the guest OS does not install a boot loader in the master boot record area of its own virtual disk, the guest OS will naturally not operate properly. Also, if normal shutdown procedures are not followed when the OS terminates, the guest OS installed on the virtual disk may itself become corrupted, just as is the case with an OS running on a physical server.

On the other hand, in a container environment, in the isolated space corresponding to the guest OS, there is no general “OS There is no “boot procedure. For this reason, container environments are characterized by extremely fast container startup and shutdown, with less overhead than a guest OS on a hypervisor-type virtualization infrastructure.

In addition, while hypervisor-type software emulates hardware, a container environment uses namespaces (namespaces) and a resource management mechanism called cgroups, which allows multiple containers to run as processes within a single OS, thus reducing the amount of Therefore, the components and resources of the OS environment (container) required for the separated space can be reduced. Therefore, compared to hypervisor-based virtualization technologies, Linux containers can dramatically improve the integration ratio because they consume less hardware resources such as CPU, memory, storage, and network, and have less overhead. Also, in terms of performance, in a container environment, application processes are separated for each container, but since they are executed directly from the host OS, CPU utilization on the container is equivalent to that of the host OS (see figure below).

What are namespaces in Docker?

In a single OS environment, Docker can create multiple isolated spaces, which are realized by namespaces. The namespace can realize process isolation, for example, a process in one isolated space A is invisible to another isolated space B. When a Docker container is created, the namespaces listed below are created.

ipc namespace: also called Inter-Process Communication namespace. It separates internal inter-process communication.
mnt namespace: This namespace isolates file system mount information visible to processes and works similar to the chroot command.
net namespace : This namespace is used to control networking. net namespace can have its own network interface, allowing network communication between multiple containers and hosts.
pid space : Used for process isolation. It is controlled by the kernel and is used for controlling child PIDs by the parent PID.
user namespace : Separates user IDs and group IDs. each user namespace can hold individual user IDs and group IDs.
uts namespace : This is called the UTS (Unix Time-Sharing System) namespace. Used to separate host names, NIS domain names, etc.

Using these features, containers are realized by creating multiple spaces where processes, file systems, user IDs, etc. are separated.

Separation of pid namespace

Namespaces can be used to isolate system resources for each container, such as filesystems and networking. When a user runs a Docker container, the Docker engine creates a namespace for that container. Each container runs in a separate namespace and access is limited to that namespace; from the perspective of the host OS running the Docker engine, it appears that processes belonging to multiple namespaces are running uniformly, but within an individual namespace (i.e., within a container), only the application processes belonging to that namespace are visible (i.e., within the container). However, within an individual namespace (i.e., within a container), only the application processes belonging to that namespace are visible (see figure below).

For example, suppose that the host OS running the Docker engine starts two containers, one running the httpd service for the web server and the other running the vsftpd service for the FTP server, and that the host OS assigns the httpd daemon the number 100 as its process ID (PID) and the vsftpd daemon the number 20 million as its PID. The host OS assigns the httpd daemon a process ID (PID) of 100 and the vsftpd daemon a PID of 20 million. From the host OS, both httpd and vsftpd are running as processes. However, in the container where the httpd service is running, 10,000,000 of the PID is allocated to httpd, while in the container where the vsftpd service is running, 10,000,000 of the PID is allocated to vsftpd, although httpd and vsftpd have the same PID of 10,000,000 Because the container in which httpd runs and the container in which fsftpd runs are in separate PID namespaces, vsftpd is not visible within the container in which httpd runs, and httpd is not visible from the container in which vsftpd runs. Because the process IDs are assigned in a closed form in the namespace, it means that the host OS process space could be separated from the application process space.

File System Separation

It is not only the process ID that is separated as a container (process), but also the file system namespace: the container infrastructure in which the Docker engine runs allows each container to have a separate file system namespace. A file system within a container cannot see the file system of another container. For example, a file named test1.html is created in a CentOS-based container in the container’s /data directory, and a file named test2.html is created in an Ubuntu-based container in the container’s /data directory. In this case, from the host OS where the Docker engine is running, the test.html and test2.html files are visible under the /var/lib/docker directory on the host OS (see figure below).

Since the file system on the host OS has a single namespace, both files are visible, but in the CentOS container, only the test1.html file is visible, but not the test2.html file. On the other hand, in the Ubuntu container, only the test2.html file is visible, and the test1.html file is invisible. Thus, since the file system is allocated in a closed namespace, the file system space of the host OS could be separated from the file system space of each container.

cgroups(Control Group)

In an environment where containers run as multiple isolated spaces on a single host, it is very important to limit the use of limited hardware resources. In a Docker environment, by default, containers attempt to exhaust hardware resources. To prevent a container from exhausting hardware resources, the hardware resources available to each container must be limited. This is achieved by using cgroups (Control Groups) (see figure below).

cgroups is a resource control mechanism implemented in the Linux kernel. cgroups limits the amount of hardware resources, such as CPU and memory, used by each container and allocates configured hardware resources to each separated namespace. communication bandwidth, and other computer resources can be combined and assigned to user-defined groups of tasks, and resource usage limits and releases can be set for these groups.

The Linux implementation of cgroups is a powerful resource management tool that has been used in Linux server systems for many years, even before the advent of Docker, and is well-known for its ability to control hardware resource allocation in a very granular manner. cgroups, implemented in Linux, limit the usage of various physical devices, mainly CPU, memory, and block I/O, for application processes.

cgroups provides a virtual file system (specifically, under the /sys/fs/cgroups directory on the host OS), and various resources can be controlled by changing the parameters provided on this file system. The main statistical information managed by the cgroups file system is shown below.

bkkio : Display I/O statistics of block devices, I/O control
cpuacct : Generate reports of CPU time consumed
cpuset : Configure CPU cores and memory allocation placement where processes run
devices : Configure device access control settings
freezer : Pause and resume tasks (processes)
hugetb : make available large virtual memory pages
memory : report memory sources consumed by tasks, set upper limit of memory used
pref_event : Enable to view with pref tool

In the next article, we will discuss the preparations before introducing Docker, including the checklist before introducing Docker, which Docker edition to use, OS selection, and Docker desktops.