Web Server Overview

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Search Technology DataBase Technology Ontology Technology Algorithm Digital Transformation Technology User Interface and DataVisualization Workflow & Services IT Infrastructure Navigation of this blog

Summary

From WEB+DB EXPRESS vol122. In the previous article, we discussed the HTTP protocol, which connects clients and servers. This time, we will discuss server technology. For more information on IT infrastructure technology in general, see “IT Infrastructure Technology“.

Web Server Overview

The basic processing flow of a web server is as follows.

First of all, the web server is always connected to the network, and is always ready to receive incoming data from clients. When data is received from a client, it receives it, passes the data to the HTTP request analysis process, and returns responce data according to the analyzed request.

The specific operation of the web server is as follows when an HTTP request is received

POST /index.html HTTP/1.1
Host www.examle.com
Content-Type:application/x-www-form-urlencoded

name=foo

The behavior is to return the following HTTP response.

HTTP/1.1 200OK
Content-Type:text/html
Connection:close
<html>
  <body>hello foo!!</body>
</html>

The specific implementation example is described in ” linkage between web server and DB.”.

One of the features of a web server is that it can handle requests from multiple clients in parallel. There are two main types of processing for multiple requests: “prefork type” and “event type”.

Prefork type is a function that processes requests from multiple clients in parallel using the OS’s ability to switch between multiple programs (processes) at high speed and behave as if they are running in parallel. In this case, the web server spawns multiple processes, and each process handles a single connection. Apache HTTP Server is a typical server that uses the prepork type. The details of the Apache server and its specific implementation are described in “Installing and Operating the Apache Server and LAMP“.

Since the prefork type uses processes for processing, it is necessary to launch many processes in order to process many requests simultaneously. When the number of processes becomes too large, the rate of process switching by the OS increases, causing the computer to take longer to process (C10K problem: 10,000 clients problem).

In order to solve the above problem, “event-driven” is a system that allows a single process to handle multiple requests.

The event-driven type uses asynchronous I/O to process multiple requests in a single process. This means that another process is executed while the CPU waits for slower data exchanges, such as network or disk accesses. A typical web server that uses event-driven processing is nginx, The details of nginx and its specific implementation are described in “Overview and Installation of nginx Server.“. 

HTTP was originally implemented as a protocol to return HTML on the server. Therefore, the basic behavior is to return the same response to the same request every time. However, as the web has developed, there have been more and more requests to change the response depending on the situation, even for the same resource. For example, if you have an e-commerce site and you want to change the content of the cart to retain the products, you need to change the content displayed even when accessing the same resource because the content changes for each user.

The above is called a web server that delivers dynamic content.

In contrast, a web server that returns the same resource each time is called a static web server that delivers static content. Originally, web servers were implemented to return this static content. Since the content to be returned is determined from the beginning, it is possible for another server to return the static content instead of the original web server. These methods include caching the content in the Cahe-Control header field of the HTTP response or using a CDN.

CDN (Contents Delivery Network) is a service that uses a system to efficiently deliver static content. CDN providers have web servers in various locations on the Internet, and by distributing the servers, they can avoid the concentration of access from large numbers of users and distribute the load. Some companies specialize in providing SDN, such as Akamai and Fastly, while others provide it through cloud services, such as AWS’s CloudFront. In contrast, in the case of dynamic content delivery, there is no such thing.

In contrast, in the case of dynamic content delivery, efficient processing methods include: (1) adding dynamic processing to the web server itself; (2) making the web server more dynamic; and (3) making the web server more dynamic. (2) The web server is linked to a program that performs dynamic processing. There are two ways to do this. The typical method for (1) is SSI (Server Side Includes), but it is rarely used now. There are a number of methods for (2).

One of the above methods is CGI (Common Gateway Interface), which is rarely used now, but was implemented in early servers as a function to work with external programs. CGI is a method of exchanging standard input and output with external programs to provide dynamic content, and each time a request is made, the program is triggered. As a result, when the number of accesses increased, it became difficult to process them at high speed.

In order to solve the above problem, the Apache module was developed, which was implemented in the Perl and PHP language processing systems, but due to its prefix type, it could not be used in larger scale cases.

To solve the above problem, FastCGI was introduced. the problem with CGI was that it slowed down due to the launching of the program for each request, so instead of launching the program every time, it improved the slowdown by maintaining the connection between the web server and the external program. This is FastCGI.

In FastCGI, the Web server and the external program communicate using the INIX domain socket technology or local TCP/IP. The external program resides in the same way as the web server and is always ready to respond to requests, maintaining the connection between the web server and the external program.

In HTTP, a proxy is a server that relays the response from the server, caches and filters the content, and a reverse proxy is one that does the opposite (relays the request from the client and relays the request to the internal server).

In recent years, there has been an increase in the use of reverse proxies that operate at the application layer using HTTP to link web servers and external programs.

As interfaces to connect web servers such as WSGI/Rack/PSGI have appeared in web application frameworks, it has become easier to run web applications as HTTP servers. As a result, it is now possible to connect to a web server using HTTP, which is an existing mechanism, instead of FastCGI, which is a dedicated mechanism.

The most prominent OSS implementation of a Web server is Linux.

In the next article, we will discuss browsers, HTML, CSS, and JavaScript.

コメント

タイトルとURLをコピーしました