Overview of web technologies, HTTP protocol connecting web servers and browsers

Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Search Technology DataBase Technology Ontology Technology Algorithm Digital Transformation Technology User Interface and DataVisualization Workflow & Services IT Infrastructure Navigation of this blog

Summary

From WEB+DB EXPRESS vol122, HTTP is an application layer protocol used for communication between clients and servers on the Web, as described previous article.Hypertext, the H of HTTP, refers to the ability to link multiple documents together. Hypertext, the “H” in HTTP, refers to the ability to link multiple documents together. HTML uses the link function to realize hypertext.

HTTP protocol overview

Currently, HTTP/1.1 is the mainstream version of HTTP. The next generation of HTTP includes HTTP2 (header compression and pipeline processing) and HTTP3 (use of QUIC), which are designed to increase speed.

HTTP1.1 is designed to exchange data in text format. Therefore, by using browser development tools, it is possible to see the actual contents of the data being exchanged.

In HTTP, the roles of the client and the server differ greatly, and the data to be sent differs between them. HTTP requests are sent by the client, and HTTP responses are sent by the server in response.

HTTP works by issuing a request for each resource to be retrieved and receiving a response, and connections are broken for each request. Since each request starts with a new connection, it is not possible to maintain the state of communication. This is called a stateless protocol. In a real web application, there are many cases where you want to keep the state, so you can use a mechanism like a cookie to make HTTP have a state.

The HTTP request is sent from the client to the server, and consists of three parts: the request line, the header field, and the request body. The header field and request body are separated by a blank line, and the header field and request body can be omitted. The following is an example of a simple HTTP request

POST /index.html HTTP/1.1. ← リクエスト行
Host: ww.example.com　　← ヘッダーフィールド
Content-Type:application/x-ww-form-irlencoded. ← ヘッダフィールド
foo=hoge&bar=fuga. ← リクエストボディ

The request line indicates what kind of request is to be sent to the server. The data format is described in the order of “method”, “request target”, and “HTTP version”.

GET is a method to retrieve data from a specified request target, while POST is used to send data to a server.

The request target specifies the path of the resource to be retrieved. The one shown in the example is the path below the host name of the specified URL. (The host name will be converted to an IP address and used to send packets over the Internet, so it will not appear in the application layer.)

Header fields are composed of two elements: “field name” and “value”. There are many types of these (e.g., Accept-Langusge, Authorization, Content-type, Host, etc.)

The request body is used when you want to send data to the web server. The format of the data to be sent is specified by Content-Type in the header field, and although various formats can be specified in Content-Type, a commonly used example is application/x-www-form-urlencoded. As an example, the following data is represented.

key=value1 & key2=value2

Specify the parameter name and value in the form of key=value, and connect multiple parameters with &.

The HTTP response returned from the server to the request described so far consists of three elements: the “status line,” “header field,” and “response body. As in the request, the header field and response body are separated by a blank line, and the header field and response body may be omitted. A simple HTTP response is shown below.

HTTP/1.1 200OK  ← ステータス行
Content-Type: text/html. ← ヘッダフィールド
Connection: close. ← ヘッダフィールド
<html>           ← レスポンスボディ
   <body>Hello World!!</body>. ← レスポンスボディ
<html>.          ← レスポンスボディ

The first line of an HTTP response is always the status line. The status line indicates what type of response it is. It is composed of three elements: “HTTP version”, “status code”, and “message”.

The HTTP version is HTTP/1.1 in the example above, and the status code is a predetermined code that indicates the result of the request. They are as follows.

The message can contain any string (e.g., OK). Since it is mainly used to convey information to the developer, the string will vary depending on the server.

The header field is the same as the HTTP request, containing a pair of “field name” and “atta”. Commonly used field names include “Cache-Control”, “Content-Type”, and “Location”.

The format of the data to be sent in the response body is specified in the header field Content-Type, for example, text/html for HTML, image/png for PNG images, etc.

As mentioned above, cookies are a stateless protocol, HTTP, with the addition of a state management function. The main data flow is shown in the figure below.

Cookies provide a state management mechanism in the following flow.

① WebサーバーがクライアントにCookieの値を設定するレスポンスを返す。

②　クライアントはCookieの値を保持し、WebサーバーへのリクエストにCookieの値を含める。

③　WebサーバはCookieの値を読み込み、コンテンツを返す。必要に応じて新たなCookieの値を設定する

Cookies and Set-Cookies are header fields that are used to realize the above mechanism. At this time, if a large number of values are stored in a cookie, the communication efficiency will deteriorate, so usually only data such as identifiers are stored, and most of the state management is done by the web server.

When a client connects to a web server to receive response contents, some contents are changed more frequently than others, and it is inefficient to send contents that are changed less frequently every time. If the content retrieved last time is cached on the client side, and the local cache is used when the content has not changed, waste can be avoided.

In this way, HPPT has a function to determine whether the content is the same as the content retrieved in the previous request, and if the content is the same, to omit sending the content. There are two ways to determine whether the content is the same or not: using the last modified time or using a unique identifier.

A web server can usually host only one web application, but a feature that allows a single web server to serve multiple web applications is “virtual hosting”.

In the HTTP request, the host name is converted to an IP address and used in packet communication over the Internet, so the information on the host name of the web application is not included in the application layer protocol because it is not necessary. Therefore, the application layer does not know which host name was used to access the application.

An example of an HTTP request containing a Host header field is shown below.

GET /index.html HTTP/1.1
Host: www.example.com

The Web server reads the value of the Host header field to determine which host name is being used to avoid access, and decides what content to return.

The next article will discuss web servers.