About Web Technologies
Web technology is the platform on which technologies such as machine learning, artificial intelligence, and digital transformation are placed.
In this article, we will discuss the overview of web technology (overview of Internet technology, HTTP protocol, web server, web browser, web application and programming technology such as Javascript and React), implementation technology (Javascript, React, Clojure, Pyhton etc.) and implementation techniques (Javascript, React, Clojure, Pyhton, etc.), concrete applications (MAMP, MediaWiki and WordPress as CMS (Contents Management System), concrete launch of Fess and ElasticSearch as search platforms) and various applications (various applications presented at conferences and on the web). I will also describe various applications (various applications presented at conferences and on the web).
Implementation
This section describes examples of how servers described in “Server Technology” can be used in various programming languages. Server technology here refers to technology related to the design, construction, and operation of server systems that receive requests from clients over a network, execute requested processes, and return responses.
Server technologies are used in a variety of systems and services, such as web applications, API servers, database servers, and mail servers. Server technology implementation methods and best practices differ depending on the programming language and framework.
Rust is a programming language developed by Mozilla Research for systems programming, designed with an emphasis on high performance, memory safety, parallelism, and multi-threaded processing. It is also a language focused on bug prevention through strong static type checking at compile time.
This section provides an overview of Rust, its basic syntax, various applications, and concrete implementations.
Database technology refers to technology for efficiently managing, storing, retrieving, and processing data, and is intended to support data persistence and manipulation in information systems and applications, and to ensure data accuracy, consistency, availability, and security.
The following sections describe implementations in various languages for actually handling these databases.
Raspberry Pi is a Single Board Computer (SBC), a small computer developed by the Raspberry Pi Foundation in the UK. Its name comes from a dessert called “Raspberry Pi,” which is popular in the UK.
This section provides an overview of the Raspberry Pi and describes various applications and concrete implementation examples.
- Linking Web Server and DB (1) Setting up a server in Clojure
In this article, we will discuss setting up a server using Ring, a Clojure web server technology.
- Linking Web server and DB (2) Routing using compojure
In this article, we describe the implementation of routing using compojure, a web server technology of Clojure.
- Linking web server and DB (3) DB connection and control
This time, we will discuss setting up a database (postgresql) that will be connected to the web system that was set up in the last issue.
- Linking web server and DB (4) Linking Server and DB
In this article, we will discuss about building a web application in Clojure by connecting the database and the web server that were set up in the previous article.
A search system will be a system that searches databases and information sources based on a given query and returns relevant results, and will be capable of targeting various types of data, such as information, image, and voice search. The implementation of a search system involves elements such as database management, search algorithms, indexing, ranking models, and user interfaces, and a variety of technologies and algorithms are used, with the appropriate approach selected according to specific requirements and data types.
This section discusses specific implementation examples, focusing on Elasticsearch.
An elastic search-based search engine that can be up and running in a short period of time (creeling, automatic indexing, word registration, etc.)
Steps to set up the environment for launching Elastic Search, the de facto search engine.
I will describe the construction of a simple UI for elastic search (launching reactive search) built on node and react.
- Search tool Elastic Search UI (collaboration with reactivesearch: application)
Continuing from the previous article, we will discuss a more complex search UI configuration as an application of reactivesearch UI.
Elasticsearch is an open source distributed search engine that provides many features to enable fast text search and data analysis. Various plug-ins are also available to extend the functionality of Elasticsearch. This section describes these plug-ins and their specific implementations.
Web crawling is a technology to automatically collect information on the Web. This section describes an overview of web crawling, its applications, and concrete implementations using Python and Clojure.
I will describe a simple chatbot implementation using Node.js, a front-side server framework using Javascript, and React.
The basic operation is rule-based: after writing the rules of the conversation, we drop them into a json file, and the conversation proceeds along the flow of the written sequence.
CMS (Contents Management System) such as Media Wiki and Word Press are platforms that are used in various scenes. In order to run them in production, it is necessary to set up a server such as Apache and a database such as Postresql, but XAMPP and MAMP can be used for a simple start-up.
In this article, I will describe how to set up MAMP and Mediawiki (WordPress).
Semantic MediaWiki (SMW) will be an extension to Wikimedia’s MediaWiki software to add semantic capabilities based on Semantic Web principles. and enhance information retrieval, filtering, and visualization.
Semantic MediaWiki provides a special markup and query language for structuring information on wikipages and defining relationships between them. This allows wiki pages to be treated like a database within a wiki, providing the ability to categorize, tag, associate, and define properties of information.
CSS (Cascading Style Sheets) is a style sheet language used to specify the style and layout of documents created in markup languages such as HTML and XML, Bootstrap is a framework for building responsive websites and web applications using HTML, CSS, and JavaScript.
This section mainly discusses the various implementations of bootstrap.
WoT (Web of Things) will be a standardized architecture and protocol for interconnecting various devices on the Internet and enabling communication and interaction between devices. The WoT is intended to extend the Internet of Things (IoT), simplify interactions with devices, and increase interoperability.
This article describes general implementation procedures, libraries, platforms, and concrete examples of WoT implementations in python and C.
Overview
The Internet is a global network of interconnected computer networks around the world, and Web technologies will be technologies for transmitting and viewing information and content on that Internet. In other words, the Internet is the foundation of information and communication, and web technology provides specific tools and methods for transmitting, sharing and browsing information on it. In contrast to Web technologies as information dissemination/sharing/browsing, Web 3.0 focuses on Semantic Web technologies and improved data semantics, whereas Web 3 represents a new architecture and philosophy of the distributed Web, emphasising distributed ledger technology, data ownership and privacy which can be described as. Both are responsible for different aspects of the future of the web and overlap in some respects, but are different concepts.
First, let’s look at Internet technology, which is the base of Web technology. The Internet began with ARPNET, a system for communicating via multiple computers, which was first studied in the late 1960s and realized in the United States in the early 1970s. The idea is to communicate via a machine.
Initially started for military purposes to build fault-tolerant networks, the Internet Protocol Suite (suite means a set) was developed to become a standard technology that is widely used around the world. The protocols used in the Internet Protocol Suite are divided into several layers, and communication is achieved by having the higher-level protocols conceal the lower-level protocols. This is called encapsulation. This encapsulation ensures modularity among the layers and enables interconnection of various networks.
HTTP is an application layer protocol used when a client and server communicate on the Web. Hypertext, the H of HTTP, refers to the ability to link multiple documents. The following is an example.
HTTP/1.1 is currently the predominant version of HTTP. Next-generation HTTP versions include HTTP2 (header compression and pipeline processing) and HTTP3 (use of QUIC), which are designed for higher speed.
HTTP1.1 is a specification for exchanging data in text. Therefore, by using browser development tools, it is possible to see the actual contents of the data being exchanged.
In HTTP, the roles of the client and server are very different, and the data sent by each differs. The HTTP request is sent by the client, and the HTTP response is sent by the server in response.
The web server is always connected to the network and is ready to receive tangential loss from clients at any time. When data is received from a client, it receives it, passes the data to the HTTP request parsing process, and returns response data in response to the parsed request.
One of the characteristics of a web server is that it handles requests from multiple clients concurrently. There are two main types of multiple request processing: “prefork type” and “event type.
The prefork type uses the OS’s ability to switch multiple programs (processes) at high speed and act as if they are running in parallel to process requests from multiple clients concurrently. In this case, the web server spawns multiple processes, and each process handles one connection. The OS handles the process switching, so the web server only needs to handle its own requests in each process.
In the event-driven model, asynchronous I/O is used to handle multiple requests in a single process. This means that another process is executed while the CPU waits for slow data exchanges, such as network or disk accesses. nginx is a typical web server that uses event-driven processing.
In this article, we will discuss browsers, HTML, CSS, and Javascript.
The browser is the client part that receives and processes data from the previous server. Typical browsers include Chrome, Safari, Edge, Firefox, etc. The actual operation of a browser is as follows: (1) analyze a URL, (2) query a DNS server to obtain an IP address from a host name, (3) generate an HTTP request, (4) send an HTTP request to the obtained IP address, (5) receive an HTTP response from the web server, and (6) process the response. (6) analyze the HTTP response, (7) if necessary, retrieve related content through additional HTTP communication, and (8) display the received content based on the analysis results.
HTML, the input for rendering, stands for HyperText Markup Language. Since HTML handles text, it can be used to specify the abstract structure of sentences such as headings and paragraphs, as well as the appearance of fonts such as size and color. The same is true of the
CSS (Cascading Style Sheets) was developed to separate these appearance functions.
Javascript is a language that runs in the browser. It was implemented as a language to run on Netscape Navigator, a browser used in the early days of the Web, and was given the name JavaScript because Java was attracting a lot of attention at the time. Despite the similarity in name, it is a completely different language from Java.
The Web was built to link documents from around the world through HTML hyperlinks. As their use increased, the Web, which originally provided only static content, evolved to provide dynamic applications, which evolved into software called Web applications.
However, since those software were dynamic mechanisms built on top of static mechanisms, they require various innovations that differ from general applications.
Since HTTP used in the Web is a stateless protocol that does not have state, it becomes necessary to maintain state in the application in some form in order to manage user state. Applications for ordinary PCs and smartphones have state in memory, so they do not need to make any special efforts to maintain state.
REST stands for representational state transfer and defines an architectural style of distributed hypermedia systems. applications that support REST do not store any information about the client’s state on the server side. Such applications require the client to maintain state itself and transfer state between the client and server using a REST-style implementation, such as an HTTP API exposed by the application. The client and server can be synchronized by querying the server for the latest state and expressing the same state on the client side.
Web development Programming Language
Front-end development in Javascript
Javascript is designed to feel familiar. The language’s syntax is similar to Java, and its structure of functions, arrays, dictionaries (or associative arrays), and regular expressions is common to many scripting languages. Thus, Javascript appears to be a quick learning curve for anyone with a little experience writing programs. In addition, the small number of concepts that form the core of the language allows even novice programmers to begin actually writing programs with little training.
Despite this approachability, to become truly proficient in Javascript, it takes a long time and a deep understanding of the language’s semantics, idiosyncrasies, and most effective idioms.
Like most successful technologies, Javascript has evolved over time. Initially touted as a “Java augmentation” for programming interactive Web pages, Javascript eventually usurped Java as the most dominant programming language on the Web. In 1997, a standard (officially called ECMAScript) was established for Javascript, which has become so popular that it is now the most widely used programming language on the Web. Nowadays, there are many Javascript implementations that conform to various versions of ECMAScript.
In the following pages of this blog provide an overview of the Javascript language, specific implementations used for front-end development, and frameworks such as React.
Web Design with CSS
Both HTML and CSS specifications are developed by the World Wide Web Consortium (W3C), a non-profit organization.
W3C has many companies and organizations from around the world as members, and several steps leading up to the final recommended specification are made public. This type of use by a non-profit standards organization has the advantage of long-term global use compared to use by a specific company, but it also has the disadvantage that it takes time for the specification to be finalized, and in order to stay ahead of the latest technology, both browser use and author use of the specification may be limited to the intermediate steps. Sometimes it is based on use.
CSS is used in conjunction with HTML, where the emphasis is on clarifying the structure of the document. Here, “document structure” refers to the role of each part of the content, such as headings, body text, quotations, bullets, and tables. By properly defining the structure of the document with HTML elements, the structure can be identified and handled by the program.
In the following pages of this blog, we provide an overview of CSS, concrete implementation examples, and an introduction to Bootstrap, a popular framework.
PHP and Web Frameworks
PHP (Hypertext Preprocessor) is a server-side script execution environment that runs on major web servers such as Apache and IIS (Internet Information Services). Its easy-to-understand function-based syntax allows even novice programmers to learn it easily, and its user base has expanded rapidly in the years since 2000.
In recent years, the “for beginners” image has faded somewhat, and with the strengthening of the environment, including object-oriented syntax, type declarations, standard libraries, and application frameworks, the groundwork is now in place to support so-called large-scale development. In addition, support for PHP is being strengthened mainly through Microsoft Azure, a cloud environment, and PHP, which previously seemed to be a technology mainly for the Linux environment, can now be used under a multi-platform environment.
PHP is, at best, a “flexible” language, and at worst, a “vague and lazy” language. It is a language that is prone to bugs and security problems, even if you are just copying someone else’s code.
In the following pages of this blog, we describe PHP, its language overview, and actual web construction using frameworks such as Composer and Laravel.
Clojure and functional programming
Clojure is a relatively new language, created by Rich Hickey and introduced in 2007. Although new, the language itself is a dialect of the LISP language introduced in 1958, and it is both old and new, running on top of the JVM and able to use legacy JAVA programming language code.
One of the features of Clojure is that it is a functional language. This is one of the latest trends in the history of programming languages, in which all programs are composed of functional blocks called functions, in contrast to conventional languages such as python and javascript, which are written procedures.
One of the perspectives in programming language development is to improve reusability. Object-oriented languages, which dominated the market before functional languages, were developed from this perspective, but the idea of structuring programs in blocks of functions further improves reusability.
In the following pages of this blog, we introduce Clojure, give an overview of the language, introduce some general implementation examples, and describe some examples of implementations in web applications and machine learning, as well as its application to AI technology.
Implementations
Laravel is a PHP framework developed by Taylor Otwell. One of the features of Laravel is its low learning cost.
Laravel Sail is an official development environment provided by Laravel, and it is easy to build a Laravel development environment by using a tool called Docker.
In this article, we will describe how to install and configure Docker, and how to download and run Laravel Sail.
Pedestal is an API-first Clojure framework, a data-driven extensible framework that provides a set of libraries for building reliable concurrent services with dynamic properties, implemented using protocols to reduce coupling between components.
Apache Spark is an open source parallel and distributed processing platform. Based on Spark Core, the engine of parallel and distributed processing, Spark consists of a set of application-specific libraries: Spark SQL for SQL processing, Spark Streaming for stream processing, MLlib for machine learning processing, and GraphX for graph processing.
Spark Core can accept HDFS (Hadoop Distributed File System) as well as HIve, HBase, PostgreSQL, MySQL, CSV files, and other inputs as data sources.
Spark provides fast parallel distributed processing of large amounts of data, and after reading data from the data source, Spark processes it with minimal I/O to storage and network I/O. This allows Spark to process the same data in the same way, with the same amount of I/O to storage and network I/O. This makes Spark suitable for cases where the same data needs to be transformed in succession, or where the result set needs to be iterated over multiple times, such as in machine learning. Spark’s features are described below in terms of “machine learning processing,” “business processing,” and “stream processing,” respectively.
In this article, we will discuss how to realize an on-premise system consisting of one web server and one database server on AWS.
Amazon Web Services (AWS) is a virtual system that is configured in the cloud. Naturally, the computers, storage, and network are virtual as well, and when you sign up for AWS, there are no servers, let alone a network.
In order to create a system, it is necessary to start by building a virtual network. While the basic concept of building a virtual system is the same as building a conventional physical system, there are many differences. Therefore, it is necessary to fully understand the differences first when building a virtual system or network in AWS. Therefore, this article will provide an overview of the differences between legacy physical infrastructure and the data center environment in AWS.
Amazon VPC (Virtual private Cloud) is a service that configures a virtual network in the cloud. When deploying resources such as servers, it is first necessary to create a VPC area. To run virtual machines in a VPC area, users need to create subnets in the VPC and configure several networking settings. For users, this is one of the most basic AWS services.
Because a VPC is a virtual network environment, it can be configured remotely using a web browser user interface without having to touch any hardware. However, the AWS virtual network is not a single network. However, since the construction of an AWS virtual network involves many unique settings, in this section we will actually create a VPC area and create a subnet within it, paying close attention to these points.
When an EC2 instance is placed on a subnet, one or more private IP addresses are automatically assigned. First, we will describe the allocation rules and mechanism.
Based on the IP address allocation rules described above, let’s actually place an EC2 instance on the subnet and see how it works. Here, we will discuss an example of placing an EC2 instance on the subnet “mysubnet01” described above.
AWS provides various EC2 instance types with different performance levels; EC2 instances are charged on a pay-per-use basis per hour of operation, with the higher performance instances being more expensive. For this experiment alone, choosing a low-specification, inexpensive instance type will be sufficient. However, when actually operating the system, it is necessary to select an appropriate instance type that meets the system requirements for a combination of cost and performance. Instance performance is mainly determined by the following seven items
- Introduction to Amazon Web Services Networking (4) Connecting and Checking Instances to the Internet
To connect to an Ec2 instance from the Internet, a public IP address assignment and Internet gateway are required. This in itself is no different from a normal network environment, but public IP addresses in AWS are handled a little differently. Instead of truly assigning a public IP address to an instance, communication is done by converting the private IP address using NAT.
In this article, we will discuss how to assign a public IP and set up an Internet Gateway. We will then describe how the assigned IP address looks from the instance by logging in to the EC2 instance via SSH and checking the network interface settings.
To connect an EC2 instance on a VPC to the Internet, simply assigning a public IP address is not sufficient. An Internet gateway must be provided and the route table must also be changed.
The AWS firewall function has two mechanisms: one is the “security group” mechanism and the other is the “network ACL” mechanism. One is “security groups” and the other is “network ACLs. The former is set for each EC2 instance, and the latter is set for each subnet.
The reason for having mechanisms with different security levels is that it is necessary to use the two differently. Roughly speaking, network ACLs are used for security on a subnet basis, and security groups are used to control ports that need to be handled individually for each instance.
In this article, we will discuss these two firewall functions and describe how to change security groups, which is necessary when web server software such as Apache HTTP Server or nginx is installed on an EC2 instance.
For websites with back-end databases, it is a common practice to place the database server on a subnet that is not directly accessible from the Internet for security reasons. You can also set up an EC2 instance on a private subnet.
To do any work on the EC2 instance, you need to connect to it remotely using SSH or other means. However, an instance that is assigned only a private IP address cannot be reached from the Internet, so the instance cannot be directly manipulated remotely. There are two rules against this (1) Use a stepping-stone server, and (2) Configure a VPN. In this article, we describe the concrete implementation of (1) using WordPress and MySQL as an example.
Web servers used for business are usually operated using URLs with their own domains, such as “http://www.example.co.jp/”. In this case, a “public static IP address” and a “DNS server” are required.
In an on-premises communications environment, it is common practice to use BIND when DNS is required, so it is normal to install BIND on an EC2 instance in AWS. Of course, you can build a mechanism for name resolution in this way as well, but AWS usually uses Route53, which is a managed service (a service that lets AWS take care of operations and management) that provides DNS services.
Practical Use
A tool for preprocessing (data partitioning, normalization, cleansing, etc.), which is important in machine learning. In particular, it has features such as edit distance processing to deal with shaky expressions in natural language processing, and the ability to convert various types of data as input/output.
D3.js and React, which are based on Javascript, can be used as tools for visualizing relational data such as graph data. In this article, we will discuss specific implementations using D3 and React for 2D and 3D graph displays, and heat maps as a form of displaying relational data.
- PF for fast processing of streamed data and large amounts of data: Apache Spark Overview
Apache Spark is an open source parallel and distributed processing infrastructure. Based on Spark Core, the engine of parallel and distributed processing, Spark consists of a set of application-specific libraries, including Spark SQL for SQL processing, Spark Streaming for stream processing, MLlib for machine learning processing, and GraphX for graph processing.
Spark Core can accept HDFS (Hadoop Distributed File System) as well as HIve, HBase, PostgreSQL, MySQL, CSV files, and other inputs as data sources.
Spark provides fast parallel distributed processing for large amounts of data, and after reading data from the data source, Spark processes it with minimal I/O to storage and network I/O. Therefore, Spark is suitable for cases where the same data is transformed in succession, or where the result set is iterated over multiple times, such as in machine learning. Spark’s features are described below in terms of “machine learning processing,” “business processing,” and “stream processing,” respectively.
- Introduction to Amazon Web Services Networking (5) Security Groups and Network ACLs
The AWS firewall function has two mechanisms: one is the “security group” mechanism and the other is the “network ACL” mechanism. One is “security groups” and the other is “network ACLs. The former is set for each EC2 instance, and the latter is set for each subnet.
The reason for having mechanisms with different security levels is that it is necessary to use the two differently. Roughly speaking, network ACLs are used for security on a subnet basis, and security groups are used to control ports that need to be handled individually for each instance.
In this article, we will discuss these two firewall functions and describe how to change security groups, which is necessary when web server software such as Apache HTTP Server or nginx is installed on an EC2 instance.
- Introduction to Amazon Web Services Networking (7) Operation of your own domain
Web servers used for business are usually operated using URLs with their own domains, such as “http://www.example.co.jp/”. At that time, a “public static IP address” and a “DNS server” are required.
In an on-premises communications environment, it is common practice to use BIND when DNS is required, so it is normal to install BIND on an EC2 instance in AWS. Of course, you can build a mechanism for name resolution in this way as well, but AWS usually uses Route53, which is a managed service (a service that lets AWS take care of the operation and management) that provides DNS services.
Applications
This section describes WoT (Web of Things) technology used in Artificial Intelligence and IOT technologies. WoT is an abbreviation for Web of Things, which was defined by W3C, the Internet standards organization, to solve existing IoT issues.
WoT addresses one of the challenges of the IoT, which is the lack of compatibility (at present, in many cases, sensors, platforms, or operating systems work only with certain systems), by addressing the issues of existing web technologies that are already widely used (HTML, Javascript, JSON, etc.) and By using protocols to provide IoT services and applications, we can increase interoperability of devices and add features such as security and access control at the application level, as well as semantic usage of data combined with Semantic Web technologies. The goal is to enable the creation of a wide variety of services.
Semantic Web Technologies
Semantic Web technology is “a project to improve the convenience of the World Wide Web by developing standards and tools that make it possible to handle the meaning of Web pages,” and it will evolve Web technology from the current WWW “web of documents” to a “web of data.
The data handled there is not Data in the DIKW (Data Information Knowledge Wisdom) pyramid, but Information and Knowledge information, expressed in ontologies, RDF and other frameworks for expressing knowledge, and used in various DX and AI tasks.
In the following pages of this blog, I discuss about this Semantic Web technology, ontology technology, and conference papers such as information of ISWC (International Semantic Web Conference), which is the world’s leading conference on Semantic Web technology.
Other Technologies
- A Semantic Web Resource Protocol: XPointer and HTTP
- On the Emergent Semantic Web and Overlooked Issues
- Metadata-Driven Personal Knowledge Publishing
- Guidelines for Benchmarking the Performance of Ontology Management APIs
- A Semantic Context-Aware Access Control Framework for Secure Collaborations in Pervasive Computing Environments
- A Framework for Ontology Evolution in Collaborative Environments
- Information Integration Via an End-to-End Distributed Semantic Web System
コメント