Rule base, knowledge base, expert system and relational data

A new textbook of artificial intelligence, from the basic knowledge required for product/service development. An overview of knowledge data technology within artificial intelligence technology.

Human beings are constantly making choices in their lives. At that time, we make some kind of decision by comparing two things in our mind. In the same way, computers solve problems by continuously comparing them. The comparison in this case is the conditional branching, and the answer to the input is derived by performing the conditional branching. It can be said that this is also artificial intelligence in that the machine is doing the same thing that humans recognize as intelligence. Very early artificial intelligence was composed of programs that performed this kind of conditional branching, and this has been carried over to the present (Fig. 1).

When doing conditional branching, what is needed is the condition setting for it, or rules. A system that uses rules to perform conditional branching is called a rule-based system, and conditional branching is often described in the form of IFTHEN. Considering that the structure of a program or algorithm is expressed in a flowchart, it can be seen that rule-based systems are compatible with flowcharts (Figure 2).

When building a rule-based system, the content of the conditional branches is written in a flowchart. The rules need to be decided in advance by humans. Naturally, the rule-based system cannot deal with so-called unknown problems for which humans do not know the correct answer. When deciding on the condition settings, it is necessary to pay attention to the order and priority of them.

For example, let’s consider a simple process that determines the airflow rate for air conditioning based on the temperature: very strong when the temperature is 33 degrees Celsius or higher, strong when the temperature is 30 degrees Celsius or higher, and weak when the temperature is 28 degrees Celsius or higher. If the temperature is 34 degrees Celsius and the wind strength is determined based on whether the temperature is 28 degrees Celsius or higher, the wind will be weak even though it is very hot. In order to avoid such a situation, it is necessary to judge the hotter condition first.

Another example is the process of name registration. This is a process to determine whether one ID is the same as another ID, and if so, to output the same ID. Recent topics include the matching of insurance subscribers in medical-related databases for drug safety measures, etc., in relation to medical fee statements (receipts) received at hospitals, etc., and the matching of 50 million employees’ pension numbers and national pension numbers to individual pension recipients, as it has become difficult to integrate them into the basic pension number in pension record management. In the management of pension records, the 50 million employee pension numbers and national pension numbers that have been difficult to integrate into the basic pension number have to be matched with individual pensioners (Figure 3). It will be necessary to determine the rules while dealing with several cases, such as shaky names and typographical errors in the records. It will also be necessary to set rules for not using names that have already been assigned for further assignment.

This clarification of the problem and solution at the stage of designing the rules is called problem formulation. There has been a lot of talk about whether or not human jobs will disappear with regard to artificial intelligence, but if this problem formulation can be made possible by artificial intelligence, it will be something that will make things much easier for humans.

If we draw a flowchart based on rules, we can construct a binary tree based on that information. This tree structure is called a decision tree and is often used in statistical data processing and analysis. If a rule is unknown and you want to find it, you may be able to find it by processing statistical data. In such cases, it is important to consider decision trees.

When constructing a rule-based program, if the conditional branches that serve as rules are based on fixed information, there is no problem with hard coding (a program that cannot be modified later). If the conditional settings are changed, there is no problem as long as the program can be rewritten without cost. In the past, when the external storage device was a very expensive item, it was sometimes cheaper to rewrite the program, but when the condition setting is easily changed, for example, when you want to change it according to your preference, it is very costly to rewrite the program again and again (see the figure below).

Therefore, a method to deal with this problem was considered by separating the main body of the program, which uses the data for processing and output, from the data that serves as the condition setting. The separated set of data is called a knowledge base. When a program needs to make a conditional branch, it uses the ID to extract the rule, reads the setting value, and makes a decision.

The knowledge base may be stored as a text file on a file system, or in a database management system (DBMS) such as SQLite. In some systems, the contents written in the knowledge base can be updated using a text editor or a dedicated configuration screen or query language (see the figure below).

In addition to the data stored in the knowledge base being read out by programs as settings, there are also systems that store vast amounts of information that can be explored by humans.

For example, a database system called UniProtKB is one of the knowledge bases used in the life science field. European institutions collaborate to collect protein information, and through annotation and curation (collecting data, examining it, integrating it, and organizing it with annotations and other information), UniProt (Th (ThUniversalProteinResource, URLhttp://www.uniprot.org/) catalog database and analysis tools.

UniProtKB is a cataloging system that focuses on the amino acid sequences and protein characteristics that make up proteins from gene sequences and amino acid sequences that have been collected and registered in major databases around the world. The information is stored and made public. It also includes information on species and pathways (data that shows interactions with other proteins and other compounds), so it can be used to narrow down what proteins are similar to those in humans and mice, and what roles they are expected to play. The aforementioned rule-based system can be used to narrow down the list.

The aforementioned rule-based systems began to be used for large systems as they developed in the 1960s. In particular, systems that can support or replace the work of experts in classification and discrimination are called expert systems. Most of the current systems that present the results of analysis, such as production systems, are inherited from expert systems.

For example, one of the earliest expert systems was Dendral, a project developed at Stanford University in 1965, which estimated the chemical structure of a substance based on the numerical value (molecular weight) of the location of a peak obtained by mass spectrometry. The language used is LISP.

Specifically, the molecular weight of a molecule of water (H2O) is 18, since H=1.01 and 0=16.00, and mass spectrometry shows a peak near 18 (mass spectrometers using gas chromatography have a resolution of about an integer value, so it is not necessary to be too precise).

The system calculates the answer by calculating the combination of atoms for a chemical substance with a molecular weight of 18. As the molecular weight increases, the number of combinations of atoms becomes more diverse, and it takes more time to calculate the answer.

The Dendral system consists of two parts: a part that performs heuristic analysis (Heuristic Dendral) and a part that registers a set of molecular structures and their mass spectra as a knowledge base and feeds back to the heuristic analysis (Meta-Dendral). (Meta-Dendral can be called a learning system.

Meta-Dendral is a learning system, and MYCIN, a system derived from Dendral and developed in the 1970s, is an expert system. MYCIN is an expert system that diagnoses patients and contagious blood diseases and provides the antibiotics to be administered along with the dosage. The name “MYCIN” is derived from the antibiotic suffix “-mycin”.

MYCIN can make judgments based on about 500 rules, and based on the answers to questions that can be answered in a format other than yes or no, it not only shows some bacteria in the order of most likely (confidence level) to be the cause of the disease with reasons, but also suggests a treatment course based on information such as body weight.

The performance of MYCIN was better than that of doctors who do not specialize in bacterial infections, with a diagnostic accuracy rate of 65%, but not as good as that of specialists, which was 80% (survey by Stanford University School of Medicine).

The MYCIN project itself was not a bad performance, and was a successful development project, but it was not actually used in the field. Since the 2000s, these expert systems for medicine using trustworthiness have been described as “unusable” and it has been difficult to develop and implement similar systems. This is because of ethical and legal issues, such as the lack of clarity on liability issues when wrong results are adopted using computerized diagnosis, and the resistance of doctors. Incidentally, such systems are often required to have a diagnostic accuracy rate of 85-90% or higher and as few false positives and false negatives as possible (high positive accuracy rate).

An expert system uses an inference engine to return a decision result. An inference engine is a program that uses rules to make inferences. Rules that humans deal with can be understood by being expressed in words, but in order for a computer to interpret and process them, the expression needs to be changed to something suitable for it. The academic field related to such representation is called semiotic logic.

The most basic representation used is called propositional logic, and it represents things that are expressed in terms of truth values. Propositional logic consists of propositional variables and statement operators (connectives).

The purpose of propositional logic is to express and grasp the relevance of propositions by relating them with “and,” “or,” “then,” etc., without seeking the meaning of the proposition itself. Therefore, although we cannot analyze the meaning of propositions, we can make sense of them by using other logics that extend propositional logic, such as predicate logic.

By extending propositional logic, the logic shown in the figure below has been quantified and established as an inference engine.

These inference engines allow us to increase the number of ways we can respond to “questions”.

In propositional logic and postoperative logic, sentences are expressed using symbols. In propositional logic, logical formulas are composed of propositional variables, which are logical atomic formulas, and sentence operators, while in predicate logic, the range of expression is expanded by adding more symbols. The symbols are as follows.

For example, when there are two propositions P and Q, and their truth values (represented by true, false or 0, 1) are determined, depending on the values of P and Q, ¬P, P∧Q, P∨Q, P⇒Q, and P⇔Q will be as shown below.

We can say that P ⇒ Q is equal to (¬P) ∨ Q and P ⇔ Q is equal to (P ⇒ Q) ∧ (Q ⇒ P). This table is called a trial table.

A logical formula that is always true is called a tautology, while a logical formula that is always false is called a false formula or a contradiction formula, and a logical formula officer has a tautology, or equivalence relation, as shown in the figure below.

The inference rules that are presented as a combination of these formulas can be converted to a clause form. By converting them to clause form, even complex logical formulas can be grouped together to make them easier to handle. Converting to clause form is called transformation to the conjunction standard form for propositional logic formulas, and to the Skolem standard form for predicate logic formulas. In the case of the conjunction standard form, a clause is a logical formula that is combined with an elective. For example, the following figure shows how this is done.

When transforming to the Skolem standard form, the Skolem function is used to remove the presence sign. In this case, ∀x1∃x2¬P(x1,x2) means that it is possible to map x1 to x2, so it can be set to f(x1), and ∃x3∀x4R(x3,x4) can be replaced with the constant In the last section on making variables independent, it is more convenient that x4 and x1 in C1 and C2 in the procedure for applying the distributive law are independent of each other, so we made them independent by replacing them with x2 and x3.

These transformations of the inference engine and inference rules have the effect of making it possible to query the knowledge base more efficiently.

In this way, we can say that artificial intelligence is an aspect of “how much of what is achieved by inference engines can be done without human assistance. In the 1970s, it was said that “even with these inference engines, there is a limit to how much artificial intelligence can be created that can handle all kinds of problems. In the 1970s, it was said, “Even with these inference engines, there is a limit to how far we can go in creating artificial intelligence that can handle all kinds of problems. In the 1970s, it was said that “even with all these inference engines, there are limits to how far we can go in creating artificial intelligence for all problems.

Another example of the use of an expert system, such as a program that uses mass spectra to infer the scientific structure of matter, is a recommendation engine, which is a system in wide use today.

A recommendation engine is a system that can be used on e-commerce sites to make recommendations to visitors, such as “What are you buying after looking at this product? A recommendation engine is a system that recommends similar products to visitors to a site. This can be described as an expert system that asks the visitor to display similar products using the information the visitor is looking at as keywords.

Recommendation engines can be divided into two main types. One is to make recommendations based on content, and the other is to make recommendations using information specific to the site visitor, such as the visitor’s browsing history and gradient history.

A recommendation engine based on content does not use information about the visitor, but simply calculates relevant content from the information at hand (information about products for e-commerce sites, or articles for news sites).

From the information at hand as a knowledge base, for example, there are elements that make up the information, such as titles and genres, as well as other data representations that are derived by calculation. The elements that make up the information and the data expressions derived by calculation are generally called feature values. The process of deriving them by calculation is called feature extraction.

For example, let’s say that person A is looking at a news article about the Kumamoto earthquake. At this time, the “problem to be solved by the recommendation engine” is what to suggest to Mr. A as the next article to read. (Figure below)

Suppose that each article has a set of keywords. From these keywords, features can be created.

A state in which multiple components, such as keywords or words, frequently exist in common in data such as multiple articles or sentences is said to be shared, and what is expressed about that state is called a co-occurrence pattern or co-occurrence expression. (Figure below)

With the data of these co-occurrence expressions obtained, we can calculate the relevance of the articles. (Figure below)

For example, if we decide to determine the relevance of article a and article b by the percentage of their keywords in common (calculate the percentage of the number of keywords that exist in common among the number of keywords in the two), the relevance between articles can be calculated on a per-article basis.

This process makes it possible to arrange the articles with contents similar to article a in order of proximity. In this case, the articles that are close to article a are ordered as follows: article d>article b=article c.

In this example, it was assumed that “keywords are set in the articles,” but it is also possible to extract features by performing text processing by calculation. In addition, simply collecting articles with similar contents will result in articles with almost the same contents, so a separate method is needed to prevent articles from becoming overly similar.

An algorithm called collaborative filtering is used to make recommendations that are more suitable for the visitor by using data specific to the site visitor such as browsing history and purchase history. A typical example is the one used by Amazon.

In the aforementioned water faucet based on no use, it extracts contents that are close to each other by deriving the co-occurrence of keywords in the article officer and defining relevance. In contrast, individualized recommendations are made through correlation analysis using the co-occurrence of information about the visitor’s unique history and information about non-visitors. In other words, the recommendation is based on the hypothesis that “if there are people who behave and evaluate similarly to me, I will also behave and evaluate that person.

Let’s say that we have data that shows whether or not the target site visitor X and the other visitors A to E made a purchase of a certain product after viewing the product page. (Figure below)

Where there is no data, it is indicated by a hyphen. The problem here is to “calculate the degree of recommendation of the product”, which means “which product is the most suitable to recommend to Mr. X next from his purchase record”.

First, the correlation coefficients between the values of 0 and 1 are calculated for the five products (2, 3, 8, 9, and 10) that have been recorded for Ms. X. The correlation coefficients are calculated for the products that have been recorded as common purchases by the other five people. Here, we use Pearson’s product law correlation coefficient, which is commonly used. For example, the correlation coefficient between Mr. X and Mr. A is calculated as shown below.

The result is shown in the figure below. The correlation coefficients for the other four can be calculated in the same way. Then, we can see that there are three people (C, D, and E) who have a positive correlation (same purchase history) of 0.5 or higher (see below).

The next step is to select the products to recommend to Ms. X. The five products that Ms. X has not yet discovered are 1, 4, 5, 6, and 7.

There are five products (1, 4, 5, 6, and 7) that Ms. X has not discovered yet, and we calculate the average value of each product for those that have been recorded by Ms. C and Ms. E, and use this as the degree of recommendation. The reason why we did not use the total value as the recommendation is because we considered the effect of missing items. By doing so, we can find the products that Ms. X has not seen and is most likely to purchase from the data of three people whose purchasing behavior is similar to Ms. X’s by calculation. In this case, the recommendation of product 5 is 1.00, which is the highest value, so product 5 is the next product to be recommended to Ms. X.

In the example shown in the figure below, the products were recorded as 0 or 1, but well-known recommendation engines use a 5-step rating system.