Natural Language Processing Technology

Machine Learning Ontology Digital Transformation Artificial Intelligence Probabilistic Generative Model Clojure Python Navigation of this blog

Overview of Natural Language Processing Technology

Natural Language Processing (NLP) is a general term for technologies that mechanically process natural language used by humans for applications as diverse as text classification, document summarization, machine translation, sentiment analysis, and question answering.

The technology belongs to the field of artificial intelligence, which is a crossover of theories and techniques from various fields such as machine learning, statistics, linguistics, and computer science, etc. Basic NLP tasks include word segmentation, morphological analysis, POS tagging, syntactic analysis, semantic analysis, proper noun recognition, and collocation analysis, and these tasks are part of the preprocessing required to process text more mechanically and apply machine learning algorithms.

A number of machine learning techniques are used in NLP, including

  • Word Embedding (Word Embedding): Word embedding is a technique for vectorizing words in natural language, whereby the vectorization of words is used to quantify the relationships between words. Typical algorithms include Word2Vec and GloVe.
  • Part-of-Speech Tagging: Part-of-Speech Tagging is a technique for assigning parts-of-speech to words in a sentence. Typical algorithms include hidden Markov models (HMMs) and conditional probability fields (CRFs).
  • Named Entity Recognition: Named Entity Recognition is a technique for recognizing proper nouns such as names of people, places, and organizations in text. Typical algorithms include Conditional Random Fields (CRF) and Recurrent Neural Networks (RNN).
  • Parsing: Parsing is a technique for breaking sentences into meaningful units and analyzing how these units are connected. Typical algorithms include methods for generating parse trees and analyzing dependency relations.
  • Classification: Classification is a technique for classifying sentences into predefined categories. Typical algorithms include Support Vector Machines (SVM) and Naive Bayes.
  • Regression: Regression is a technique for predicting numerical values based on textual input. Typical algorithms include linear regression and logistic regression.

Recently, methods based on deep learning have become mainstream and are widely used, especially in the fields of natural language generation, natural language understanding, and dialogue systems, where significant progress has been made. For example, in the field of neural machine translation, translation accuracy has improved dramatically and now provides natural translation results.

NLP has a wide range of applications and is used in all fields, including business, medicine, entertainment, and education. Some examples are listed below.

  • Machine Translation: One of the most well-known areas of NLP will be machine translation. Machine translation technology facilitates communication between different languages, with Google Translate and DeepL being prime examples.
  • Document Classification: NLP is also commonly used in the field of document classification. This includes, for example, spam filtering, automatic classification of news articles, and classification of product reviews.
  • Emotional Analysis: NLP can be used to analyze emotions from text. For example, the sentiment of a review of a product can be analyzed. This technology enables companies to understand customer opinions and improve their products.
  • Question answering: NLP is also being used in the area of question answering. This includes, for example, voice assistants such as Siri and Alexa, and automated FAQ response systems.
  • Dialogue systems: NLP is also being used in the field of dialogue systems to enable natural interaction with humans. These include automated customer response systems and chatbots.
  • Natural Language Generation: NLP is also used in the field of natural language generation. Examples include automatic summarization, sentence generation, and sentence proofreading.

This section describes various theories, applications, and implementations of natural language processing techniques.

About Natural Language Processing Technology

From the preface of Iwanami Data Science Series Vol. 2 “Statistical Natural Language Processing – Machines for Handling Words”.

-Language is a tool used for communication between people. It is easy for humans to acquire language, and does not require any special talent or long and steady training. However, it is nearly impossible for non-humans to control language. Language becomes a very mysterious thing.

Natural language processing is the study of using computers to handle such language. It started in the 1940s, the same time that the original computer was born, and it has been a dream of mine since its birth to use computers to handle language.

In the beginning, natural language processing was realized by writing a series of rules that said, “This is what a word is. However, language is extremely diverse, constantly changing, and can be interpreted differently by different people and contexts. It is not practical to write down all of them as rules, and statistical inference based on data, i.e., actual natural language data, has been the mainstream since the late 1990s, replacing rule-based natural language processing. Statistical natural language processing is, to put it crudely, solving a problem by building a model of how words are actually used.

Natural language processing has made great strides under the statistical approach, and especially in the last 5-10 years, remarkable progress has been made in solving problems that were previously unsolvable. The methods used in natural language processing have one thing in common: they not only solve problems, but also infer “the other side of invisible words. -In this blog

In this blog, I will first discuss natural language from the perspective of “what natural language is” (2) from the viewpoint of philosophy, linguistics, and mathematics, and then (3) natural language processing technology in general, and most importantly (4) word similarity. (3) Natural language processing techniques in general, and (4) word similarity. It then discusses (5) various tools for using them in computers and their specific programming (6) implementation, sharing information that can be used for real-world tasks.

As described above, natural language processing is a fundamental technology for digital transformation, which enables computers to handle real-world information, and for building various artificial intelligence applications, which can be found in other sections.

In addition, the recent development of statistical natural language processing is inseparable from the development of machine learning technology, and this blog is structured in such a way that detailed information about them can be found in separate sections.

The following is a summary of the natural language processing-related items in this blog.

Implementation

Natural Language Processing (NLP) is a generic term for technologies for processing human natural language on computers, with the goal of developing methods and algorithms for understanding, interpreting, and generating textual data.

This section describes the various algorithms used for this natural language processing, the libraries and platforms that implement them, and specific examples of their implementation in various applications (document classification, proper name recognition, summarization, language modeling, sentiment analysis, and question answering).

Natural language processing (NLP) preprocessing is the process of preparing text data into a form suitable for machine learning models and analysis algorithms. Since machine learning models and analysis algorithms cannot ensure high performance for all data, the selection of appropriate preprocessing is an important requirement for the success of NLP tasks. Typical NLP preprocessing methods are described below. These methods are generally performed on a trial-and-error basis based on the characteristics of the data and task.

  • Metaverse control by natural language processing and generative AI

Metaverse manipulation by natural language will be a technology that allows users to intuitively control the objects, environment and avatar movements in the metaverse using natural language.

Various models for emotion recognition have been proposed, as described in “Emotion recognition, Buddhist philosophy and AI”. In addition, a number of AI technologies such as speech recognition, image recognition, natural language processing and bioinformation analysis have been used to extract emotions. This section describes the details of these technologies.

  • NLP Processing of Long Sentences by Sentence Segmentation

Sentence segmentation is an important step in the NLP (natural language processing) processing of long sentences. By segmenting long sentences into sentences, the text can be easily understood and analyzed, making it applicable to a variety of tasks. Below is an overview of sentence segmentation in NLP processing of long sentences.

Self-Supervised Learning (SLS) is a field of machine learning, an approach to learning from unlabeled data, and the SLS approach is a widely used method for training language models and learning expressions. The following is an overview of the Self-Supervised Learning approach to language processing.

Word Sense Disambiguation (WSD) is one of the key challenges in the field of Natural Language Processing (NLP). The goal of this technique is to accurately identify the meaning of a word in a sentence when it is used in multiple senses. In other words, when the same word has different meanings in different contexts, WSD tries to identify the correct meaning of the word, which is an important preprocessing step in various NLP tasks such as machine translation, information retrieval, and question answering systems. If the system can understand exactly which meaning is being used for a word in a sentence, it is more likely to produce more relevant and meaningful results.

The main approaches to using artificial intelligence techniques to extract emotions include (1) natural language processing, (2) speech recognition, (3) image recognition, and (4) biometric analysis. These methods are combined with algorithms such as machine learning and deep learning, and are basically detected using large amounts of training data. Approaches that combine different modalities (text, voice, images, biometric information, etc.) to comprehensively understand emotions are also more accurate methods.

Methods for extracting emotion from textual data include, specifically, dividing sentences into tokens, using machine learning algorithms to understand word meaning and context, and training models using a dataset for emotion analysis to predict the emotion context for unknown text

Sentiment Lexicons (Sentiment Polarity Lexicons) are used to indicate how positive or negative a word or phrase is. There are several statistical methods to analyze sentiment using this dictionary, including (1) simple count-based methods, (2) weighted methods, (3) combined TF-IDF methods, and (4) machine learning approaches.

The evaluation of text using natural language processing (NLP) is the process of quantitatively or qualitatively evaluating the quality and characteristics of textual data, a method that is relevant to a variety of NLP tasks and applications. This section describes various document evaluation sectoral methods.

Lexical learning using natural language processing (NLP) is the process by which a program understands the vocabulary of a language and learns the meaning and context of words. Lexical learning is the core of the NLP task, extracting the meaning of words and phrases from text data and enabling the model to understand natural language more effectively, an important It is a step in the process. This section provides an overview of this lexical learning, various algorithms and implementation examples.

Dealing with polysemous words (homonyms) in machine learning is one of the key challenges in tasks such as natural language processing (NLP) and information retrieval. Polysemy refers to cases where the same word has different meanings in different contexts, and various approaches exist to solve the problem of polysemy.

Multilingual NLP in machine learning is the field of developing natural language processing (NLP) models and applications for multiple languages, a key challenge in the field of machine learning and natural language processing, and a component of serving different cultural and linguistic communities. The NLP field is an important issue in the field of machine learning and natural language processing and is an element for serving different cultural and linguistic communities.

Language Detection algorithms are methods for automatically determining which language a given text is written in, and language detection is used in a variety of applications, including multilingual processing, natural language processing, web content classification, and machine translation preprocessing. Language detection is used in a variety of applications, including multilingual processing, natural language processing, web content classification, and machine translation preprocessing. This section describes common language detection algorithms and methods.

Translation models in machine learning are widely used in the field of natural language processing (NLP) and are designed to automate text translation from one language to another. These models use statistical methods and deep learning architectures to understand sentence structure and meaning and to perform translation.

GNMT (Google Neural Machine Translation) is a neural machine translation system developed by Google that uses neural networks to provide natural translation between multiple languages.

OpenNMT (Open-Source Neural Machine Translation) is an open source platform for neural machine translation that supports translation model building, training, evaluation and deployment.

Multilingual Embeddings is a technique for embedding text data in different languages into a vector space. This embedding represents the language information in the text data as a numerical vector and allows text in different languages to be placed in the same vector space, making multilingual embeddings a useful approach for natural language processing (NLP) tasks such as multilingual processing, translation, class classification, and sentiment analysis.

The Lesk algorithm is a method for determining the meaning of words in the field of natural language processing, and in particular, it is an approach used for Word Sense Disambiguation (WSD). Word sense disambiguation is the problem of selecting the correct meaning of a word when it has multiple different senses, depending on the context.

The Aho-Hopcroft-Ullman Algorithm (Aho-Hopcroft-Ullman Algorithm) is known as an efficient algorithm for string processing problems such as string search and pattern matching. This algorithm combines the basic data structures in string processing, Trie and Finite Automaton, to efficiently search for patterns in strings, and is mainly used for string matching, but also has applications in compilers, text search engines, and other It is mainly used for string matching, but has applications in a wide range of fields, including compilers and text search engines.

Subword-level tokenization is a natural language processing (NLP) approach that divides text data into subwords (parts of words) that are smaller than words. This is used to facilitate understanding of the meaning of sentences and to alleviate lexical constraints. There are several approaches to subword-level tokenization.

Byte Pair Encoding (BPE) is a text encoding method used to compress and tokenize text data. BPE is widely used in Natural Language Processing (NLP) tasks in particular and is known as an effective tokenization method.

SentencePiece is an open source library and toolkit for tokenizing text data. NLP) tasks.

InferSent is a method for learning semantic representations of sentences in natural language processing (NLP) tasks. The following is a summary of the main features of InferSent.

Skip-thought vectors, neural network models that generate semantic representations of sentences and are designed to learn context-aware sentence embedding (embedding), were proposed in 2015 by Kiros et al. proposed by Kiros et al. in 2015. The model aims to embed a sentence into a continuous vector space, taking into account the context before and after the sentence. The main concepts and structure of Skip-thought vectors are described below.

The Unigram Language Model Tokenizer (UnigramLM Tokenizer) is a tokenization algorithm used in natural language processing (NLP) tasks. Unlike conventional algorithms that tokenize words, the Unigram Language Model Tokenizer focuses on tokenizing partial words (subwords).

WordPiece is one of the tokenization algorithms used in natural language processing (NLP) tasks, especially in models such as BERT (Bidirectional Encoder Representations from Transformers), which is described in “Overview of BERT and Examples of Algorithms and Implementations. BERT (Bidirectional Encoder Representations from Transformers),” which is also described in “BERT Overview, Algorithms, and Example Implementations.

GloVe (Global Vectors for Word Representation) is a type of algorithm for learning word embeddings. GloVe is specifically designed to capture the meaning of words and has excellent ability to capture the semantic relevance of words. This section provides an overview, algorithm, and example implementation with respect to Glove.

FastText is an open source library for natural language processing (NLP) developed by Facebook that can be used to learn word embeddings and perform NLP tasks such as text classification. Here we describe the FastText algorithm and an example implementation.

ELMo (Embeddings from Language Models) is one of the methods of word embeddings (Word Embeddings) used in the field of natural language processing (NLP), which was proposed in 2018 and has been very successful in subsequent NLP tasks. In this section, we provide an overview of this ELMo, its algorithm and examples of its implementation.

The Seq2Seq (Sequence-to-Sequence) model is a deep learning model for taking sequence data as input and outputting sequence data, and in particular, it is an approach that can handle input and output sequences of different lengths. and dialogue systems, and is widely used in a variety of natural language processing tasks.

BERT (Bidirectional Encoder Representations from Transformers), BERT was presented by Google researchers in 2018 and is a deep neural network model pre-trained with a large text corpus and is one of the very successful pre-training models in the field of natural language processing (NLP). The main features and overview of BERT are described below.

GPT (Generative Pre-trained Transformer) is a pre-trained model for natural language processing developed by Open AI, based on the Transformer architecture and trained by unsupervised learning using large data sets. .

ULMFiT (Universal Language Model Fine-tuning) was proposed by Jeremy Howard and Sebastian Ruder in 2018 to effectively fine-tune pre-trained language models in natural language processing (NLP) tasks. It is an approach for fine tuning. The approach aims to achieve high performance on a variety of NLP tasks by combining transfer learning with fine tuning at each stage of training.

Transformer was proposed by Vaswani et al. in 2017 and will be one of the neural network architectures that have led to revolutionary advances in the field of machine learning and natural language processing (NLP). This section provides an overview of this Transformer model and its algorithm and implementation.

Transformer XL will be one of the extended versions of Transformer, a deep learning model that has proven successful in tasks such as natural language processing (NLP). Transformer XL is designed to more effectively model long-term dependencies in context and is able to process longer text sequences than previous Transformer models.

The Transformer-based Causal Language Model is a type of model that has been very successful in Natural Language Processing (NLP) tasks. The Transformer model (Transformer-based Causal Language Model) is a very successful model for natural language processing (NLP) tasks and is based on the Transformer architecture described in “Overview of the Transformer Model and Examples of Algorithms and Implementations. The following is an overview of the Transformer-based Causal Language Model.

Relative Positional Encoding (RPE) is a method for neural network models that use the transformer architecture to incorporate relative positional information of words and tokens into the model. Although transformers have been very successful in many tasks such as natural language processing and image recognition, they are not good at directly modeling the relative positional relationships between tokens. Therefore, RPE is used to provide relative location information to the model.

User-customized learning aids utilizing natural language processing (NLP) are being offered in a variety of areas, including the education field and online learning platforms. This section describes the various algorithms used and their specific implementations.

Automatic summarization technology is widely used in information retrieval, information processing, natural language processing, machine learning, and other fields to compress large text documents and sentences into a short, to-the-point form that is easy to understand. This section provides an overview of this automatic summarization technology, various algorithms and implementation examples.

  • Abstraction-based approaches in summarisation and AI-based communication support

‘Overview of automatic summarisation technology, algorithms and examples of implementation’ describes AI-based summarisation technology. Automatic summarisation technology is widely used in information retrieval, information processing, natural language processing, machine learning and other fields to compress large text documents and texts into short and to the point forms, and to facilitate the understanding of summarised information. It can be broadly divided into two types: extractive summarisation and abstractive summarisation. Here, we would like to consider a qualitative approach to abstractive summarisation based on the ‘one-word summarisation technique’.

Monitoring and supporting online discussions using Natural Language Processing (NLP) is used in online communities, forums, and social media platforms to improve the user experience, facilitate appropriate communication, and detect problems early. It is an approach that can be used to improve the user experience, facilitate appropriate communication, and detect problems early. This paper describes various algorithms and implementations of online discussion monitoring and support using natural language processing (NLP).

Search Algorithm (Search Algorithm) refers to a family of computational methods used to find a target within a problem space. These algorithms have a wide range of applications in a variety of domains, including information retrieval, combinatorial optimization, game play, route planning, and more. This section describes various algorithms, their applications, and specific implementations with respect to these search algorithms.

Multi-Objective Search Algorithm (Multi-Objective Optimization Algorithm) is an algorithm for optimizing multiple objective functions simultaneously. Multi-objective optimization aims to find a balanced solution (Pareto optimal solution set) among multiple optimal solutions rather than a single optimal solution, and such problems have been applied to many complex systems and decision-making problems in the real world. This section provides an overview of this multi-objective search algorithm and examples of algorithms and implementations.

Automatic machine learning (AutoML) refers to methods and tools for automating the process of designing, training, and optimizing machine learning models.AutoML is particularly useful for users with limited machine learning expertise or those seeking to develop efficient models, with the following main goals. This section provides an overview of this AutoML and examples of various implementations.

Similarity is a concept that describes the degree to which two or more objects or things have common features or properties and are considered similar to each other, and plays an important role in evaluating, classifying, and grouping objects in terms of comparison and relatedness. This section describes the concept of similarity and general calculation methods for various cases.

The issue of small amount of data to be trained (small data) is a problem that appears in various tasks as a factor that reduces the accuracy of machine learning. Machine learning with small data can be approached in various ways, taking into account data limitations and the risk of overlearning. This section discusses the details of each approach and implementation examples.

Transfer learning, a type of machine learning, is a technique for applying a model or knowledge learned in one task to a different task. Transfer learning is usually useful when a new task requires little data or high performance. This section provides an overview of transfer learning and various algorithms and implementation examples.

Self-Supervised Learning is a type of machine learning and can be considered as a type of supervised learning. While supervised learning uses labeled data to train models, self-supervised learning uses the data itself instead of labels to train models. This section describes various algorithms, applications, and implementations of self-supervised learning.

Support Vector Machine (SVM) is a supervised learning algorithm widely used in pattern recognition and machine learning. is to find the best separating hyperplane between the classes in the feature vector space, which is determined to have the maximum margin with the data points in the feature space. The margin is defined as the distance between the separation hyperplane and the nearest data point (support vector), and in SVM, the optimal separation hyperplane can be found by solving the margin maximization problem.

This section describes various practical examples of this support vector machine and their implementation in python.

LightGBM is a Gradient Boosting Machine (GBM) framework developed by Microsoft, which is a machine learning tool designed to build fast and accurate models for large data sets. Here we describe its implementation in pyhton, R, and Clojure.

This section provides an overview of python Keras and examples of its application to basic deep learning tasks (handwriting recognition using MINIST, Autoencoder, CNN, RNN, LSTM).

RNN (Recurrent Neural Network) is a type of neural network for modeling time-series and sequence data, and can retain past information and combine it with new information, such as speech recognition, natural language processing, video analysis, and time series prediction, It is a widely used approach for a variety of tasks.

LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN), which is a very effective deep learning model mainly for time series data and natural language processing (NLP) tasks. LSTM can retain historical information and model long-term dependencies, making it a suitable method for learning long-term information as well as short-term information.

Bidirectional LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) that is widely used for modeling sequence data such as time series data and natural language processing. Bidirectional LSTM is characterized by its ability to simultaneously learn sequence data from the past to the future direction and to capture the context of the sequence data more richly.

GRU (Gated Recurrent Unit) is a type of recurrent neural network (RNN) that is widely used in deep learning models, especially for processing time series data and sequence data. The GRU is designed to model long-term dependencies in the same way as the LSTM (Long Short-Term Memory) described in “Overview of LSTM and Examples of Algorithms and Implementations,” but it is characterized by its lower computational cost than the LSTM. It is characterized by lower computational cost than LSTM.

Bidirectional Recurrent Neural Network (BRNN) is a type of recurrent neural network (RNN) model that can simultaneously consider past and future information. BRNN is particularly useful for processing sequence data and is widely used in tasks such as natural language processing and It is widely used in tasks such as natural language processing and speech recognition.

Deep RNN (Deep Recurrent Neural Network) is a type of recurrent neural network (RNN), which is a stacked model of multiple RNN layers. deep RNN helps model complex relationships in sequence data and extract more sophisticated feature representations. Typically, a Deep RNN consists of RNN layers stacked in multiple layers in the temporal direction.

Stacked RNN (Stacked Recurrent Neural Network) is a type of recurrent neural network (RNN) architecture that uses multiple RNN layers stacked on top of each other, enabling modeling of more complex sequence data and effectively capturing long-term dependencies It is a method that allows for more complex sequence data modeling and the ability to effectively capture long-term dependencies.

Echo State Network (ESN) is a type of reservoir computing, a type of recurrent neural network (RNN) used for prediction, analysis, and pattern recognition of time series and sequence data. tasks and may perform well in a variety of tasks.

The Pointer-Generator network is a type of deep learning model used in natural language processing (NLP) tasks, and is particularly suited for tasks such as abstract sentence generation, summarization, and information extraction from documents. The network is characterized by its ability to copy portions of text from the original document verbatim when generating sentences.

  • Overview of BERT and Examples of Algorithms and Implementations

BERT (Bidirectional Encoder Representations from Transformers), BERT was presented by Google researchers in 2018 and is a deep neural network model pre-trained with a large text corpus and is one of the very successful pre-training models in the field of natural language processing (NLP). This section provides an overview of this BERT, its algorithms and examples of implementations.

Sparse modeling is a technique that takes advantage of sparsity in the representation of signals and data. Sparsity refers to the property that non-zero elements in data or signals are limited to a very small portion. The purpose of sparse modeling is to efficiently represent data by utilizing sparsity, and to perform tasks such as noise removal, feature selection, and compression.

This section provides an overview of sparse modeling algorithms such as Lasso, compression estimation, Ridge regularization, elastic nets, Fused Lasso, group regularization, message propagation algorithms, dictionary learning, etc., as well as a description of the various algorithms used in image processing, natural language processing, recommendation, signal processing The paper describes the implementation of the algorithms in various applications such as image processing, natural language processing, recommendation, machine learning, signal processing, brain science, and so on.

Overlapping group regularization (Overlapping Group Lasso) is a type of regularization method used in machine learning and statistical modeling for feature selection and estimation of model coefficients. In this case, the feature is allowed to belong to more than one group at the same time. This section provides an overview of this overlapping group regularization and various implementations.

A topic model is a statistical model for automatically extracting topics (themes or categories) from large amounts of text data. Examples of text data here include news articles, blog posts, tweets, and customer reviews. The topic model is a principle that analyzes the pattern of word occurrences in the data to estimate the existence of topics and the relevance of each word to the topic.

This section provides an overview of this topic model and various implementations (topic extraction from documents, social media analysis, recommendations, topic extraction from image information, and topic extraction from music information), mainly using the python library.

Submodular optimization is a type of combinatorial optimization that solves the problem of maximizing or minimizing a submodular function, a function with specific properties. This section describes various algorithms, their applications, and their implementations for submodular optimization.

A knowledge graph is a graph structure that represents information as a set of related nodes (vertices) and edges (connections), and is a data structure used to connect information on different subjects or domains and visualize their relationships. This paper outlines various methods for automatic generation of this knowledge graph and describes specific implementations in python.

A knowledge graph is a graph structure that represents information as a set of related nodes (vertices) and edges (connections), and is a data structure used to connect information on different subjects or domains and visualize their relationships. This section describes various applications of the knowledge graph and concrete examples of its implementation in python.

Causal Forest is a machine learning model for estimating causal effects from observed data, based on Random Forest and extended based on conditions necessary for causal inference. This section provides an overview of the Causal Forest, application examples, and implementations in R and Python.

There are open source tools such as text-generation-webui and AUTOMATIC1111 that allow codeless use of generation modules such as ChatGPT and Stable Diffusion. In this article, we describe how to use these modules for text generation and image generation.

Online Prediction (Online Prediction) is a technique that uses models to make predictions in real time under conditions where data arrive sequentially.” Online learning, as described in “Overview of Online Learning, Various Algorithms, Application Examples, and Specific Implementations,” is characterized by the fact that models are learned sequentially but the immediacy of model application is not clearly defined, whereas online prediction is characterized by the fact that predictions are made immediately upon the arrival of new data and the results are used. characteristic.

This section discusses various applications and specific implementation examples for this online forecasting.

Structural Learning is a branch of machine learning that refers to methods for learning structures and relationships in data, usually in the framework of unsupervised or semi-supervised learning. Structural learning aims to identify and model patterns, relationships, or structures present in the data to reveal the hidden structure behind the data. Structural learning targets different types of data structures, such as graph structures, tree structures, and network structures.

This section discusses various applications and concrete implementations of structural learning.

Multimodal search integrates multiple different information sources and data modalities (e.g., text, images, audio, etc.) to enable users to search for and retrieve information. This approach effectively combines information from multiple sources to provide more multifaceted and richer search results. This section provides an overview and implementation of this multimodal search, one using Elasticsearch and the other using machine learning techniques.

Elasticsearch is an open source distributed search engine for search, analysis, and data visualization that also integrates Machine Learning (ML) technology and can be leveraged for data-driven insights and predictions. It is a platform that can be used to achieve data-driven insights and predictions. This section describes various uses and specific implementations of machine learning technology in Elasticsearch.

Elasticsearch is an open source distributed search engine that provides many features to enable fast text search and data analysis. Various plug-ins are also available to extend the functionality of Elasticsearch. This section describes these plug-ins and their specific implementations.

As an example of binary classification (two-class classification), the task of dividing a movie review into positive and negative reviews based on the content of the movie review text is described.

Collected from the IMDb (Internet Movie Database) set (preprocessed and included in Kelas), 50,000 “positive” or “negative” reviews with 50% negative and positive, respectively. Use 25,000 training data consisting of 50% of the reviews).

The actual calculation using the Dense and sigmaid functions using Keras is described.

We will build a network that classifies the reuters news feed data (packaged as part of Keras) into mutually exclusive topics (classes). Due to the large number of classes, this problem is an example of multiclass clasification. Each data point can be classified into only one category (topic). If you think about it, this is specifically a single-label multiclasss classification problem. If each data point can be classified into multiple categories (topics), then we are dealing with a multilabel multiclass classification problem.

We have implemented and evaluated this problem using Kera, mainly using the Dense layer and the Relu function.

Deep Learning for Natural Language (Text) The two basic deep learning algorithms for processing sequences are recurrent neural networks (RNNs) and one-dimensional convolutional neural networks (CNNs).

The DNN model will be able to map the statistical structure of a sentence word at a level sufficient to solve many simple text processing tasks. Deep learning for Natural Language Processing (NLP) will be pattern recognition applied to words, sentences, and paragraphs in the same way that computer vision is pattern recognition applied to pixels.

Text vectorization can be done in multiple ways. (1) divide the text into words and convert each word into a vector, (2) divide the text into characters and convert each character into a vector, (3) extract the words or characters of an n-gram and convert the n-gram into a vector.

The vector can be in the form of one-hot encoding or word embedding. There are various learned word embedding databases available (Word2Vec, Global Vectors for Word Representation (GloVe), iMDb dataset).

One of the common features of all coupled networks and convolutional neural networks will be that they do not have more memory. Each input passed to these networks is processed separately, and no state is maintained across these inputs. When processing sequences or time series data in such networks, the entire sequence needs to be provided to the network at once so that it can be treated as a single data point. Such a network is called a feedforward network.

In contrast, when people read a text, they follow the words with their eyes and memorize what they see. This allows the meaning of the sentence to be expressed in a fluid manner. Biological intelligence, while processing information in a novel way, maintains an internal model of what it is processing. This model is built from past information and is updated whenever new information is given.

Recurrent Neural Networks (RNNs) work on the same principle, though in a much simpler way. In this case, the processing of a sequence is done by iteratively processing the elements of the sequence. The information related to what is detected in the process is then maintained as state. In effect, an RNN is a kind of neural network with an inner loop.

In this paper, I describe the implementation of Simple RNN, which is a basic RNN using Keras, and LSTM and GRU, which are advanced RNNs.

We describe an advanced method to improve the performance and generalization power of RNNs. In this paper, we take the problem of predicting temperature as an example, and access time-series data such as temperature, pressure, and humidity sent from sensors installed on the roof of a building. Using these data, we solve the difficult problem of predicting the temperature 24 hours after the last data point, and discuss the challenges we face when dealing with time series data.

Specifically, I describe an approach that uses recurrent dropout, recurrent layer stacking, and other techniques for optimization, and uses GRU (Gated Recurrent Unit) layers.

The last method we will discuss is the bidirectional RNN (bidirectional RNN). Bidirectional RNNs are one of the most common RNNs and can perform better than regular RNNs in certain tasks. This RNN is often used in Natural Language Processing (NLP). As for bidirectional RNNs, they can be considered as versatile deep learning, like Swiss Army knives for NLP.

The feature of RNN is that it depends on the order (time). Therefore, shuffling the time increments or reversing the order may completely change the representation that the RNN extracts from the sequence. Bidirectional RNNs are built to capture patterns that are overlooked in one direction by processing sequences in the forward and reverse directions, taking advantage of the order-sensitive nature of RNNs.

In this article, we will discuss text generation using LSTM as generative deep learning with python and Keras.

As far as data generation using deep learning is concerned, in 2015, Google’s DecDream algorithm was proposed to transform images into psychedelic dog eyes and pared-down artworks, and in 2016, a short film called “sunspring” based on a script (with complete dialogues) generated by the LSTM algorithm, as well as the generation of various types of music.

These are achieved by using a deep learning model to extract samples from the statistical latent space of the learned images, music, and stories.

In this article, I will first describe a method for generating sequence data using a recurrent neural network (RNN). In this article, I will use text data as an example, but the exact same method can be applied to all kinds of sequence data (e.g., music, handwriting data in paintings, etc.). It can also be used for speech synthesis and dialogue generation in chatbots such as Google’s smart replay.

Specific implementation and application of evolving deep learning techniques (OpenPose, SSD, AnoGAN, Efficient GAN, DCGAN, Self-Attention, GAN, BERT, Transformer, GAN, PSPNet, 3DCNN, ECO) using pyhtorch.

Tools

In this article, we will discuss various tools that are indispensable for the use of natural language processing. For processing tools for processing raw text, there are data cleansing tools such as openrefine and similarity evaluation tools. Various other OSS tools are also discussed.

From the Machine Learning Professional Series, “Natural Language Processing with Deep Learning. One feature of natural language that differs greatly from image recognition and speech recognition is that the processing targets are discrete “symbols. On the other hand, the contents of a neural network are continuous values expressed as vectors and matrices (optimization is also performed as a continuous function calculation), so “discrete” symbols such as words and sentences, which are the processing units of natural language processing, must be converted into “real-valued continuous It is necessary to convert “discrete” symbols such as words and sentences, which are the processing units of natural language processing, into “real-valued continuous domain data” such as vectors and matrices to be processed by deep learning/neural networks.

  • OpenNLP Open source natural language processing tools

Apache OpenNLP is an open source product maintained under the Apache Software Foundation and is a set of supervised learning tools for natural language processing. It provides basic functions such as “Language Detector”, “Sentence Detector”, “Tokenizer”, “Name Finder”, “Document Categorizer”, “Part-of-Speech Tagger”, “Lemmatizer Chunker”, “Parser”, and so on are almost all available as basic natural language processing tools. Japanese was not supported in older versions, but is now officially supported in the new version 1.9.0.

  • Juman and KNP. Japanese Morphological Analysis and Take-Take Analysis Tools

Juman and KNP are morphological, syntactic, case, and case-relative analysis systems developed at Kyoto University. KNP uses the analysis results of Juman as input and outputs the dependency relations, case relations, and correspondence relations between clauses and basic phrases. The case and correspondence relations are determined by a probabilistic model based on a large-scale case frame constructed automatically from the Web.

In general, when performing machine learning or statistical processing, if data is corrupted, inaccurate, or irrelevant to the purpose of the processing (existence of so-called garbage data), the results will be inaccurate and the purpose will not be achieved. In order to prevent these problems, a method called data cleansing is used to process data accurately and cleanly. Data cleansing is necessary in the pre-processing of machine learning and post-processing of natural language processed data.

One of the tools for data cleansing is “openrefine. This is an open source tool that was originally owned by google under the name googlerefine, but was renamed OpenRefine in 2012 after being moved to an open source project.

Natural Language from the Perspective of Philosophy, Linguistics, and Mathematics

In answering the question “Is there such a thing as meaning, what is meaning, and in what way?”, this blog begins with a question that goes one step further from the frame problem of “How can we build a robot or computer that understands meaning?

Thinking about such a robot can be expected to lead to the fact that if a robot, which is only a thing that obeys physical laws, can realize “understanding of meaning,” it will give us a hint to answer the question of how meaning can be written in a world of only things.

Structure, according to wiki, is “the way in which the parts that make up a thing are combined. It is a general term for the relationships of opposition, contradiction, and dependence among the elements that make up a whole. In the world of mathematics, it is the arrangement and relationship of the parts and elements of a complex thing. In the world of mathematics, the basic approach is to abstract the “parts that make up a thing” as much as possible and find the relationship between them.

The programming language I mentioned earlier is a kind of language called a formal language. A formal language is a set of strings (words) that can be generated from a set of base symbols (alphabet, etc.) and generation rules (grammar), and is theoretically based on mathematics called mathematical logic.

Mathematical logic is the foundation of mathematics, and is the study of defining and proving all kinds of things in mathematics using set theory and proof theory. One of the most famous examples is the proof of classical mathematical systems using the ZFC axiomatic system. Roughly speaking, what is done here is to define the basic parts and then combine them to construct a large world.

To “awareness” means to observe or perceive something carefully, and when a person notices a situation or thing, it means that he or she is aware of some information or phenomenon and has a feeling or understanding about it. Becoming aware is an important process of gaining new information and understanding by paying attention to changes and events in the external world. In this article, I will discuss this awareness and the application of artificial intelligence technology to it.

Paul Nurse, the author of this book, saw a butterfly fluttering into his garden one early spring day and felt that, although very different from himself, the butterfly was unmistakably alive, just like himself, able to move, feel, and react, and moving toward its “purpose. What does it mean to be alive? WHAT IS LIFE” is a tribute to the physicist Erwin Schrodinger’s “What is Life?

One test for determining that a machine is intelligent is the Turing Test, described in “Conversation and AI (Thinking from the Turing Test).” The basic idea of the Turing test is based on the hypothesis that if an AI is so intelligent that it is indistinguishable from a human in a conversation with a human, then the AI can be considered as intelligent as a human. In contrast, Searle argues, “Computational systems that follow algorithms cannot be intelligent, because computation is by definition a formal process. Computation is by definition a formal symbolic operation, and there is no understanding of meaning.

Computers do two things (and only two things). One is to do calculations, and the other is to remember the results of calculations. For most of human history, the speed of computation has been limited by the speed of the human brain, and the recording of computation results has been limited by the ability of the human hand to write. This corresponds to the fact that only very small problems could be solved computationally. With the use of modern computers, this problem solving capability has been greatly expanded.

Now let’s think about “computational thinking” to solve problems computationally.

All knowledge can be classified as either declarative or imperative. Declarative knowledge consists of statements of fact. For example, “The square root of x is y satisfying yxy=x” is declarative knowledge. For example, “The square root of x is y that satisfies yxy=x” is declarative knowledge. This is a statement of fact and says nothing about how to find the square root.

In contrast, imperative knowledge is “how-to” knowledge, which is a recipe for deriving information.

The world around us is made up of two opposing concepts, “concrete” and “abstract. The word “concrete” is most often used when explaining something in a way that is easy to understand. The word “concreteness” is most often used when explaining something in an easy-to-understand way, such as “In concrete terms…” or when you don’t understand what the other person is saying, such as “Could you be more specific? The word “abstract,” on the other hand, is used to describe a situation. On the other hand, the word “abstract” can be used in the context of “I don’t understand what that person is talking about because it’s so abstract.

In this way, the generally accepted impression of these concepts is that “concrete” means easy to understand and “abstract” means difficult to understand. As you can see, the word “abstract” is often associated with a negative impression, but in fact it is the fundamental basis of human thought, and it is the concept that makes us human and definitely different from animals.

Artificial Intelligence (AI) has great influence in the field of education and has the potential to transform teaching methods and learning processes. Below we discuss several important aspects of AI and education.

  • General Linguistics
  • Saussure’s Linguistics
  • Language behavior

Theory

  • Overview of Multi-Task Learning and Examples of Applications and Implementations

Multi-Task Learning is a machine learning method that simultaneously learns multiple related tasks. Usually, each task has a different data set and objective function, but Multi-Task Learning aims to incorporate these tasks into a model at the same time so that they can complement each other by utilizing their mutual relevance and shared information.

Here, we provide an overview of methods such as shared parameter models, model distillation, transfer learning, and multi-objective optimization for this multitasking, and discuss examples of applications in natural language processing, image recognition, speech recognition, and medical diagnosis, as well as a simple implementation in python.

Ontology matching can be a technique that aims to find correspondences between semantically related entities of different ontologies.

These correspondences can represent equivalences between ontology entities or other relations such as consequences, subsumption, disjointness, etc. Many different matching solutions have been proposed from various perspectives such as databases, information systems, artificial intelligence, etc.

Various methods have been proposed for ontology matching, starting from simple string matching, various machine learning approaches, data interlinking, ontology partitioning and pruning, context-based matching, matcher tuning, alignment debugging, and user participation in matching.

In the following pages of this blog, we discuss these various techniques for ontology matching.

Meaning can be thought of as some kind of information conveyed by text (or sound or images), as described previously in “Two approaches to the meaning of language (fusion of symbolic and distributed representations).

This information is certain to exist, but it is like dark matter that no one has ever seen, and we cannot see it directly.

When there is a symbol B, in order to know that its meaning is A, it cannot be confirmed by itself, but only by its relative relationship (connotation, paraphrase, identical, similar, etc.) with other symbols that have the same meaning (or are judged by people to have the same meaning). In other words, “meaning,” which is dark matter, cannot be observed by itself, and the meaning of a symbol can only be defined when there is an object to compare it to.

An introduction to early dialogue engines (artificial incompetence) that do not understand the meaning of Eliza’s lineage, and an analysis of the relationship between the meaning of words and dialogue through an introduction to Wittgenstein’s philosophy of logic, James Joyce’s meta-literature, and the Ten Ox Diagrams leading to Zen enlightenment and Zen questions and answers, as well as an introduction to the recently developed BERT-based BuddhaBot.

This book will be written by Professor Hidenori Kawamura, who conducts AI research at Hokkaido University. Professor Kawamura has also created a system of AI willow poems, which can be found on Twitter at the Kawamura Lab (Harmony Lab). For example, there is a haiku about world peace, “Laughing together in prayer for peace among mankind,” and another about the piano, “My wife hasn’t played the piano for a long time.

Professor Kawamura has created an artificial intelligence “Issa-kun” that uses deep learning to learn haiku written by a haiku poet (Kai Otsuka), and has asked Mr. Otsuka to select the best of the haiku and then discusses the content with the source of the teacher data (the haiku poet). The purpose of this book is to consider the following issues in artificial intelligence research through these tasks.

In many areas of language processing, language models have emerged as the key to processing language. In some textbooks, the term “language model” starts with a mathematical expression such as “A language is a subset L of a set Σ* consisting of a sequence of characters x ∈ Σ.

In a more concrete sense, a language model is something that is familiar to all speakers of a language, as if they are unconsciously using it continuously.

As an example, let’s consider the following document.

“Good day, Misan, how are you?  We are in good health.”

If you try to read the above, you can read it normally, but if you look at it closely, you will see that the characters have been changed in some places, which is obviously strange, but the content is understandable (readable). This is thanks to the language model in my head.

A topic model is one that estimates the distribution of words in a text from a latent structure (model) called a “topic. The topic model is based on the hypothesis that each genre in a text has its own probability distribution, and for example, the occurrence of words such as “tie-up” or “Nikkei 225” in the economic column is different from that of words such as “rice” or “knit” in the home column.

In this article, we will discuss these topic models, starting with the naive Bayes method, Latent Dirichlet Allocation (LDA), supervised LDA, ideal point topic models, and Boltzmann machines from a deep learning perspective.

In order to consider the various models that represent natural language, it is fundamental for us to master a foreign language by acquiring knowledge of the meaning and grammar of the words in that language. How does this translate into a machine’s understanding of natural language? How can we teach the meaning of a word to a machine? These are the questions we will consider.

A simple answer to this question is to give the computer a dictionary that is used by humans, such as a Japanese or English dictionary. The next step is to consider a dictionary designed to teach the meaning of words to a computer instead of a human dictionary. In English, WordNet, PropBank, and FrameNet are well-known examples of such dictionaries.

The next step is to automatically learn the meaning of words from a large amount of text drawn by humans. (You shall know a word by the company it keeps.)”. A similar idea is Harris’s distributional htpothesis, which states that words that occur in similar contexts have similar meanings.

It is a fact that most people agree that natural language texts convey some information, namely meaning. However, when asked “What is meaning?” or “How do you define meaning on a computer (or in mathematical terms)?” few people, even experts in natural language, can give a clear answer. The meaning of a natural language would be something like dark matter, which is almost certain to exist, but which no one has ever seen.

The existence of dark matter was inferred from the discrepancy between observed facts and current theories. In other words, we cannot directly see what dark matter is or what it looks like, but we can obtain the effects of dark matter as observed facts, and from these facts, we can calculate backwards to find the shape of the matter we hold.

The study of meaning in natural language processing is similar to this. Meaning, whose reality cannot be seen directly, is approached from observable events. In the following, I will describe the approach to meaning in natural language processing, and the advantages and disadvantages of two representative semantic representations: symbolic representation and distributed representation.

It is difficult to increase the number of words in continuous word recognition, and if we want to recognize more than 10,000 words, we need to utilize knowledge about the language that we have built separately.First, let us discuss the measure of language complexity. Suppose that a language is represented in the form of a grammatical network. A grammatical network consists of nodes and transitions that connect them, and each transition outputs a word. There are two measures of the complexity of this network.One is the static branching factor.

There are countless tables of information on the Web and in documents, which are very useful as knowledge information compiled manually. In general, tasks for extracting and structuring such information are called information extraction tasks, and among them, tasks specialized for tabular information have been attracting attention in recent years. Here, we discuss various approaches to extracting this tabular data.

Machine translation is a technology that automatically translates between natural languages such as Japanese and English. Such automatic translation has appeared in everything from Doraemon’s “Honyaku Konnyaku” to various science fiction movies, and has been a dream of many years. And with significant developments in recent years, it is finally coming to fruition.

However, human language is very complex, and two major problems must be overcome in order to realize an accurate translation system. The first is the problem of selecting the correct vocabulary to translate the input words. The second would be the reordering problem of reproducing the minutes of the input language in the correct word order of the output language.

In this issue, we will discuss the problem of maximizing a submodular function. However, we will usually discuss the case where f:2V→ℝ is a monotone submodular function. The constraint condition also imposes that the number of elements in the subset to be selected is at most k(>0). As mentioned above, the inferior modulo function often represents some gain in various fields, and therefore, the formulation to maximize it can be found in many situations. Thus, the problem of maximizing a submodular function becomes very important in applications.

The problem of maximizing a submodular function has been applied to various problems in machine learning and other fields. Here we discuss document summarization, the sensor placement problem, and active learning as examples.

As an algorithm, we also describe the greedy method, a simple approximation algorithm.

Previously, we have downloaded Motojiro Kajii’s works from Aozora Bunko and installed MeCab to perform morphological analysis. This time, we will finally try to generate sentences using this data. However, we will not go into the field of deep learning just yet. Here we will just try to generate sentences using a method called “Markov Chain”.

In this article, we will use Markov chains to automatically generate sentences. The words “sentence” and “chain” may give you an idea of what it is like to link words together. You can do a lot of fun things with Markov chains. Very interesting.

A topic model is a model for extracting what each document has and what topics are being discussed in a large set of document data. By using this technology, it is possible to find documents with similar topics and to organize documents based on topics, which can be used for search and other solutions.

This topic model has been applied not only to the analysis of document data, but also to image processing, recommendation systems, social network analysis, bioinformatics, music information processing, and many other fields. This is due to the fact that information such as images, purchase histories, and social networks have a hidden structure similar to that of documents.

For example, in a political article, the words “parliament”, “bill”, and “prime minister” tend to appear in the same sentence, while in a sports article, the words “stadium”, “player”, and “goal” appear. In the case of images, if there is a picture of a kitchen knife, there is a high probability that there is also a picture of a cutting board, and in the case of purchase history, people with similar interests buy similar products, and in social networks, people with similar interests tend to become friends.

In the topic model, such tendencies are expressed using a model of probability. By using a probability model, uncertainty can be handled, and essential information can be extracted from data that contains noise. In addition, since various types of information can be handled within the framework of probability, many extensions of topic models that integrate various types of information have been proposed, and their usefulness has been confirmed.

In the following pages of this blog, I discuss the basic theory and various applications of this topic model.

  • Word2Vec Dimensionality reduction and distributed representation

word2Vec is an open source deep learning technology proposed by Tomas Mikolow et al. In principle, word2Vec vectorizes words (with default parameters of 200 dimensions), positions words in a 200-dimensional space, and allows users to see similarities between words (e.g., evaluated in terms of cosine similarity) and perform clustering.

Negative sampling is a learning algorithm in natural language processing and machine learning, especially used in word embedding models such as Word2Vec as described in ‘Word2Vec’. It is a method for selective sampling of infrequent data (negative examples) for efficient learning of large datasets.

One feature of natural language that differs greatly from image recognition and speech recognition is that the processing targets are discrete “symbols. On the other hand, since the contents of a neural network are continuous values expressed as vectors and matrices (optimization calculations are also performed as continuous function calculations), “discrete” symbols such as words and sentences, which are the processing units of natural language processing, must be converted into “real-valued continuous It is necessary to convert “discrete” symbols such as words and sentences, which are the processing units of natural language processing, into “real-valued continuous domain data” such as vectors and matrices to be processed by deep learning/neural networks.

First, let’s talk about matrix decomposition. For example, consider the task of “recommending movies where customers are rows, movies are columns, and the element values are ratings”: with a huge number of users like Netflix, it would be difficult to store and process this matrix on the same scale. A solution to this problem is data compression. Let I ∈ ℕ be the number of customers and J ∈ ℕ be the number of movies. Let xij denote the evaluation of movie j ∈ [J] by customer i ∈ [I], and the collection of these evaluations can be expressed as an IxJ matrix X. The higher the value of xij, the higher the rating.

Although there are countless possible factorizations of a matrix, a singular value decomposition can be obtained that gives an expansion by a normal orthogonal basis of the vectors in each row of the original matrix and that, when the expansion is terminated in the middle, it approximates the original data in the least-squares-error sense. This is the case.

Low-rank approximation through such decompositions has been applied to a wide variety of data, not limited to customer x product data. For example, when X is the data of documents x words, the method of singular value decomposition of the matrix is called latent semantic analysis (LSA) or latent semantic indexing (LSI).

With the emergence of social media such as Twitter, which began to spread rapidly around 2010, the data available on the web is not just text, but now includes multiple modalities such as time (date) information, geographic information (GPS information), images, and voice information.

These technological advances have made it possible to collect large amounts of information (hereafter referred to as event information), such as who, when, where, and what, and have opened up vast possibilities for natural language processing, which previously had only a limited number of applications.

In this article, I will discuss topic models in probabilistic modeling of natural language. When we look at a string of words in a language, we immediately notice that there is a large bias in word frequencies. The table below shows the frequency of words in Kenji Miyazawa’s “Night on the Galactic Railroad,” and the figure next to it shows a graph plotting rank versus frequency as a good logarithm.

Ontology matching can be a technique that aims to find correspondences between semantically related entities of different ontologies.

These correspondences can represent equivalences between ontology entities or other relations such as consequences, subsumption, disjointness, etc. Many different matching solutions have been proposed from various perspectives such as databases, information systems, artificial intelligence, etc.

Various methods have been proposed for ontology matching, starting from simple string matching, various machine learning approaches, data interlinking, ontology partitioning and pruning, context-based matching, matcher tuning, alignment debugging, and user participation in matching.

In the following pages of this blog, we discuss these various techniques for ontology matching.

コメント

タイトルとURLをコピーしました