Deep Learning Technologies

Machine Learning Natural Language Processing Artificial Intelligence Digital Transformation Image Processing Online Learning Reinforcement Learning Probabilistic Generative Modeling DX Case Study Navigation of this blog

Overview of Deep Learning Technology

Deep learning technology uses multi-layered neural networks to perform advanced data processing and recognition. Neural networks here are network models that mimic the workings of neurons in the brain, also called Artificial Neural Networks (ANN), as described in “Pattern Recognition Algorithms. A neural network can learn training data while automatically adjusting parameters such as weights and biases.

The simplest neural network is called a perceptron, which was proposed by Frank Rosenblatt in 1957 and used as a linear classifier. It works by assigning weights to multiple input values, summing them, feeding them into an activation function with a bias, and outputting the result. The perceptron is useful for simple linear classification problems, but it cannot handle nonlinear problems.

In order to make the perceptron applicable to nonlinear problems, it is necessary to connect multiple layers of perceptrons, but the number of parameters in the model increases dramatically, making it impossible to perform the computation. In contrast, the autoencoder proposed by Jeffrey Hinton in 2006 (see “Autoencoder“) as described in “Where Do Features Come From“, can be used to calculate multilayer neural nets by using SGD as described in “Stochastic Optimization” and gradient descent as described in “Continuous Optimization for Machine Learning”. The use of mathematical optimization methods such as SGD and gradient descent has led to breakthroughs, enabling the computation of multilayer neural networks, which had been difficult in the past, and providing theoretical support for such methods as those described in “Statistical Learning Theory“.

These deep learning models started with the basic perceptron and evolved to include autoencoders, multilayer perceptrons, Boltzmann machines, convolutional neural networks (CNN), recurrent neural networks (RNN), long short-term memory (LSTM), and more recently, complex models such as Recent complex models include Transformer, GPT3, EfficientNet, AlphaFold, and DALL-E.

To implement and use these methods, it is possible to code simple models using mathematical libraries as described in “Implementing Neural Networks and Error Back Propagation Using Clojure” but for complex models or models in OSS, it is more common to use dedicated deep learning platforms such as tensirflow, Keras, or pytorch as described in “Comparison of tensorflow, Keras, and pytorch. However, when using complex models or models from OSS, it is more common to use dedicated deep learning platforms such as tensirflow, Keras, or pytorch, as described in “Comparison of tensorflow, Keras, and pytorch”. For Keras, which is the easiest to use among them, please refer to “Hello World of Neural Networks” and so on.

The advantage of deep learning technology over general machine learning technology is that while general machine learning technology requires two processes, feature extraction and pattern learning, deep learning technology can perform these processes simultaneously by combining multiple models, and it is simple to obtain an answer as long as data is input. This is because of its simplicity. On the other hand, deep learning technology requires a large amount of data for learning, with several hundred million to several tens of billions of data in the model, and it is difficult to explain why such results were obtained, so it is a black box type of learning.

Many approaches are also being considered for combining with other machine learning techniques, such as deep reinforcement learning combined with reinforcement learning as described in “Theory and Algorithms of Various Reinforcement Learning Techniques and Python Implementation” and Graph neural networks combined with graph data processing as described in “Graph Data Processing Algorithms and Their Application to Machine Learning/Artificial Intelligence Tasks. In addition, recent theoretical research has shown that Gaussian processes, which are stochastic models, and deep learning models can construct equivalent models, as described in “Equivalence between Neural Networks (Deep Learning) and Gaussian Processes“.

Deep learning is widely used in image recognition as described in “Image Processing Technology” speech recognition as described in “Speech Recognition Technology” natural language processing as described in “Natural Language Processing Technology” music generation as described in “Mathematics, Music and Computers” and automatic driving.

Here, we describe various theories and applications of deep learning, as well as concrete implementations in python and other languages.

From “Deep Learning DeepLearning” by the Japanese Society for Artificial Intelligence.

Deep learning is a technology that has received a great deal of attention in the field of machine learning (or artificial intelligence). The word “deep” here refers to the depth of the layers in the neural network (or equivalent) in which learning takes place, i.e., the layers are stacked on top of each other, and deep learning refers to learning using a mechanism with multiple layers. For a long time, building a deep neural network has been a kind of “dream” in the research field, because although the human brain has a multi-layered structure, simply imitating a structure similar to it has not been enough to achieve the essential learning ability.

This “deep” approach is extremely important for the “representation of the problem” when operating machine learning. This representation refers to what parts of the given data or the external world should be focused on and represented as features, and until now it has been up to human ability to determine this. If a machine can automatically extract features, in other words, if it can learn representations, this will be a breakthrough. Deep learning is a promising way to achieve representation learning, and it has great potential significance for the field of artificial intelligence as a whole.

In this blog, I will discuss the following aspects of deep learning.

Implementation

This section provides an overview of python Keras and examples of its application to basic deep learning tasks (handwriting recognition using MINIST, Autoencoder, CNN, RNN, LSTM).

PyTorch is a deep learning library developed by Facebook and provided as open source. It has features such as flexibility, dynamic computation graphs, and GPU acceleration, making it possible to implement a variety of machine learning tasks. Below we describe various examples of implementations using PyTorch.

  • Overview of mini-batch learning and examples of algorithms and implementations

Mini-batch learning is one of the most widely used and efficient learning methods in machine learning, which is computationally more efficient and applicable to large data sets compared to the usual Gradient Descent method. This section provides an overview of mini-batch learning. Mini-batch learning is a learning method in which multiple samples (called mini-batches) are processed in batches, rather than the entire dataset at once, and the gradient of the loss function is calculated for each mini-batch and the parameters are updated using the gradient.

Adversarial attack is one of the most widely used attacks against machine learning models, especially for input data such as images, text, and voice. Adversarial attacks aim to cause misrecognition of machine learning models by applying slight perturbations (noise or manipulations). Such attacks can reveal security vulnerabilities and help assess model robustness

The Seq2Seq (Sequence-to-Sequence) model is a deep learning model for taking sequence data as input and outputting sequence data, and in particular, it is an approach that can handle input and output sequences of different lengths. and dialogue systems, and is widely used in a variety of natural language processing tasks.

RNN (Recurrent Neural Network) is a type of neural network for modeling time-series and sequence data, and can retain past information and combine it with new information, such as speech recognition, natural language processing, video analysis, and time series prediction, It is a widely used approach for a variety of tasks.

LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN), which is a very effective deep learning model mainly for time series data and natural language processing (NLP) tasks. LSTM can retain historical information and model long-term dependencies, making it a suitable method for learning long-term information as well as short-term information.

  • Overview of Bidirectional LSTM and Examples of Algorithms and Implementations

Bidirectional LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) that is widely used for modeling sequence data such as time series data and natural language processing. Bidirectional LSTM is characterized by its ability to simultaneously learn sequence data from the past to the future direction and to capture the context of the sequence data more richly.

  • About GRU (Gated Recurrent Unit)

GRU (Gated Recurrent Unit) is a type of recurrent neural network (RNN) that is widely used in deep learning models, especially for processing time series data and sequence data. The GRU is designed to model long-term dependencies in the same way as the LSTM (Long Short-Term Memory) described in “Overview of LSTM and Examples of Algorithms and Implementations,” but it is characterized by its lower computational cost than the LSTM. It is characterized by lower computational cost than LSTM.

  • About Bidirectional RNN (BRNN)

Bidirectional Recurrent Neural Network (BRNN) is a type of recurrent neural network (RNN) model that can simultaneously consider past and future information. BRNN is particularly useful for processing sequence data and is widely used in tasks such as natural language processing and It is widely used in tasks such as natural language processing and speech recognition.

  • About Deep RNN

Deep RNN (Deep Recurrent Neural Network) is a type of recurrent neural network (RNN), which is a stacked model of multiple RNN layers. deep RNN helps model complex relationships in sequence data and extract more sophisticated feature representations. Typically, a Deep RNN consists of RNN layers stacked in multiple layers in the temporal direction.

  • About Stacked RNN

Stacked RNN (Stacked Recurrent Neural Network) is a type of recurrent neural network (RNN) architecture that uses multiple RNN layers stacked on top of each other, enabling modeling of more complex sequence data and effectively capturing long-term dependencies It is a method that allows for more complex sequence data modeling and the ability to effectively capture long-term dependencies.

  • About Echo State Network (ESN)

Echo State Network (ESN) is a type of reservoir computing, a type of recurrent neural network (RNN) used for prediction, analysis, and pattern recognition of time series and sequence data. tasks and may perform well in a variety of tasks.

  • Overview of Pointer-Generator Networks, Algorithms, and Examples of Implementations

The Pointer-Generator network is a type of deep learning model used in natural language processing (NLP) tasks, and is particularly suited for tasks such as abstract sentence generation, summarization, and information extraction from documents. The network is characterized by its ability to copy portions of text from the original document verbatim when generating sentences.

  • Overview of Variational Autoencoder Bayes (Variational Autoencoder, VAE) and Examples of Algorithms and Implementations

Variational Autoencoder (VAE) is a type of generative model and a neural network architecture for learning latent representations of data. The VAE learns latent representations by modeling the probability distribution of the data and sampling from it. An overview of VAE is given below.

  • Block K-FAC Overview, Algorithm, and Implementation Examples

Block K-FAC (Block Kronecker-factored Approximate Curvature) is a kind of curve chart (curvature information) approximation method used in deep learning model optimization.

CNN (Convolutional Neural Network) is a deep learning model mainly used for computer vision tasks such as image recognition, pattern recognition, and image generation. This section provides an overview of CNNs and implementation examples.

DenseNet (Densely Connected Convolutional Network) was proposed in 2017 by Gao Huang, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten in “Overview of CNN DenseNet improves the efficiency of deep network training by introducing “dense” connections during convolutional neural network training, and reduces the gradient loss problem. and reducing the gradient loss problem.

ResNet is a deep convolutional neural network (CNN) architecture proposed by Kaiming He et al. in 2015, as described in “CNN Overview, Algorithms and Implementation Examples”. ResNet introduces innovative ideas and approaches that have achieved phenomenal performance in computer vision tasks.

GoogLeNet is a convolutional neural network (CNN) architecture described in Google’s 2014 “CNN Overview and Algorithms and Examples of Implementations”. This model achieved state-of-the-art performance in computer vision tasks such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), and GoogLeNet is known for its unique architecture and modular structure.

VGGNet (Visual Geometry Group Network) is a convolutional neural network (CNN) model developed in 2014 and described in “CNN Overview, Algorithms, and Examples of Implementations” that has achieved high performance in computer vision tasks. VGGNet was proposed by researchers in the Visual Geometry Group at the University of Oxford.

  • Overview of Transfer Learning, Algorithms, and Examples of Implementations

Transfer learning, a type of machine learning, is a technique for applying a model or knowledge learned in one task to a different task. Transfer learning is usually useful when a new task requires little data or high performance. This section provides an overview of transfer learning and various algorithms and implementation examples.

Multilingual NLP in machine learning is the field of developing natural language processing (NLP) models and applications for multiple languages, a key challenge in the field of machine learning and natural language processing, and a component of serving different cultural and linguistic communities. The NLP field is an important issue in the field of machine learning and natural language processing and is an element for serving different cultural and linguistic communities.

GloVe (Global Vectors for Word Representation) is a type of algorithm for learning word embeddings. GloVe is specifically designed to capture the meaning of words and has excellent ability to capture the semantic relevance of words. This section provides an overview, algorithm, and example implementation with respect to Glove.

FastText is an open source library for natural language processing (NLP) developed by Facebook that can be used to learn word embeddings and perform NLP tasks such as text classification. Here we describe the FastText algorithm and an example implementation.

  • Skipgram Overview, Algorithm and Example Implementation

Skip-gram is a method for learning distributed representations of words (word embedding), which is widely used in the field of natural language processing (NLP) to quantify similarity and relevance of meanings by capturing word meanings as vector representations. It is also used in GNNs such as DeepWalk, which is described in “Overview of DeepWalk, Algorithms, and Examples of Implementations”.

ELMo (Embeddings from Language Models) is one of the methods of word embeddings (Word Embeddings) used in the field of natural language processing (NLP), which was proposed in 2018 and has been very successful in subsequent NLP tasks. In this section, we provide an overview of this ELMo, its algorithm and examples of its implementation.

BERT (Bidirectional Encoder Representations from Transformers), BERT was presented by Google researchers in 2018 and is a deep neural network model pre-trained with a large text corpus and is one of the very successful pre-training models in the field of natural language processing (NLP). This section provides an overview of this BERT, its algorithms and examples of implementations.

  • Overview of GPT and Examples of Algorithms and Implementations

GPT (Generative Pre-trained Transformer) is a pre-trained model for natural language processing developed by Open AI, based on the Transformer architecture and trained by unsupervised learning using large data sets. .

ULMFiT (Universal Language Model Fine-tuning) was proposed by Jeremy Howard and Sebastian Ruder in 2018 to effectively fine-tune pre-trained language models in natural language processing (NLP) tasks. It is an approach for fine tuning. The approach aims to achieve high performance on a variety of NLP tasks by combining transfer learning with fine tuning at each stage of training.

Transformer was proposed by Vaswani et al. in 2017 and will be one of the neural network architectures that have led to revolutionary advances in the field of machine learning and natural language processing (NLP). This section provides an overview of this Transformer model and its algorithm and implementation.

  • About Transformer XL

Transformer XL will be one of the extended versions of Transformer, a deep learning model that has proven successful in tasks such as natural language processing (NLP). Transformer XL is designed to more effectively model long-term dependencies in context and is able to process longer text sequences than previous Transformer models.

  • Overview of the Transformer-based Causal Language Model with Algorithms and Example Implementations

The Transformer-based Causal Language Model is a type of model that has been very successful in Natural Language Processing (NLP) tasks. The Transformer model (Transformer-based Causal Language Model) is a very successful model for natural language processing (NLP) tasks and is based on the Transformer architecture described in “Overview of the Transformer Model and Examples of Algorithms and Implementations. The following is an overview of the Transformer-based Causal Language Model.

  • About Relative Positional Encoding

Relative Positional Encoding (RPE) is a method for neural network models that use the transformer architecture to incorporate relative positional information of words and tokens into the model. Although transformers have been very successful in many tasks such as natural language processing and image recognition, they are not good at directly modeling the relative positional relationships between tokens. Therefore, RPE is used to provide relative location information to the model.

  • Overview of GANs and their various applications and implementations

GAN (Generative Adversarial Network) is a machine learning architecture that is called a generative adversarial network. This model was proposed by Ian Goodfellow in 2014 and has since been used with great success in many applications. This section provides an overview of this GAN, its algorithms and various application implementations.

Parallel distributed processing in machine learning is a process that distributes data and calculations across multiple processing units (CPUs, GPUs, computer clusters, etc.) and simultaneously processes them to reduce processing time and improve scalability, and plays an important role when processing large data sets and complex models. It plays an important role in processing large data sets and complex models. This section describes concrete implementation examples of parallel distributed processing in machine learning in on-premise/cloud environments.

Object detection technology involves the automatic detection of specific objects or objects in an image or video and their location. Object detection is an important application of computer vision and image processing and is applied to many real-world problems. This section describes various algorithms and implementation examples for this object detection technique.

R-CNN (Region-based Convolutional Neural Networks) is an approach to utilize deep learning in object detection tasks. neural networks (CNNs) to predict object classes and bounding boxes, and R-CNNs have shown very good performance in object detection tasks. This paper describes an overview of this R-CNN, its algorithm and implementation examples.

Faster Region-based Convolutional Neural Networks (Faster R-CNN) is one of a series of deep learning models that provide fast and accurate results in object detection tasks. Convolutional Neural Networks (R-CNNs)), and represents a major advance in the field of object detection, solving the problems of previous architectures called R-CNNs. This section provides an overview of this Faster R-CNN, its algorithms, and examples of implementations.

YOLO (You Only Look Once) is a deep learning-based algorithm for real-time object detection tasks. YOLO will be one of the most popular models in the fields of computer vision and artificial intelligence.

SSD (Single Shot MultiBox Detector) is one of the deep learning based algorithms for object detection tasks.

Mask R-CNN (Mask Region-based Convolutional Neural Network) is a deep learning-based architecture for object detection and object segmentation (instance segmentation), in which the location of each object is not only enclosed in a bounding box It has the ability to segment objects at the pixel level within an object as well as surround it, making it a powerful model for combining object detection and segmentation.

EfficientDet will be one of the computer vision models with high performance in the object detection task; EfficientDet is designed to balance the efficiency and accuracy of the model, and will provide superior performance with less computational resources.

EfficientNet is one of the lightweight and efficient deep learning models and convolutional neural network (CNN) architectures.EfficientNet was proposed by Tan and Le in 2019 and was designed to optimize model size and It will be designed to achieve high accuracy while optimizing computational resources.

LeNet-5 (LeNet-5) is one of the most important historical neural network models in the field of deep learning and was proposed in 1998 by Yann Lecun, a pioneer in convolutional neural networks (CNN), as described in “CNN Overview and Algorithm and Implementation Examples. LeNet-5 was very successful in the handwritten digit recognition task and has contributed to the subsequent development of CNNs.

MobileNet is one of the most widely used deep learning models in the field of computer vision, and is a lightweight and efficient convolutional neural network (CNN) optimized for mobile devices developed by Google, as described in “CNN Overview, Algorithms and Implementation Examples”. MobileNet can be used for tasks such as image classification, object detection, and semantic segmentation, and offers superior performance, especially on resource-constrained devices and applications. It offers superior performance.

SqueezeNet is a lightweight, compact deep learning model and architecture for convolutional neural networks (CNNs), as described in “CNN Overview, Algorithms, and Implementation Examples. neural networks with small file sizes and low computational complexity, and is primarily suited for resource-constrained environments and devices.

A segmentation network is a type of neural network that can be used to identify different objects or regions in an image on a pixel-by-pixel basis and divide them into segments (regions). It is mainly used in computer vision tasks and plays an important role in many applications because it can associate each pixel in an image to a different class or category. This section provides an overview of this segmentation network and its implementation in various algorithms.

Rainbow (“Rainbow: Combining Improvements in Deep Reinforcement Learning”) is a seminal work in the field of deep reinforcement learning that combines several reinforcement learning improvement techniques into an algorithm that improves the performance of DQN (Deep Q-Network) Rainbow outperformed other algorithms on many reinforcement learning tasks and has become one of the benchmark algorithms in subsequent research.

Prioritized Experience Replay (PER) is a technique for improving Deep Q-Networks (DQN), a type of reinforcement learning. ), and while it is common practice to randomly sample from the experience replay buffer, PER improves on this and becomes a way to preferentially learn important experiences.

Dueling DQN (Dueling Deep Q-Network) is an algorithm based on Q-learning in reinforcement learning and is a kind of value-based reinforcement learning algorithm. Dueling DQN is an architecture for efficiently estimating Q-values by learning state value functions and advantage functions separately, and this architecture was proposed as an advanced version of Deep Q-Network (DQN).

Deep Q-Network (DQN) is a combination of deep learning and Q-Learning, and is a reinforcement learning algorithm for problems with high-dimensional state spaces by approximating the Q-function with a neural network. Learning and uses techniques such as replay buffers and fixed target networks to improve learning stability.

Vanilla Q-Learning is a type of reinforcement learning, which is one of the algorithms used by agents to learn optimal behavior while interacting with their environment. Q-Learning is based on a mathematical model called the Markov Decision Process (MDP), in which the agent learns the value (Q-value) associated with a combination of State and Action, and selects the optimal action based on that Q-value.

Soft Actor-Critic (SAC) is a type of Reinforcement Learning algorithm that is primarily known as an effective approach for problems with continuous action spaces. Reinforcement Learning) and has several advantages over other algorithms such as Q-learning and Policy Gradients.

Proximal Policy Optimization (PPO) is a type of reinforcement learning algorithm and one of the policy optimization methods, which is based on the policy gradient method described in “Overview of Policy Gradient Methods, Algorithms, and Examples of Implementations” and designed for improved stability and high performance.

A3C (Asynchronous Advantage Actor-Critic) is a type of deep reinforcement learning algorithm that uses asynchronous learning to train reinforcement learning agents. A3C is particularly suited to tasks in continuous action spaces and has attracted attention for its ability to make effective use of large-scale computational resources.

Deep Deterministic Policy Gradient (DDPG) is an algorithm that extends the Policy Gradient method (Policy Gradient) in reinforcement learning tasks with continuous state space and continuous action space. deep neural networks to solve reinforcement learning problems in continuous action space.

  • Overview of REINFORCE (Monte Carlo Policy Gradient) and Examples of Algorithms and Implementations

REINFORCE (or Monte Carlo Policy Gradient) is a type of reinforcement learning and a policy gradient method. REINFORCE is a method for directly learning policies and finding optimal action selection strategies.

  • Actor-Critic Overview, Algorithm, and Implementation Examples

Actor-Critic is an approach to reinforcement learning that combines policy and value functions (value estimators).

Variational methods (Variational Methods) are used to find the optimal solution in a function or probability distribution, and are one of the optimization methods widely used in machine learning and statistics, especially in stochastic generative models and variational autoencoders (VAE). In particular, it plays an important role in machine learning models such as stochastic generative models and variational autoencoders (VAE).

Variational Bayesian Inference is one of the probabilistic modeling methods in Bayesian statistics, and is used when the posterior distribution is difficult to obtain analytically or computationally expensive.

This section provides an overview of the various algorithms for this variational Bayesian learning and their python implementations in topic models, Bayesian regression, mixture models, and Bayesian neural networks.

  • Overview of Bayesian Neural Networks and Examples of Algorithms and Implementations

Bayesian neural networks (BNNs) are architectures that integrate probabilistic elements into neural networks, whereas regular neural networks are deterministic, BNNs build probabilistic models based on Bayesian statistics. This allows the model to account for uncertainty and has been applied in a variety of machine learning tasks.

A graph neural network (GNN) is a type of neural network for data with a graph structure. ) to express relationships between elements. Examples of graph-structured data include social networks, road networks, chemical molecular structures, and knowledge graphs.

This section provides an overview of GNNs and various examples and Python implementations.

Graph Convolutional Neural Networks (GCN) is a type of neural network that enables convolutional operations on data with a graph structure. While regular convolutional neural networks (CNNs) are effective for lattice-like data such as image data, GCNs were developed as a deep learning method for non-lattice-like data with very complex structures, such as graph data and network data.

    ChebNet (Chebyshev network) is a type of Graph Neural Network (GNN), which is one of the main methods for performing convolution operations on graph-structured data. ChebNet is an approximate implementation of convolution operations on graphs using Chebyshev polynomials, which are used in signal processing.

    Graph Attention Network (GAT) is a deep learning model that uses an attention mechanism to learn the representation of nodes in a graph structure. GAT is a model that uses a set of mechanisms to learn the representation of a node.

    • Graph Isomorphism Network (GIN) Overview, Algorithm and Example Implementation

    Graph Isomorphism Network (GIN) is a neural network model for learning isomorphism of graph structures. The graph isomorphism problem is the problem of determining whether two graphs have the same structure, and is an important approach in many fields.

    GraphSAGE (Graph Sample and Aggregated Embeddings) is one of the graph embedding algorithms for learning node embeddings (vector representation) from graph data. By sampling and aggregating the local neighborhood information of nodes, it effectively learns the embedding of each node. This approach makes it possible to obtain high-performance embeddings for large graphs.

    Bayesian deep learning refers to an attempt to incorporate the principles of Bayesian statistics into deep learning. In ordinary deep learning, model parameters are treated as non-probabilistic values, and optimization algorithms are used to find optimal parameters. This is called “Bayesian deep learning”. For more information on the application of uncertainty to machine learning, please refer to “Uncertainty and Machine Learning Techniques” and “Overview of Statistical Learning Theory (Non-Equationary Explanation).

    Dynamic Graph Neural Networks (D-GNN) are a type of Graph Neural Networks (GNN) designed to deal with dynamic graph data, where nodes and edges change with time. It is designed to handle data in which nodes and edges change over time. (For more information on GNNs, see “Graph Neural Networks: Overview, Applications, and Example Python Implementations. The approach has been used in a variety of domains including time series data, social network data, traffic network data, and biological network data.

    Labeling of image information can be achieved by various machine learning approaches, as described below. This time, we would like to consider the fusion of these machine learning approaches and the constraint satisfaction approach, which is a rule-based approach. These approaches can be extended to labeling text data using natural language processing, etc.

    Meta-Learners are one of the key concepts in the domain of machine learning and can be understood as “algorithms that learn learning algorithms. In other words, Meta-Learners can be described as an approach to automatically acquire learning algorithms that can be adapted to different tasks and domains. This section describes this Meta-Learners concept, various algorithms and concrete implementations.

    Federated Learning is a new approach to training machine learning models that addresses the challenges of privacy protection and efficient model training in distributed data environments. Unlike traditional centralized model training, Federated Learning trains models on the device or client itself and performs distributed learning without sending models to a central server. This section provides an overview of Federated Learning, its various algorithms, and examples of implementations.

    In the area of machine learning, environments with rich libraries such as Python and R are used and have become almost de facto. However, it was not at a level where the user could freely use the libraries of the other party, and there were hurdles in making full use of the latest algorithms.

    In contrast, in recent years (since 2018), frameworks that can interoperate with the Python environment, such as libPython-clj, have appeared, and mathematical frameworks that utilize Java and C libraries, such as fastmath, deep learning framework Cortex, Deep The development of frameworks such as fastmath, a mathematical framework that leverages Java and C libraries, and deep learning frameworks such as Cortex and DeepDiamond have led to active discussions on approaches to machine learning, such as scicloj.ml, a well-known machine learning community on Clojure.

    With the introduction of small-scale deep learning in algorithms for reinforcement learning, online learning, etc. in mind, I will describe the implementation of neural nets in Clojure (including an understanding of the principles of neural net algorithms). The base implementation is based on the article “Building Neural Networks from Zero and Observing Hidden Layers in Clojure” on qita with some additions.

    Hierarchical Temporal Memory (HTM) is a machine learning technology that aims to capture the structural and algorithmic properties of the neocortex. HTM is a neural network-like pattern recognition algorithm based on the theory of “auto-associative memory” (thinking brain, thinking computer) advocated by Jeff Hawkins, the inventor of the handheld computers (palm, Treo) that are the prototypes of today’s smart phones.

    A comparison is made between tensorflow, Kreas and pyhorch, which are open source frameworks for deep learning.

    Artificial intelligence is defined as “efforts to automate intellectual tasks that are normally performed by humans. This concept encompasses a number of approaches that have nothing to do with learning. Early chess programs, for example, simply incorporated rules hard-coded by programmers, and cannot be called machine learning.

    For quite some time, many experts believed that in order to achieve a level of AI comparable to that of humans, a large enough number of rules to manipulate knowledge would have to be explicitly defined and manually incorporated by programmers. However, it was impossible to track down explicit rules for solving more complex and fuzzy problems like image classification, speech recognition, and language translation, and machine learning was born as a new approach to replace them.

    A machine learning algorithm would be one where you give machine learning a sample of what you expect and it extracts rules to perform a data processing task. In machine learning and deep learning, the main task is “to transform data in a meaningful way. In other words, machine learning learns useful representations from given input data. These representations are then used to approach the expected output.

    As a hello world of deep learning technology, concrete implementation and evaluation of handwriting recognition technology for MNIST data by pyhton/Kera.

    In this article, we will discuss the manipulation of tensors, a mathematical element in neural networks, using numpy. In general, all current machine learning systems use tensors as the basic data structure. A tensor is essentially a container for data. In most cases, tensors are numerical data. Therefore, a tensor is a container for numerical data.

    A tensor is defined by the following three main attributes. (1) Number of axes (factorial): for example, a 3D tensor has 3 axes and a matrix has 2 axes; in Python libraries such as Numpy, the number of axes is called the ndim attribute of the tensor; (2) Shape: an integer tuple representing the number of dimensions along each axis of the tensor; for example, in the example above, the shape of the matrix is (3 In the example above, for example, the shape of the matrix is (3,5), and the shape of the 3D tensor is (3,3,5). The shape of a vector is represented by a single element, such as (5,), while the shape of a scalar is empty ([]), (3) Data type: The type of data contained in the tensor, usually represented by dtype in Python libraries. For example, a tensor can be of type float32, uint8, or float64. It is important to note that most libraries, including Numpy, do not have tensors of type string. Note that most libraries, including Numpy, do not have tensors of type string, since strings are variable length and such an implementation is not possible.

    The stochastic gradient descent and error back propagation methods using tensors are described.

    The specific Keras workflow (1) defining training data (input and objective tensors), (2) defining a network (model) consisting of multiple layers that map input values to objective values, (3) setting up the learning process by selecting a loss function, optimizer, and indicators to monitor, and (4) iteratively training the training data by calling the model’s fit method is described, and specific problems are solved.

    As an example of binary classification (two-class classification), the task of dividing a movie review into positive and negative reviews based on the content of the movie review text is described.

    Collected from the IMDb (Internet Movie Database) set (preprocessed and included in Kelas), 50,000 “positive” or “negative” reviews with 50% negative and positive, respectively. Use 25,000 training data consisting of 50% of the reviews).

    The actual calculation using the Dense and sigmaid functions using Keras is described.

    We will build a network that classifies the reuters news feed data (packaged as part of Keras) into mutually exclusive topics (classes). Due to the large number of classes, this problem is an example of multiclass clasification. Each data point can be classified into only one category (topic). If you think about it, this is specifically a single-label multiclasss classification problem. If each data point can be classified into multiple categories (topics), then we are dealing with a multilabel multiclass classification problem.

    We have implemented and evaluated this problem using Kera, mainly using the Dense layer and the Relu function.

    We will discuss the application of regression to problems that predict continuous values rather than discrete labels (such as predicting tomorrow’s temperature based on weather data, or the time it will take to complete a project based on a software project specification).

    The task is to predict the price of housing in the suburbs of Boston in the mid-1970s. For this prediction, we will use data points about the Boston suburbs at that time, such as crime rates and local property tax rates. The dataset contains a relatively small number of data points (506) and is divided into 404 training samples and 102 test samples. We also use different scales for the input data features (such as crime rate). For example, some show the rate as a value from 0 to 1, some take a value from 1 to 12, and some take a value from 0 to 100.

    The approach is characterized by data normalization, using mean absolute error (MAE) and mean square error (MSE) as loss functions, and k-fold cross-validation to compensate for the small number of data.

    We will discuss unsupervised learning. This category of machine learning finds important transformations of the input data without borrowing the value of the objective. Unsupervised learning may be aimed at data visualization, data compression, data denoising, or it may be aimed at gaining a better understanding of the correlations represented by the data. Unsupervised learning is an integral part of data analysis, and is often needed to gain a better understanding of a data set before solving supervised learning problems.

    Two categories of unsupervised learning are well known: dimensionallity reduction and clustering. There are also self-learning methods such as autoencoder.

    The paper also discusses over-learning and under-learning, and computational efficiency/optimization through regularization and dropout.

    In this article, we will discuss convolutional neural networks (CNNs), also known as cnvnet, a deep learning model that has been used almost without exception in computer vision applications. In this paper, we describe how to apply CNNs to the image classification problem of MNIST as handwritten character recognition.

    We apply two more basic methods for applying deep learning to small data sets. One is feature extraction with pre-trained models, which improves the correctness rate from 90% to 96%. The second is fine tuning of the learned model, which will result in a final correctness rate of 97%. These three strategies (training a small model from scratch, feature extraction using the trained model, and fine tuning of the trained model) are some of the props that can be used when using a small dataset for attrition classification.

    The dataset we will use is the Dogs vs Cats dataset, which is not packaged in Keras. This dataset will be the one provided by Kaggle’s Computer Vision Kompetition in late 2013. The original dataset can be downloaded from the Kaggle web page.

    In this article, we will discuss how to improve CNNs by using learned models. VGG16 is a simple CNN architecture widely used in ImageNet, which is a learned model consisting of classes representing animals and everyday objects. VGG16 is an older model, not quite up to the state of the art, and a bit heavier than many of the latest models.

    There are two ways to use a trained network: feature extraction and fine-tuning.

    Since 2013, a wide range of methods have been developed to visualize and interpret these representations. In this article, we will focus on three of the most useful and easy-to-use methods.

    (1) Visualization of the intermediate outputs of a CNN (activation of intermediate layers): This provides an understanding of how the input is transformed by the layers of the CNN and provides insight into the meaning of the individual filters of the CNN. (2) Visualization of CNN’s filters: To understand what kind of visual patterns and visual concepts are accepted by each filter of CNN. (3) Visualization of a heatmap of class activation in an image: This will allow us to understand which parts of an image belong to a particular class, and thus to localize objects in the image.

    Deep Learning for Natural Language (Text) The two basic deep learning algorithms for processing sequences are recurrent neural networks (RNNs) and one-dimensional convolutional neural networks (CNNs).

    The DNN model will be able to map the statistical structure of a sentence word at a level sufficient to solve many simple text processing tasks. Deep learning for Natural Language Processing (NLP) will be pattern recognition applied to words, sentences, and paragraphs in the same way that computer vision is pattern recognition applied to pixels.

    Text vectorization can be done in multiple ways. (1) divide the text into words and convert each word into a vector, (2) divide the text into characters and convert each character into a vector, (3) extract the words or characters of an n-gram and convert the n-gram into a vector.

    The vector can be in the form of one-hot encoding or word embedding. There are various learned word embedding databases available (Word2Vec, Global Vectors for Word Representation (GloVe), iMDb dataset).

    One of the common features of all coupled networks and convolutional neural networks will be that they do not have more memory. Each input passed to these networks is processed separately, and no state is maintained across these inputs. When processing sequences or time series data in such networks, the entire sequence needs to be provided to the network at once so that it can be treated as a single data point. Such a network is called a feedforward network.

    In contrast, when people read a text, they follow the words with their eyes and memorize what they see. This allows the meaning of the sentence to be expressed in a fluid manner. Biological intelligence, while processing information in a novel way, maintains an internal model of what it is processing. This model is built from past information and is updated whenever new information is given.

    Recurrent Neural Networks (RNNs) work on the same principle, though in a much simpler way. In this case, the processing of a sequence is done by iteratively processing the elements of the sequence. The information related to what is detected in the process is then maintained as state. In effect, an RNN is a kind of neural network with an inner loop.

    In this paper, I describe the implementation of Simple RNN, which is a basic RNN using Keras, and LSTM and GRU, which are advanced RNNs.

    We describe an advanced method to improve the performance and generalization power of RNNs. In this paper, we take the problem of predicting temperature as an example, and access time-series data such as temperature, pressure, and humidity sent from sensors installed on the roof of a building. Using these data, we solve the difficult problem of predicting the temperature 24 hours after the last data point, and discuss the challenges we face when dealing with time series data.

    Specifically, I describe an approach that uses recurrent dropout, recurrent layer stacking, and other techniques for optimization, and uses GRU (Gated Recurrent Unit) layers.

    The last method we will discuss is the bidirectional RNN (bidirectional RNN). Bidirectional RNNs are one of the most common RNNs and can perform better than regular RNNs in certain tasks. This RNN is often used in Natural Language Processing (NLP). As for bidirectional RNNs, they can be considered as versatile deep learning, like Swiss Army knives for NLP.

    The feature of RNN is that it depends on the order (time). Therefore, shuffling the time increments or reversing the order may completely change the representation that the RNN extracts from the sequence. Bidirectional RNNs are built to capture patterns that are overlooked in one direction by processing sequences in the forward and reverse directions, taking advantage of the order-sensitive nature of RNNs.

    In this article, we will discuss building a complex network model using the Keras Functional API as a best practice for more advanced deep learning.

    When considering a deep learning model that predicts the market price of used clothing, the inputs to this model include user-provided metadata (such as the brand of the item and how old it is), user-provided text descriptions, and pictures of the item. The model is multimodal using these.

    Some tasks may require prediction of multiple target attributes from the input data. A multi-output model for a tree where you have the text of a full-length novel or a short story, and you want to classify the novel by genre, but you also want to predict when the novel was written.

    Or, for a combination of the above, you can use the Functional API in Keras to build a flexible model.

    In this article, I will discuss how to monitor what is happening in the model during training and optimization of DNN. When training a model, it is often difficult to predict from the beginning how many epochs are needed to optimize the loss value in the validation data.

    For these epochs, if the training can be stopped when the improvement of the loss value in the validation data is no longer observed, the task can be performed more effectively. This is made possible by callbacks in Keras.

    TensorBoard is a browser-based visualization tool that is included in TensoFlow. Note that TensorBoard can be used only when TensorFlow is used as a backend of Keras.

    The main purpose of TensorBoard is to allow you to visually monitor everything that is happening inside the model during training, and if you are also monitoring information other than the final loss of the model, you will be able to see more clearly what the model is doing and not doing, and you will be able to quickly see the whole body. The capabilities of TesorBoead include (1) visual monitoring of metrics during training, (2) visualization of model architecture, (3) visualization of activation and gradient histograms, and (4) 3D exploration of embedding.

    In this article, I will discuss the optimization of models.

    If all you need is something that works for the time being, you can blindly experiment with the architecture and it will work reasonably well. In this section, we will discuss an approach to make it work well enough to win a machine learning competition, instead of being satisfied with what works.

    First, I will discuss “normalization” and “dw convolution” as important design patterns other than the residual connection mentioned above. These patterns become important when you are building a high-performance deep convolutional neural network (DCNN).

    When building a deep learning model, you need to make a variety of decisions that seem to be at your personal discretion. Specifically, how many layers should there be in the stack? How many units or filters should be in each layer? What function should be used as the activation function? How many dropouts should be used? and so on. These architecture-level parameters are called hyperparameters to distinguish them from model parameters that are trained through back-propagation.

    Another powerful method for obtaining the best results is model ensembling. An ensemble is a pooling of the predictions of different models to produce better predictions.

      There are two methods for training multiple time-series data with a single deep learning model in Keras. (The advantage of the first method is that the model is simpler and therefore faster to learn and predict than the second method, while the advantage of the second method is that it can be customized for each time series, making it easier to improve accuracy than the first method.

      Generative Model

      Conditional Generative Models are a type of generative model that has the ability to generate data given certain conditions. Conditional Generative Models play an important role in many application fields because they can generate data based on given conditions. This section describes various algorithms and concrete implementations of this conditional generative model.

      There are open source tools such as text-generation-webui and AUTOMATIC1111 that allow codeless use of generation modules such as ChatGPT and Stable Diffusion. In this article, we describe how to use these modules for text generation and image generation.

      Huggingface is an open source platform and library for machine learning and natural language processing (NLP). The tools and resources provided by Huggingface are supported by an open source community, where there is an active effort to share code and models. This section describes the Huggingface Transformers, documentation generation, and implementation in python.

      Attention in deep learning is an important concept used as part of neural networks. The Attention mechanism refers to the ability of a model to assign different levels of importance to different parts of the input, and the application of this mechanism has recently been recognized as being particularly useful in tasks such as natural language processing and image recognition.

      This paper provides an overview of the Attention mechanism without using mathematical formulas and an example of its implementation in pyhton.

      In this article, we will discuss text generation using LSTM as generative deep learning with python and Keras.

      As far as data generation using deep learning is concerned, in 2015, Google’s DecDream algorithm was proposed to transform images into psychedelic dog eyes and pared-down artworks, and in 2016, a short film called “sunspring” based on a script (with complete dialogues) generated by the LSTM algorithm, as well as the generation of various types of music.

      These are achieved by using a deep learning model to extract samples from the statistical latent space of the learned images, music, and stories.

      In this article, I will first describe a method for generating sequence data using a recurrent neural network (RNN). In this article, I will use text data as an example, but the exact same method can be applied to all kinds of sequence data (e.g., music, handwriting data in paintings, etc.). It can also be used for speech synthesis and dialogue generation in chatbots such as Google’s smart replay.

      Specific implementation and application of evolving deep learning techniques (OpenPose, SSD, AnoGAN, Efficient GAN, DCGAN, Self-Attention, GAN, BERT, Transformer, GAN, PSPNet, 3DCNN, ECO) using pyhtorch.

      Theory and Application

      From the book “Deep Learning” published by the Japanese Society for Artificial Intelligence, I will describe the overall picture of deep learning.

      Broadly speaking, there are two types of models: deterministic models, in which the output is determined definitively with respect to the input, and probabilistic models, which are based on stochastic models. Most of the deterministic neural networks are classified into hierarchical neural networks and self-encoders.

      Hierarchical neural networks have a feed-forward structure in which the code is propagated from input to output through coupling, and are mainly used for supervised learning. Hierarchical neural nets include perceptrons, multilayer perceptrons, deep (hierarchical) neural nets, recurrent neural nets, convolutional neural nets, and recurrent coupled neural nets.

      In “Bayesian Learning for Neural Networks,” Neal showed that a one-layer neural network is equivalent to a Gaussian process in the limit of hidden layers → ∞. Therefore, by considering a Gaussian process instead of a neural network, optimization of multiple weights in a neural network becomes unnecessary, and the predictive distribution can be obtained analytically. In addition, Gaussian processes have a natural structure as a stochastic model, and unlike neural networks, which cannot predict what will be learned, Gaussian processes can express prior knowledge about problems through kernel functions and can prospectively handle objects that cannot be trivially vectorized, such as time series and graphs. It has an advantage.

      • Overview of Multi-Task Learning and Examples of Applications and Implementations

      Multi-Task Learning is a machine learning method that simultaneously learns multiple related tasks. Usually, each task has a different data set and objective function, but Multi-Task Learning aims to incorporate these tasks into a model at the same time so that they can complement each other by utilizing their mutual relevance and shared information.

      Here, we provide an overview of methods such as shared parameter models, model distillation, transfer learning, and multi-objective optimization for this multitasking, and discuss examples of applications in natural language processing, image recognition, speech recognition, and medical diagnosis, as well as a simple implementation in python.

      We will discuss the optimization of the algorithm and the discriminant function of the classifier for image information. First, we discuss the gradient descent method. In the gradient descent method, the derivative of the objective function J is obtained, and the parameters are successively updated at each step m in the inverse direction of the gradient.

      In image processing, local feature extraction, statistical feature extraction, coding and pooling are each considered as a single module, and the structure that consists of multiple layers of these modules is called a deep structure. The method to learn this deep structure from input to output in an end-to-end manner is called deep learning. In deep learning, it is common to design the constituent modules by using neural networks, and it is common to design the deep structure using neural networks by using deep neural networks, and it is common to design the deep structure using neural networks by using deep neural networks. Deep neural networks are referred to as deep neural networks. By using deep learning, it is possible to build a system that predicts the desired output for input data, even without being familiar with the local feature extraction and coding methods mentioned above. On the other hand, due to the difficulty in designing an appropriate network structure and learning the parameters, the amount of training data required increases, which is a problem when using deep learning.

      From the book “Artificial Intelligence: From the Mysteries of the Mind to the Science of the Mind” published by Iwanami Shoten, I introduce Jeffrey Hinton’s “Where Do Function Come From?

      There are two major ways in which the human brain can represent various objects and events in the external world (shapes of objects, arrangement of scenes, meanings of words and sentences, etc.) as spatio-temporal patterns of neural activity.

      One is to prepare a large pool of neurons in which there is one neuron for each object, and try to represent each object by activating one neuron in the pool. The other is to represent each entity by the activity of many neurons, with each neuron being involved in the representation of many entities (distributed representation).

      If we assume that the information in the human mind is made up of distributed representations, where do these “features” come from? Hinton goes on to discuss this further.

      I would like to start with Jeffrey Hinton’s paper on autoencoders, “Reducing the Dimensionality of Data with Neural Networks”.

      The autoencoder is trained by feeding the same vector to the input and output layers. The idea is to make the number of neurons in the middle layer smaller than in the input and output layers, and to compress the data as features by extracting the output of the middle layer.

      In this article, I will discuss convolutional neural networks as an application to image recognition technology.

      In order to classify images, we can consider a network in which the units of the input layer correspond to each pixel value of the image, and a unit in one layer is connected to all units in the neighboring layers. This layer is called the fully-connected layer. However, when applied to large images, the number of parameters in the network becomes huge, making it difficult to train.

      In this paper, we consider how to reduce the number of parameters and make training easier by constraining the structure of the network using the unique properties of images. Here, we introduce the idea of local features, which is that pixels in a neighborhood have a strong relationship, but the relationship becomes weaker as the distance between the pixels increases.

      In this way, we can consider that a unit in one layer is not connected to all units in the next layer, but only to a group of units (local units) in the lower layer that exist in the neighborhood of each unit in the upper layer. This local region where the upper layer units are bound is called the local redeptive field. By restricting the coupling to the local area, the number of parameters can be reduced compared to a fully coupled network.

      Local features are valid in any region of the image, and at the same time, if a feature is valid in one part of the image, it can be considered as valid in other parts of the image, and weight sharing can be performed. With this weight sharing, the number of parameters can be further reduced.

      In order to take advantage of the spatial characteristics of the image, the units are arranged on a plane, and this group of units is called a feature map. The operation of manipulating the feature map by applying a kernel to the input image is equivalent to the operation of convolution of the kernel with the input image. Such a layer with the properties of local receptive field and weight sharing is called a convolution layer.

      • Word2Vec Dimensionality reduction and distributed representation

      word2Vec is an open source deep learning technique proposed by Tomas Mikolow et al. In principle, word2Vec is a vectorization of words (200 dimensions with default parameters), which can be used to position words in a 200-dimensional space, to see the similarity between words (e.g., evaluated by cosine similarity), and to perform clustering.

      First, word2vec takes as input the five words wt-5, wt-4, . . wt-1, wt+1, …, wt+4, wt+5 Bag-of-Words representation as the input, and then trains a CBOW (Continuous Bag Of Word) neural network to output the (focused) word wt.

      One feature of natural language that differs greatly from image recognition and speech recognition is that the processing target is a discrete “symbol”. On the other hand, the contents of a neural network are continuous values represented by vectors and matrices (optimization calculations are also performed as continuous function calculations), so it is necessary to convert “discrete” symbols such as words and sentences, which are the processing units of natural language processing, into “real-valued continuous domain data, such as vectors and matrices.

      There are various models to represent such natural language.

      One means of bridging the gap is the “one-hot vector representation”. This is a vector of a certain dimension in which only one element is 1 and the rest are 0. Given a predetermined vocabulary V, the number of words is represented by|V|, and each word in V is assigned a word number from 1 to|V|. If the word number of the i-th word in a given sentence is n, then the vector xi representing the i-th word is a vector with the n-th element set to 1 and the other elements set to 0.

      In this way, the representation of an event with one or a few characteristic elements is called “local representation”. On the other hand, when an event is represented as a collection of various features that share a concept with other events, it is called “distributed representation.

      An overview of graph neural network technology, a framework for performing deep learning on graph data. It is used for estimating physical properties of compounds, natural language processing using Attension, co-occurrence networks, and image information processing.

      • Graph Neural Networks (2) Utilizing Tools
      • Neural Networks as Applied Models of Bayesian Inference

      Neural networks, like linear and logistic regression, are probabilistic models that directly estimate the predicted value y from the input x. In this section, we describe a continuous-value regression algorithm using a neural network. Unlike linear regression models, the main feature of neural networks is that they can learn from data a nonlinear function for predicting y from x.

      As with many of the models described so far, we will treat the neural network completely Bayesian and solve all learning and prediction by probabilistic (approximate) inference. This has the advantage over general neural networks obtained by maximum likelihood or MAP estimation that overfitting can be naturally suppressed and that the degree of uncertainty and confidence in the prediction can be treated quantitatively.

      From “This is a good introduction to deep learning, Machine Learning Startup Series”. Reading notes are provided.

      In this article, we will discuss learning for alignment sorting.

      In this section, we describe an algorithm that learns how to sort alignments by presenting a number of correct alignments (positive examples) and incorrect alignments (negative examples). The main difference between these approaches is that the techniques in this section require some sample data for learning. This can be provided by the algorithm itself, such as only a subset of the correspondences to be judged, or it can be determined by the user, or it can be brought from external resources.

      In this section, we will discuss some of the well-known machine learning methods that have been used for text classification, such as Bayesian learning, WHIRL learning, neural networks, support vector machines, and decision trees.

      There have been many studies on speech recognition using neural networks before the advent of deep learning. In particular, during the second neural network boom triggered by the rediscovery of the inverse error propagation method by Rumelhart at the end of the 1980s, several applications to speech recognition have been announced. In this section, we will discuss three of the most representative ones: time-delay neural networks, recurrent neural networks, and HMM-MLP hybrid recognition. Although these methods did not become mainstream at the time, they are the basis of speech recognition using deep learning. I will also describe convolutional neural networks, which are often used in speech recognition using deep learning.

      Local feature extraction, statistical feature extraction, coding and pooling are each considered as one module, and the structure in which these modules are stacked on top of each other in multiple levels is called a deep structure. The method of learning this deep structure from input to output in an end-to-end manner is called deep learning. In deep learning, it is common to design the constituent modules by using neural networks, and it is common to design the deep structure using neural networks by using deep neural networks, and it is common to design the deep structure using neural networks by using deep neural networks. Deep neural networks are referred to as deep neural networks. By using deep learning, it is possible to build a system that predicts the desired output for input data, even without being familiar with the local feature extraction and coding methods mentioned above.

      In this article, we will discuss forward and back propagation algorithms and mini-batch as an overview of deep learning techniques.

      Continuing from the previous article, we will discuss the theoretical overview and implementation of convolutional neural networks (CNNs), which are frequently used for image recognition in deep learning.

      Deep learning consists of deep and wide multilayer neural nets, and high performance is obtained by using up the information in a large amount of training data. In this section, we discuss online learning methods for deep learning, such as mini-batch stochastic gradient descent, momentum method, and accelerated gradient method.

      Continuing from the previous article, we will discuss AdaGrad, RMSprop, ADADELTA, and vSGD as online learning methods for deep learning.

      Deep learning, which excels at analyzing large-scale, complex models, and probabilistic generative models based on probability calculations, which actively introduce knowledge and structures that can be assumed in data through the process of modeling, and show their strength in cases where “not all the necessary data are available,” such as missing data and undetermined values, have each developed independently.

      In the process, deep learning has mainly focused on the development of scalable models that can train large amounts of data and on improving prediction accuracy, while the evaluation of the interpretability and reliability of the basis for prediction results has been left to the back burner.

      To address these issues, there are three possible future directions: (1) Bayesianization of deep learning models, (2) Bayesian analysis of existing methods, and (3) application of deep learning techniques to Bayesian inference. The first two are the application of Bayesian inference to deep learning methods, which can naturally suppress overfitting, adjust hyperparameters, and model runtimes in a more medical manner, especially in deep generative models, and can also serve as a guideline for medically evaluating the data generating ability of models. In addition, the combination with other stochastic models and the completion of missing values can be done naturally through stochastic calculations. (3) is an approach called deep learning for Bayesian inference, and includes methods to improve the efficiency of Bayesian inference for complex models, such as “amortized inference,” which uses neural nets to calculate the posterior distribution of a huge number of random variables while predicting the trends of the variables. There are also methods to improve the efficiency of Bayesian inference for complex models, such as “amortized inference,” which uses neural nets to calculate the posterior distribution for complex models while predicting the trend of variables.

      コメント

      タイトルとURLをコピーしました