Artificial Intelligence Technologies Drawing Attention at Recent International Conferences
In this issue, we will discuss notable technologies and their representative papers extracted from the following prominent overseas conferences.
ICML will be recognized as one of the most prestigious conferences in the field of machine learning.ICML covers a wide range of topics related to machine learning theory, algorithms, and applications, including: deep learning, reinforcement learning, supervised and unsupervised learning, kernel methods, graphical models, and distributional estimation.
NeurIPS will be one of the world’s largest scientific conferences on neural information processing systems and recognized as one of the most important conferences in computer science. at NeurIPS, machine learning, deep learning, neural networks, reinforcement learning, natural language processing, and more Presentations are given in a wide range of fields.
ICLR is the world’s leading conference on representation learning in machine learning. representation learning is a branch of machine learning that refers to methods for extracting useful features from data. at ICLR, papers are presented in various The ICLR has published papers in various fields of representation learning, such as deep learning, recurrent neural networks, and convolutional neural networks.
- Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data Mining (ACM SIGKDD)
It will be a special interest group of the Association for Computing Machinery (ACM), established to promote research in the field of ACM SIGKDD (Knowledge Discovery and Data Mining). ACM SIGKDD is an important venue for researchers and engineers from around the world to present their research results in the areas of machine learning, statistics, data mining, data analytics, big data, and artificial intelligence, as well as to share information on the latest technologies and trends in data science. The event has become an important venue for researchers and engineers from all over the world.
CVPR is an international conference on computer vision and pattern recognition organized by the Institute of Electrical and Electronics Engineers (IEEE) and is considered one of the most important scientific conferences in the field of computer vision. at CVPR , image recognition, image processing, machine learning, deep learning, 3D vision, robot vision, and a wide range of other areas of computer vision and pattern recognition, among others.
TNNLS will be a journal that publishes papers on neural networks and machine learning, including papers on a wide range of machine learning topics such as neural networks, deep learning, reinforcement learning, kernel methods, statistical learning theory, optimization theory, and evolutionary computation.
AAAI will be an important international conference for many researchers in artificial intelligence and an organization that publishes papers and reports on basic and applied research, education, and public policy in various artificial intelligence fields.
IJCAI will be one of the world’s most important academic conferences in the field of artificial intelligence, covering a wide range of fields in artificial intelligence, mainly artificial intelligence, machine learning, knowledge representation, natural language processing, planning, search, statistical inference, knowledge-based systems, multi-agent systems and robotics. IJCAI was first held in 1957 and has been held every two years since then.
The criteria for the selection of technologies are as follows: deep learning, reinforcement learning, probabilistic generative modeling, natural language processing, explainable machine learning, and knowledge information processing, which are described in this blog, are referred to in the respective linked articles, and we are conscious of picking up technologies other than those.
Self-Supervised Learning
Self-supervised learning, as described in “Overview of Self-Supervised Learning, Various Algorithms, and Examples of Implementations” will be a method that generates self-generating labels from unlabeled data. This method involves pre-training with a large unlabeled data set, followed by fine tuning with task-specific labeled data to build a high-performance model. Compared to traditional supervised learning, self-supervised learning is more effective when labeled data is limited and has also been applied to tasks that are difficult to label, such as natural language processing and image processing. In the past, it has been shown to be effective in the area of natural language processing. It has also been applied to high-performance object detection and semantic segmentation methods in the area of image recognition.
The following sections describe the main methods for self-supervised learning.
- Contrastive Learning: Contrastive learning compares pairs of different data to learn to distinguish them. For example, in the case of images, different parts of the same image are taken as pairs and learned by comparing their features.
- Predictive Task: A task that uses unlabeled data to predict what will happen next in order to train the model. For example, in the case of a sentence, a part of the sentence is hidden and the task is to predict it.
- Self-generating model: A self-generating model would be a model that has the ability to reconstruct input data. It encodes the input data to obtain a low-dimensional representation, then decodes and reconstructs it. In this process, the representation of the data can be learned.
- Evolutionary Algorithms: Evolutionary algorithms described in “Overview of evolutionary algorithms and examples of algorithms and implementations” are methods that apply evolutionary principles to evolve a model. For example, by evolving the weights and architecture of the network, self-supervised learning is performed.
Representative papers on self-supervised learning include the following
- “Unsupervised Learning of Video Representations using LSTMs” (2015) – Vondrick et al. In this paper, a self-supervised learning method for video data using long short-term memory (LSTM) is proposed. It takes video data as input and learns a representation of the video through the task of predicting the next frame using LSTM.
- “Context Encoders: Feature Learning by Inpainting” (2016) – Pathak et al. In this paper, a self-supervised learning method is proposed that learns a representation of an image through the task of hiding parts of the image and predicting them. By repairing the missing parts of the image, it learns the representation of the entire image.
- “A Simple Framework for Contrastive Learning of Visual Representations” (2020) – Chen et al. This paper proposes a contrastive learning method that compares different pairs of data and learns to distinguish between them. Different data modalities, such as images and text, can be used to learn representations.
- “Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning” (2020) – Grathwohl et al. In this paper, a method for self-supervised learning with image data is proposed. It takes an image with noise as input and learns a representation of the image through the task of converting the image back to the original image.
- “Momentum Contrast for Unsupervised Visual Representation Learning” (2020) – He et al. In this paper, a momentum contrast method is proposed that learns by comparing pairs of different data. Using image data, the method learns representations of images by comparing different pairs of data.
Graph Neural Networks (GNN)
GNNs as described in “Overview of Graph Neural Networks, Applications, and Examples of Python Implementations” is a type of neural network that learns on data with graph structure while taking into account information on nodes and edges. Recently, GNN has been increasingly applied to graph classification, graph clustering, graph recommendation, and graph generation. Data with graph structure is expected to be applied in a wide range of antipathy as it appears in various domains such as social networks, chemical structures, 3D models, and so on.
GNN learns on graph data by representing information on nodes and edges as feature vectors and updating the information of the entire graph while considering the relationship with neighboring nodes and edges. since GNN combines features of nodes and edges to generate a representation of the entire graph topology information and local information can be effectively used.
The basic structure of a GNN is as follows
- Node feature vectors: Each node in a graph has its own feature vector. For example, in the case of a social network, a node represents a user, and the user’s profile information, friendships, etc. are represented as feature vectors of the node.
- Edge feature vectors: Edges of a graph also have feature vectors. For example, in the case of a social network, edges represent relationships among users, and the strength of friendships and types of relationships are represented as edge feature vectors.
- Message propagation: GNNs update the feature vectors of nodes and edges by considering their relationships with neighboring nodes and edges. This is called message propagation. The feature vectors of nodes and edges are combined to generate a new feature vector.
- Aggregation: After repeating message propagation and updating the feature vectors of nodes and edges, a final representation of the graph is generated. This is called aggregation.
The following is a list of representative papers on graph neural networks.
- “Semi-Supervised Classification with Graph Convolutional Networks” (Kipf, T.N. & Welling, M., 2017): This paper proposes a Graph Convolutional Network (GCN), which is one of the most popular models of GNN. It proposes a method for semi-supervised learning on graphs using a small number of labeled data and a large number of unlabeled data in a node classification task.
- “Graph Attention Networks” (Velickovic, P. et al., 2018): This paper proposes a Graph Attention Network (GAT), which is one derivative model of GNN. The features of nodes and edges are weighted using an attention mechanism, and a method is proposed for learning while paying attention to important information. For details, please refer to “Overview of GAT (Graph Attention Network), Algorithm and Examples of Implementation.
- “Inductive Representation Learning on Large Graphs” (Hamilton, W.L. et al., 2017): This paper proposes Graph SAGE (Graph Sample and Aggregate, GraphSAGE described in “GraphSAGE Overview, Algorithm, and Example Implementation“), a method for learning representations on large graphs of graph data. By sampling and aggregating local information of nodes, the method provides efficient learning on large graphs.
- “Gated Graph Neural Networks” (Li, Y. et al., 2016): This paper proposes a Gated Graph Neural Network (GGNN), which is one derivative of GNN. It proposes a method to update information on a graph using gates between neighboring nodes.
- “Graph Convolutional Networks for Text Classification” (Yao, L. et al., 2019): This paper proposes a method for applying graph convolutional networks in text classification tasks, and is one of the pioneering studies applying GNNs to natural language processing tasks.
meta-learning
Meta-Learners, which can also be used for Few-Shot/Zero-Shot Learning, as described in “Overview and Implementation Examples of Meta-Learners” and “Overview of causal inference using Meta-Learners and examples of algorithms and implementations” is a method for learning commonalities and similarities among tasks and adapting to new tasks when learning multiple tasks. In other words, it is a method for learning new tasks quickly and efficiently by using learned models. Meta-learning is particularly useful in situations where data is limited, such as in Few-Shot Learning, where machine-learning models can effectively use less data.
Meta-learning is divided into two phases: meta-training and meta-testing.
- Meta-Training: In meta-training, a base model, called a meta-model, is learned using training data from multiple tasks. This base model learns common features to adapt to new tasks. The meta-model extracts commonalities among tasks and learns parameters for high performance on new tasks.
- Meta-Testing: In meta-testing, the meta-model learned in meta-training is used to evaluate adaptability to new tasks. Even when training data for a new task is very limited, the meta-model makes appropriate predictions for the new task using common features that have already been learned.
There are several approaches to meta-learning, but some of the most common are as follows
- Model-Agnostic Meta-Learning (MAML): MAML is a method for learning a learning algorithm to update the parameters of a base model. The learning algorithm used to update the parameters of the base model is learned as a meta-model, and appropriate parameter updates are performed for new tasks.
- Memory-Augmented Neural Networks (MANNs): MANNs is a meta-learning method using neural networks with external memory. The use of external memory increases the efficiency of neural network computation.
Representative papers on Meta-Learning (Meta-Learning) include the following
- “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks” (2017) – Finn et al. In this paper, a model-independent meta-learning method is proposed. The method will use data from different tasks to initialize the model and allow it to adapt to new tasks at high speed.
- “Prototypical Networks for Few-shot Learning” (2017) – Snell et al. In this paper, a prototype-based meta-learning method is proposed. The method will use a small amount of labeled data to build a high-performance classifier for a new task.
- “Reptile: A Scalable Metalearning Algorithm” (2018) – Nichol et al. In this paper, a meta-learning method using evolutionary optimization techniques is proposed. The method involves iterative optimization for multiple tasks and updating the parameters of the model.
- “Meta-Learning with Differentiable Convex Optimization” (2018) – Shaban et al. In this paper, a meta-learning method is proposed by solving different convex optimization problems. The method updates the parameters of the model as solutions to convex optimization problems and adapts to new tasks.
- “Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace” (2018) – Kim et al. In this paper, a meta-learning method with a learnable metric is proposed. The method will use data from different tasks to update the learnable metric and adapt it to new tasks.
Few-Shot Learning
Few-Shot Learning is a machine learning technique that learns a new task with a limited number of training data. While conventional machine learning requires a large amount of training data, Few-Shot Learning overcomes this limitation and can learn from very few data points Few-Shot Learning overcomes this limitation and can learn from a very small number of data points.
Few-Shot Learning is generally useful in the following scenarios
- Small Data Sets: Few-Shot Learning can be useful when there is very limited training data available for a given task. For example, when building a translation model for a new language, it is difficult to apply existing approaches because only a limited amount of language pair data is available. For more information, see “Overview of Translation Models, Algorithms, and Example Implementations.
- Frequent new tasks: Few-Shot Learning is useful in environments where tasks change frequently and it is difficult to collect large amounts of training data each time. For example, when building image classification models for new products, it is difficult to apply existing approaches because new products are added every day.
There are several approaches to Few-Shot Learning. Typical ones are listed below.
- Meta-Learning: Meta-learning is a method of learning commonalities and similarities among tasks and adapting to new tasks when learning multiple tasks. For example, when a new task is given, the learning results of previous tasks can be used for rapid learning.
- Transfer Learning: Transfer Learning described in “Overview of Transfer Learning and Examples of Algorithms and Implementations” is a method of applying a model learned for one task to a different task. For example, when training a new task using an image classification model, previously trained feature extractors can be reused to enable training with less data.
The following is a list of representative Few-Shot Learning papers.
- “Matching Networks for One Shot Learning” (Vinyals et al., 2016): This paper proposes Matching Networks using Neural Networks in One Shot Learning (learning when you have only one sample for one new class). This would be a high-performance Few-Shot Learning by computing the similarity between the support set and query set of a new class and using it for prediction.
- “Prototypical Networks for Few-Shot Learning” (Snell et al., 2017): This paper proposes a prototype-based approach in Few-Shot Learning. The prototypes are features of each class extracted from a support set of new classes, which are used to predict query sets. This approach allows for easy learning and high generalization performance.
- “Meta-Learning with Memory-Augmented Neural Networks” (Santoro et al., 2016): This paper proposes a meta-learning approach in Few-Shot Learning that uses neural networks with memory. The use of memory allows for faster learning for new tasks by leveraging past experience.
- “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks” (Finn et al., 2017): This paper proposes model-independent meta-learning methods in Few-Shot Learning. These allow learning common features across different tasks and fast adaptation to new tasks.
AutoML
Automatic Machine Learning (AutoML), also described in “Overview of AutoML, Algorithms and Various Implementations” will be a method that automates the selection of machine learning models, tuning of hyperparameters, feature selection, and data preprocessing. They can be used to streamline the machine learning workflow and enable users with limited machine learning expertise to build and optimize machine learning models.
The main features of automated machine learning are as follows
- Model exploration: automatic machine learning explores different machine learning algorithms and model architectures to find the best model. This allows for automatic selection of the best model for the data.
- Hyperparameter Tuning: Automatic machine learning can automatically optimize the hyperparameters of a model (e.g., learning rate, regularization term, etc.). This allows the search for optimal hyperparameter values to maximize model performance.
- Feature Selection and Data Preprocessing: Automatic machine learning can automatically perform data preprocessing, such as feature selection and scaling of input data and handling of missing values. This can improve data quality and model performance.
- Model evaluation and selection: Automatic machine learning can evaluate different models and combinations of hyperparameters and select the best model. This can automate the process of comparing multiple models and selecting the best model.
- Model Deployment: After selecting the best model, automatic machine learning automatically deploys it. This allows the user to perform machine learning without having to do any solving.
Some papers on automatic machine learning are listed below.
- “AutoML: A Survey of the State-of-the-Art” – This paper will be an overview of AutoML and a comprehensive survey of current research, detailing basic AutoML concepts and methods, automated model selection, hyperparameter optimization, and data preprocessing.
- “Efficient Neural Architecture Search via Parameter Sharing” – This paper proposes Neural Architecture Search with Shared Parameters (ENAS), a method for streamlining network architecture search. ENAS reduces architecture search time and computational resources, and can automatically search for high-performance models.
- “AutoAugment: Learning Augmentation Policies from Data” – This paper proposes AutoAugment, a method for automatically learning data augmentation policies; AutoAugment can improve model performance by optimizing image data augmentation.
- “Practical Automated Machine Learning for the Kaggle Competition” – This paper proposes a practical AutoML method for Kaggle’s machine learning competition; Kaggle is a platform where competitive evaluation of machine learning models takes place, and this paper demonstrates the utility of AutoML under realistic constraints.
Question-answering learning
Question-answering learning is a branch of machine learning that is used to automatically generate answers to questions from users. In general, question-answering learning has important applications in the domains of natural language processing (NLP) and artificial intelligence (AI). The following sections describe the main concepts and techniques of question-answering learning.
- Data Collection: Question-and-answer type learning requires a large amount of data on questions and their correct answer pairs. To collect this data, it is common to manually annotate or use existing question-answer datasets.
- Feature Extraction: Feature extraction of textual data is required to convert questions and answers into a format that can be handled by machine learning algorithms. Typical methods include word and sentence embedding and TF-IDF (Term Frequency-Inverse Document Frequency).
- Model Learning: In question-response learning, machine learning algorithms are used to learn models that predict answers to questions. Typical algorithms include Naive Bayes, SVM (Support Vector Machine), RNN (Recurrent Neural Network) as described in “Overview of RNN and examples of algorithms and implementations”, BERT (Bidirectional Encoder Representations from Transformers).
- Evaluation: To evaluate the performance of the trained model, a test dataset is used to assess the model’s prediction accuracy and adequacy of answers. Commonly used evaluation metrics include percent correct, fit, recall, and F1 scores.
- Model tuning: Tuning of hyperparameters and model refinement to improve model performance. This includes, for example, improving the model architecture and expanding the training data.
- Deployment: To put the learned question-answering model into practical use, it is assembled into an actual question-answering system, etc.
The following is a list of representative papers in the field of question-response learning.
- “Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks” – : Quoc Le, Tomas Mikolov This paper proposes a method for learning to rank short sentences using convolutional deep neural networks in question-answer type learning. Specifically, the convolutional neural network is used to rank question-answer pairs by converting them into vectors and learning their similarity.
- “Bidirectional Encoder Representations from Transformers (BERT)” – : Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova In this paper, we propose the BERT model, which is known for significantly improving the performance of natural language processing tasks; BERT has been shown to perform well on a variety of tasks, including question-answer type learning, using a pre-trained two-way transformer neural network.
- “Dynamic Memory Networks for Visual and Textual Question Answering” – : Caiming Xiong, Stephen Merity, Richard Socher In this paper, we propose a method for question-answer-based learning using memory networks. Memory networks use external memory to hold information and can model associations between questions and answers. In this paper, a memory network is used to learn the relevance between questions and answers and achieves high performance.
Federated Learning
Federated Learning described in “Overview of Federated Learning and Examples of Various Algorithms and Implementations.” is a form of distributed machine learning, a method that collects data from multiple devices and users and aggregates locally learned models to create a global model while sharing model learning among multiple devices and systems.
Federated Learning is a method for learning models by leveraging data across multiple devices and systems while emphasizing privacy protection and data security. It also reduces network bandwidth and minimizes data movement, which is useful when there are real-time requirements or when the communication environment is limited.
The basic structure of Federated Learning is as follows.
- Central Server: The central server is responsible for model initialization and parameter aggregation.
- Device/Client: Each device or client trains a model using local data and sends the training results to the central server after training.
- Parameter aggregation by central server: The central server receives training results sent by devices and clients and integrates them to update the global model.
- Model update: The model parameters updated by the central server are sent to each device or client, which then initiates the next round of learning.
- Convergence iterations: The above process is repeated multiple times to continue learning until the model converges.
The following are representative papers from Federated Learning.
- “Communication-Efficient Learning of Deep Networks from Decentralized Data“(AISTATS 2017)- This paper is an early paper in Federated Learning and will be published by Google. It proposes a method to improve communication efficiency for learning models across multiple devices with distributed data.
- “Federated Learning: Strategies for Improving Communication Efficiency“(NIPS 2016)- This paper proposes the basic concepts of Federated Learning and suggests ways to improve communication efficiency in distributed learning of data.
- “Practical Secure Aggregation for Federated Learning on User-Generated Data“(USENIX Security 2017)- This paper emphasizes the importance of privacy protection in Federated Learning and proposes a secure aggregation protocol.
- “Scalable and Privacy-Preserving Distributed Deep Learning“- This paper proposes a privacy-preserving, scalable, distributed deep learning approach using Federated Learning.
multimodal technology
Multimodal technology, also described in “Application and Implementation of ElasticSearch and Machine Learning for Multimodal Search” will be the general term for technologies that combine several different modalities (forms and media of information) for processing and analysis. By integrating information from different modalities, richer and more complex information can be understood and deeper insights can be gained. Below are some typical areas and applications of multimodal technologies.
- Vision and language integration: Vision information (images and videos) and language information (text) are combined to realize image and video captioning, image/video retrieval, and text generation. Examples include image capture technology that automatically generates text describing objects or scenes in images, and speech recognition technology that converts audio in videos into text.
- Speech and language integration: Combining speech and language information to interpret voice commands and understand and generate spoken dialogue. Examples include voice assistants for smart speakers and voice search technology that enables voice search and question answering.
- Text and language integration: Combines textual and linguistic information to enable automatic text summarization, sentiment analysis, machine translation, and text generation. Examples include emotion analysis technology that analyzes emotions and sentiments in text, and machine translation technology that enables automatic translation between different languages.
- Integration of gesture and language: Combining gesture information (data from motion sensors and other means of human movement) and language information to understand communication through gestures and generate responses that match gestures. For example, in virtual reality (VR) and augmented reality (AR) interactions, gesture and voice are combined to achieve natural communication.
The following is a list of representative papers on multimodal technology.
- “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention” – Xu et al., 2015 This paper proposes a model for image captioning that introduces a visual attention mechanism that can direct attention to specific regions of an image. It is widely recognized as a pioneering study in image captioning that integrates vision and language.
- “Listen, Attend and Spell” – Chan et al., 2016 This paper proposes a model for speech recognition that introduces an audio attention mechanism that can focus on temporal information in speech. It is known as an important work in speech recognition that integrates speech and language.
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” – Devlin et al., 2018 This paper proposes BERT (Bidirectional Encoder Representations from Transformers), a language model for handling textual information. representations, and is known as an important research in the field of language understanding that achieves the integration of text and language.
- “Language Models are Unsupervised Multitask Learners” – Radford et al., 2019 This paper proposes a large-scale language model (GPT-2 (Generative Pre-trained Transformer 2)), which can achieve high-performance language generation through pre-training using large amounts of text data and is known as an important work in the field of text and language integration in language generation, and is known as an important research in the field of text-to-language integration.
コメント