Overview of python Keras and examples of its application to basic deep learning tasks

Machine Learning Artificial Intelligence Digital Transformation Natural Language Processing Image Processing Speech Recognition Technology Time Series Data Analysis Reinforcement Learning Probabilistic Generative Modeling Deep Learning Python Navigation of this blog

Summary

This section provides an overview of python Keras and specific applications to basic deep learning tasks (handwriting recognition using MINIST, Autoencoder, CNN described in “Overview of CNN and examples of algorithms and implementations”, RNN as described in “Overview of RNN and examples of algorithms and implementations”, LSTM described in “Overview of LSTM and Examples of Algorithms and Implementations“).

Python Keras Overview

Python’s Keras is a neural network library that is available in a highly abstract and developer-friendly form, providing a concise and efficient tool for implementing machine learning and deep learning tasks. and integrated with other deep learning frameworks (TensorFlow, Theano, CNTK, etc.). (See also “Comparison of tensorflow, Keras, and pytorch” for a comparison of other leading libraries such as tensorflow, Keras, and pytorch.)

The following are some of the features and advantages of Python Keras

Simple syntax: Keras provides a simple and intuitive API that allows for the concise description of neural network models. Developers can experiment and prototype more quickly because less code is required to build models.
Modularity and extensibility: Keras allows models to be built as layers, which can then be combined to create complex networks. Keras is also easy to reuse existing layers and models; it is highly modular and provides various types of layers, activation functions, loss functions, optimization algorithms, etc.
Multiple backends: Keras can use deep learning frameworks such as TensorFlow, Theano, and CNTK as backends. This allows users to switch backends and take advantage of different frameworks.
Community and Documentation: Keras has a very active community where users can exchange information and support. It also has extensive official documentation that provides explanations of features and tutorials.
GPU Support: Keras supports GPUs and can run fast deep learning computations using GPUs. In particular, when using the TensorFlow backend, it is easy to set up to use GPUs.

Next, we describe the actual environment settings for using Keras. For more information on Keras, “Deep Learning with Python” is a good reference book. Please refer to it as well.

environment setting

This section describes how to build an environment for using the Keras library in Python.

Installing Python: Since Keras runs on Python, you will need to install Python. This can be done by downloading and installing the latest version of Python from the official website.
Create a virtual environment (optional): Create a Python virtual environment to avoid the spaghetti state of the libraries to be used. The virtual environment will allow you to manage library versions on a project-by-project basis. To create a virtual environment, use the venv module.

python3 -m venv myenv  # Creating a virtual environment
source myenv/bin/activate  # Virtual environment activation (Linux/Mac)
myenvScriptsactivate  # Activate Virtual Environment (Windows)

For details on building a python development environment, please refer to “How to create code development environments in various languages” and “Setting up a Python development environment with SublimeText4 and VS code“.

Installing Keras: Keras can be installed using pip, a Python package management tool. If the virtual environment is activated, execute the following command

pip install keras

Additionally, the sensorflow-gpu package can be installed to enable GPU support if needed.

pip install tensorflow-gpu

Note: Keras is currently provided as part of TensorFlow and is also installed as part of the tensorflow package.

Installation of required additional libraries: In order to use Keras, you will also need some additional libraries. Use the following command to install the necessary libraries. Here, the NumPy library for numerical computations and the Matplotlib library for plotting graphs are installed.

pip install numpy matplotlib

The above steps will complete the setup of the Keras environment in Python. To solve individual problems, it is necessary to further install other packages and tools as needed to build the project.

Handwriting recognition in MNIST

As a start, we show a simple implementation using the Python Keras library of the task of classifying handwritten digits in the MNIST dataset, which is also described as the Hello World of Keras in “Hello World of Neural Networks, Implementation of Handwriting Recognition with MNIST Data” etc. A simple example implementation is shown below.

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.datasets import mnist
from keras.utils import to_categorical

# Loading MNIST Data Sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Input data preprocessing
x_train = x_train.reshape(-1, 28, 28, 1) / 255.0
x_test = x_test.reshape(-1, 28, 28, 1) / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# Model Building
model = Sequential()
model.add(Flatten(input_shape=(28, 28, 1)))
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Model Compilation
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Model Training
model.fit(x_train, y_train, batch_size=128, epochs=10, verbose=1)

# Model Evaluation
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test loss:', test_loss)
print('Test accuracy:', test_acc)

In this example, the Sequential model in Keras is used to build a full-connected neural network: the Flatten layer is used to convert the input image into a 1D vector, the Dense layer is used to define the hidden and output layers, the activation function is the ReLU function, and the output layer is set up to use a softmax function described in “Overview of softmax functions and related algorithms and implementation examples” to obtain a probability distribution. The ReLU (Rectified Linear Unit) function here is one of the activation functions widely used in neural networks, which outputs 0 when the input is less than 0, and outputs the input value as it is when the input is greater than 0. It is defined as follows.

f(x) = max(0, x)

The softmax function (Softmax function) is also a function used to interpret the output of a neural network as a probability distribution, and is generally used as the activation function for the final layer of a multi-class classification. The definition formula is as follows.

\[Softmax(x_i) = \frac{exp(x_i)}{\displaystyle\sum(exp(x_j))}\]

Understanding these models requires knowledge of deep learning, see “About Deep Learning” for details.

Next, we specify an optimization algorithm and a loss function to compile the model. In this example, we use Adam as the optimization algorithm and categorical cross-entropy described in “Overview of Cross-Entropy and Related Algorithms and Implementation Examples,” as the loss function. See “Stochastic Optimization” for theoretical details. Categorical Cross-Entropy is one of the loss functions used in multi-class classification tasks in machine learning and deep learning, and is expressed by the following equation.

\[CategoricalCrossEntropy\ =\ -\displaystyle\sum(y * log(p))\]

In training the model, input and target data are given using the fit method, the batch size and number of epochs are specified, and finally, in evaluating the model, test data are used to evaluate the model’s performance and display losses and accuracy.

Autoencoder

Autoencoder, a type of unsupervised learning, is a neural network model that learns a low-dimensional code (encoding) to represent the input data and a decoder to reconstruct the original data. Encoders and decoders usually have a symmetric structure. Autoencoder is also the technology that triggered the recent rise of deep learning technology, as described in “Where Do Features Come From?

Autoencoders are used for data compression and feature extraction. The encoder converts the input data into a low-dimensional representation, the decoder reconstructs the representation to match the original data as closely as possible, and the encoder is trained to match the output of the encoder with the input data, so that the encoder has the ability to capture features of the data. “Autoencoder” for more information on Autoencoder.

Autoencoder is built in the following steps

Encoder construction: Build a neural network that converts the input data into a low-dimensional representation. Usually, layers such as a full concatenation layer and a convolutional layer are used.
Construction of a decoder: Construct a neural network that converts the low-dimensional representation back to its original dimension. As with encoders, layers such as all-junction layers and convolutional layers are used.
Autoencoder construction: Construct an autoencoder model by combining the encoder and decoder.
Training the model: Feed the autoencoder model with input data and train it to match the encoder output with the original data as closely as possible.
Feature extraction and data compression: The encoder output is used to perform tasks such as feature extraction and dimensionality reduction of the data.

Autoencoder can be used for various applications such as anomaly detection, noise reduction, and feature extraction, or it can be a general-purpose technology that is also used as a deep learning pre-training method.

Below is an example implementation of Autoencoder using Python’s Keras library.

from keras.layers import Input, Dense
from keras.models import Model
from keras.datasets import mnist
import numpy as np

# Loading MNIST Data Sets
(x_train, _), (x_test, _) = mnist.load_data()

# Input data preprocessing
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = np.reshape(x_train, (len(x_train), np.prod(x_train.shape[1:])))
x_test = np.reshape(x_test, (len(x_test), np.prod(x_test.shape[1:])))

# Auto Encoder Construction
input_dim = 784  # 入力データの次元数
encoding_dim = 32  # エンコーディング次元数

# Encoder Definition
input_img = Input(shape=(input_dim,))
encoded = Dense(encoding_dim, activation='relu')(input_img)

# Decoder Definition
decoded = Dense(input_dim, activation='sigmoid')(encoded)

# Definition of the autoencoder model
autoencoder = Model(input_img, decoded)

# Encoder Model Definition
encoder = Model(input_img, encoded)

# Compilation of autoencoder models
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Autoencoder model training
autoencoder.fit(x_train, x_train, epochs=50, batch_size=256, shuffle=True, validation_data=(x_test, x_test))

# Conversion and reconstruction of test data
encoded_imgs = encoder.predict(x_test)
decoded_imgs = autoencoder.predict(x_test)

In this example, the Autoencoder is implemented for images in the MNIST dataset.

In building the model, the encoder and decoder are defined using an Input layer and a Dense layer. The output of the encoder is constrained to the number of encoding dimensions, and the output of the decoder matches the number of dimensions of the input data. In compiling the model, the optimization algorithm and loss function are specified. In this example, Adam is used as the optimization algorithm and binary cross-entropy is used as the loss function. In training the model, the fit method is used to train the model by providing input data and target data (the input data itself) and setting parameters such as the number of epochs and batch size.

CNN(Convolutional Neural Network)

CNN (Convolutional Neural Network) is a type of neural network developed to achieve superior performance in image processing tasks. A CNN consists of a Convolutional Layer and a Pooling Layer, and is generally constructed as a network by stacking combinations of these layers. CNNs have the following characteristics

Convolutional Layer: The convolutional layer performs convolution operations on images and feature maps to extract features. The convolutional operation generates a feature map by performing an element-by-element sum-of-products operation with the input data while sliding the kernel (filter), and the convolutional layer is responsible for extracting the local structure of the features.
Pooling Layer: The pooling layer is used to reduce the spatial dimension of the feature map obtained in the convolution layer. Typically, Max Pooling is used to generate a new feature map by extracting the maximum value from a region in the feature map. The pooling layer is responsible for obtaining a representation that is robust to changes in feature location.
Activation Function: After the convolution and pooling layers, a nonlinear activation function is applied. The ReLU (Rectified Linear Unit) is typically used as the activation function. The activation function introduces nonlinearity into the network and enhances the expressive power of the model.
Fully Connected Layer: After the convolutional and pooling layers, a fully connected layer is usually added. The fully-connected layer takes the features extracted by the convolutional and pooling layers and produces the final output.

For more information on image information processing techniques, see “Image Information Processing Techniques.

An example implementation of this CNN using the Python Keras library is shown below. This example targets the image classification task on the CIFAR-10 dataset.

import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.datasets import cifar10
from keras.utils import to_categorical

# Loading the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Input data preprocessing
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Model Building
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Model Compilation
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Model Training
model.fit(x_train, y_train, batch_size=128, epochs=10, validation_data=(x_test, y_test))

# Model Evaluation
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test loss:', test_loss)
print('Test accuracy:', test_acc)

In this example, a convolutional neural network (CNN) is constructed using the CIFAR-10 dataset, which consists of 10 different classes of 32×32 pixel color images.

In building the model, Conv2D and MaxPooling2D layers are used for convolution and pooling. Finally, the Flatten layer transforms the feature maps into 1D vectors, and the Dense layer of all the combined layers is used to predict the classes. In this example, Adam is used as the optimization algorithm and categorical cross-entropy is used as the loss function.

For more details on CNNs, see “Deep Learning for Computer Vision with Python and Keras (1) – Convolution and Pooling” and “Deep Learning for Computer Vision with Python and Keras (2) Data Extension of CNNs Using Small Amount of Data .

RNN(Recurrent Neural Network)

RNN (Recurrent Neural Network) is a type of neural network designed to process sequential data such as time series data and natural language. RNN has the feature of processing current input while retaining past information and transferring information to the next step. The features of RNNs are as follows

Recursive coupling: RNNs have recursive coupling with self-looping, which maintains a hidden state (hidden state). This hidden state serves to preserve information about the input at the previous time. This recursive coupling allows past information to be reflected in the current processing.
Input and output sequences: RNNs can take sequence data as input and generate sequence data as output. This allows it to be applied to tasks such as time series data prediction and sequence transformation.
Modeling long-term dependencies: RNNs have the ability to model dependencies in long-term time series data. Since past information propagates to the next step via hidden states, information in the distant past can also be considered.
Sequence-oriented processing: RNNs typically process time-series data sequentially from the beginning to the end. This allows processing that takes into account the structure of sequence data.

Typical architectures for RNNs include LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit). These will be improved variants developed to address the gradient loss problem and the challenges of modeling long-term dependencies in RNNs.

RNNs are widely used in tasks such as natural language processing, speech recognition, and time series prediction, and there are also applications of RNNs such as Stacked RNNs (RNNs) and Bidirectional RNNs (RNNs).

For details on natural language processing, please refer to “Natural Language Processing Technology” for speech recognition, please refer to “Speech Recognition Technology” and for time series prediction, please refer to “Time Series Data Analysis.

The following is an example implementation of RNN (Recurrent Neural Network) in Python’s Keras library. This example targets the sentiment analysis task on the IMDB movie review dataset.

import numpy as np
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense
from keras.preprocessing import sequence

# Loading IMDB datasets
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

# Input data preprocessing
max_len = 200  # Maximum document length
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)

# Model Building
model = Sequential()
model.add(Embedding(10000, 32, input_length=max_len))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))

# Model Compilation
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Model Training
model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))

# Model Evaluation
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test loss:', test_loss)
print('Test accuracy:', test_acc)

In this example, the RNN (LSTM) is built using the IMDB movie review dataset, which is a dataset consisting of movie review texts and their sentiment labels (positive or negative).

In building the model, the Embedding layer is used to perform word embedding and convert the text data into a dense vector representation. Next, the LSTM layer is used to embed time series information into the model, and finally, the Dense layer is used to predict emotion labels. In compiling the model, the optimization algorithm and loss function are specified; in this example, Adam is used as the optimization algorithm and binary cross-entropy as the loss function.

In training the model, the fit method is used to provide input and label data, the batch size and number of epochs are specified, and finally, in evaluating the model, test data is used to evaluate the model’s performance and display the loss and accuracy.

Sequence data analysis using RNNs is also the underlying technology for modern generative models, as described in “Overview of Automatic Sentence Generation with Huggingface.

LSTM(Long Short-Term Memory)

LSTM (Long Short-Term Memory) is a type of RNN (Recurrent Neural Network), a neural network architecture designed to model long-term dependencies. LSTM has the following elements.

Cell State: LSTM introduces memory cells called cell states. The cell state is the part of memory that holds long-term information and allows information to be added or deleted.
Gates: LSTMs have a control mechanism called gates. Gates introduce threshold values in a learnable manner to control the flow of information. The main gates are Input Gate, Forget Gate, and Output Gate.
- Input Gate: A gate for adding new information to the cell state.
- Forget Gate: A gate for removing unnecessary information from a cell state.
- Output Gate: A gate to convey information taken from a cell state to the next hidden state.
Hidden State: LSTMs, like RNNs, have hidden states. Hidden states hold past information for the current input. The hidden state is updated by cell state and gate control.

LSTMs are particularly effective in processing time-series and sequential data, especially when long-term dependencies need to be captured, and are widely used for tasks such as natural language processing (language modeling, machine translation), speech recognition, and motion recognition.

Below is an example implementation of LSTM (Long Short-Term Memory) using Python’s Keras library; when using Keras to implement LSTM, the LSTM layer is added using keras.layers.LSTM.

import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.datasets import mnist
from keras.utils import to_categorical

# Loading MNIST Data Sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Input data preprocessing
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Change the shape of input data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

# Model Building
model = Sequential()
model.add(LSTM(128, input_shape=(x_train.shape[1], 1)))
model.add(Dense(10, activation='softmax'))

# Model Compilation
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Model Training
model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))

# Model Evaluation
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test loss:', test_loss)
print('Test accuracy:', test_acc)

In this example, an LSTM network is used for a handwritten digit recognition task on the MNIST dataset. In building the model, the LSTM layer is used to add LSTM units and finally the Dense layer is used to predict the output, and in compiling the model, the optimization algorithm and loss function are specified. In this example, Adam is used as the optimization algorithm and categorical cross-entropy as the loss function.

In training the model, the fit method is used to provide input and label data, the batch size and number of epochs are specified to train the model, and finally, in evaluating the model, test data is used to evaluate the model performance and display the loss and accuracy.

LSTM is a run of complex deep learning models, which are further complicated to develop into the state-of-the-art generative models described in the “Overview of Automatic Sentence Generation with Huggingface” section.

Reference Information and Reference Books

For more information on image information processing in general, please refer to “Image Information Processing Techniques” and for natural language processing, refer to “Natural Language Processing Technology“, for more information on deep learning techniques, please refer to “About Deep Learning“.

Reference book is “Image Processing and Data Analysis with ERDAS IMAGINE”

“Hands-On Image Processing with Python: Expert techniques for advanced image analysis and effective interpretation of image data”

“Introduction to Image Processing Using R: Learning by Examples”

“Deep Learning for Vision Systems”