About Residual Coupling

Machine Learning Artificial Intelligence Digital Transformation Deep Learning Information Geometric Approach to Data Mathematics Navigation of this blog

Overview of Residual Coupling

Residual Connection (Residual Connection) is one of the methods for directly transferring information across layers in deep learning networks, and this method was introduced to address the problem of gradient loss and gradient explosion, especially when training deep networks. Residual coupling was proposed by Kaiming He et al. at Microsoft Research in 2015 and has since been very successful.

Typically, deep neural networks are prone to information loss as more layers are added, and when gradient loss occurs, as described in “The vanishing gradient problem and its counterpart” the network becomes difficult to learn and training becomes difficult to converge. Residual coupling can address these problems by creating shortcut paths across layers and directly transferring information on residuals (differences) that should be learned by the layers themselves.

Specifically, given the output \(H(x)\) of a layer, using residual combining the new output \(F(x)\) can be expressed as follows.

\[ F(x) = H(x) + x \]

where (x) is the original input through the shortcut path. This equation allows the original input to be transmitted unchanged, and the model learns the residuals that \(H(x)\) should learn. When the gradient is back-propagated, not only (H(x)) but also the residuals are propagated to ensure effective learning.

Residual coupling is used in various architectures, such as convolutional neural networks (CNNs) described in “Overview of CNNs, Algorithms, and Examples of Implementations” and residual networks (ResNet) described in “About ResNet (Residual Network)“. ResNet (Residual Network)”.

Specific Procedures for Residual Coupling

The specific procedure for the residual join is as follows. In the following description, (x) represents the input, (H(x)) the output of the layer, and (F(x)) the new output from the residue join.

1. Input shortcut: The input (x) is provided as is as a shortcut path.

2. Computing layer outputs: Compute layer outputs (H(x)) in the usual way. This is done differently for each layer, such as convolution, total concatenation, etc.

3. Computing the residuals: Compute the residuals by adding up the outputs of the layers and the inputs.
\[ \text{Residual} = H(x) + x\ ].

4. applying an activation function (optional): an activation function may be applied to the residuals, which allows the model to learn nonlinear functions.

5. Final output: The final output (F(x)) is obtained by residual merging.
\[ F(x) = \text{Activation}(\text{Residual}) \].

This results in a new output, where the original input through the shortcut path is added to the output of the layer as a residual. This method addresses the problem of gradient disappearance and gradient explosion while facilitating the training of deeper networks. Residual coupling is usually introduced into the structure of convolutional neural networks (CNNs) and deep neural networks (DNNs), which allows for the training of deeper models.

Example implementation of residual coupling

Residual joins are relatively easy to implement in deep learning frameworks. Below is a simple example of residue join implementation using Python and TensorFlow. The same approach can be applied to other deep learning frameworks.

import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, ReLU, Add

def residual_block(input_tensor, filters, kernel_size=(3, 3), strides=(1, 1)):
    # Major paths (normal processing)
    x = Conv2D(filters, kernel_size=kernel_size, strides=strides, padding='same')(input_tensor)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    # Shortcut path (residual coupling)
    shortcut = Conv2D(filters, kernel_size=(1, 1), strides=strides, padding='same')(input_tensor)
    shortcut = BatchNormalization()(shortcut)

    # residual linkage
    x = Add()([x, shortcut])
    x = ReLU()(x)
    
    return x

# Example of model construction (ResNet-like small network)
input_tensor = Input(shape=(32, 32, 3))
x = residual_block(input_tensor, filters=64)
x = residual_block(x, filters=64)
output_tensor = residual_block(x, filters=128)

model = tf.keras.Model(inputs=input_tensor, outputs=output_tensor)

In this example, the residual_block function defines a basic block containing a residual join, which combines the main path (normal processing) with a shortcut path (residual join) and finally adds together the output of the shortcut path and the main path.

Residual Coupling Challenges and Solutions

Residual coupling is a powerful method for effectively training deep learning networks, but there are some challenges and caveats. Below we discuss some of the challenges of residual coupling and how to overcome them.

1. Degradation Problem:

Challenge: When constructing a deep network, as the number of layers increases, performance may degrade on training data. This is called the “degradation problem” or “degrade problem.

Solution: Residual coupling was introduced as a means to address this problem by transferring information across layers to reduce gradient loss and information loss. This theoretically improves performance even when the network is deeper.

2. increase in computational load:

Challenge: Residual coupling can increase the computational complexity of the overall model due to the existence of shortcut paths.

Solution: To improve the computational efficiency of the model, it is important to make appropriate use of deep learning frameworks and optimization methods. Methods to optimize the structure of the model will also be considered.

3. risk of over-learning:

Challenge: As models become sufficiently deep, they become more adaptive to the training data, increasing the risk of overlearning.

Solution: It is common to take measures to suppress overlearning by using dropout, regularization, and other methods in combination. Data expansion is also effective.

4. structure of shortcut paths:

Challenge: If the structure of shortcut paths is not appropriate, gradient problems may occur during back propagation.

Solution: Shortcut paths need to have appropriate dimensional consistency, and learning weights for shortcut paths may be considered.

Reference Information and Reference Books

For more information on optimization in machine learning, see also “Optimization for the First Time Reading Notes” “Sequential Optimization for Machine Learning” “Statistical Learning Theory” “Stochastic Optimization” etc.

1. “Deep Learning” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Publisher: MIT Press (2016)
Why: Chapter 6–7 explains optimization difficulties in deep networks, which motivate residual connections. Though ResNet is not explicitly detailed, the foundation is essential.

2. “Dive into Deep Learning” by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola

Publisher: Open-source / Available online
Why: Hands-on guide with MXNet, PyTorch, and TensorFlow implementations including ResNet architecture.

3. “Programming PyTorch for Deep Learning” by Ian Pointer

Publisher: O’Reilly Media
Why: Provides practical implementation of ResNet and explains residual blocks in a modern deep learning context using PyTorch.

4. “Deep Learning with PyTorch” by Eli Stevens, Luca Antiga, and Thomas Viehmann

Publisher: Manning Publications
Why: Contains practical exercises on building residual networks from scratch using PyTorch.

Papers

1. “Deep Residual Learning for Image Recognition“

Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (2015)
Summary: The original ResNet paper; introduces residual blocks and solves degradation problems in deep neural nets.

2. “Identity Mappings in Deep Residual Networks“

Authors: Kaiming He et al. (2016)
Summary: A refinement of ResNet with identity mapping for more stable training.

3. “Densely Connected Convolutional Networks” (DenseNet)

Authors: Gao Huang et al. (2016)
Summary: Builds upon the idea of residual connections by introducing dense connections.

4. “Aggregated Residual Transformations for Deep Neural Networks” (ResNeXt)

Authors: Saining Xie et al. (2017)
Summary: Extends ResNet with split-transform-merge strategies for better performance with similar complexity.