Machine learning approaches with small data and various implementation examples

Machine Learning Artificial Intelligence Digital Transformation Stochastic Generative Models Bayesian Modeling Natural Language Processing Markov Chain Monte Carlo Method Image Information Processing Reinforcement Learning Knowledge Information Processing Explainable Machine Learning Deep Learning General ML Small Data ML Physics & Mathematics Navigation of this blog

Machine Learning with Small Data

As described in “Challenges in Achieving 100% Reproducibility for Risky Tasks” and “How to Deal with Machine Learning with Inaccurate Supervisory Data” the issue of small data to be trained (small data) is a problem that appears in various tasks as a factor that reduces the accuracy of machine learning.

Machine learning with small data often takes the following approaches, taking into account data constraints and the risk of overlearning.

Data Augmentation: When considering small data, the first consideration is how to artificially augment the data. For example, in the case of image data, data augmentation through operations such as rotation, flipping, cropping, etc. can be used to improve the generalization performance of the model.
Model simplification: In small data, complex models (e.g., deep neural networks) tend to have a high risk of over-training. In contrast, limiting the complexity of the model is expected to make it easier to fit the data. Therefore, an approach using simple models (e.g., logistic regression, decision trees) or models with regularization can be effective.
Model parameter tuning: With small data, model parameter tuning is important. Cross-validation can be used to find appropriate hyperparameter values. Careful tuning is needed to find the right balance between hyperparameter selection and model performance.
Pre-training or transfer learning: For small data, it may be useful to use a model that has been pre-trained on a large data set in advance (pre-training model). The weights of the pre-trained model can be fixed and adapted to the task on a new dataset, which is expected to alleviate the problem of insufficient data and improve the performance of the model.
Feature selection and dimensionality reduction: With small data, it is important to perform appropriate feature selection and dimensionality reduction. Selecting only features that are important to the model can improve model training efficiency, and dimensionality reduction methods (e.g., principal component analysis) can be used to reduce the dimensionality of the data.
Data partitioning and evaluation: With small data, care should be taken to avoid data bias when partitioning the data set into training and test sets. It is important to choose an appropriate partitioning method and to use cross-validation and bootstrapping methods to properly evaluate model performance.
Leveraging Domain Knowledge: With small data, it is important to leverage domain knowledge. Feature engineering and model design based on domain knowledge can make more effective use of limited data.

The following sections describe those details and specific implementations.

Data Augmentation

<Overview>

In machine learning with small data, data augmentation (Data Augmentation) can be used to make effective use of limited data. Data augmentation is a technique to generate new data points from an existing data set.

Image data: See “Image Information Processing Techniques” for more information on image data processing.
- Rotate: Rotate the image by a certain angle.
- Flip: Flip an image horizontally or vertically.
- Zoom: Crop and enlarge a portion of an image.
- Translate: Move an image horizontally or vertically.
- Change brightness or contrast: Change the brightness or contrast of an image.
Text data: See “Natural Language Processing Techniques” for more information on natural language processing.
- Synonym swapping: Convert words in a sentence into synonyms.
- Delete or Insert Words: Randomly delete words from a sentence or insert new words.
- Replace part of a sentence: take a part of a sentence from another sentence or replace it with a random sentence.
Voice data: See “Speech Recognition Techniques” for more information on speech recognition techniques.
- Adding noise: Adding white noise or ambient sounds to the audio.
- Change audio speed: Change the playback speed of the audio.
- Time-shift audio: Shift the audio backward or forward in time.

The basic principle of data expansion is to generate new data by applying random transformations to the original data. This may increase data variation and improve the generalization performance of the model, but care must also be taken to select the appropriate transformation for the task and data, and to avoid over-transforming the data.

Data augmentation is typically implemented using machine learning frameworks and libraries, which are often in flux. For example, for image data, the Python library OpenCV and TensorFlow’s ImageDataGenerator are used, and for text and audio data, there are libraries and methods specialized for each data format.

When implementing data extension, it is common to incorporate it into the pre-processing pipeline of a data set or use a data generator, and the scope of application of data extension and the parameters of the conversion should be adjusted according to the task and the nature of the data.

Next, an example implementation of data extension is described.

<Example Implementation in Python>

As an example of data extension implementation, we describe an example of data extension for image data using Python and NumPy. In the following example, data expansion methods such as rotation, flip, zoom, and translation are implemented.

import numpy as np
import cv2

def rotate_image(image, angle):
    height, width = image.shape[:2]
    rotation_matrix = cv2.getRotationMatrix2D((width / 2, height / 2), angle, 1)
    rotated_image = cv2.warpAffine(image, rotation_matrix, (width, height))
    return rotated_image

def flip_image(image, flip_code):
    flipped_image = cv2.flip(image, flip_code)
    return flipped_image

def zoom_image(image, zoom_factor):
    height, width = image.shape[:2]
    zoomed_image = cv2.resize(image, (int(width * zoom_factor), int(height * zoom_factor)))
    return zoomed_image

def translate_image(image, shift_x, shift_y):
    translation_matrix = np.float32([[1, 0, shift_x], [0, 1, shift_y]])
    translated_image = cv2.warpAffine(image, translation_matrix, (image.shape[1], image.shape[0]))
    return translated_image

# Loading image data
image = cv2.imread('image.jpg')

# rotation
rotated_image = rotate_image(image, 30)

# flip
flipped_image = flip_image(image, 1)  # 1は水平方向のフリップ

# zoom
zoomed_image = zoom_image(image, 1.2)

# shift
translated_image = translate_image(image, 50, -30)

# View Extended Data
cv2.imshow('Original Image', image)
cv2.imshow('Rotated Image', rotated_image)
cv2.imshow('Flipped Image', flipped_image)
cv2.imshow('Zoomed Image', zoomed_image)
cv2.imshow('Translated Image', translated_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

The above code implements rotation, flip, zoom, and translation of image data using the OpenCV library. Each function takes the original image data as input and applies the specified data dilation method to generate new image data. In practice, data expansion is typically performed within the machine learning training loop, taking data from the data set by batch size and applying data expansion to each data point before inputting it into the model. In this way, the model can learn different data for every epoch.

On Model Simplification

<Overview>

Model simplification is important for machine learning with small data. Simplifying the model reduces the risk of over-fitting (over-training) the data and improves generalization performance. Model simplification is also the approach described in “Explainable Machine Learning” and see them for details. Several approaches to model simplification are described below.

Simplifying the model architecture: Using simple models instead of complex models (e.g., deep neural networks) can be a useful approach. For example, linear models such as logistic regression or linear support vector machines may provide a simple yet effective baseline.
Model regularization: Regularization is a method of constraining the complexity of a model; by using methods such as L1 regularization and L2 regularization, the parameters of the model can be constrained. This allows only important features to be retained and reduces over-fitting due to noise. See “Machine Learning with Sparsity” for more details on model regularization.
Feature Selection and Dimensionality Reduction: For small data, models can be simplified by reducing the number of features. Feature selection methods can be used to remove features that do not contribute to the prediction, and dimensionality reduction methods (e.g., principal component analysis) can be an effective approach to reduce the dimensionality of the feature space. See also “About Principle Component Analysis (PCA)” for more information on PCA.
Selecting Hyperparameters to Control Model Complexity: There are various hyperparameters in a model. In order to control model complexity, appropriate values for hyperparameters need to be selected, and cross-validation can be used to evaluate the trade-off between model performance and complexity and find the optimal hyperparameter settings.
Ensemble Learning: Ensemble learning as described in “Overview of Ensemble Learning and Examples of Algorithms and Implementations” will be a method that combines multiple models to make predictions. For small data, ensemble learning can be used to improve forecasting performance by combining multiple simple models. For more information on ensemble learning, see Ensemble Learning in “General Machine Learning and Data Analysis.

By combining the above techniques, machine learning models can be simplified for small data. However, it is important to strike the right balance in model simplification, and oversimplification may reduce the expressive power of the model, so care must be taken.

Next, examples of concrete implementations of these models are described.

<Implementation in Python>

As an example of model simplification, we describe how to construct a logistic regression model using Python and the scikit-learn library.

from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectFromModel
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# Preparing Data
X = ...  # feature vector
y = ...  # target label

# Data Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Feature Selection and Model Building
model = LogisticRegression()
feature_selector = SelectFromModel(model, threshold='median')
pipeline = Pipeline([('feature_selector', feature_selector), ('model', model)])

# Model Learning
pipeline.fit(X_scaled, y)

# Model Evaluation
accuracy = pipeline.score(X_scaled, y)
print("Accuracy:", accuracy)

In the above example, a logistic regression model is used to classify the data. To simplify the model, the SelectFromModel class in scikit-learn is used for feature selection to build a simple logistic regression model. In addition, the StandardScaler class is used for scaling the data, which allows the feature vector values to be scaled to mean 0 and standard deviation 1. Furthermore, the Pipeline class is used to combine feature selection and model training as a series of processes, which allows feature selection and model training to be performed together. Finally, as an evaluation of the model, the percentage of correct answers on the training data is calculated and displayed.

This example shows that by consistently performing data preprocessing, feature selection, and model training, models can be simplified and properly trained even on small data. However, depending on the specific task and data, the preprocessing and feature selection methods should be adjusted.

Parameter Tuning of the Model

In machine learning with small data, model parameter tuning is particularly important, and appropriate parameter settings can maximize model performance. Several approaches to model parameter tuning are described below.

Grid Search: Grid search is a way to try all combinations of hyperparameters that you specify. Candidate hyperparameter values can be specified and cross-validation can be used to evaluate the performance of each combination and select the combination with the best performance. However, if there are many candidate hyperparameters or the computational cost is high, the search space will explode, and execution may take a long time.
Random Search: Random Search is a method of trying random combinations within a specified range of hyperparameters and evaluating performance by selecting random combinations. This is an effective approach when the search space is large or when the hyperparameters have different importance levels.
Bayesian Optimization: Bayesian optimization is a method of hyperparameter search that leverages prior knowledge, sets a prior distribution of hyperparameters, and updates the posterior distribution when evaluating model performance. This makes it possible to efficiently search for the optimal hyperparameters. However, Bayesian optimization is somewhat complex to implement and may require sufficient computational resources. See also “Nonparametric Bayesian and Gaussian Processes” for more information on Bayesian optimization.
Model-based Optimization: Model-based optimization is a method of building a model in the search space and using that model to select the hyperparameters to be evaluated next, for example, using a random forest or Gaussian process model, It is possible to evaluate the importance of hyperparameters.

When using these methods, it is common to use tools provided by machine learning frameworks and libraries. For example, the scikit-learn library provides classes such as GridSearchCV and RandomizedSearchCV to easily implement grid search and random search. For example, the scikit-learn library provides classes such as GridSearchCV and RandomizedSearchCV to easily implement grid search and randomized search. In addition, libraries such as Optuna and BayesianOptimization are available for Bayesian optimization methods.

The following is an example of grid search using scikit-learn.

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

# Preparing Data
X = ...  # feature vector
y = ...  # target label
# Model Definition
model = RandomForestClassifier()

# Set hyperparameter candidates
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10]
}

# Execute grid search
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5)
grid_search.fit(X, y)

# Display of optimal model and hyperparameters
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_
print("Best Model:", best_model)
print("Best Parameters:", best_params)

In the above example, n_estimators and max_depth are set as candidate hyperparameters for the random forest model. use GridSearchCV to try all combinations of the specified candidates and run cross-validation to find the best model and hyperparameters find the best model and hyperparameters.

However, parameter tuning is computationally expensive, so it is necessary to find the right balance for small data, and parameter tuning also carries the risk of overfitting, so it should be evaluated carefully.

Pre-training and transition learning

Pre-training and transfer learning methods are effective approaches for machine learning on small data. Using these methods, it is possible to leverage the knowledge of models trained on large data sets and adapt them to small data tasks. The approaches are described below. Transfer learning is also discussed in “About Deep Learning” and “Theory, Algorithms, and Python Implementations of Various Reinforcement Learning Techniques”. Please refer to those as well.

Pre-training: Pre-training is a technique that uses models that have been previously trained on a large dataset (e.g., the ImageNet dataset), which is capable of extracting common features such as images and text. The model trained on a large dataset has the ability to extract common features such as images and text. To apply this to small-data tasks, the weights of the pre-trained model are used as initialization, and additional learning (fine tuning) is performed on the small data set.
Transfer Learning: Transfer learning is a type of pre-training that involves applying part or all of a model trained on a large data set to a new task. This may involve fixing a portion of the pre-trained model (e.g., convolutional layers) and performing additional training on a new data set. This allows for high performance on small data tasks.

Next, we show an example implementation of pre-training and transition learning in Python. The following example uses TensorFlow and Keras to perform pre-training and transition learning.

import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

# Loading the pre-training model
pretrained_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze some of the pre-training models
for layer in pretrained_model.layers:
    layer.trainable = False

# Building a New Model
x = Flatten()(pretrained_model.output)
x = Dense(256, activation='relu')(x)
output = Dense(num_classes, activation='softmax')(x)

model = Model(inputs=pretrained_model.input, outputs=output)

# Model Compilation
model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

# Model Learning
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val))

# Model Evaluation
loss, accuracy = model.evaluate(X_test, y_test)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

In the above example, the VGG16 model is used as a pre-training model. A portion of the pre-training model is frozen and a new network is added to build the final model. The weights of the frozen portion are fixed and only the weights of the new additional layers are trained. In addition, the model is trained by compiling the model with the appropriate optimization algorithm and loss function, and finally, the model is evaluated on a test data set to display the losses and percentage correct.

Pre-training and transfer learning are effective methods on small data; pre-trained models have knowledge learned on large data sets, which can be transferred to small data to achieve high performance on limited data.

Feature Selection and Dimensionality Reduction

Feature selection and dimensionality reduction play an important role in machine learning with small data. Using these methods, it is possible to reduce the dimensionality of the data and retain only important features. Examples of their implementation are described below.

Feature Selection: Feature selection is a method of selecting a subset of important features from the original feature space, which reduces the dimensionality of the data and reduces the complexity of the model by retaining only the important features. Major feature selection methods include Filter methods, Wrapper methods, and Embedded methods. For more details on feature selection, see “Various Feature Engineering Methods and Their Python Implementations. See also there.

An example implementation of feature selection using the SelectKBest class of the scikit-learn library is described below.

from sklearn.feature_selection import SelectKBest, chi2

# Preparing Data
X = ...  # feature vector
y = ...  # target label

# Perform feature selection
k_best = SelectKBest(score_func=chi2, k=10)  # Specify score function and number of features to select
X_selected = k_best.fit_transform(X, y)

# Get the index of the selected feature
selected_indices = k_best.get_support(indices=True)

In the above example, feature selection is performed using the chi-square test: the SelectKBest class is used to specify the score function and the number of features to select (k), and the fit_transform method is used to extract a subset of the selected features. The get_support method is also used to obtain the index of the selected features.

Dimensionality Reduction: Dimensionality reduction is a method of projecting the original feature space onto a lower-dimensional feature space. Major dimensionality reduction methods include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and t-SNE. For details on Principal Component Analysis, please refer to “About Principal Component Analysis (PCA)“.

Below is an example implementation of Principal Component Analysis using the PCA class in the scikit-learn library.

from sklearn.decomposition import PCA

# Preparing Data
X = ...  # feature vector

# Performing a Principal Component Analysis
pca = PCA(n_components=2)  # Specify the number of dimensions after reduction
X_reduced = pca.fit_transform(X)

In the above example, dimensionality reduction is performed to two dimensions; the PCA class is used to specify the number of dimensions after reduction, and the fit_transform method is used to obtain the dimension-reduced feature vector.

Division and Evaluation of Data

In machine learning with small data, the choice of data partitioning and evaluation method is also an important approach. By selecting appropriate data partitioning and evaluation methods, it is possible to properly evaluate model performance and avoid over-learning. These methods are described below.

Training set, validation set, and test set: There are three ways to partition the data into three sets: training set, validation set, and test set, where the training set is used for training the model, the validation set is used for tuning the hyperparameters of the model and selecting the architecture and finally, the test set is used for the final evaluation of the model.

Below we describe how to split the data into a training set and a test set using the train_test_split function in the scikit-learn library.

from sklearn.model_selection import train_test_split

# Preparing Data
X = ...  # feature vector
y = ...  # target label
# Data Division
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In the above example, the data is split 80:20 into a training set and a test set; the split ratio can be changed by adjusting the test_size parameter.

Cross-Validation: Cross-validation is a method of training and evaluating multiple models by splitting the data into multiple blocks. Major cross-validation methods include k-fold cross-validation, stratified k-fold cross-validation, and Leave-One-Out cross-validation. This allows the entirety of the data to be used effectively to evaluate the model.

Below we describe an example implementation of k-fold cross-validation using the cross_val_score function in the scikit-learn library.

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

# Preparing Data
X = ...  # feature vector
y = ...  # target label

# Model Preparation
model = LogisticRegression()

# Run k-fold cross-validation
scores = cross_val_score(model, X, y, cv=5)  # cv specifies the number of divisions

# Displays the score and average score for each split
print("Scores:", scores)
print("Mean Score:", scores.mean())

In the above example, the logistic regression model is used to split the data into five blocks and perform cross-validation; by adjusting the cv parameter, the number of splits can be changed.
Data partitioning and the choice of evaluation method is important even for small data, and when sufficient data is not available, methods such as cross-validation should be used to maximize the data and properly evaluate the model’s performance. Reproducibility can also be ensured by specifying the random_state parameter for random data splitting.

The use of domain knowledge

<Overview>

In machine learning with small data, the use of domain knowledge is very important. Domain knowledge refers to expertise and experience in a particular industry or domain, which can be incorporated into machine learning tasks to understand the characteristics and assumptions behind the data and improve model construction and interpretation. A knowledge graph is a data version of domain knowledge.” Various Applications and Implementations of Knowledge Graphs” describes examples of their use. See also “Knowledge Information Processing Techniques.

The following is a description of methods for utilizing domain knowledge.

Feature Engineering: Domain knowledge can be used to extract useful features from a data set, for example, important indicators or patterns in data from a specific industry can be extracted and added as features. Domain knowledge can also be applied in data preprocessing and transformation to improve data quality and representation.
Model Selection and Parameter Tuning: Domain knowledge can be leveraged to set optimal models and hyperparameters. Model performance can be improved by selecting effective models and parameter combinations for specific domains.
Interpreting Data and Results: Domain knowledge can also help interpret model results and predictions. By applying domain knowledge to data sets and model outputs, it is possible to understand whether the model is extracting important features and to connect model results to business decisions.

To leverage domain knowledge, it is important to consider the following steps

Collaborate with domain experts: Leveraging collaboration with domain experts to learn domain knowledge is an important component. Gaining insights and feedback from domain experts can help clarify direction for model building and feature engineering.
Continuous learning of domain knowledge: Because domain knowledge is constantly evolving, continuous learning is important. Therefore, it is necessary to constantly learn about the latest trends and technologies in the industry and incorporate the latest domain knowledge into the model.
Practice and Experience: Domain knowledge is deepened through practice and experience. Working on real-world problems allows for more practical application of domain knowledge and a deeper understanding of the properties of data and models.

In the case of small data, the use of domain knowledge is a particularly important approach. Because the number of data is limited, understanding the context and characteristics behind the data is required to build effective models, and proper use of domain knowledge can improve the performance and applicability of the models.

Examples of concrete implementations of such models are described below.

<Implementation in python>

This section describes specific Python implementations for utilizing domain knowledge in machine learning with small data.

Example of feature engineering: One way to extract useful features from data by leveraging domain knowledge is to transform the data and create derived features. The following is an example of working with image data of a dog and a cat.

import numpy as np
import cv2

def extract_features(image):
    # Leverage domain knowledge of image preprocessing and feature extraction
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    resized_image = cv2.resize(gray_image, (100, 100))
    features = np.reshape(resized_image, -1)
    return features

# Preparing Data
images = ...  # Image data
labels = ...  # class label

# Perform feature extraction
extracted_features = []
for image in images:
    features = extract_features(image)
    extracted_features.append(features)

# Create a combination of feature vectors and labels for the data
X = np.array(extracted_features)
y = np.array(labels)

In the above example, the extract_features function is used to extract features from image data. The specific image processing and feature extraction methods can be customized according to domain knowledge, and the extracted features are used as feature vectors and treated as input data for the model.

Example of Model Selection and Parameter Tuning: An example of model selection and hyper-parameter tuning using domain knowledge is shown below. The following is an example of a classification task using a random forest model.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Preparing Data
X = ...  # feature vector
y = ...  # class label

# Model selection and parameter tuning
model = RandomForestClassifier()
param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [None, 5, 10]}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X, y)

# Display of optimal models and parameters
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_
print("Best Model:", best_model)
print("Best Parameters:", best_params)

In the above example, a grid search is performed using a random forest model. Here, the grid search is used to try multiple combinations of hyperparameters and select the best parameters. The range of parameters and selection method here can be customized based on domain knowledge.

Example of Interpreting Data and Results: Here is an example of how domain knowledge can be leveraged to interpret model results and predictions. The following is an example of a binary classification task using a logistic regression model.

from sklearn.linear_model import LogisticRegression

# Preparing Data
X = ...  # feature vector
y = ...  # class label

# Model Learning
model = LogisticRegression()
model.fit(X, y)

# Model Interpretation
coefficients = model.coef_
feature_names = ...  # Feature Name

# Indication of Important Characteristics
for i, coef in enumerate(coefficients[0]):
    feature_name = feature_names[i]
    print("Feature:", feature_name)
    print("Coefficient:", coef)

In the above example, the data is trained using a logistic regression model. After learning, the weights (coefficients) of the model are obtained to indicate the importance of each feature. By utilizing this type of interpretation technique, we can understand which features the model is focusing on and use the results to make business decisions.

On the application of machine learning in small data

There are a wide variety of applications of machine learning in small data. Some specific applications are described below.

Medical diagnosis: In small data medical diagnosis, machine learning is used to predict and assist in diagnosis of diseases based on patient symptoms and test results. The model is constructed by taking patient information, medical history, and other characteristics as inputs, and utilizing past medical data and the knowledge of experts.
Recommendation of online advertisements: Small data online advertisements recommend the most appropriate advertisements based on the user’s behavioral history, interests, and other factors. Here, we are building a model that predicts the probability of displaying advertisements and the click rate, using user characteristics and past click data as input.
Fraud detection in the financial field: In the financial field of small data, fraudulent activities are detected based on customers’ transaction histories and behavioral patterns. Using transaction data and specific patterns as inputs, we are building models to detect anomalies and predict fraudulent behavior.
Credit Scoring: In small data credit scoring, credit risk is evaluated based on customer attribute information and past repayment history. Using the applicant’s information and credit data as inputs, we are building models to predict repayment ability and default risk.
Object detection and image recognition: In small data object detection and image recognition, objects and features are detected and classified based on images and video frames from camera and sensor data. Models for object detection and classification are built using image data and feature extraction methods as input.

These are only a few examples; in practice, small data machine learning is applied in a variety of domains. In order to cope with the features and constraints of small data, appropriate preprocessing, feature extraction, model simplification, and utilization of domain knowledge play an important role, and improving data quality and collecting additional data are also factors that contribute to successful machine learning on small data.

Reference Information and Reference Books

Machine learning with small data is discussed in detail in “Small Data Learning, Combining Logic and Machine Learning, and Local/Group Learning“. Please refer to that as well.

For reference books, see “Small Data Analysis and Machine Learning.

“Data Analytics: A Small Data Approach”