Considerations for causal Inference and strong AI

Machine Learning Artificial Intelligence Digital Transformation Reinforce Learning Python Intelligent information Probabilistic Generative Model Explainable Machine Learning Mathematics Natural Language Processing Ontology Technology Problem Solving Relational Data Learning Statistical Causal Search Economy and Business Physics & Mathematics Navigation of this blog

Cause and effect cannot be derived from data alone

‘The Book of Why: The New Science of Cause and Effect’ How to answer the question ‘Why?’’ states at the beginning of the book that “data basically tell us nothing”.

With the development of machine learning technology in recent years, if we have so-called ‘big data’, we can find regularities in them by observing them. If the theory and situation behind the rules is not understood, it is not clear whether they are really useful or not, or what to do if they are not useful, and solutions to the problem may not be found.

This can be understood, for example, from the following examples.

1. the existence of confounding factors: e.g. if the data shows that ‘incidents of drowning increase as ice cream consumption increases’, there may be a correlation between the two events, but this does not necessarily mean that ice cream consumption is the cause of the drowning.

The real cause is the increase in temperature during the hot season (confounding factor), which increases the consumption of ice cream on hot days and increases the number of people going for water sports and swimming, thus increasing the number of drowning incidents. Thus, looking at the data alone, it would appear that there is a causal relationship between ice cream consumption and drowning if the important factor of temperature is not taken into account.

2. reverse causality (Simpson’s inverse theorem): there is experimental data comparing the effects of Treatment A and Treatment B, with the overall result being that Treatment A is more effective. However, when analysed separately for subgroups such as age and gender, the results show that Treatment B is more effective.

Such a reversal is called ‘Simpson’s inverse principle’, which in this case indicates that judging causality from the aggregate results of the data as a whole may lead to erroneous conclusions.

3. correlation without causality (spurious correlation): Suppose, for example, that in a given year there is data showing a very high correlation between the ‘number of films starring Nicolas Cage in the US’ and the ‘number of drownings in swimming pools’. But, of course, there is no causal relationship between these two events; they are coincidences and no useful information about cause and effect can be gleaned from such data.

Some data may thus appear to be correlated by sheer chance, and this is known as ‘spurious correlation’, which refers to the appearance of a correlation despite the absence of a causal relationship.

4. problems with the method of data collection: for example, when studying the treatment of a disease in a hospital, collecting data only from hospitals with a large number of seriously ill patients may lead to an underestimation of the effectiveness of the treatment.

Thus, biased data collection methods may also yield nothing about cause and effect, e.g. if the study is biased towards a particular group, the data cannot be used to infer an overall causal relationship.

This is connected to Searle’s criticism in ‘The Turing Test, Searle’s Refutation and Artificial Intelligence’ that ‘Computational systems that follow algorithms are not intelligent in the first place, because computation is by definition a formal symbolic manipulation, in which there is no understanding of meaning’. This is also connected to the criticism that ‘Computation is by definition a formal symbolic manipulation, in which there is no understanding of meaning’.

‘The Book of Why: The New Science of Cause and Effect’ states that this act of understanding meaning can only be obtained by “seeing” (observing), which is usually done by machine learning, and also by “acting” to use the results of the observation and “imagining” what is at the root of the observation, and that this “why is it so” or “how is it so when it is not so” can only be achieved by “acting” to use the results of the observation and “imagining” what is at the root of it. Imagining ‘why it happens’ or ‘what would happen if it did not happen’ leads to causal inference.

The act of ‘imagining’ causal relationships cannot withstand general use unless it is done in a common scientific language, and by considering models described in scientific language, we can bypass the long and unproductive discussion of ‘what is causal inference’ and concentrate only on the specific and answerable question of ‘what can causal inference do’. The book also states that we can concentrate on the specific and answerable question: ‘What can causal inference do?

How to express causal reasoning in mathematical formulae

The usual scientific language for modelling causal reasoning is ‘causal models’ or ‘causal graphs’. In particular, Pearl’s Causal Model (Pearl’s Causal Model) is well known. Concepts related to this are listed below.

1. Causal Effect: A central goal of causal inference is to quantify how an intervention affects an outcome, e.g. the effect of the variable \(X \) on the outcome variable \(Y \) can be expressed as follows

– Outcome without intervention: \( Y = f(X, U_Y) \)
– Causal effect of intervention: \( P(Y|do(X=x)) \)

Where \( do(X=x) \) indicates an operation in which \( X=x \) is not simply observed, but rather its value is set to \( x \) by deliberately intervening in \( X \). To measure causal effects, the distribution \( P(Y|do(X=x)) \) after this intervention needs to be calculated.

The ‘intervention’, also described here, is positioned above the concept of ‘association’, which is obtained by observation. ‘Intervention’ requires going beyond observation and making changes to the subject.

This is illustrated by the question ‘What happens when the price of toothpaste is doubled?’ When considering the question ‘What happens when the price of toothpaste is doubled?’, it is not enough to collect and examine a large amount of data from previous times when the price of toothpaste was doubled and make an ‘association’, assuming that the previous times when the price doubled were caused by a temporary supply shortage, This time, when the price dares to increase without any change in such market conditions, we can think in terms of forecasting by building a model with the concept of ‘intervention’ in the environment to consider what might happen.

Also, ‘The Science of Causal Reasoning, ’Why?’ How to answer the question ‘Why?’’ also defines causality as “For cause X and effect Y, variable X is the cause of variable Y if variable Y listens to variable X’s opinion and determines its own value according to that opinion”. This act of listening to opinions leads to what is called considering an intervention.

2. causal graphs: graphical models are often used to represent causal relationships. This allows causal relationships between variables to be represented by arrows (→).

For example, the graph ⌘( X \rightarrow Y \) shows that the variable \( X \) has a causal effect on the variable \( Y \).

3. Backdoor Criterion: A concept used to adjust for common confounding factors, where the relationship between \(X \) and \(Y \) is biased by the confounding factor \(Z \), conditioning the causal effect on \(Z \).

Confounding factor here refers to an external variable that affects the relationship between the independent variable under study (cause) and the dependent variable (outcome) in statistics and causal inference.

This is the case, for example, if a study observes a relationship between ‘more ice cream consumption and more people drowning’. However, this is a case where temperature is a confounding factor between ‘ice cream consumption’ and ‘number of people drowning’, when in fact temperature is a factor affecting both, as more people eat more ice cream on hot days and more people swim, which also increases the number of people drowning. In this case, temperature is acting as a confounding factor, which means that there is no causal link between ice cream consumption directly increasing the number of people drowning.

In order to treat confounding factors correctly, it is important to devise statistical methods and experimental designs to eliminate the influence of confounding factors and reveal the true causal relationship.

The equation representing the influence of such confounding factors is expressed as follows

\[
P(Y | do(X))) = \sum_{Z} P(Y | X, Z) P(Z)
\]

4. operational causal effect (ATE: Average Treatment Effect): a measure for calculating the average treatment effect is the Average Treatment Effect (ATE). It is defined as follows.

\[
ATE = \mathbb{E}[Y | do(X=1)] – \mathbb{E}[Y | do(X=0)] \]

5. basic equation of causal inference: The basic equation of causal inference is as follows

\[
P(Y | do(X=x)) = \sum_{Z} P(Y | X=x, Z) P(Z)
\]

Where \( Z \) is the confounding factor, which estimates the causal effect via conditional probability.

Causal reasoning and strong AI

Strong AI (or Artificial General Intelligence, AGI: Artificial General Intelligence) refers to AI with broad intelligence that is not limited to a specific problem domain; AGI aims for AI that can learn and understand knowledge like humans and respond flexibly in different environments and situations, Beyond mere pattern recognition and optimisation, it needs to be capable of ‘intelligent reasoning’ and ‘deliberate action selection’.

The ability to perform causal reasoning is said to be essential for a strong AI to have the same intellectual capacity as humans. The reasons for this include.

1. rationale for decision-making: for a strong AI to make complex decisions, it needs to understand the causal relationships between events. Humans make decisions by predicting what future consequences an action will have, and this requires causal reasoning.

When considering a strong AI making similar decisions to humans, for example, when diagnosing a disease, it is not enough to simply look at correlations based on historical patient data; it must identify the root cause of the disease and understand how this affects treatment in order to recommend an effective treatment. Causal reasoning is essential for this.

2. hypothesis generation and testing: for a strong AI to solve problems as well as humans, it needs to be able to generate hypotheses and test hypotheses. This is also part of causal reasoning: the cycle of generating a hypothesis, intervening on it and observing the results is central to causal reasoning. Human scientific thinking is based on this process and a strong AI is expected to have similar capabilities.

3. learning in unknown environments: humans can adapt to unknown environments, reasoning and acting on new information. For a strong AI to have this adaptive capacity, it needs to be able to understand cause-and-effect relationships and learn how changes in the environment affect outcomes.

For example, when an AI controls a self-driving vehicle, it must not merely process and act on information from sensors, but must also causally understand how sudden environmental changes (e.g. obstacles or weather changes) affect the behaviour of the vehicle.

4. error correction and self-improvement: causal reasoning can also help correct errors and self-improve. If a strong AI fails, the ability to identify the causes of that failure and learn how to act successfully in the future is important; this is a process of causal reasoning and adjusting future behaviour based on past experience.

Causal reasoning is expected to play a major role in the future of strong AI; AI needs to be equipped with the ability to understand causal relationships and utilise this for decision-making, learning and adaptation, like humans, whereas current AI technologies are data-driven, correlation-based and often black box-like It is hoped that by incorporating causal reasoning, AI will acquire a more human-like understanding and reasoning ability, whereas current AI technologies are data-driven, correlation-based and often function like a black box.

Integration with machine learning techniques for AGI realisation and specific implementation examples

The application of causal reasoning expressed in scientific language to machine learning, as described above, is expected to bring us closer to achieving strong AI. In this section, we discuss some specific examples of them.

1. treatment effect estimation using average treatment effect (ATE): one example of a situation where the concept of causal inference is used in machine learning is in treatment effect estimation (TEE). For example, in the field of medicine, it is important to estimate the effect of a treatment on a patient, and the application of causal inference enables more accurate effect estimation based on data.

The average treatment effect (ATE: Average Treatment Effect) is a measure of the difference in outcomes between treatment and non-treatment groups and is expressed as a formula as follows

\[
ATE = \mathbb{E}[Y | do(X=1)] – \mathbb{E}[Y | do(X=0)] \mathbb{E}\]

Where,
– \( Y \) is the outcome variable (e.g. patient’s health status),
– \( X \) is the treatment variable (e.g. whether the patient received treatment).

The application of ATE to machine learning is achieved by assuming that there is a causal relationship between the treatment variable \( X \) and the outcome variable \( Y \), which is estimated from the data. As an example, consider a machine learning model (e.g. random forest or neural network) that learns the following two conditional expectations

– \( \mathbb{E}[Y | X=1] \): expected value of the treatment group
– \( \mathbb{E}[Y | X=0] \): expected value of the non-treatment group

By taking the difference between these expected values, the treatment effect can be estimated, which allows machine learning to assess the causal impact of the intervention.

In this section, we present a concrete implementation example of treatment effect estimation using ATE, using the Python ‘DoWhy’ library.

1. install the required libraries: first, install the following libraries.

pip install dowhy pandas numpy scikit-learn

2. example implementation of treatment effect estimation using ATE: In this example, sample data are generated and the impact of the intervention on the outcome is estimated.

import numpy as np
import pandas as pd
from dowhy import CausalModel

# Creation of data
np.random.seed(42)

# sample size
n = 1000

# Covariate (age)
age = np.random.normal(50, 10, n)

# Treatment
treatment = np.random.binomial(1, 0.5, n)

# Outcome (health status)
outcome = 2 * treatment + 0.1 * age + np.random.normal(0, 1, n)

# Creating data frames.
data = pd.DataFrame({'Treatment': treatment, 'Age': age, 'Outcome': outcome})

# Step 1: Creating a causal model
model = CausalModel(
    data=data,
    treatment='Treatment',
    outcome='Outcome',
    common_causes='Age'
)

# Causal graph visualisation
model.view_model()

# Step 2: Identification of causal effects
identified_estimand = model.identify_effect()

# Step 3: Estimation of ATE
ate_estimate = model.estimate_effect(
    identified_estimand,
    method_name="backdoor.propensity_score_matching"
)

print("Estimated average treatment effect (ATE):", ate_estimate.value)

# Performing counterfactual reasoning.
counterfactual_outcome = model.do(x=0, data=data)['Outcome']
data['Counterfactual_Outcome'] = counterfactual_outcome

# Display counterfactual inference results.
print(data[['Treatment', 'Outcome', 'Counterfactual_Outcome']].head())

Implementation details:

1. data preparation:

Covariate (Age): sample patient age from a normal distribution.
Treatment: sampling from a binomial distribution of whether the patient receives treatment or not.
Outcome: Health status is generated from an equation combining treatment, age and noise.

2. creating a causal model:

Use the DoWhy library to create a causal model. This model considers the relationship between treatment (treatment) and outcome (health status) and includes common cause (age).

3. identify causal effects:

Use the identify_effect() method to identify treatment effects.

4. estimate ATE:

Use the estimate_effect() method to estimate the average treatment effect (ATE) using a method based on backdoor criteria (propensity score matching). This estimate represents the average impact of the treatment on the outcome.

5. counterfactual inference:

Use the do() method to infer counterfactual outcomes in the absence of the intervention (treatment).

Interpretation of results:

Estimated average treatment effect (ATE): a number is displayed showing the average impact of the intervention on the outcome.
Counterfactual inference results: the actual observed outcome is compared with the outcome if the intervention had not taken place.

2. counterfactual inference: counterfactual inference is a method of estimating the outcome of a hypothetical situation that did not actually occur, e.g. ‘What would have happened if I had received treatment?’ would have answered the question ‘What would have happened if I had been treated?’.

Counterfactual inference can be expressed in mathematical terms as follows.

\[
Y_x(u)
\]

Where,
– \( Y_x(u) \) represents ‘the outcome if a unit called ⥉( u \) is subjected to an intervention called \( X=x \)’.

In this section, a basic example implementation of counterfactual reasoning is given using the Python ‘DoWhy’ library.

1. install the required libraries:

pip install dowhy pandas scikit-learn matplotlib

2. example of counterfactual reasoning implementation: this example simulates how the outcome (health status) would have changed depending on whether a certain treatment was given or not, and asks ‘what would have happened if I had not received the treatment?’ and infer a counterfactual of ‘What would have happened if I had not received the treatment?’.

import numpy as np
import pandas as pd
from dowhy import CausalModel
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Creation of data
np.random.seed(42)

# sample size
n = 1000

# Covariate (age)
age = np.random.normal(50, 10, n)

# Treatment
treatment = np.random.binomial(1, 0.5, n)

# Outcome (health status)
outcome = 2 * treatment + 0.1 * age + np.random.normal(0, 1, n)

# Creating data frames.
data = pd.DataFrame({'Treatment': treatment, 'Age': age, 'Outcome': outcome})

# Split data into training and testing for counterfactual inference.
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

# Step 1: Creating a causal model
model = CausalModel(
    data=train_data,
    treatment='Treatment',
    outcome='Outcome',
    common_causes=['Age']
)

# Causal graph visualisation
model.view_model()

# Step 2: Identification of causal effects
identified_estimand = model.identify_effect()

# Step 3: Estimation of causal effects (estimation of treatment effects)
causal_estimate = model.estimate_effect(
    identified_estimand,
    method_name="backdoor.propensity_score_matching"
)

print("Estimated causal effect (mean effect of treatment):.", causal_estimate.value)

# Step 4: Performing counterfactual reasoning
# Inference of counterfactual results based on test data if different treatments had been taken
test_data['counterfactual_outcome'] = model.do(x=0, data=test_data)['Outcome']
print("Consequences of counterfactual reasoning (if no treatment):")
print(test_data[['Treatment', 'Outcome', 'counterfactual_outcome']].head())

Implementation details:.

1. data generation:

Simulation data are generated here based on the assumption that whether or not a person has received treatment (Treatment) influences the outcome (Outcome).
Age, as a covariate, is a factor influencing Treatment and Outcome.

2. creating a causal model:

A causal model is created using the DoWhy library to model the causal relationship between Treatment (Treatment) and Outcome (Outcome).
The causal model takes into account the common cause, Age, so that the impact of the treatment can be correctly assessed.

3. identification of causal effects:

The Identification of Causal Effects step determines the estimation method for assessing the effect of a treatment based on the causal graph. Here, estimation is based on the backdoor criterion.

4. estimation of causal effects:

Propensity score matching (propensity_score_matching) is used to estimate the effect of the treatment. This estimates the effect of the treatment by matching the groups that received the treatment and those that did not receive the treatment so that they are under similar conditions.

5. counterfactual inference:

Finally, counterfactual inference (do() function) is used to estimate ‘what would have happened if they had not received the treatment?’ to estimate the counterfactual outcome. This infers the outcome of a different intervention on the actual observed data.

Running results: the run yields the following results

Estimated causal effects: the average impact of the treatment (treatment) on the outcome (average effect of treatment, ATE) is displayed.
Counterfactual inference: for each individual, the actual observed outcome (if they received the treatment) can be compared with the counterfactual outcome ‘what would have happened if they had not received the treatment’.

3. Doubly Robust Estimator: an advanced estimation method that combines machine learning and causal inference is the Doubly Robust Estimator (DRE). This provides a method for estimating causal effects even in the presence of missing data and confounding factors.

The Doubly Robust Estimator is expressed in mathematical terms as follows.

\[
ATE_{DRE} = \frac{1}{N} \sum_{i=1}^{N} \left( \frac{Y_i}{e(X_i)} – \frac{(1 – Y_i)}{1 – e(X_i)} \right)
\]

Where \( e(X) \) is the propensity score and \( Y_i \) is the observed outcome.

Propensity scores use machine learning models (e.g. logistic regression or random forests) to predict the probability of receiving a treatment and adjust for confounding factors to reduce bias in causal effects.

An example implementation is given below.

Install the required libraries:

pip install pandas numpy statsmodels scikit-learn

Implementation example:

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from statsmodels.api import OLS

# Creation of sample data
np.random.seed(42)

# sample size
n = 1000

# Treatment variables
X = np.random.binomial(1, 0.5, n)

# Covariates
Z = np.random.normal(0, 1, n)

# Outcome variable (Outcome)
Y = 2*X + Z + np.random.normal(0, 1, n)

# Creating data frames.
data = pd.DataFrame({'Treatment': X, 'Covariate': Z, 'Outcome': Y})

# Step 1: Propensity Score Estimation
# Perform logistic regression on the Treatment covariate (Z)
propensity_model = LogisticRegression()
propensity_model.fit(data[['Covariate']], data['Treatment'])

# Propensity score
propensity_score = propensity_model.predict_proba(data[['Covariate']])[:, 1]

# Step 2: Estimate the outcome regression model (Outcome Regression Model)
# Create linear regression models of outcomes with treatment (Treatment) and covariates (Z)
outcome_model = LinearRegression()
outcome_model.fit(data[['Treatment', 'Covariate']], data['Outcome'])

# Predicted outcome values
outcome_pred = outcome_model.predict(data[['Treatment', 'Covariate']])

# Step 3: Calculation of the Doubly Robust Estimator
# Observed Outcome, Treatment, Covariate, and Propensity Score
Y = data['Outcome']
X = data['Treatment']
Z = data['Covariate']

# Doubly Robust Estimator formula.
# DR = Outcome + ((Treatment - Propensity Score) * (Observed Outcome - Outcome Prediction)) / Propensity Score
dr_estimator = outcome_pred + (X - propensity_score) * (Y - outcome_pred) / propensity_score

# Average Treatment Effect (ATE)
ate_dr = np.mean(dr_estimator)

print("Average Treatment Effects (ATE) with Doubly Robust Estimator:", ate_dr)

Implementation overview:

Step 1: Estimation of propensity score

Using LogisticRegression, predict the Treatment Treatment by logistic regression based on the covariate Z and estimate the propensity score (probability of being given the treatment).

Step 2: Outcome regression model

Using LinearRegression, create a regression model to predict the Outcome using the Treatment Treatment and the covariate Covariate.

Step 3: Calculate the Doubly Robust Estimator

According to the Doubly Robust Estimator formula, combine both the outcome regression model and the propensity score to estimate the treatment effect.

Interpretation of results:

The code uses Doubly Robust Estimator to estimate the Average Treatment Effect (ATE), where ate_dr is its estimate and the average effect of the treatment given is calculated.

4. causal reinforcement learning: some methods incorporate causal reasoning into reinforcement learning. Reinforcement learning is a framework in which agents interact with their environment and learn behaviours that maximise rewards, but the addition of causal reasoning allows for more adaptive learning.

Adding causal reasoning to reinforcement learning allows agents to reason as follows

\[
P(Y | do(A=a))
\]

Where,
– \( A \) is the agent’s action,
– \( Y \) is the reward or outcome.

This allows agents to infer how their behaviour causally influences future rewards and learn more optimal strategies.

A concrete example of CRL is the introduction of the concept of causal inference into traditional reinforcement learning algorithms (e.g. Q-learning, DQN, PPO) to learn causal relationships between actions and outcomes in the environment. Here, as a simple implementation example, we describe a combination of causal reasoning using the ‘DoWhy’ library and Q-learning.

1. install the required libraries:

pip install gym numpy pandas scikit-learn dowhy

2. a simple causal Q-learning implementation example: in this example, a simple ‘Frozen Lake’ environment is built using the gym library, and reinforcement learning takes place as the agent learns the causal relationship between actions (interventions) and outcomes.

import gym
import numpy as np
import pandas as pd
from dowhy import CausalModel
from sklearn.linear_model import LogisticRegression

# Building the Frozen Lake environment
env = gym.make('FrozenLake-v1')

# Parameters of Q-learning
alpha = 0.1  # learning rate
gamma = 0.99  # discount rate
epsilon = 0.1  # Search parameters for ε-greedy
num_episodes = 1000
num_actions = env.action_space.n
num_states = env.observation_space.n

# Q Table initialisation
Q = np.zeros((num_states, num_actions))

# Collect the data needed to create a causal model
data = []

def choose_action(state, epsilon):
    if np.random.rand() < epsilon:
        return env.action_space.sample()  # random behaviour
    else:
        return np.argmax(Q[state])  # Actions based on Q-values

# Performing learning
for episode in range(num_episodes):
    state = env.reset()
    done = False

    while not done:
        action = choose_action(state, epsilon)
        next_state, reward, done, _ = env.step(action)

        # Q-value updates (traditional Q-learning)
        Q[state, action] += alpha * (reward + gamma * np.max(Q[next_state]) - Q[state, action])

        # Stores data for causal models (state, action, next state, reward)
        data.append([state, action, next_state, reward])

        state = next_state

# Converting data to DataFrame.
df = pd.DataFrame(data, columns=['State', 'Action', 'NextState', 'Reward'])

# Step 1: Creating a causal model
causal_model = CausalModel(
    data=df,
    treatment='Action',
    outcome='Reward',
    common_causes=['State', 'NextState']
)

# Drawing causal graphs
causal_model.view_model()

# Step 2: Identifying causal inferences
identified_estimand = causal_model.identify_effect()

# Step 3: Estimation of causal effects
causal_estimate = causal_model.estimate_effect(
    identified_estimand,
    method_name="backdoor.propensity_score_matching"
)

# Consequences of causal reasoning
print("Causal Estimate:", causal_estimate.value)

Implementation overview:

1. environment setup: set up a ‘Frozen Lake’ environment using the gym library. In this environment, the agent slides on the ice towards the goal, but the success of the action is stochastic.

2. implementation of Q-learning: Q-learning learns an optimal policy through interaction with the environment, updating the Q-value of each state-action pair. Q-values are updated when the agent moves to the next state and obtains a reward.

3. collecting causal inference data: at each step of reinforcement learning, state, action, next state and reward are recorded as data. This data is later used to build the causal model.

4. causal inference using DoWhy:

Creating a causal model: use the DoWhy library to create a causal model to estimate the impact of an action (intervention) on a reward. The model analyses the causal relationship between actions and outcomes, while taking into account common causes (states and next states).
Displaying the causal graph: visualise the causal graph to see the relationship between the intervention and the outcome.
Estimate causal effects: estimate the causal effects of the behaviour on rewards using methods such as propensity score matching.

Reference information and reference books

This section describes reference information and reference books on causal inference and machine learning.

1. reference books:

1.1 「The Book of Why: The New Science of Cause and Effect」 by Judea Pearl and Dana Mackenzie

1.2 「Causal Inference in Statistics: A Primer」 by Judea Pearl, Madelyn Glymour, Nicholas P. Jewell

1.3 「Causal Inference: What If」 by Miguel Hernán and James Robins

1.4 「Elements of Causal Inference: Foundations and Learning Algorithms」 by Jonas Peters, Dominik Janzing, Bernhard Schölkopf

1.5 「Counterfactuals and Causal Inference: Methods and Principles for Social Research」 by Stephen L. Morgan, Christopher Winship

2. online reference:

2.1 Judea Pearl’s Homepage

2.2 Causal Inference with Python and R (GitHub)

2.4 DAGitty

2.5 Tutorial on Causal Inference and its Connections to Machine Learning (Using DoWhy+EconML)

3. research papers:

3.1 “Counterfactual Learning of Continuous Stochastic Policies” by Thomas et al.

3.2 “A Survey on Causal Inference“