How does the brain see the world?

Machine Learning Natural Language Processing Artificial Intelligence Digital Transformation Image Processing Reinforcement Learning Probabilistic Generative Modeling Deep Learning Python Clojure Autonomous AI Navigation of this blog
How does the brain see the world?

Hierarchical Temporal Memory (HTM) and Clojure’ by Jeff Hawkins, described in “A Thousand Brains: A New Theory of Intelligence”, the “1000 Brains” theory is based on the HTM concept and the “cortical column”, a component unit of the neocortex. It states.

This question of how the brain sees the world has long been explored in neuroscience, psychology and philosophy, in addition to Hawkins, and provides insight into how the world we perceive, interpret and are aware of is produced by the workings of the brain. Here we discuss some perspectives on them.

1. integration of sensory inputs:
– The brain receives vast amounts of information through the sensory organs of the eyes, ears and skin. However, information from each sensory organ is processed in different areas of the brain and eventually integrated to create a single ‘perceptual experience’.
– Multisensory integration: the visual, auditory and somatosensory areas of the cortex in particular process different sensory information, and these are integrated in integrated areas such as the frontal lobes, where they are perceived by us as a coherent sensory experience.

2. predictive encoding model:
– The brain predicts the future: the brain does not just passively accept sensory input, but is thought to generate predictions from past experiences and compare them with the sensory information received. For example, the brain can immediately ‘interpret’ what it sees with its eyes because it predicts visual information based on past memories and experiences and compares it with the actual visual data.
– Error minimisation: when errors occur between predictions and actual sensory input, the brain works to correct the ‘error’. In this way, the brain adjusts its next prediction, constantly generating cognitive experiences that are adapted to reality.

3. the role of consciousness and attention:
– Information sorting by consciousness: the brain does not process all information in its surroundings equally, but consciously processes only those to which it has directed its attention. For example, if you are in a crowded place and you hear your friend’s voice, your brain focuses its ‘attention’ on that voice and filters out other sounds.
– Hierarchy of consciousness: there is a hierarchy of conscious perception, moving from unconscious processing to highly conscious thought, through which the brain interprets the world and guides our behaviour. The interaction between automatically processed visual information and selectively conscious information constitutes our ‘sense of reality’.

4. memory and cognitive biases:
– The influence of memory on reality perception: past experiences and memories have a significant impact on the world as seen by the brain. This can cause people to feel and see the same view differently depending on their experiences.
– Cognitive biases: the world we see is not entirely objective and is distorted by various cognitive biases. For example, ‘negativity bias’, where negative information is more likely to be remembered, and ‘confirmation bias’, where we focus on information that fits our experiences and beliefs, affect how our brain perceives the world.

5. influence of social and cultural factors:
– Social cognition: the brain also has the social ability to read the emotions and intentions of others, which enables humans to empathise and co-operate in behaviour. Relationships with others and cultural context also have a significant influence on the interpretation of visual information.
– Cultural and linguistic frameworks: the language and culture we have shape how our brain ‘sees the world’. Different cultures have different values and ways of thinking ingrained in them, which also influence the brain’s interpretation and perception of the information it receives.

Examples of applications in AI

Consider whether these perspectives can be realised using AI. Specific examples of mimicking ‘how the brain sees the world’ can be considered to be mainly related to cognitive models and simulations of the visual system, and from this perspective we consider the implementation of the brain’s perceptual processes and perception of the world using AI technology.

1. implementation of predictive coding models: predictive coding is the idea that the brain predicts sensory information and minimises the gap with the actual sensory input, and some models in AI incorporate this approach.

Example implementation: the Variational Autoencoder (VAE), described in ‘Overview of Variational Autoencoder (VAE), its algorithms and implementation examples’, generates or reconstructs new data by learning the latent space of input data. The predictive encoding model uses VAE to learn how to ‘predict the next data and minimise the difference (error) between it and the actual data’. This model is used to recognise visual data and mimics how the brain processes and predicts information.

Implementation tools: implementation of VAE in TensorFlow, PyTorch.’ Prediction of time-series data (e.g. video and sensor data) using recurrent neural networks (RNNs) as described in ‘Overview, algorithms and implementation examples of RNNs’ and LSTMs as described in ‘Overview, algorithms and implementation examples of LSTMs’.

Applications: When AI receives visual information, it makes predictions based on past experience and recognises objects while correcting errors. For example, in face recognition and object detection, predict the next feature to appear and improve recognition accuracy based on this.

2. self-improvement model using reinforcement learning: Reinforcement learning (Reinforcement Learning), described in ‘Overview of reinforcement learning techniques and various implementations’, is an algorithm in which an agent interacts with its environment and learns behaviours that maximise rewards, and in this process, the AI improves itself by repeating the cycle of ‘prediction’ → ’ Execution’ → “Evaluation of results” cycle, while improving itself.

Example implementation: DQN, described in ‘Overview of Deep Q-Network (DQN), algorithms and implementation examples’, is a type of reinforcement learning that uses deep learning to determine an agent’s behaviour, where the AI learns which action to take next based on feedback from the environment and optimises the The process is designed to. Accuracy is improved by evaluating gaps between prediction and execution and repeating the learning process.

Implementation tool: OpenAI Gym (simulation environment). Reinforcement learning models in TensorFlow, PyTorch (DQN, A3C, PPO).

Applications: when an automated vehicle decides on the best behaviour on the road, it learns by minimising the difference between predicted and actual movements, based on past driving data (experience). The AI agent also optimises the choice of action in the game and evolves through experience. In a game against a player, it predicts and executes strategies and modifies its behaviour based on rewards.

3. error-driven learning (error minimisation): methods can be implemented in AI to minimise the difference between predictions and actual inputs (errors) when the brain processes sensory information. Backpropagation and self-supervised learning are used as methods to achieve this.

Example implementation: there are ways in which AI can learn information that is not pre-labelled based on the data. For example, this could be like hiding parts of an image to make predictions and then filling in the hidden parts to minimise errors (errors) and learn more accurate recognition.

Implementation tools: implementation of self-supervised learning algorithms in TensorFlow, PyTorch (e.g. SimCLR, BYOL).

Applications: by minimising the ‘errors’ that occur when the brain makes predictions and processes sensory information, AI increases its ability to detect anomalies. For example, systems that predict machine failures in the manufacturing industry or identify abnormal patterns in medical data.

4. implementation of multisensory integration models: the brain perceives the world by integrating information from multiple senses, such as vision, hearing and touch; in AI, such multisensory integration can be mimicked to build perceptual models closer to reality.

Example implementation: consider a model that integrates and processes different data sources such as visual (image) and auditory (sound) information. For example, by integrating speech and image recognition and building a system to analyse video data, a more accurate understanding can be achieved.

Implementation tools: integration of multimodal data in TensorFlow, PyTorch (e.g. integrated speech and video recognition).

Applications: a system in which a robot understands and reacts to its surroundings by combining voice instructions and visual cues. For example, a robot receives voice instructions and operates while checking visual objects.

implementation example

Examples of concrete implementations of the above models are described. These are examples of practical codes and systems built to simulate the processes of perception and recognition of the brain using AI technology.

1. implementation of predictive encoding models

Example implementation: variational autoencoder (VAE) and predictive coding

Based on the predictive coding process of the brain, VAE is used to predict the next data and minimise the difference between it and the actual data.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
import torch.nn.functional as F

# Define the Variational Autoencoder
class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.fc1 = nn.Linear(784, 400)
        self.fc21 = nn.Linear(400, 20)  # mean
        self.fc22 = nn.Linear(400, 20)  # log variance
        self.fc3 = nn.Linear(20, 400)
        self.fc4 = nn.Linear(400, 784)

    def encode(self, x):
        h1 = F.relu(self.fc1(x))
        return self.fc21(h1), self.fc22(h1)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5*logvar)
        eps = torch.randn_like(std)
        return mu + eps*std

    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h3))

    def forward(self, x):
        mu, logvar = self.encode(x.view(-1, 784))
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

# Training loop for VAE
model = VAE()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Example data loading (MNIST dataset)
from torchvision import datasets, transforms
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./data', train=True, download=True, transform=transforms.ToTensor()),
    batch_size=128, shuffle=True)

def train(epoch):
    model.train()
    train_loss = 0
    for batch_idx, (data, _) in enumerate(train_loader):
        data = Variable(data)
        optimizer.zero_grad()
        recon_batch, mu, logvar = model(data)
        loss = loss_function(recon_batch, data, mu, logvar)
        loss.backward()
        train_loss += loss.item()
        optimizer.step()
    print('Train Epoch: {} tAverage loss: {:.6f}'.format(
          epoch, train_loss / len(train_loader.dataset)))

# Loss function for VAE (Reconstruction + KL Divergence)
def loss_function(recon_x, x, mu, logvar):
    BCE = F.binary_cross_entropy(recon_x, x.view(-1, 784), reduction='sum')
    # KL divergence
    # D_KL(Q || P)
    # Where Q is the learned posterior distribution and P is the prior distribution.
    # We assume P = N(0, I)
    # The formula for KL is:
    # D_KL(Q || P) = 0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)
    # Where mu and sigma are the parameters of the learned distribution.
    # In this case we are approximating the true posterior Q using a Gaussian distribution.
    # The prior distribution P is a standard Gaussian, N(0, 1)
    # So the KL term is regularizing the latent space to follow a standard Gaussian.
    # Therefore, the final loss is a combination of the reconstruction loss and the KL loss.
    # More info: https://arxiv.org/abs/1312.6114

    # KL divergence
    M = mu.pow(2).sum(1)
    S = logvar.exp().sum(1)
    return BCE + 0.5 * (M + S - 20 - logvar.sum())

# Example training
for epoch in range(1, 11):
    train(epoch)

The code uses variational autoencoders (VAEs) to reconstruct images, where the VAEs are based on predictive coding and learn to predict the next data from the latent space and minimise the error (loss) with the actual data. It can be used for image recognition and reconstruction to simulate how the brain makes predictions.

2. self-improving systems using reinforcement learning

Example implementation: game learning with Deep Q-Network (DQN)

Use reinforcement learning to implement the process of an agent interacting with and learning from its environment. The process is similar to the way the brain improves based on actions and results.

import gym
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from collections import deque
import random

# Define the DQN Model
class DQN(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(DQN, self).__init__()
        self.fc1 = nn.Linear(input_dim, 128)
        self.fc2 = nn.Linear(128, 128)
        self.fc3 = nn.Linear(128, output_dim)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

# Set up the environment (e.g., CartPole)
env = gym.make('CartPole-v1')

# Hyperparameters
state_dim = env.observation_space.shape[0]
action_dim = env.action_space.n
learning_rate = 0.001
gamma = 0.99
epsilon = 0.1
batch_size = 64
buffer_size = 10000
target_update_freq = 10

# Initialize the Q-network and target network
q_network = DQN(state_dim, action_dim)
target_network = DQN(state_dim, action_dim)
target_network.load_state_dict(q_network.state_dict())

optimizer = optim.Adam(q_network.parameters(), lr=learning_rate)
memory = deque(maxlen=buffer_size)

# Epsilon-Greedy policy for exploration
def epsilon_greedy_policy(state):
    if random.random() < epsilon: return env.action_space.sample() # Random action else: with torch.no_grad(): return torch.argmax(q_network(torch.tensor(state, dtype=torch.float32))).item() # Training loop for episode in range(500): state = env.reset() done = False total_reward = 0 while not done: action = epsilon_greedy_policy(state) next_state, reward, done, _ = env.step(action) memory.append((state, action, reward, next_state, done)) if len(memory) > batch_size:
            batch = random.sample(memory, batch_size)
            states, actions, rewards, next_states, dones = zip(*batch)

            # Convert data to tensor
            states = torch.tensor(states, dtype=torch.float32)
            actions = torch.tensor(actions, dtype=torch.int64)
            rewards = torch.tensor(rewards, dtype=torch.float32)
            next_states = torch.tensor(next_states, dtype=torch.float32)
            dones = torch.tensor(dones, dtype=torch.uint8)

            # Compute the Q values
            q_values = q_network(states).gather(1, actions.unsqueeze(1)).squeeze(1)
            next_q_values = target_network(next_states).max(1)[0]
            target = rewards + gamma * next_q_values * (1 - dones)

            # Compute loss
            loss = torch.mean((q_values - target.detach()) ** 2)

            # Optimize the Q-network
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        state = next_state
        total_reward += reward

    # Update the target network
    if episode % target_update_freq == 0:
        target_network.load_state_dict(q_network.state_dict())

    print(f"Episode {episode}, Total Reward: {total_reward}")

The code uses Deep Q-Network (DQN) to optimise the agent’s behaviour in the environment through reinforcement learning, where the AI chooses its next action based on past experience in order to maximise the reward. This mimics the process by which the brain learns from experience and predicts the next best action.

3. recognition through multimodal integration

Implementation example: recognition by integration of speech and visual information

AI systems that integrate speech and image recognition are used to build AI models that combine visual and auditory information to recognise the world.

import torch
import torch.nn as nn
import torchvision.models as models
import torchaudio
from torchaudio.transforms import MelSpectrogram

# Vision model (ResNet18)
vision_model = models.resnet18(pretrained=True)
vision_model.fc = nn.Linear(vision_model.fc.in_features, 100)  # Output layer for our case

# Audio model
class AudioModel(nn.Module):
    def __init__(self):
        super(AudioModel, self).__init__()
        self.mel_spec = MelSpectrogram()
        self.lstm = nn.LSTM(input_size=64, hidden_size=128, num_layers=2, batch_first=True)
        self.fc = nn.Linear(128, 100)  # Output layer

    def forward(self, x):
        x = self.mel_spec(x)
        _, (hn, _) = self.lstm(x)
        return self.fc(hn[-1])

# Example inputs (image and audio)
image = torch.randn(1, 3, 224, 224)  # Random image (batch_size, channels, height, width)
audio = torch.randn(1, 16000)  # Random audio signal (batch_size, samples)

# Forward pass for vision and audio
vision_output = vision_model(image)
audio_model = AudioModel()
audio_output = audio_model(audio)

# Combine both outputs (e.g., for joint classification task)
combined_output = vision_output + audio_output

This example shows a model that performs image and speech recognition separately and integrates them to obtain the final recognition result. As the brain integrates visual and auditory information to recognise the world, this system mimics that mechanism.

reference book

Reference books are as follows.

1. How Whole Brain Thinking Can Save the Future: Why Left Hemisphere Dominance Has Brought Humanity to the Brink of Disaster and How We Can Think Our Way to Peace and Healing

2. Brain Responses to Auditory Mismatch and Novelty Detection: Predictive Coding from Cocktail Parties to Auditory-Related Disorders

3. Neuromorphic Engineering – A Modern Approach: Unveiling The Principles And Applications Of Brain-inspired Systems In The Technological Frontier

4. Deep Learning

5. The Brain’s Representational Power: On Consciousness and the Integration of Modalities

6. The Age of Em: Work, Love, and Life when Robots Rule the Earth

7. The Computer and the Brain

8. Fundamental Neuroscience

コメント

タイトルとURLをコピーしました