Overview of database technology, CRUD and ACID, and CAP

Machine Learning Artificial Intelligence Search Technology Algorithm Digital Transformation Mathematics Algorithm Programming Technology ICT Navigation of this blog

About Databases Technology

As the wiki explains, “A database is a collection of information organized for easy retrieval and storage. A database is a collection of information organized for easy retrieval and storage, usually realized by a computer. A database is a collection of information organized for easy retrieval and storage. The data structure handled by the program and the data itself can be manipulated with less man-hours than in the case of proprietary implementations. It is the most important technology in modern information systems that handle huge amounts of data [source]. This is the most important technology in modern information systems that handle huge amounts of data.

The advantage of using a database is that you can simply use a general-purpose data structure rather than implementing your own data structure in a program, and you can use a system that can ensure data consistency (data backup, etc.), which I will explain later.

In this blog, I will discuss the following about databases.

Implementation

Overview of Database Technology and Examples of Implementation in Various Languages

Database technology refers to technology for efficiently managing, storing, retrieving, and processing data, and is intended to support data persistence and manipulation in information systems and applications, and to ensure data accuracy, consistency, availability, and security.

The following sections describe implementations in various languages for actually handling these databases.

Vector Database Overview

A vector database is a type of database that primarily stores vector data and allows queries, searches, and other operations to be performed in vector space. vector database vendors have emerged. This has been particularly influenced by the rise of ChatGPT,
This is because vector databases can be used in configurations called RAGs to compensate for weaknesses in ChatGPT, such as handling the latest news and unpublished information, which ChatGPT is not very good at. Vector databases are designed to search for data based on vector similarity and to retrieve relevant data efficiently. Some also use algorithms such as k-NN (k nearest neighbor) to retrieve high-dimensional data and also use techniques such as quantization and partitioning to optimize retrieval performance.

ElasticSearch Plug-ins and Implementation Examples

Elasticsearch is an open source distributed search engine that provides many features to enable fast text search and data analysis. Various plug-ins are also available to extend the functionality of Elasticsearch. This section describes these plug-ins and their specific implementations.

RDF store and SPARQL About NoSQL DB(Graph DB) RDF store overview

In this article, we will discuss the RDF store, which is a database for handling RDF data, and SPARQL, which is a query system for extracting data from the RDF store.

RDF data is also known as a triplestore. RDF data is also called a triplestore, and is a graph type data of a non-RDBMS called NoSQL database. In addition to RDF stores, there are other types of graph databases such as Noo4J and Datomic, which are composed of “nodes”, “edges”, and “properties” without RDF, and each has its own query engine (Cypher, Datalog extended query). An RDF database is one that supports SPARQL, which is a query that conforms to the RDF data structure.

Specific RDF data stores are provided by existing DB frameworks such as Ocarle and IBM, Jena (and its query module Arq) from the Apache project as open source, and BlazeGraph and AllegroGraph as individual vendors. As for individual vendors, there are BlazeGraph and AllegroGraph. As for cloud service support, Neptune of AWS also supports SAPRQL.

Reids NoSQL DB(K-V DB) Overview and basic use

Redis is an in-memory remote database characterized by high performance, replication, and a unique data model that supports five different data structures and can adapt to a wide variety of problems that naturally map to Redis features. With features such as replication, persistence, and client-side sharding, Reids is highly scalable and can be used in a variety of applications, from a convenient means of prototyping to a full-blown database system handling hundreds of gigabytes of data and millions of requests per second. database systems that handle hundreds of gigabytes of data and millions of requests per second.

Memcached, a key-value cache server, provides similar functionality to Redis. In comparison, Redis can store key-to-value mappings as well as memcached, but the major differences are the ability to automatically write data to disk in two different ways (snapshot (instant dump) and append-only write) and the ability to store four types of data structures other than plain strings. Redis can be used as a primary database.

Redis can be used as a primary database or as an auxiliary system for other storage systems. You can store data in Redis only when you need its performance and functions, and use other databases when you can tolerate slightly lower performance or when the data is too large to be economically stored in memory.

Web server and DB integration in Clojure Postgresql and the Server
Clojure and Redis Using Redis in Clojure

Technical Topics

Database Technology

Database technology is a technology for organizing, managing, manipulating, and storing data. Databases are intended to store relevant data in a consistent manner and provide rapid access to it as needed. This section describes the basic elements of this database technology and the algorithms used.

Database Algorithms(1) Database Consistency

One of the biggest differences between a database and other information storage methods is that the information in the database has a predefined structure. Another characteristic is “consistency”.

Consistency means that the information contained in the database does not contradict each other even if the data is moved in and out of the database. In this paper, three algorithms are described to achieve these characteristics. The first one is “write log ahead”, the second one is “two-stage commit”, and the last one is “relational database”.

Database Algorithms(2) About relational databases

A database is data with a structure, the simplest of which is a table structure. Sometimes it is more efficient to divide the table structure into multiple parts instead of just one.

This simple structure saves the memory of the computer and also has the advantage that when changing the data, all the repeated information (class information) can be changed at once instead of changing it one by one.

The data (class number in the above example) that is commonly used in the two split tables is called the key. The database uses this key to access the data as a chunk, and this chunk is called a “B-tree” in computer science.

RDBMS and SQL About SQL

The chapter on postgresql in “Seven Databases, Seven Worlds” is also divided into three parts: the first part covers basic schema definition, data insertion, row update and deletion, and basic reading, the second part details SQL, and the third part is multi-dimensional full-text search using SQL.

SQL consists of three main areas of language. The first is the Data Definition Language, which defines the data schema and creates tables, including CREATE (creating tables), ALTER (changing table settings), DROP (deleting tables), and TRUNCATE (deleting table data). The second is the Data Manipulation Language that manipulates the data in the created schema (table), such as SELECT (data retrieval), INSERT (data addition), UPDATE (data update), and DELETE (data deletion). The third is the Data Control Language, which is the overall control language, and includes GRANT and REVOKE.

RDF store and SPARQL About NoSQL DB(Graph DB) RDF store overview

In this article, we will discuss the RDF store, which is a database for handling RDF data, and SPARQL, which is a query system for extracting data from the RDF store.

Reids NoSQL DB(K-V DB) Overview and basic use

Overview of Database Technology and Examples of Implementation in Various Languages

The following sections describe implementations in various languages for actually handling these databases.

Redis Data Structures and Concepts (external link)

Redis is not a plain key-value store. In effect, it is a data structures server that supports different types of values. That is, whereas a traditional key-value store associates a key with a string value, Redis does not limit the value to a simple string, but can store more complex data structures. The following list is a listing of all data structures supported in Redis. We will discuss each of them in this tutorial.

Web server and DB integration in Clojure Postgresql and the Server
Clojure and Redis Using Redis in Clojure

Database Construction for Microservices Using Datomic, the Next-Generation DB

In this article, we will discuss Datomic, a next-generation database for microservices, which is the foundation for storing and retrieving data reliably for data-oriented applications such as microserpices (data-oriented applications are applications where the volume, complexity, and change of data is an issue). Datomic is a library written in Clojure, a cloud service provided by AWS.

Ontology Matching Schema matching for data integration of different DBs

Ontology matching can be a technique that aims to find correspondences between semantically related entities of different ontologies.

These correspondences can represent equivalences between ontology entities or other relations such as consequences, subsumption, disjointness, etc. Many different matching solutions have been proposed from various perspectives such as databases, information systems, artificial intelligence, etc.

Various methods have been proposed for ontology matching, starting from simple string matching, various machine learning approaches, data interlinking, ontology partitioning and pruning, context-based matching, matcher tuning, alignment debugging, and user participation in matching.

Schema Matching and Mapping Schema matching for data integration of different DBs

In application areas such as e-business, web-based mashups, and life sciences, it is becoming increasingly important for heterogeneous information systems to communicate cooperatively. Such cooperative systems must automatically and efficiently collate, exchange, transform, and integrate large data sets from different sources and different structures to enable seamless data exchange and transformation.

This book, edited by Bellahsene, Bonifati, and Rahm, provides an overview of how schema and ontology collation and mapping tools have met the above requirements and points out future technical challenges. The contributions by leading experts are organized into three parts: “Large-scale, knowledge-driven schema matching,” “Quality-driven schema mapping and evolution,” and “Evaluation and tuning of matching tasks.

Fusion of Probability and Logic (1) Bayesian Networks, KBMC, PRM, and SRL

As a fusion of logic and probability, we describe SRL developed in North America, which introduces logical expressions to improve the descriptive power of Bayesian nets and is used as a kind of convenient (macro-like) function. Specifically, we will discuss probabilistic relational model (PRM), Markov logic network (MLN), and probabilistic soft logic (PSL ) are described.

Extracting Tabular Data from the Web and Documents and Semantic Annotation (SemTab) Learning

There are countless tables of information on the Web and in documents, which are very useful as knowledge information compiled manually. In general, tasks for extracting and structuring such information are called information extraction tasks, and among them, tasks specialized for tabular information have been attracting attention in recent years. Here, we discuss various approaches to extracting this tabular data.
The Combined Approach to OBDA: Taming Role Hierarchies Using Filters
Towards a Semantic Web for Relational Databases: A Practical Semantic Toolkit and Chinese Medicine Use Cases
Personalized Best Answer Computation in Graph Databases
Ontology-Based Data Access: Ontop of Databases
NoSQL Databases for RDF: An Empirical Evaluation
OBDA: Query Rewriting or Materialization? In Practice, Both!
Answering SPARQL Queries over Databases under OWL2QL Entailment Regime
kyrie2: Query Rewriting under Extensional Constraints in ELHIO
Schema-Agnostic Query Rewriting in SPARQL 1.1
How Semantic Technologies Can Enhance Data Access at Siemens Energy

Overviews

The following is an overview of databases based on the book “Seven Databases in Seven Weeks A Guide to Modern Databases and the NoSQL movements” by Eric Redmond and Jim R Wilson. Seven Databases in Seven Weeks A Guide to Modern Databases and the NoSQL movements” by Eric Redmond and Jim R Wilson.

The book starts with PostgreSQL, which I introduced in the last issue, and introduces Riak, HBase, MongoDB, CouchDB, Noe4J, and Redis. The author originally wrote about NoSQL. The motivation is to answer the questions: What is NoSQL? What systems are included? How does it affect software development? The motivation is also to find answers to various database questions such as: What is NoSQL?

The first question is: “Which database or combination of databases is best suited to solve your problem? is listed here.

This book starts with PostgreSQL, which is a relational DB introduced in the previous article, and introduces Riak, HBase, MongoDB and CouchDB, which are key-value DBs, Noe4J, which is a graph-type DB, and Redis, which is a key-value DB with pub-sub functions that is also used for IOT. The author originally wrote about NoSQL. The author originally wrote about NoSQL. The motivation is what is NoSQL? What kind of systems does it include? How does it affect software development? The author’s motivation is to find answers to various database questions, such as: What is NoSQL?

Next, the paper describes databases from the perspective that they are designed for specific tasks to solve real problems. Relational DBs emphasize query flexibility rather than schema flexibility, key-value DBs are designed for cases where complex queries are unnecessary and fast access is required, column-oriented DBs are designed for storing huge amounts of data on multiple machines, document-oriented DBs emphasize schema flexibility Graph DB, which emphasizes the interconnection of data (i.e., nodes can be searched one after another while tracing relationships).

Next was the perspective of connecting to the database, whether through a general-purpose programming language such as JavaScript, a dedicated language (DSL) such as PL/psSQL or Gremlin, or a low-level protocol such as REST or Thrift. Other perspectives include scalability and performance, and cost.

The most important message in the book is that the question to consider is not whether the database can model the data that appears in the problem you are facing, but whether it is the best choice for the problem domain, usage patterns, and available resources (not whether the database is available, but whether you should use it). Should we use the database?) I think it is.

In addition, in the discussion of individual databases, comparisons from the perspective of CRUD (Create (add new data) – Read (reference data) – Update (update data) – Delete (delete data)) and transaction comparisons are explained using the terms ACID and CAP . (ACID refers to Atom, Consistent, Isolated, and Durable, while CAP refers to Consistency, Availability, and Partition-Tolerance.

While it is easy to understand the meaning of each word in CRUD, the transactional words may not immediately come to mind. CRUD is related to data input/output, while transaction is related to a mechanism that ensures data consistency even when processing is performed in a database.

In my next article, I will discuss RDBMS and SQL.