Ontology Based Data Access (ODBA), generative AI and GNNs

Machine Learning Artificial Intelligence Natural Language Processing Semantic Web Python Collecting AI Conference Papers Deep Learning Ontology Technology Digital Transformation Knowledge Information Processing Navigate This blog

Ontology Based Data Access(ODBA)について

Ontology Based Data Access (OBDA) is a method that allows queries to be performed on data stored in different formats and locations using a unified, conceptual view provided by an ontology, with the semantic integration of data and a user-friendly format for The aim will be to provide access to the data in a format that is easily understood by the user.

OBDA provides a semantic perspective on data and models data by means of ontologies, allowing users to work with data at a higher level of abstraction The elements included in ODBA will be those shown below.

Ontology: a formal representation of knowledge about the domain, defining entities, attributes, relations, constraints, etc. It can provide a shared vocabulary and conceptual schema for modelling the domain and allows semantic interpretation of data. Ontologies are described in formal languages such as OWL (Web Ontology Language) and RDF (Resource Description Framework) to describe the meaning of data.
Data sources: data sources in various formats, such as relational databases, XML data, spreadsheets, etc., can be targeted and related data can be retrieved using query languages such as SPARQL or SQL to query the data.
Mapping: mapping data sources to an ontology, whereby tables and fields in the database map to concepts and attributes in the ontology. The concepts in the ontology and the schema of the data sources can be linked by tools and software to enable ontology-based data access while maintaining data integrity.

The advantages of using OBDA include the following

Data integration: different data sources are integrated to provide a semantically consistent view. This allows users to easily utilise information from different databases.
Semantic search: ontology-based queries enable users to make sense of the data. This provides more accurate results.
Flexible queries: users can create queries based on ontologies, thus eliminating the need for specialist knowledge of database structures.
Reusability: ontologies make it easier to re-use data across different domains and systems.
Automatic inference: OBDA enables users to infer knowledge that is not directly described and gain new insights.

OBDA is applied in a variety of fields, including

Medical information systems: OBDA is used to integrate patient data, diagnostic information and research data managed by different healthcare organisations. Ontologies are used to unify terminology from different data sources and keep queries consistent, thereby making it easier to obtain comprehensive information on a patient’s medical history and treatment plan.
Bioinformatics: genetic information distributed across many different databases can be queried in a unified manner through OBDA, thereby absorbing differences in different data formats and terminology and enabling researchers to quickly obtain the genetic information they need.
Enterprise information management: OBDA can be used to integrate diverse data held by companies (e.g. customer data, product data, supply chain data, etc.), or use ontologies to query data across different systems in a consistent manner and provide decision support.
Business intelligence: integrating different enterprise databases and performing semantic analysis to support business decision-making.
Smart cities: smart city infrastructure data is collected from many different sensors and data sources, such as traffic, energy and environment; with OBDA, urban infrastructure can be monitored and optimised with consistency across different data sources, thereby improving traffic management and energy consumption more efficiently.
Management of digital archive data in museums and libraries: digital archive data held by museums and libraries can be accessed and queried in a uniform manner through OBDA, thereby enabling consistent retrieval of information across different cultural heritage databases.

Such an OBDA has the following challenges.

Ontology design and maintenance: designing and maintaining a high-quality ontology requires domain knowledge and ontology design expertise, especially when dealing with large domains and complex data.
Scalability: performance can be an issue for large data sources and complex ontologies.
Mapping complexity: mapping between data sources and ontologies can be complex and requires expertise to ensure accurate mapping.
Data integrity: it can be difficult to maintain data integrity between different data sources.
Usability: using the OBDA system requires users to have knowledge of ontologies and the SPARQL query language, which can lead to a steep learning curve. In addition, the non-intuitive interface makes it difficult for users to become familiar with the system.
Security and privacy: the OBDA system integrates and enables access to different data sources, which complicates security and privacy management. In particular, the risk of data leakage and unauthorised access increases when integrating data sources with different access rights.

Some tools and frameworks that support OBDA include

Ontop: an open source OBDA system that enables querying relational databases on R2RML mappings using SPARQL.
Mastro: an OBDA system that supports DL-Lite ontologies with advanced reasoning capabilities.
D2RQ: a tool for accessing relational databases as virtual RDF graphs and performing SPARQL queries.

Examples of ODBA implementations

As an example of an implementation of Ontology-Based Data Access (OBDA), the following steps can be followed. This section shows how OBDA can be implemented using specific technologies.

Example implementation: enterprise HR data integration

1. ontology design: First, an ontology is designed to represent the HR information in the enterprise. This ontology contains the following classes and properties.

Classes: Employee, Department, Position
Properties: hasDepartment (the department to which the employee belongs), hasPosition (the employee’s position), hasSalary (the employee’s salary)

Describe the ontology in OWL (Web Ontology Language).

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:ex="http://example.com/hr#">

<owl:Class rdf:about="ex:Employee"/>
<owl:Class rdf:about="ex:Department"/>
<owl:Class rdf:about="ex:Position"/>

<owl:ObjectProperty rdf:about="ex:hasDepartment">
<rdfs:domain rdf:resource="ex:Employee"/>
<rdfs:range rdf:resource="ex:Department"/>
</owl:ObjectProperty>

<owl:ObjectProperty rdf:about="ex:hasPosition">
<rdfs:domain rdf:resource="ex:Employee"/>
<rdfs:range rdf:resource="ex:Position"/>
</owl:ObjectProperty>

<owl:DatatypeProperty rdf:about="ex:hasSalary">
<rdfs:domain rdf:resource="ex:Employee"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#decimal"/>
</owl:DatatypeProperty>

</rdf:RDF>

2. preparation of data sources:

Assume that personnel data within a company is stored in several different databases. For example, assume that the employees table contains information on employees and the departments table contains information on departments.
Data sources include SQL databases and spreadsheets.

3. creating mappings: The OBDA platform (e.g. Ontop) is used to define mappings between data sources and ontologies. In the following, R2RML (RDB to RDF Mapping Language) is used to define the mapping between SQL databases and ontologies.

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://example.com/hr#> .

<#EmployeeMapping>
    rr:logicalTable [
        rr:tableName "employees"
    ] ;
    rr:subjectMap [
        rr:template "http://example.com/hr/Employee/{id}" ;
        rr:class ex:Employee ;
    ] ;
    rr:predicateObjectMap [
        rr:predicate ex:hasDepartment ;
        rr:objectMap [
            rr:template "http://example.com/hr/Department/{department_id}" ;
        ] ;
    ] ;
    rr:predicateObjectMap [
        rr:predicate ex:hasPosition ;
        rr:objectMap [
            rr:template "http://example.com/hr/Position/{position_id}" ;
        ] ;
    ] ;
    rr:predicateObjectMap [
        rr:predicate ex:hasSalary ;
        rr:objectMap [
            rr:column "salary" ;
        ] ;
    ] .

4. execute the query: Use SPARQL to send queries to the OBDA system to interrogate the integrated data. For example, the query ‘Retrieve all employees with a salary of 50,000 or more’ would look like this

SELECT ?employee ?salary WHERE {
  ?employee rdf:type ex:Employee .
  ?employee ex:hasSalary ?salary .
  FILTER (?salary >= 50000)
}

The OBDA system converts this SPARQL query into an SQL query and accesses the corresponding database to retrieve the results.

5. retrieving and using the results: The query results are returned based on the ontology perspective, making the company’s HR data available to the user in a consistent format. Results are often presented in list or graphical format for further analysis and reporting.

6. techniques and tools used:

Ontology design: tools such as Protégé are used to design and manage ontologies
OBDA platforms: use OBDA tools such as Ontop to perform mapping and query transformation
Databases: RDBMS such as MySQL and PostgreSQL
Query languages: SPARQL (query), SQL (data access)

Combination of ODBA and generative AI

OBDA can be combined with generative AI to deepen the semantic understanding of data and enhance the ability to generate information.

Specifically, this means that by combining OBDA, a framework for understanding the meaning of data and integrating information, with generative AI that generates new information and content, it becomes possible to generate more relevant information based on the meaning of the data and strengthen semantic understanding,

OBDA maintains data integrity and improves the quality of information generated by the generative AI, thereby enabling practical and reliable information generation, with the effect of improving the consistency and accuracy of the information.

The main approaches to integrating this OBDA and generative AI include

Semantic query generation: using OBDA, the meaning of the information sought by the user can be ascertained and the appropriate query can be generated by the generative AI based on this information. This makes it easier to extract the required data.
Content generation: based on the data obtained by OBDA, the generative AI generates reports, summaries or new suggestions. This process is particularly useful in the fields of data analysis and research.
Automated decision-making: complex datasets are analysed using OBDA and the generative AI generates recommendations to support decision-making based on the results.

Examples of specific applications include

Healthcare: a system can be built in which patient data, treatments and research results are integrated in an ontology and the generative AI automatically generates recommendations for diagnoses and treatments. This will enable healthcare professionals to make quick and accurate decisions.
Business intelligence: using OBDA, different company databases can be integrated and generative AI can generate reports and forecasts of market trends. This will make it easier for management to make data-driven decisions.
Education: learner progress data and teaching material data can be managed in an ontology, allowing the generative AI to suggest the best learning plans and teaching materials for individual learners. This will enable a personalised learning experience.

These challenges could include the following

Data quality: the effectiveness of OBDA depends on the quality of the original data. Ensuring data quality is important because if the data is inaccurate or incomplete, the generated information will also be affected.
Control of the generative model: sufficient training and feedback mechanisms are needed to control the quality of the information output by the generative AI. Monitoring of model outputs is required, especially where important data and opinions are involved.
Ethical issues: the use of information generated by generative AI requires ethical considerations, for example, care must be taken to avoid the risk of misdiagnosis in the medical field or incorrect decision-making in business.

The combination of OBDA and generative AI is a powerful approach to improve the semantic understanding of data and generate higher quality information, and this integration is expected to facilitate information processing and decision-making in a variety of fields and have practical applications.

ODBA and GNN

OBDA can be combined with graph neural networks (GNNs) to further enhance semantic data management and the analysis of complex data structures.

Possible approaches to integrating OBDA and GNNs include the following steps

Use OBDA to generate semantic graph structures from different data sources. This graph takes the form of nodes representing entities and attributes and edges showing their relationships.
The generated graph structure is then fed into the GNN to learn the features and relationships of the nodes. This enables the discovery and prediction of new knowledge.
Based on the results obtained by the GNN, OBDA is used to perform semantic queries and extract the required information.

Specific applications of the approach where OBDA and GNN are integrated are described.

Enhancing knowledge graphs: using OBDA to construct knowledge graphs and analysing these graphs using GNNs, it is possible to extend knowledge and discover new associations. For example, in the medical field, the relationship between patient data and treatments can be analysed and more effective treatments can be proposed.
Recommendation systems: by using OBDA to semantically model product and user data and learning their relationships using GNN, systems can be built to make more accurate recommendations.
Business intelligence: integrating different corporate data sources through OBDA and using GNNs to analyse market trends and customer behaviour to support decision-making.

Reference information and reference books

This section describes reference books on Ontology Data Base Access systems. These books cover content related to databases, ontologies, knowledge management and information retrieval systems, and are useful for understanding the basics and applications of building ontology-based systems and access.

1. “An Introduction to Ontology Engineering”

2. “Semantic Web for the Working Ontologist”

3. “Ontology-Based Information Retrieval for Healthcare Systems”

4. “Ontology-Based Data Access Leveraging Subjective Reports“

Deux Ex Machina

AIシステム設計・意思決定構造の設計を専門としています。
Ontology・DSL・Behavior Treeによる判断の外部化、マルチエージェント構築に取り組んでいます。

Specialized in AI system design and decision-making architecture.
Focused on externalizing decision logic using Ontology, DSL, and Behavior Trees, and building multi-agent systems.