ISWC2022Papers

Artificial Intelligence Technology Semantic Web Technology Reasoning Technology Collecting AI Conference Papers Ontology Technology Machine Learning Digital Transformation Technology Knowledge Information Processing Technology
ISWC2022Papers

From ISWC2022, an international conference on Semantic Web technologies, one of the artificial intelligence technologies for handling knowledge information. In the previous article, we discussed ISWC 2021. This time, we will discuss ISWC 2022, which is a full virtual event, taking into account the limitations of COVID-19, the limited funds available for travel, and visa issues.

The International Semantic Web Conference (ISWC) will serve as the leading international forum for the Semantic Web and Knowledge Graph community to discuss and present the latest advances in fundamental research, innovations, and applications related to semantics, data, and the Web. It will be a forum for discussion and presentation of the latest advances in fundamental research, innovative technologies, and applications related to semantic, data, and the Web.

In this issue, we will discuss ISWC 2022, which has become a full virtual event due to the limitations of COVID-19, the limited funds available for travel, and visa issues.

The conference was held for five days and offered a rich program of various tracks (Research, Resources, In-Use, Industry), workshops and tutorials, posters, demonstrations, lightning talks, doctoral consortia, challenges, panels, etc., with 1,363 335 submissions by authors.

As in previous years, a wide variety of contributions were submitted in this year’s research track, which fell into four categories. First, papers on classical reasoning and query answering for various forms of ontologies such as RDF(S)/OWL, SHACL, SPARQL, and their variants and extensions, as well as non-standard tasks such as repair, description, and database mapping. Also, as in previous years, papers on ontology/knowledge graph embedding, especially graph neural networks of various forms and their applications such as zero/few-shot learning, image/object classification, and various NLP tasks. There is also a category of papers focused on specific knowledge graph tasks, such as link and type prediction, entity alignment, etc. Finally, there were reports on surveys of the current state of the art, including LOD availability and ontology structure patterns.

The Resources Track will facilitate the sharing of resources, especially datasets, ontologies, software, and benchmarks to support, enable, and use Semantic Web research.

The In-Use Track provides a forum to explore the benefits and challenges of applying Semantic Web and Knowledge Graph technologies to concrete and practical use cases, from industry to government and society.

The proceedings also include abstracts of talks by three outstanding keynote speakers organized by ISWC2022, Markus Krötzsch, Francesca Rossi, and Ilaria Capua. Markus Krötzsch is a Semantic Web and Knowl- edge Graph community, presented “Data, Ontologies, Rules, and the Return of the Blank Node” on how the integration of data and ontologies provides opportunities for the Semantic Web, and how recent results from rule-based reasoning provide the foundation for overcoming related challenges. He presented his view on how the integration of data and ontologies provides the basis for overcoming related challenges. Francesca Rossi gave a talk titled “AI Ethics in the Semantic Web” on the main issues surrounding AI ethics, proposed solutions, and the relevance of AI ethical issues to the Semantic Web. Ilaria Capua is an internationally renowned virologist and pioneer in sharing genetic data to improve pandemic preparedness.

This year’s Industry Track covered all aspects of innovative commercial or industrial semantic technologies and knowledge graphs and their adoption.

The Workshops and Tutorials track was devoted to research on ontology engineering (ontology design patterns and ontology collation), data management (data evolution and storage, web-scale data storage, querying, and management), user interaction, and synergies with other technical areas (especially deep learning). A total of 11 workshops were held, dealing with the themes. In addition, a number of workshops focused on applications of Semantic Web technologies, such as Wikidata, Knowledge Graph Summarization, Linked Open Science, Legal Document Management, and Music Heritage Management with Knowledge Graphs. 3 workshops (Legal Document Management, Knowledge Graph Six tutorials covered core technical topics such as reasoning, schema discovery, knowledge recognition zero-shot learning, and topics on interesting applications such as autonomous driving and earth observation data management. and other topics related to interesting application areas, such as autonomous driving and earth observation data management.

The Semantic Web Challenges Track proposed five challenges to help create and integrate a community that promotes research by developing solutions. Each challenge provided a common environment to compare and contrast systems in a variety of settings and tasks. Topics would include coalitional query answering, neurosymbolic reasoning, question answering, knowledge graph construction from language models, and matching tabular data to knowledge graphs. three challenges were based on previously proposed events (Semantic Answer Type, Entity, and Relation Linking Task, Semantic Reasoning Evaluation Challenge, and Semantic Web Challenge on Tabular Data to Knowledge Graph Matching). The group continues to promote and integrate research trends in the Semantic Web. Two new challenges (Bio2RDF and Kibio federated query in Life Science Challenge and Knowledge Base Construction from Pre-trained Language Models) were also introduced as part of the program. Following the success of the other challenges, they are likely to become a stable part of future ISWC events.

The poster, demonstration, and lightning talk tracks will complement the paper track of the conference by providing an opportunity to present the latest research results, ongoing projects, and innovative work-in-progress.

Another important tradition at ISWC will be the Doctoral Consortium (DC), which will provide an opportunity for doctoral students to present their research ideas and early results and receive constructive feedback from senior members of the community.

The conference program also included two panel discussions featuring panelists invited from industry and academia. The first panel, “Is Deep Learning Good or Bad for the Semantic Web?” followed the view that deep learning methods are currently having a significant impact on Semantic Web research, but that important topics are even neglected because they are not easily addressed by deep learning approaches The second panel posed the question “Knowledge Graphs for the Physical World – What’s Missing?” to academic and industrial researchers. Indeed, applications such as smart homes, autonomous driving, robotics, and digital twins could benefit from explicit knowledge about the physical world and need to integrate rich data sources for this purpose. However, academic progress is slow and existing standards fall well short of industry needs. Translated with www.DeepL.com/Translator (free version)

Invited Lectures

CIRCULAR HEALTH

A pandemic is a unique and transformative event that shakes life by exposing the vulnerability of Homo sapiens to previously unknown pathogens and spreads by infecting most humans on the planet. But Covid 19 did much more than this. Covid 19 exposed another type of vulnerability: the vulnerability of the systems in which we operate. It opened our eyes to the harsh reality that we live in a closed system in which we are completely interconnected and interdependent with the other organisms on the planet. This realization led us to believe that as a society we should embrace a “one-health” (Note 1) approach that recognizes the connection between human, animal, and environmental health.

Covid 19 also showed that there were multiple drivers and influences for such a massive health crisis, including social [2], economic [3], and digital [4], that contributed to the collapse of the pandemic. Furthermore, Covid 19 was the most measured event in history, and in this event, large amounts of big data were generated.

In the 2000s, we experienced several challenges related to our closed systems and affecting our health, such as climate change and the food crisis. For example, we recognize that rising temperatures will have devastating effects on ocean health, biodiversity loss, and human and animal migration. We are also well aware that global demographics will require more food to feed a world population that is expected to reach 9.7 billion by 2050 [7]. At the same time, we are committed to reducing greenhouse gas emissions in order to reduce pollution and CO2 emissions.

Following the conceptual blueprint of the circular economy [8] and circular agriculture [9], now may be an opportune time to expand our approach to health to a circular model that embraces complex new connections between human health and the health of this closed system. This cyclical approach could be implemented by using the Sustainable Development Goals (SDGs) roadmap as a data-driven and convergent catalyst for health; all 17 goals relate to human, animal, plant, and environmental health, prioritizing specific activities and existing guidelines and commitments seems reasonable.

The novelty of the circular health approach is that it uses the new post-Covid 19 health priorities to promote convergence of health-related issues that are achievable within the framework of the Sustainable Development Goals. In this way, it will be possible to promote urgent health priorities within the existing framework for sustainability, and promote health as an essential resource within a closed system, the complexity of which needs to be reclaimed and addressed.

Data, Ontologies, Rules, and the Return of the Blank Node

The Semantic Web has long been characterized by the parallel development of machine-readable data and ontology models. These two worlds, inspired by very different backgrounds of Web data exchange and mathematical logic, have sometimes been perceived as complementary, even antagonistic. However, the general trend toward knowledge graphs has rendered such discussions irrelevant, and modern knowledge models such as wiki-data often combine instance and schema data in parallel. In this talk, I will describe how such data and ontology integration presents opportunities for the Semantic Web and discuss how recent results in rule-based reasoning may provide a basis for overcoming related challenges. We also discuss how recent results in rule-based reasoning may provide a basis for overcoming related challenges. In addition to the theoretical benefits of this addition, specific practical uses of this expressive power can be demonstrated.

AI Ethics in the Semantic Web

AI will bring significant benefits in terms of scientific progress, human well-being, economic value, and the potential to find solutions to major social and environmental problems Supported by AI, we can make more grounded decisions and make the decision-making process less routine and repetitive and more focused on key We will be able to focus on values and goals. However, such powerful technology also raises some concerns related to, for example, the black box nature of some AI approaches, discriminatory decisions that AI algorithms may recommend, and accountability and liability if AI systems are involved in undesirable outcomes. In addition, because many successful AI technologies rely on vast amounts of data, it is important to know how data is handled by AI systems and by those who create the data. These concerns are either barriers to AI or sources of worry for current AI users, adopters, and policy makers. Without answers to these questions, many will not trust AI and thus will not fully adopt it or reap its positive impacts. This talk will introduce the main issues surrounding AI ethics, some of the proposed technical and non-technical solutions, as well as the practical behaviors and regulations that define AI development, deployment, and use. The relevance of the Semantic Web to AI ethics will also be discussed. Translated with www.DeepL.com/Translator (free version)

Research Track

Knowledge graph (KG) completion has been studied for many years in link prediction tasks that infer lost relationships, but literals have received less attention due to their non-discrete and semantic challenges. Unlike other literals, numerical attributes such as height, age, and birthday have great potential as prediction targets because they can be computed and estimated, and thus play an important role in a series of tasks. However, the use of structural information and the development of embedding techniques have led to only a few attempts to predict numerical attributes on ancient documents. In this paper, we revisit the task of predicting numerical attributes on KG and introduce several new methods that explore and exploit the rich semantic knowledge of language models (LMs) for this task. We also propose effective combinatorial strategies to maximize the use of both structural and semantic information. Extensive experiments will be conducted to show that both semantic methods and combinatorial strategies are highly effective.

In recent years, there have been many efforts to learn continuous representations of symbolic knowledge bases (KBs). However, these approaches suffer from the problem that they can only embed data-level knowledge (ABox) or suffer from inherent limitations when dealing with concept-level knowledge (TBox), i.e., they cannot faithfully model the logical structures present in the KB. In this paper, we introduce BoxEL, a geometric KB embedding approach that allows us to better capture the logical structure (axioms of ABox and TBox) in the description logic EL++. BoxEL models concepts in the KB as axis-parallel boxes suitable for modeling concept intersection, entities as points in the box Theoretical guarantees (soundness) for the logical structure preservation of BoxEL are presented. That is, the learned model of BoxEL that embeds at loss 0 is a (logical) model of KB. Experimental results on (plausible) inclusion inference and real-world applications to protein-protein prediction show that BoxEL outperforms both traditional knowledge graph embedding methods and state-of-the-art EL++ embedding approaches.

Document-level relationship extraction (RE) aims to identify relationships between entities throughout a document. This requires complex reasoning skills that synthesize a variety of knowledge, including core references and common sense. Large knowledge graphs (KGs) are rich in real-world facts and can provide valuable knowledge for document-level relationship extraction. In this paper, we propose an entity knowledge injection framework to enhance the current document-level RE model. Specifically, we introduce coreference distillation to inject coreference knowledge and give the RE model a more general feature of coreference reasoning. We also use representation coordination to inject factual knowledge and aggregate KG and document representations into a unified space, and experiments on two benchmark datasets validate the generalization of our entity knowledge injection framework and its consistent improvement over several document-level RE models.

Time-efficient solutions for querying RDF knowledge graphs rely on index structures with short response times to respond quickly to SPARQL queries. Our recent development of Hypertries, an index structure for tensor-based triple stores, has achieved significant improvements in execution time compared to mainstream storage solutions for RDF knowledge graphs. However, the spatial footprint of this new data structure is still often larger than many mainstream solutions. This study details how to reduce the memory footprint of hypertrees and further accelerate query processing in hypertree-based RDF storage solutions by: (1) removing duplicate nodes through hashing, (2) compressing non-branching paths, (3) using single entry leaf node in the parent node. We evaluate these strategies against baseline Hypertries and well-known triple stores such as Virtuoso, Fuseki, GraphDB, Blazegraph, and gStore. Four datasets/benchmark generators were used for the evaluation: SWDF, DBpedia, WatDiv, and WikiData. The results show that Hypertries reduced its memory footprint by up to 70% and achieved relative improvements of up to 39% in average queries per second and 740% in queries per hour.

Convolutional neural networks (CNNs) classify images by learning an intermediate representation of the input through many layers. Recent work has been done to align the latent representation of CNNs with semantic concepts. However, to generate such alignments, most existing methods rely primarily on large amounts of labeled data, which is difficult to acquire in practice. We address this limitation by presenting a framework for mapping CNN hidden units to class semantic attributes extracted from an external common sense knowledge repository. We empirically demonstrate the effectiveness of our framework on copy-paste adversarial image classification and a generalized zero-shot learning task.

The reproducibility crisis is an ongoing problem with major implications for data-driven science. Highly connected distributed web ontologies are the backbone of semantic data and the Linked Open Data Cloud, providing contextual information on terms important to the use and interpretation of data. It is the backbone of the cloud and provides contextual information for terms that are important for the use and interpretation of data. This is important for the reproducibility of research results using the data. This paper identifies, analyzes, and quantifies the reproducibility issues associated with obtaining term context. This paper identifies, analyzes, and quantifies reproducibility issues associated with capturing terminological context (e.g., due to unavailable ontologies). Identify, analyze, and quantify reproducibility issues (e.g., due to ontologies) and their impact on the reproducibility crisis in the Linked Open Data Cloud. Identify the impact on the reproducibility crisis in the Linked Open Data Cloud. Our validation is backed by frequent and continuous monitoring of the vocabularies available online. It is backed by continuous monitoring of the vocabularies and ontologies available online; it is corroborated by the DBpedia Archivo dataset; it is backed by the frequent and continuous monitoring of the vocabularies and ontologies available online; and it is backed by the frequent and continuous monitoring of the vocabularies and ontologies available online. We also show to what extent the reproducibility crisis can be addressed again by ontology archiving in DBpedia Archivo and Linked Open Vocabularies.

Contrastive learning has emerged as a powerful tool for graph representation learning. However, most contrastive learning methods learn graph features in a coarse-grained fixed manner, which may underestimate local or global information. Therefore, we propose a new hierarchical contrast learning (HCL) framework that learns graph representations hierarchically in order to obtain a more hierarchical and richer representation. Specifically, HCL consists of two main components. Specifically, HCL consists of two main components: a new adaptive Learning to Pool (L2Pool) method to construct a more rational multi-scale graph topology for more comprehensive contrastive learning, and a new multi-channel pseudo-Sham network that can learn mutual information within each scale more expressively. experimentally shown to achieve competitive performance on 12 datasets, including node classification, node clustering, and graph classification. Furthermore, visualization of the learned representations reveals that HCL successfully captures meaningful features of graphs.

The importance of considering potentially conflicting individual perspectives when dealing with knowledge is widely recognized. Many existing ontology management methods fully merge knowledge perspectives and may need to be weakened to maintain consistency.
Standpoint logic is a simple and versatile multimodal logic “add-on” for existing KR languages that integrally represents domain knowledge associated with diverse and potentially conflicting viewpoints, and hierarchically organizes, combines, and interrelates them.
Starting from the general framework of first-order standpoint logic (FOSL), we focus on sentence formula fragments and provide multi-time translations to standpoint-free versions for them. This results in decidability and favorable complexities for various highly expressive decidable fragments of first-order logic. We then use some elaborate coding tricks to establish a similar translation for the highly expressive descriptive logic SROIQb_s underlying the ontology language OWL 2 DL. The result provides practical reasoning support for an ontology language augmented by viewpoint modeling using existing highly optimized OWL reasoners.

Current deep learning methods for object recognition are purely data-driven and require a large number of training samples to obtain good results. Because they rely only on image data, these methods tend to fail when faced with new environments where even small deviations occur. However, human perception has proven to be very robust to such distributional changes. It is speculated that this is based on the ability to respond to unknown scenarios by incorporating extensive contextual knowledge. Context can be based on the co-occurrence of objects in a scene or on experiential memory. Following the human visual cortex’s use of context to form different object representations for the images seen, we propose an approach to enhance deep learning methods by using external contextual knowledge encoded in a knowledge graph. To this end, we extract different contextual views from a general knowledge graph, transform the views into a vector space, and infuse them into a DNN. We perform a series of experiments to investigate the impact of different contextual views on the learned object representations for the same image dataset. The experimental results provide evidence that contextual viewpoints affect the image representation in the DNN differently and thus lead to different predictions for the same image. We also show that context helps to strengthen the robustness of object recognition models to out-of-distribution images, which typically occur in transition learning tasks and real-world scenarios.

When cooking, it is sometimes desirable to substitute ingredients to avoid allergens, to compensate for missing ingredients, to find new tastes, etc. More generally, the problem of substituting entities used in procedural instructions is a difficult one because it requires an understanding of how the entities and actions in the instructions interact to produce the final result. In this paper, we address this problem by (1) using natural language processing tools and domain-specific ontologies to analyze instructions and generate flow graph representations, (2) learning new embedding models that capture the flow and interaction of entities at each stage of the instruction, and (3) using the embedded models to identify plausible We propose a method to identify plausible replacements. The proposed method aggregates nodes in a flow graph and computes intermediate results dynamically, so it can learn embeddings with fewer nodes than general graph embedding models. In addition, the embedding model performs better than the baseline on the task of predicting the linkage of ingredients in a recipe.

Temporal Heterogeneous Information Network (Temporal HIN) embedding aims to represent various types of nodes with different timestamps in a low-dimensional space while preserving structural and semantic information, which is extremely important for various real-world tasks. Researchers have made a great deal of effort and achieved some significant results on temporal HIN embedding in Euclidean space. However, there is always a fundamental contradiction: many real-world networks exhibit hierarchical properties and power-law distributions that are not isometric to Euclidean space. Recently, representation learning in hyperbolic spaces has been proven to be effective for data with hierarchical structures and power laws. Inspired by this property, we propose a hyperbolic heterogeneous temporal network embedding (H2TNE) model for temporal HIN. Specifically, we exploit a temporal and heterogeneous doubly constrained random walk strategy to capture structural and semantic information, and compute embeddings using hyperbolic distances in proximity measurements. Experiments show that the proposed method outperforms the SOTA model in temporal link prediction and node classification.

Entity alignment is a fundamental and important technique for knowledge graph (KG) integration. For many years, entity alignment research has been based on the assumption that KGs are static, which ignores the growing nature of KGs in the real world: as KGs grow, previous alignment results need to be revisited, while new entity alignments wait to be discovered This is the case. In this paper, we propose and dive into a realistic but unexplored setting: continuous entity alignment. To avoid retraining the entire model for the entire KG each time a new entity or triple appears, we present a continuous alignment method for this task. This method allows us to reconstruct the representation of entities based on their adjacencies and to use existing adjacencies to generate new entity embeddings in a fast and inductive manner. In addition, by selecting and replaying partially pre-aligned entity pairs, only a portion of the KG can be learned while extracting reliable alignments for knowledge reinforcement. Since growing KGs inevitably contain unmatchable entities, the proposed method differs from previous methods by employing bidirectional nearest neighbor matching to discover new entity alignments and update old alignments. In addition, we also construct a new dataset that simulates the growth of a multilingual DBpedia. Extensive experiments show that the proposed method is more effective than baselines based on re-training and inductive learning.

Today, much of the structured data is still stored in relational databases, and it is important to provide a transformation between relational and semantic data. relational to RDF mappings, such as R2RML [13], provide a declarative mapping to existing relational data and provide a way to view them in the RDF data model. Relational to RDF mappings transform relational instance data into RDF, while not specifying any transformation of existing relational constraints such as primary key constraints or foreign key constraints Since the advent of R2RML, interest in RDF constraint languages has increased and SHACL [15] has been standardized and SHACL [15] has been standardized. This raises the question of which SHACL constraints are guaranteed to be valid for the datasets generated by the relational to RDF mapping. For any given SQL constraint and relation→RDF mapping, this is a difficult problem, but we can introduce a constraint rewrite of the relation→RDF mapping that faithfully translates SQL consistency constraints into SHACL constraints by introducing a number of constraints into the mapping. We define and prove two fundamental properties: maximal semantic preservation and monotonicity.

Satellite-based positioning systems are primarily used in outdoor environments, but a variety of other positioning technologies exist for different domains and use cases, including indoor and underground. The representation of spatial data with semantic-linked data can be adequately addressed by existing spatial ontologies. However, the focus is primarily on location data with a specific geographic context, and solutions are lacking to describe the different types of data generated by positioning systems and the sampling techniques used to acquire the data. This paper presents a new generic Positioning System Ontology (POSO) built on top of the Semantic Sensor Network (SSN) and the Sensor, Observe, Sample, Actuator (SOSA) ontology. POSO takes known positioning algorithms and technologies consideration and provides the missing concepts needed to describe positioning systems and their outputs. This will allow for the improvement of hybrid positioning systems using multiple platforms and sensors described via the POSO ontology.

Temporal knowledge graphs (TKGs) organize and manage the dynamic relationships among entities over time. Inference of missing knowledge in TKGs, known as temporal knowledge graph completion (TKGC), has been an important research topic. Traditional models treat all facts with different timestamps in the same latent space, even though the semantic space of TKG changes over time. Therefore, they are not effective in reflecting the temporal nature of knowledge. In this paper, a new model called Spatial Adaptive Network (SANe) is constructed by adapting different latent spaces to snapshots with different timestamps in order to effectively learn temporal changes in latent knowledge. Specifically, the convolutional neural network (CNN) is extended to map facts with different timestamps to different latent spaces to effectively reflect dynamic changes in knowledge. On the other hand, to search for overlaps in the latent space, a time-aware parameter generator is designed to assign specific parameters to the CNN in terms of the context of the timestamps. Thus, knowledge of adjacent time intervals can be shared efficiently, enhancing the performance of TKGC, which can learn the validity of knowledge over a certain period of time. Extensive experiments have demonstrated that SANe achieves state-of-the-art performance on four established benchmark datasets for temporal knowledge graph completion.

Several types of dependencies have been proposed for the static analysis of existence rule ontologies, promising insights into, for example, computational properties in ontology-based query responses and possible practical uses of a given set of rules. Unfortunately, however, these dependencies are rarely implemented, and their potential is rarely realized in practice. We focus on two types of rule dependencies (positive dependencies and constraints) and design and implement an optimized algorithm for their efficient computation. Experiments on real-world ontologies with up to 100,000 rules or more demonstrate the scalability of our approach and provide practical case studies for several previously proposed applications. In particular, we can analyze to what extent rule-based bottom-up reasoning is guaranteed to produce “lean” knowledge graphs (so-called cores) without redundancy in practical ontologies.

Heterogeneous Graph Neural Networks (HGNN) is an area of research that has received a great deal of attention in recent years. Knowledge graphs contain hundreds of different relations and exhibit the intrinsic property of strong heterogeneity. However, the majority of HGNNs characterize heterogeneity by learning separate parameters for each type of node and edge in the latent space; when HGNNs attempt to process knowledge graphs, the number of type-related parameters explodes, making HGNNs only applicable to graphs with few edge types. only applicable to graphs with few edge types. To overcome this limitation, we propose a novel heterogeneous graph neural network that incorporates a hypernetwork that generates the necessary parameters by modeling the general semantics between relations. Specifically, the hypernetwork is used to generate relation-specific parameters for convolution-based message functions to improve model performance while maintaining parameter efficiency. Empirical studies on the most commonly used knowledge base embedding datasets confirm the effectiveness and efficiency of the proposed model. In addition, the model parameters were shown to be significantly reduced (from 415M to 3M for FB15k-237 and from 13M to 4M for WN18RR).

The goal of key phrase extraction is to identify a small set of phrases that best describe the content of a text. Automatic generation of key phrases has become essential for many natural language applications, such as text classification, indexing, and summarization. In this paper, we propose MultPAX, a multitasking framework that extracts current and past key phrases using a pre-trained language model and knowledge graph. This framework consists of three components. First, MultPAX identifies key phrases present in the input documents. Next, MultPAX links to an external knowledge graph to obtain more relevant phrases. Finally, MultPAX ranks the extracted phrases based on their semantic relevance to the input documents and returns the top k phrases as final output. We conducted experiments on four benchmark datasets to evaluate MultPAX’s performance against various state-of-the-art baselines. The results show that our approach significantly outperforms the state-of-the-art baselines with a significance test p<0.041. Our source code and dataset are available at https://github.com/dice-group/MultPAX.

In dialogue systems, the use of external knowledge is a promising way to improve response quality. Many existing studies employ knowledge graphs (KGs) as external resources for context understanding and response generation, focusing on the contribution of entities in the last utterance of a dialogue. However, the correlation between the knowledge contained in multi-turn contexts and the transition regularities between relations in the knowledge graphs has not yet been fully investigated. Therefore, we propose a knowledge-primed dialogue generation model (RT-KGD) that takes into account relational transitions. Specifically, inspired by the latent logic of human conversation, we propose a model that integrates the regularity of relational transitions at the dialogue level with entity semantic information at the turn level. We believe that the interaction between knowledge provides rich cues for predicting appropriate knowledge and generating coherent responses. Experimental results in both automatic and manual evaluation show that our model outperforms the state-of-the-art baseline.

This paper presents an end-to-end learning framework called LoGNet (Local and Global Triple Embedding Network) for triple-centric tasks in knowledge graphs (KGs).LoGNet is based on graph neural networks ( GNN), which combines local and global triple embedding information. Local triple embeddings are learned by treating triples as sequences. Global triple embeddings are learned by operating on the characteristic triple line graph of the knowledge graph. Nodes in . are triples, edges are inserted according to the subject/object shared by the triples, and node and edge features are derived from triples in . LoGNet provides a fresh triple-centric perspective and the flexibility to adapt to a variety of downstream tasks. We discuss specific use cases in triple classification and anomaly predicate detection. Through experimental evaluation, we show that LoGNet outperforms state-of-the-art technologies.

Content gaps in the knowledge graph affect downstream applications. Semantic Web researchers have advanced these studies primarily in relation to data quality and ontology evaluation by proposing various quality dimensions such as completeness, correctness, and consistency, as well as frameworks for evaluating these dimensions. However, defining these gaps in the context of user needs has not been done much. This limits the ability of knowledge engineers to design processes and tools to effectively address these gaps. We propose the following framework. (i) identify the core types of content gaps based on a literature review on peer production systems, and (ii) quantitatively compare the imbalance of work on the knowledge graph and the imbalance of user information needs in areas where such gaps exist to We propose a framework to identify the causes of the gap. We operationalize the framework with gender, recency, geographic, and socioeconomic gaps and apply it to Wikidata by comparing editorial metrics and Wikipedia page views between 2018 and 2021. We found no gender or recency gaps inherent in Wikidata production. Only in exceptional cases, Wikidata editors are doing less work than necessary on underrepresented entities (e.g., people in countries with low Human Development Indices), according to the amount of demand. We hope that this study will provide a basis for knowledge engineers to explore the causes of the content gap and address them when necessary.

SHACL (Shapes Constraint Language) is a recent W3C recommendation for validating RDF graphs against shape constraints that are checked at the target nodes of a data graph. The standard also describes the concept of a validation report for data graphs that violate a given constraint, with the goal of providing feedback on how the data graph can be modified to satisfy the constraint. Since the specification allows SHACL processors to define such explanations, recent work has suggested using explanations in the style of database repair (repair is a series of additions or deletions to a data graph so that the resulting graph can be verified against constraints). In this paper, we study such repairs for non-recursive SHACL (the largest fragment of SHACL, fully defined in the specification). We propose an algorithm that uses Answer Set Programming (ASP) to compute the repair by encoding the explanatory problem into a logic program, whose answer set corresponds to the (minimal) repair. Next, we study a scenario in which it is impossible to fix all objects simultaneously. This can often happen due to overall unsatisfiability or contradictory constraints. We introduce a relaxed notion of validation, allowing to validate a (maximal) subset of targets and adapt the ASP translation to take this relaxation into account; our implementation in Clingo is, to the best of our knowledge, the first implementation of a repair generator for SHACL.

Entity type information in knowledge graphs (KGs) such as DBpedia and Freebase is often incomplete due to automatic generation and human curation. Entity type is the task of assigning or inferring semantic types of entities in a KG. In this paper, we introduce a different graph walk strategy for RDF2vec and a new approach for entity typing with text, textit{GRAND} RDF2vec first generates a graph walk and then uses a language model to obtain the embedding of each node in the graph . We show that the walk generation strategy and the embedding model have a significant impact on the performance of the entity typing task. The proposed method outperforms the baseline approach for entity typing in KG on the benchmark datasets DBpedia and FIGER for both fine and coarse granularity classes. The results show that the best results are obtained by combining the order-aware RDF2vec variant with contextual embedding of textual entity descriptions.

We have a distributed geospatial RDF store Strabo 2 that handles GeoSPARQL queries against huge RDF datasets Strabo2 is based on robust technology and can scale for data distributed across hundreds of nodes Strabo2 is based on robust technology and can scale to hundreds of TBs of data distributed across hundreds of nodes. It can process data in TBs distributed across hundreds of nodes. Specifically, we use the Spark framework, enhanced with the geospatial library SEDONA for distributed in-memory processing on a Hadoop cluster, and Hive for compact persistent storage of RDF data. Strabo 2 is has a flexible design that allows for the storage and partitioning of thematic RDF data. Strabo 2 uses a flexible design that allows thematic RDF data to be stored and partitioned in different relational schemas and spatial data in separate Hive tables.
Spatial data are flexibly designed to be stored and partitioned in separate Hive tables, taking into account the GeoSPARQL vocabulary Strabo 2 is cluster friendly in terms of both memory and disk.
Compression of triples is cluster-friendly in terms of both memory and disk because it uses a partial encoding method in addition to the Parquet data file format compression method.
Compression of triples is both memory and disk friendly because it uses a partial encoding technique in addition to the Parquet data file format compression scheme GeoSPARQL queries are converted to Spark SQL dialects and enhanced with SED’s spatial functions and predicates. function and enhanced with the predicates provided by SEDONA. In this process, the system considers both the spatial selection and spatial binding capabilities of SEDONA. Optimizations are applied to ensure efficient query processing; Strabo 2 was tested experimentally. Demonstrated in an award-winning Hadoop-based cluster environment, Strabo
demonstrated the excellent scalability of Strabo 2. data sets. Strabo 2 has also demonstrated its superior scalability by handling large synthetic and real data sets in a Hadoop-based cluster environment. It shows that it can handle dataset sizes exceeding several GB compared to centralized engines running on a single server.

Controlled Query Evaluation (CQE) has recently been studied in the context of Semantic Web ontologies: the goal of CQE is to hide the answers to a given query in order to prevent external users from guessing sensitive information. In general, since there are multiple ways to hide answers and they are not comparable to each other, traditional CQE approaches pre-select which answers to make visible and which not to hide. In this paper, we propose a dynamic CQE approach, i.e., changing the answer to the current query based on the evaluation of previous queries. We aim for a system that is maximally cooperative (intuitively, answering as many queries as possible in the affirmative) in addition to being able to protect sensitive data, and we achieve this goal by delaying the answer change as long as possible. We also show that a static approach that does not rely on query history cannot intuitively simulate this behavior. Interestingly, for policies expressed by the OWL 2 QL ontology and negation, query evaluation under our semantics is first-order rewritable and thus has data complexity AC0. This paves the way for the development of practical algorithms, which are also discussed preliminarily in this paper.

Despite the massive adoption of semantic technologies in the biomedical field, little is known about how to model published ontologies in general; OWL ontologies are often published only in the crude form of a set of axioms, leaving their underlying design opaque. However, a principled and systematic ontology development life cycle would be reflected in the regularity of the ontology’s emergent syntactic structure. To better understand this emergent structure, we propose to reverse engineer the ontology with a syntax-oriented approach to identify and analyze the regularities of axioms and sets of axioms. We surveyed BioPortal in terms of syntactic modeling trends and common practices regarding OWL axioms and class frames. The results suggest that the biomedical ontology shares only simple syntactic structures in which OWL constructs are not deeply nested or combined in complex ways. While such simple structures often account for a large proportion of the axioms within a given ontology, many ontologies also contain an obvious amount of more complex syntactic structures that are not common across ontologies.

In this paper, we consider fact-checking approaches aimed at predicting the truth of claims in knowledge graphs. In the recent literature, five major fact-finding approaches to knowledge graphs have been proposed, each with limitations that partially overlap. In particular, text-based approaches are limited by manual feature extraction. In addition, embedding-based approaches suffer from low accuracy in current fact checking tasks. We propose a hybrid approach, which we call HybridFC. This approach exploits the diversity of existing fact checking approaches in an ensemble learning setting to achieve significantly better prediction performance. In particular, it outperforms state-of-the-art techniques by a factor of 0.14-0.27 in terms of area under the receiver operating characteristic curve on the FactBench dataset. Our code is open source and is available at https://github.com/dice-group/HybridFC.

Real-world knowledge graphs are usually incomplete, lacking some facts that represent valid information. Therefore, standard symbolic query engines are unable to generate expected but not logically implied answers when querying such knowledge graphs. To overcome this problem, state-of-the-art ML-based approaches first embed the KG and query in a low-dimensional vector space and then generate answers to the query based on the proximity of the embedding of the candidate entities and query in the embedding space. This allows the embedding-based approach to obtain expected solutions that are not logically internalized. However, the embedding-based approach is not applicable to inductive settings where the KG entities (i.e., constants) seen at runtime may differ from those seen at training time. In this paper, we propose a novel neuro-symbolic approach to query answering for incomplete KG that is applicable in inductive settings. In our approach, we first symbolically augment the input KG with facts representing the parts of the KG that match the query fragments, and then apply a generalization of the relational graph convolutional network (RGCN) to the augmented KG to generate the predicted query answer. We formally prove that, under reasonable assumptions, our approach can capture (often substantially) vanilla RGCN-based approaches (without KG augmentation) with fewer layers. Finally, we empirically validate our theoretical findings by evaluating the implementation of our approach for the RGCN baseline on several dedicated benchmarks.

Relational tables are widely used to store information about entities and their attributes and have become the de facto standard for training AI algorithms. Many semantic table interpretation approaches have been proposed, especially for the so-called cell-entity annotation task, where a referenced knowledge graph (KG) is used to unambiguate the value of a given table cell
A number of approaches have been proposed for the so-called cell-entity annotation task, which aims at disambiguating the value of a table cell given a knowledge graph (KG). Among these methods, heuristic-based methods have proven to be the ones that achieve the best performance, often relying on inter-column relations aggregated by column types and voting strategies. However, these methods often ignore the semantic similarity of other columns and are very sensitive to error propagation (e.g., if the type annotation is incorrect, the system often propagates entity annotation errors in the target column). In this paper, we propose Radar Station, a hybrid system that aims to add a semantic disambiguation step after pre-identified cell-entity annotations.Radar Station considers the entire column as a context and uses graph embedding to capture potential relationships between entities and improve disambiguation. We evaluated Radar Station on web tables and synthetic datasets with several graph embedding models belonging to different families. We demonstrate that our approach yields a 3% accuracy improvement over heuristic-based systems. Furthermore, we empirically observe that among the various graph embedding families, the one that relies on a fine-tuned translation distance performs better than the other models.

Temporal Knowledge Graph (TKG) inference, which aims to estimate missing facts in TKG, is essential for many important applications such as event prediction. Prior work has attempted to equip entities and relations with temporal information at past timestamps and achieved promising performance. They independently predict missing data while ignoring the possibility of future occurrences occurring simultaneously. However, there are complex linkages between concurrent future occurrences that may correlate with and influence each other. Therefore, we propose a Concurrent Reasoning Network (CRNet) to exploit the simultaneity of events at both past and future time stamps for TKG inference. Specifically, we select the top k candidate events for each missing event and construct a candidate graph based on candidate events for all missing events at future timestamps. The candidate graph connects missing facts by sharing the same entities. Furthermore, we employ a novel relational graph attention network to represent the interaction of candidate events. We evaluate our proposal by an entity prediction task on three well-known public event-based TKG datasets. The results show that our CRNet complements future missing facts with a 15-20% improvement over MRR. (Source code available at https://github.com/shichao-wang/CRNet-ISWC2022.)

Resources Track

Because there are so many datasets that it is difficult to find, download, and review the contents of related datasets (much less apply entity matching), it is not easy for data owners to publish as Linked Data connected to existing datasets. However, connections to other datasets are generally important for discoverability, browsability, and queryability. To alleviate this problem, this paper introduces LODChain, a service that helps providers strengthen the connection between their datasets and other datasets. schema elements, and triples, and suggests various inferred connections and related datasets to the user through equivalence reasoning. In addition, it provides various content-based dataset discovery services to detect incorrect mappings and enrich the content of datasets. The main difference from existing approaches is that they are metadata-based, whereas our proposal is a database. We present our implementation of LODChain and report the results of various experiments on real and synthetic data.

In the field of natural language processing, the transformation of data into language is a very important issue. This is because there are significant advantages in converting our vast amount of structured and semi-structured data into a human-readable format. Knowledge Graph (KG) data linguification focuses on converting interconnected triple-based claims consisting of subject, predicate, and object into text; KG linguistic datasets exist for some KGs, but they still lack suitability for use in many scenarios. This is especially true for Wikidata, where available datasets either loosely combine claim sets with textual information or focus on predicates about biographies, cities, and countries. To address these gaps, we propose WDV, a large KG claim linguistics dataset built from Wikidata. It has tight coupling between triples and text and can cover a wide variety of entities and predicates. We also assess speech quality through a reusable workflow for measuring human-centered fluency and adequacy scores. our data and code are publicly available to facilitate research in KG linguistics.

Ontology Matching (OM) plays an important role in many fields, such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. However, while the Ontology Alignment Evaluation Initiative (OAEI) is an excellent effort for the systematic evaluation of OM systems, it suffers from limitations in the evaluation of dependent mappings, suboptimal reference mappings, support for the evaluation of ML-based systems It suffers from several limitations, including limitations To address these limitations, we introduce five new biomedical OM tasks that include ontologies extracted from Mondo and UMLS. Each task includes both equivalence and inclusivity matching. The quality of the reference mappings will be ensured by human curation and ontology pruning. We also propose a comprehensive evaluation framework that measures OM performance from various perspectives with respect to both ML- and non-ML-based OM systems. These resources are publicly available as part of the new BioML track of OAEI 2022.

Knowledge graph embedding is a representation learning technique that projects the entities and relations of a knowledge graph into a continuous vector space. Embedding has gained much traction and is heavily used in link prediction and other downstream prediction tasks. Most approaches are evaluated on a single task or a single group of tasks to determine overall performance. The evaluation is then evaluated in terms of how well the embedded approach performs on the task at hand. Yet, what information the embedding approach is actually learning to represent is rarely evaluated (and often not even deeply understood).
To fill this gap, we present the DLCC (Description Logic Class Constructors) benchmark. This is a resource for analyzing embedded approaches in terms of which types of classes they can represent. Two gold standards are presented: one based on DBpedia, a real-world knowledge graph, and the other on synthesis. We also provide an evaluation framework that implements an experimental protocol so that researchers can directly use the gold standard. to demonstrate the use of DLCC, we compared several embedding methods using the gold standard. We found that many DL constructs on DBpedia are actually learned by recognizing different correlation patterns than those defined in the gold standard, and that certain DL constructs, such as cardinality constraints, are particularly difficult to learn with most embedding approaches.

This paper presents µ KG, an open source Python library for representation learning on knowledge graphs. µKG supports multi-source knowledge graphs (and single knowledge graphs), multiple deep learning libraries (PyTorch and TensorFlow2), multiple embedded tasks (link prediction, entity alignment, entity typing, and multi-source link prediction), and multiple parallel computation modes (multi-process and multi-GPU computation). Currently implementing 26 popular knowledge graph embedding models and supporting 16 benchmark datasets, μKG provides an advanced implementation of embedding techniques with a simplified pipeline of different tasks. It also provides high quality documentation for ease of use. μ KG is more comprehensive than existing knowledge graph embedding libraries. It is useful for thorough comparison and analysis of various embedded models and tasks. We show that jointly learned embeddings can be very useful for knowledge-powered downstream tasks, such as multi-hop knowledge graph question answering. We plan to keep abreast of the latest developments in related fields and incorporate them into µ KG.

Knowledge graphs are emerging as one of the most popular means for data linkage, transformation, integration, and sharing, promising improved data visibility and reusability. Immunogenetics is a branch of the life sciences that studies the genetics of the immune system. Due to the complexity and relevance of immunogenetics data, knowledge graphs are a promising option for representing and describing immunogenetic entities and relationships, thus enabling many applications, but little effort has been devoted to the construction and use of knowledge graphs. In this paper, we present the IMGT Knowledge Graph (IMGT-KG), the first FAIR knowledge graph in immunogenetics, which creates links between databases by retrieving and integrating data from different immunogenetics databases. As a result, IMGT-KG provides access to 79 670 110 triplets with 10 430 268 entities, 673 concepts, and 173 properties. IMGT-KG reuses many existing terms from domain ontologies and vocabularies and provides the same external links to other resources in the domain, and also provides a set of rules to apply Allen Interval Algebra to derive inferences about sequence positions. Such inferences allow, for example, inferences about genomic sequence positions IMGT-KG bridges the gap between genomic and protein sequences, opening the way to effective querying and integrated immuno-omics analysis. IMGT-KG provides detailed documentation and an open, free, and easy-to-use interface for access and exploration, along with a open and freely available, along with a web interface for

In recent years, several relation extraction (RE) models have been developed to extract knowledge from natural language texts. Accordingly, several benchmark datasets have been proposed to evaluate these models. These RE datasets consist of natural language sentences with a certain number of relations from a particular domain. While useful for generic RE benchmarks, it is not possible to generate customized micro-benchmarks according to user-specified criteria for specific use cases. Microbenchmarks are key to testing individual features of a system and pinpointing component-based insights. In this paper, we propose REBench, a framework for microbenchmarking RE systems, which allows users to select customized relation samples from existing RE datasets from various domains. The framework is flexible enough to select relation samples of different sizes according to user-defined criteria regarding the essential features to be considered in the RE benchmark. We used different clustering algorithms to generate micro-benchmarks. We evaluated state-of-the-art RE systems using a sample of different RE benchmarks. The evaluation results show that specialized microbenchmarks are important to reveal the limitations of various RE models and their components.

Faced with an ever-increasing number of scientific papers, researchers struggle to find and make sense of papers relevant to their research. Scientific open archives play a central role in dealing with this vast amount of information, but keyword-based search services often fail to capture the richness of semantic relationships among articles. This paper presents the methods, tools, and services implemented in the ISSA project to address these issues. The objectives of the project are (1) to provide a generic, reusable, and extensible pipeline for analyzing and processing papers in the Open Science Archive, (2) to transform the results into a semantic index stored and represented as an RDF knowledge graph, and (3) to provide a tool for researchers, decision makers, and scientific information professionals to explore thematic association rules, networks of co-publications, and co-occurring topics; and (4) to develop innovative search and visualization services that leverage the index. To demonstrate the effectiveness of this solution, we will also report on its deployment and user-driven customization to meet the needs of an institutional public archive with over 110,000 resources. Fully in line with the dynamics of open science and FAIR, the published works are available under an open license with all the necessary accompanying documentation to facilitate their reuse. The knowledge graphs created in our use cases comply with common linked open data best practices.

In recent years, several methods have emerged to create machine-readable, semantically rich, and interlinked descriptions of research publications (typically encoded as knowledge graphs). A common limitation of these solutions is that they either rely on human experts to summarize information from the literature or focus on a specific research area, resulting in a small number of papers being handled. In this paper, we introduce the Computer Science Knowledge Graph (CS-KG). It is a large-scale knowledge graph consisting of over 350M RDF triples describing 41M descriptions from 6.7M papers on 10M entities, linked by 179 semantic relations. the CS-KG is automatically generated by applying an information extraction pipeline against a large repository of research papers and and will be updated on a regular basis.CS-KG is much larger than similar solutions and represents a very comprehensive representation of tasks, methods, materials, and metrics in computer science. It can support a variety of intellectual services such as advanced literature search, document classification, article recommendation, trend prediction, hypothesis generation, etc. CS-KG has been evaluated against benchmarks of manually annotated sentences with excellent results.

Stream-inference query languages such as CQELS and C-SPARQL are capable of query answering for RDF streams. However, there is a lack of efficient RDF stream generators for RDF stream inference. Modern RDF stream generators are limited in the speed and amount of streaming data they can handle; to generate RDF streams in a scalable and efficient manner, we have extended RMLStreamer to also generate RDF streams from dynamic heterogeneous data streams. In this paper, we present a scalable solution that relies on a dynamic window approach to generate low-latency, high-throughput RDF streams from multiple heterogeneous data streams. Our evaluation shows that it outperforms the state-of-the-art solution by achieving millisecond latency (compared to the seconds required by the state-of-the-art solution), constant memory usage for all workloads, and sustainable throughput of about 70,000 records/sec (compared to the 10,000 records/sec taken by the state-of-the-art solution) The results of the study showed that the system is able to This allows access to a large number of data streams for integration with the Semantic Web.

WDBench is a query benchmark for Wikidata-based knowledge graphs using real queries extracted from the public query logs of the Wikidata SPARQL endpoint. In recent years, many benchmarks for graph databases (including SPARQL engines) have been proposed, but few are based on real data, even fewer use real queries, and still fewer can compare a SPARQL engine with a (non-SPARQL) graph database. Wikidata’s raw query logs contain millions of diverse queries, making it prohibitively expensive to run all such queries and difficult to draw conclusions from the combination of functions these queries use. We focus on the three main query features common to SPARQL and graph databases: (i) the Basic Graph Pattern, (ii) the Option Graph Pattern, (iii) the Path Pattern, and (iv) the Navigational Graph Pattern. To test these patterns, we extract queries from Wikidata logs, remove nonstandard features, eliminate duplicates, classify them into subsets with different structures, and display them in two different syntaxes. Using this benchmark, we present and compare the evaluation results of queries using Blazegraph, Jena/Fuseki, Virtuoso, and Neo4j.

In-Use Track

Research publishing companies need to constantly monitor and compare scientific journals and conferences to make critical business and editorial decisions. Semantic Web and Knowledge Graph technologies are the obvious solution, as they enable these companies to integrate, represent, and analyze large amounts of information from disparate sources. In this paper, we present AIDA Dashboard 2.0, an innovative system for analyzing and comparing scientific fields, developed in collaboration with Springer Nature and now available to the public. The tool integrates information from 25 million research articles from Microsoft Academic Graph, Dimensions, DBpedia, GRID, CSO, and INDUSO, and is built on a knowledge graph containing over 1.5 billion RDF triples. It can produce sophisticated analyses and rankings not available in alternative systems. In this paper, we describe the benefits of this solution for Springer Nature’s editorial process and present a user study of five editors and five researchers, reporting excellent results in terms of analysis quality and usability.

This paper presents experiences gained in the context of a European pilot project funded by the ISA2 program. The objective is to build a semantic knowledge graph that establishes a distributed data space for public procurement. The results obtained from the construction of the knowledge graph, the follow-up activities and the main lessons learned will be described. This latter should support different data governance scenarios: some partners will use their own tools to control the construction process of their part of the knowledge graph. Other partners are participating in the pilot by providing only open CSV/XML/JSON datasets. These transformations will be performed on infrastructure provided by the European Big Data Testing Infrastructure (BDTI). This paper presents the design and implementation of a knowledge graph construction process within such a BDTI infrastructure. By instantiating the OWL ontology created for this purpose, we are able to provide a declarative description of the entire workflow required to transform input data into RDF output data, which forms the knowledge graph. This declarative description is used as instructions to the workflow engine (Apache Airflow) used to construct the knowledge graph.

Automated construction of knowledge graphs (KGs) is widely used in industry for data integration and access, and there are several approaches that enable (semi-)automated construction of knowledge graphs. One important approach is to map raw data to a given knowledge graph schema (often a domain ontology) and construct entities and properties according to the ontology. However, existing knowledge graph construction methods are not always efficient and the resulting knowledge graphs are not sufficiently application-oriented and user-friendly. The challenge arises from the trade-off that domain ontologies should be knowledge-oriented to reflect general domain knowledge rather than data specificity, while knowledge graph schemas should be data-oriented to cover all data features. If the former is used as-is as a knowledge graph schema, problems may arise, such as classes that do not map to data or deep knowledge graph structures that generate blank nodes. Therefore, we propose a system of ontology reshaping that generates a knowledge graph schema that fully covers the data but also sufficiently covers domain knowledge. Keywords semantic data integration knowledge graph ontology reshaping graph algorithm automatic knowledge graph construction

Data analysis, including ML, is essential for gaining insight from production data in modern industry. However, industrial ML is affected by the low transparency of ML to non-ML experts, insufficient and inconsistent description of ML practices for review and understanding, ad hoc fashion of ML solutions tailored to specific applications, affecting their reusability, and so on. To address these challenges, we propose a viable Knowledge Graph (KG) concept and system. This system relies on semantic technologies to formally describe knowledge about ML and solutions in a KG and transform them into reusable and modularized executable scripts. Furthermore, the executable KG serves as a common language between ML experts and non-ML experts, facilitating their communication. We evaluated this system extensively through user studies, workshops, and scalability evaluations using Bosch’s industrial use cases. The results demonstrated that the system provides a user-friendly way for non-ML experts to discuss, customize, and reuse ML methods.

Digital twins are digital representations of systems in the Internet of Things (IoT), often based on AI models trained on data from those systems. linking these data sets from different stages of the IoT system life cycle, AI modeling pipeline automatically, the use of semantic models is increasing. This combination of semantic models and AI pipelines running on external datasets poses unique challenges, especially when deployed on a large scale. In this paper, we discuss the unique requirements of applying semantic graphs to automate the digital twin for a variety of practical use cases. We present a benchmark dataset DTBM that reflects these characteristics and discuss the scaling challenges of various knowledge graph technologies. Based on these insights, we propose a reference architecture used in several IBM products and derive lessons learned for scaling knowledge graphs to compose AI models for Digital Twins.

Research departments play an important role in driving innovation in their organizations. However, as the volume of information grows, it is becoming increasingly difficult for both researchers and managers to gain insights, track trends, keep track of new research, and develop strategies. In this paper, IBM Research, a corporate research community, leverages Semantic Web technologies to integrate various applications used by the community on research projects, papers, datasets, achievements, and perceptions, both structured and textual data This section presents a use case for inducing a unified knowledge graph from To make this knowledge graph more accessible to application developers, we have identified common patterns for using the induced knowledge and exposed them as an API. These patterns emerged from user research that identified the most valuable use cases and user pain points that should be mitigated. We outline two different scenarios: recommendations and analysis for business use. We describe these scenarios in detail and provide an empirical evaluation of entity recommendations in particular. The methodology used in this study and the lessons learned from it can be applied to other organizations facing similar challenges.

The Internet of Things (IoT) is transforming industries by bridging the gap between information technology (IT) and operational technology (OT). Machines are being integrated with connected sensors and managed by intelligent analytics applications, accelerating digital transformation and business operations. The introduction of machine learning (ML) into industrial devices is an advancement aimed at facilitating the convergence of IT and OT. However, the development of ML applications in the Industrial IoT (IIoT) faces a number of challenges, including hardware heterogeneity, non-standard representations of ML models, device and ML model compatibility issues, and slow application development. Successful deployment in this area requires a deep understanding of hardware, algorithms, software tools, and applications. Therefore, to support rapid development of ML applications in IIoT, this paper presents a framework built on a low-code platform that leverages Semantic Web technologies, called Semantic Low-Code Engineering for ML Applications (SeLoC- ML), a framework that allows non-experts to easily model, discover, reuse, and match-make ML models and devices at scale. Based on the matching results, project code can be automatically generated for deployment in hardware. Semantic application templates, called recipes, allow developers to quickly prototype end-user applications. Evaluations have shown the efficiency and usefulness of SeLoC-ML in industrial ML classification case studies, identifying at least a 3x reduction in engineering effort compared to traditional approaches. We share the code and welcome any contributions.

コメント

タイトルとURLをコピーしました