KI2021
KI 2021 was the 44th German Conference on Artificial Intelligence organized in cooperation with the Fachbereich Künstliche Intelligenz der Gesellschaft für Informatik (GI). The conference took place in an online fashion during September 27 to October 1, 2021. The German AI Conference basically started 45 years ago with the first GI-Fachgruppe KI meeting on October 7, 1975. KI is one of the major European AI conferences and traditionally brings together academic and industrial researchers from all areas of AI, providing an ideal place for exchanging news and research results on theory and applications. KI 2021 was organized in combination with INFORMATIK 2021, and we would like to thank Daniel Krupka and Alexander Scheibe from GI for their collaboration.
The technical program of KI 2021 comprised paper and poster presentations as well as tutorials and workshops. Overall KI 2021 received about 60 submissions of which 21 were selected as technical communications and papers, together with 6 poster presentations. We were honored that very prominent researchers kindly agreed to give very interesting keynote talks (alphabetical order, see also the abstracts below):
- – Tristan Cazenave (Université Paris-Dauphine, France) Monte Carlo Search
- – Giuseppe De Giacomo (Sapienza University of Rome, Italy) Autonomy in AI: Reactive Synthesis, Planning and Reinforcement Learning in Linear Temporal Logic on Finite Traces
- – Birte Glimm (University of Ulm, Germany) Ontologies for Providing Map Knowledge to Autonomous Vehicles
- – Kristian Kersting (TU Darmstadt, Germany) The Third Wave of AI [Joint Keynote with INFOMATIK2021]
- – Katja Mombaur (University of Waterloo, Canada) Motion Intelligence for Human-Centred Robots
- – Stuart Russell (University of California, Berkeley, USA) Human-Compatible Artificial Intelligence
An extensive range of special meetings, a tutorial, and several workshops rounded off the program: Special Events- – CLAIRE National Meeting
- – Early Career Research Consortium
- – Meeting of the FBKI task force “AI in Education” (Arbeitskreis KiS)
vi Preface
Tutorial
– Christoph Stockhammer and Mihaela Jarema: Deep Learning Workflows for Biomedical Signal Data – A Practical Example
Workshops
- – Christoph Beierle, Marco Ragni, Frieder Stolzenburg, and Matthias Thimm: 7th Workshop on Formal and Cognitive Reasoning FCR 2021
- – Barbara Hammer, Malte Schilling, and Laurenz Wiskott: Trustworthy AI in the Wild
- – Ulrich John, Petra Hostedt, and Mario Wenzel: 35th Workshop on (Constraint) Logic Programming (WLP 21)
- – Sylvia Melzer, Stefan Thiemann, and Jost Gippert: Humanities-Centred AI (CHAI)
- – Jürgen Sauer and Stefan Edelkamp: Planen und Konfigurieren (PuK)
- – Andreas Hein, Mark Schweda, Silke Schicktanz, Stefan Teipel, and Thomas Kirste: Artificial Intelligence and Ethics As Program Committee (PC) chairs, we would like to thank our speakers for their interesting and inspirational talks. Our thanks also go out to the organizers of INFORMATIK 2021 who provided support in terms of registration and setting up a virtual conference. We would like to thank the Program Committee members and additional reviewers for their efforts. Without their substantial voluntary work, this conference would not have been possible. We would also like to thank EasyChair for their support in handling submissions and Springer for their support in making these proceedings possible. Our institutions, the Czech Technical University in Prague (Czech Republic), the University of Lübeck (Germany), and the University of Leoben (Austria), also provided support for our participation, for which we are grateful. Many thanks go to Tanya Braun and Marcel Gehrke for helping with web pages and pro- ceedings. We also thank the Fachbereich Künstliche Intelligenz der Gesellschaft für Informatik, in particular Matthias Klusch and Ingo Timm, for their ongoing support and dedication to KI 2021. Last but not least, we would like to thank our sponsors:
- – Springer Verlag (https://www.springer.com)
- – DFKI (https://www.dfki.de/)
- – team neusta the digital family (https://www.team-neusta.de)
- – singularIT (https://www.singular-it.de)
- – PRC (https://www.pattern-recognition-company.com)
Technical Programme
Contents
In this paper we present a new approach to tackle complex routing problems with an improved state representation that utilizes the model complexity better than previous methods. We enable this by training from temporal differences. Specifically Q-Learning is employed. We show that our approach achieves state-of-the-art performance for autoregressive policies that sequentially insert nodes to construct solutions on the CVRP. Additionally, we are the first to tackle the MDVRP with machine learning methods and demonstrate that this problem type greatly benefits from our approach over other ML methods
Principal component analysis (PCA), a well-known technique in machine learning and statistics, is typically applied to time-independent data, as it is based on point-wise correlations. Dynamic PCA (DPCA) handles this issue by augmenting the data set with lagged versions of itself. In this paper, we show that both, PCA and DPCA, are a special case of κ-circulant maximum variance bases. We formulate the constrained linear optimization problem of finding such
-circulant bases and present a closed-form solution that allows further interpretation and significant speed-up for DPCA. Furthermore, the relation of the proposed bases to the discrete Fourier transform, finite impulse response filters as well as spectral density estimation is pointed out.
Recent developments in the propositional representation of achievement games have renewed interest in applying the latest advances in Quantified Boolean Formula technologies to solving these games. However, the number of quantifier alternations necessary to explore the solution space still impairs and limits the applicability of these methods. In this paper, we show that one can encode blocking strategies for the second player and express the last moves of the play with a single string of existential quantifiers, instead of the usual alternations of universal and existential quantifiers. We experimentally show that our method improves the performance of state-of-the-art Quantified Boolean Formula solvers on Harary’s Tic-Tac-Toe, a well-known achievement game.
Knowledge graphs offer a powerful framework to structure and represent financial information in a flexible way by describing real world entities, such as financial securities, and their interrelations in the form of a graph. Semantic question answering systems allow to retrieve information from a knowledge graph using natural language questions and thus eliminate the need to be proficient in a formal query language. In this work, we present a proof-of-concept design for a financial knowledge graph and with it a semantic question answering framework specifically targeted for the finance domain. Our implemented approach uses a span-based joint entity and relation extraction model with BERT embeddings to translate a single-fact natural language question into its corresponding formal query representation. By employing a joint extraction model, we alleviate the concern of error propagation present in standard pipelined approaches for classification-based question answering. The presented framework is tested on a synthetic dataset derived from the instances of the implemented financial knowledge graph. Our empirical findings indicate very promising results with a F1-score of 84.60% for relation classification and 97.18% for entity detection.
Creating datasets for supervised learning is a very challenging and expensive task, in which each input example has to be annotated with its expected output (e.g. object class). By combining unsupervised and semi-supervised learning, semi-unsupervised learning proposes a new paradigm for partially labeled datasets with additional unknown classes. In this paper we focus on a better understanding of this new learning paradigm and analyze the impact of the amount of labeled data, the number of augmented classes and the selection of hidden classes on the quality of prediction. Especially the number of augmented classes highly influences classification accuracy, which needs tuning for each dataset, since too few and too many augmented classes are detrimental to classifier performance. We also show that we can improve results on a large variety of datasets when using convolutional networks as feature extractors while applying output driven entropy regularization instead of a simple weight based L2 norm.
Transformer models have recently attracted much interest from computer vision researchers and have since been successfully employed for several problems traditionally addressed with convolutional neural networks. At the same time, image synthesis using generative adversarial networks (GANs) has drastically improved over the last few years. The recently proposed TransGAN is the first GAN using only transformer-based architectures and achieves competitive results when compared to convolutional GANs. However, since transformers are data-hungry architectures, TransGAN requires data augmentation, an auxiliary super-resolution task during training, and a masking prior to guide the self-attention mechanism. In this paper, we study the combination of a transformer-based generator and convolutional discriminator and successfully remove the need of the aforementioned required design choices. We evaluate our approach by conducting a benchmark of well-known CNN discriminators, ablate the size of the transformer-based generator, and show that combining both architectural elements into a hybrid model leads to better results. Furthermore, we investigate the frequency spectrum properties of generated images and observe that our model retains the benefits of an attention based generator.
In the last years, XAI research has mainly been concerned with developing new technical approaches to explain deep learning models. Just recent research has started to acknowledge the need to tailor explanations to different contexts and requirements of stakeholders. Explanations must not only suit developers of models, but also domain experts as well as end users. Thus, in order to satisfy different stakeholders, explanation methods need to be combined. While multi-modal explanations have been used to make model predictions more transparent, less research has focused on treating explanation as a process, where users can ask for information according to the level of understanding gained at a certain point in time. Consequently, an opportunity to explore explanations on different levels of abstraction should be provided besides multi-modal explanations. We present a process-based approach that combines multi-level and multi-modal explanations. The user can ask for textual explanations or visualizations through conversational interaction in a drill-down manner. We use Inductive Logic Programming, an interpretable machine learning approach, to learn a comprehensible model. Further, we present an algorithm that creates an explanatory tree for each example for which a classifier decision is to be explained. The explanatory tree can be navigated by the user to get answers of different levels of detail. We provide a proof-of-concept implementation for concepts induced from a semantic net about living beings.
As global trends are shifting towards data-driven industries, the demand for automated algorithms that can convert digital images of scanned documents into machine readable information is rapidly growing. Besides the opportunity of data digitization for the application of data analytic tools, there is also a massive improvement towards automation of processes, which previously would require manual inspection of the documents. Although the introduction of optical character recognition technologies mostly solved the task of converting human-readable characters from images into machine-readable characters, the task of extracting table semantics has been less focused on over the years. The recognition of tables consists of two main tasks, namely table detection and table structure recognition. Most prior work on this problem focuses on either task without offering an end-to-end solution or paying attention to real application conditions like rotated images or noise artefacts inside the document image. Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition due to the lack of sufficiently large datasets. In this paper we present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition. It utilizes state-of-the-art deep learning models for table detection and differentiates between 3 different types of tables based on the tables’ borders. For the table structure recognition we use a deterministic non-data driven algorithm, which works on all table types. We additionally present two algorithms. One for unbordered tables and one for bordered tables, which are the base of the used table structure recognition algorithm. We evaluate Multi-Type-TD-TSR on the ICDAR 2019 table structure recognition dataset and achieve a new state-of-the-art.
Neural architecture search (NAS) is a promising method to ascertain network architecture automatically and to build a suitable network for a particular application without any human intervention. However, NAS requires huge computation resources to find the optimal parameters of a network in the training phase of each search. Because a trade-off generally exists between model size and accuracy in deep learning models, the model size tends to increase in pursuit of higher accuracy. In applications with limited resources, such as edge AI, reducing the network weight might be more important than improving its accuracy. Alternatively, achieving high accuracy with maximum resources might be more important. The objective of this research is to find a model with sufficient accuracy with a limited number of weights and to reduce the search time. We improve the Differentiable Network Search (DARTS) algorithm, one of the fastest NAS methods, by adding another constraint to the loss function, which limits the number of network weights. We evaluate the proposed algorithm using three constraints. Compared to the conventional DARTS algorithm, the proposed algorithm reduces the search time by up to 40% when the model size range is set properly. It achieves comparable accuracy with that of DARTS.
The semantic segmentation of aerial images enables many useful applications such as tracking city growth, tracking deforestation, or automatically creating and updating maps. However, gathering enough training data to train a proper model for the automated analysis of aerial images is usually too labor-intensive and thus too expensive in most cases. Therefore, domain adaptation techniques are often necessary to be able to adapt existing models or to transfer knowledge from existing datasets to new unlabeled aerial images. Modern adaptation approaches make use of complex architectures involving many model components, losses and loss weights. These approaches are hard to apply in practice since their hyperparameters are hard to optimize for a given adaptation problem. This complexity is the result of trying to separate domain-invariant elements, e.g., structures and shapes, from domain-specific elements, e.g., textures. In this paper, we present a novel model for semantic segmentation, which not only achieves state-of-the-art performance on aerial images, but also inherently learns separate feature representations for shapes and textures. Our goal is to provide a model which can serve as the basis for future domain adaptation approaches which are simpler but still effective. Through end-to-end training our deep learning model learns to map aerial images to feature representations which can be decoded into binary space partitioning trees, a resolution-independent representation of the semantic segmentation, which can then be rendered into a pixelwise semantic segmentation in a differentiable way.
Time series forecasting is a growing domain with diverse applications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned fore-casting model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS-GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR com-bines online change point detection with a refitting of the prediction model using data augmentation for samples prior to a change point. Our experiments on sim-ulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8 % lower RMSE on different real-world datasets compared to methods with a similar computational resource con-sumption. Furthermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online fore-casting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: https://github.com/grimmlab/evars-gpr
Deep neural networks (DNNs) offer a means of addressing the challenging task of clustering high-dimensional data. DNNs can extract useful features, and so produce a lower dimensional representation, which is more amenable to clustering techniques. As clustering is typically performed in a purely unsupervised setting, where no training labels are available, the question then arises as to how the DNN feature extractor can be trained. The most accurate existing approaches combine the training of the DNN with the clustering objective, so that information from the clustering process can be used to update the DNN to produce better features for clustering. One problem with this approach is that these “pseudo-labels” produced by the clustering algorithm are noisy, and any errors that they contain will hurt the training of the DNN. In this paper, we propose selective pseudo-label clustering, which uses only the most confident pseudo-labels for training the DNN. We formally prove the performance gains under certain conditions. Applied to the task of image clustering, the new approach achieves a state-of-the-art performance on three popular image datasets.
To improve the accuracy of convolutional neural networks in discriminating between nevi and melanomas, we test nine different combinations of masking and cropping on three datasets of skin lesion images (ISIC2016, ISIC2018, and MedNode). Our experiments, confirmed by 10-fold cross-validation, show that cropping increases classification performances, but specificity decreases when cropping is applied together with masking out healthy skin regions. An analysis of Grad-CAM saliency maps shows that in fact our CNN models have the tendency to focus on healthy skin at the border when a nevus is classified.
We present a virtual reality (VR) application that enables us to interactively explore and manipulate image clusters based on layer activations of convolutional neural networks (CNNs). We apply dimensionality reduction techniques to project images into the 3D space, where the user can directly interact with the model. The user can change the position of an image by using natural hand gestures. This manipulation triggers additional training steps of the network, based on the new spatial information and new label of the image. After the training step is finished, the visualization is updated according to the new output of the CNN. The goal is to visualize and improve the cluster output of the model, and at the same time, to improve the understanding of the model. We discuss two different approaches for calculating the VR projection, a combined PCA/t-SNE dimensionality reduction based approach and a variational auto-encoder (VAE) based approach.
The increasing availability of audio data on the internet lead to a multitude of datasets for development and training of text to speech applications, based on neural networks. Highly differing quality of voice, low sampling rates, lack of text normalization and disadvantageous alignment of audio samples to corresponding transcript sentences still limit the performance of deep neural networks trained on this task. Additionally, data resources in languages like German are still very limited. We introduce the “HUI-Audio-Corpus-German”, a large, open-source dataset for TTS engines, created with a processing pipeline, which produces high quality audio to transcription alignments and decreases manual effort needed for creation.
Negation is both an operation in formal logic and in natural language by which a proposition is replaced by one stating the opposite, as by the addition of “not” or another negation cue. Treating negation in an adequate way is required for cognitive reasoning, which aims at modeling the human ability to draw meaningful conclusions despite incomplete and inconsistent knowledge. One task of cognitive reasoning is answering questions given by sentences in natural language. There are tools based on discourse representation theory to convert sentences automatically into a formal logic representation, and additional knowledge can be added using the predicate names in the formula and knowledge databases. However, the knowledge in logic databases in practice always is incomplete. Hence, forward reasoning of automated reasoning systems alone does not suffice to derive answers to questions because, instead of complete proofs, often only partial positive knowledge can be derived, while negative knowledge is used only during the reasoning process. In consequence, we aim at eliminating syntactic negation, strictly speaking, the negated event or property. In this paper, we describe an effective procedure to determine the negated event or property in order to replace it by its inverse. This lays the basis of cognitive reasoning, employing both logic and machine learning for general question answering. We evaluate our procedure by several benchmarks and demonstrate its practical usefulness in our cognitive reasoning system.
Given the increasing threat of adversarial attacks on deep neural networks (DNNs), research on efficient detection methods is more important than ever. In this work, we take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. Our method is able to detect adversarial examples generated by various attacks, and can be easily adopted to a plethora of deep classification models. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement. We perform an extensive empirical analysis on different deep classification models, investigating various state-of-the-art adversarial attacks. Moreover, we observe that our proposed method is better at detecting a combination of adversarial attacks. This work indicates the potential of detecting various adversarial attacks simply by using the class scores of an already trained classification model
The abductive theory of method (ATOM) was recently proposed to describe the process that scientists use for knowledge discovery. In this paper we propose an agent architecture for knowledge discovery and evolution (KDE) based on ATOM. The agent incorporates a combination of ontologies, rules and Bayesian networks for representing different aspects of its internal knowledge. The agent uses an external AI service to detect unexpected situations from incoming observations. It then uses rules to analyse the current situation and a Bayesian network for finding plausible explanations for unexpected situations. The architecture is evaluated and analysed on a use case application for monitoring daily household electricity consumption patterns.
Interactive and collaborative approaches have been successfully used in educational scenarios. For machine learning and AI, however, such approaches typically require a fair amount of technical expertise. In order to reach everyday users of AI technologies, we propose and evaluate a new interactive approach to help end-users gain a better understanding of AI: A participatory machine learning show. During the show, participants were able to collectively gather corpus data for a neural network for keyword recognition, and interactively train and test its accuracy. Furthermore, the network’s decisions were explained by using both an established XAI framework (LIME) and a virtual agent. In cooperation with a museum, we ran several prototype shows and interviewed participating and non-participating visitors to gain insights about their attitude towards (X)AI. We could deduce that the virtual agent and the inclusion of XAI visualisations in our edutainment show were generally rated positively by participants, even though the frameworks we used were originally designed for experts. When comparing both groups, we found that participants felt significantly more competent and positive towards technology compared to non-participating visitors. Our findings suggests that the consideration of specific user needs, personal background, and mental models about (X)AI systems should be included in the XAI design for end-users
Counting and sampling directed acyclic graphs (DAGs) from a Markov equivalence class are fundamental tasks in graphical causal analysis. In this paper, we discuss recently proposed polynomial-time algorithms for these tasks. The presented result solves a long-standing open problem in graphical modelling. Experiments show that the proposed algorithms are implementable and effective in practice. Our paper is an extended abstract of the work [24], honored as an AAAI-21 Distinguished Paper at the 35th AAAI Conference on Artificial Intelligence.
The PC algorithm is one of the most prominent constraint-based methods for learning causal structures from observational data. The algorithm relies on conditional independence (CI) tests to infer the structure and its time consumption heavily depends on the number of performed CI tests. We present a modification, called ED-PC, such that – in the oracle model – both ED-PC and the original PC algorithm infer the same structure. However, by using a new idea allowing the detection of a v-structure without explicit knowledge of a separating set, our method reduces the number of needed CI tests significantly. This is made possible by detecting nonadjacencies considerably earlier.
Explainable Artificial Intelligence (AI) has emerged to be a key component for Black-Box Machine Learning (ML) approaches in domains with a high demand for transparency. Besides medical expert systems, which inherently need to be interpretable, transparent, and comprehensible as they deal with life-changing decision tasks, other application domains like financial auditing require trust in ML as well. The European General Data Protection Regulation (GDPR) also applies to such highly regulated areas where an auditor evaluates financial transactions and statements of a business. In this paper we propose an ML architecture that shall help financial auditors by transparently detecting anomalous datapoints in the absence of ground truth. While most of the time Anomaly Detection (AD) is performed in a supervised manner, where model-agnostic explainers can be easily applied, unsupervised AD is hardly comprehensible especially across different algorithms. In this work we investigate how to dissolve this: We describe an integrated architecture for unsupervised AD that identifies outliers at different levels of granularity using an ensemble of independent algorithms. Furthermore, we show how model-agnostic explanations can be generated for such an ensemble using supervised approximation and Local Interpretable Model-Agnostic Explanations (LIME). Additionally, we propose techniques for explanation-post-processing that allow explanations to be selective, receiver-dependent, and easily understandable. In a nutshell, our architecture paves the way for model-agnostic explainability for the task of unsupervised AD. It can further be transferred smoothly to other unsupervised ML problems like clustering problems.
Lifted inference approaches reduce computational work as inference is performed using representatives for sets of indistinguishable random variables, which allows for tractable inference w.r.t. domain sizes in dynamic probabilistic relational models. Unfortunately, maintaining a lifted representation is challenging in practically relevant application domains, as evidence often breaks symmetries making lifted techniques fall back on their ground counterparts. In existing approaches asymmetric evidence is counteracted by merging similar but distinguishable objects when moving forward in time. While undoing splits a posteriori is reasonable, we propose learning approximate model symmetries a priori to prevent unnecessary splits due to inaccuracy or one-time events. In particular, we propose a multivariate ordinal pattern symbolization approach followed by spectral clustering to determine sets of domain entities behaving approximately the same over time. By using object clusters, we avoid unnecessary splits by keeping entities together that tend to behave the same over time. Understanding symmetrical and asymmetrical entity behavior a priori allows for increasing accuracy in inference by means of inferred evidence for unobserved entities to better represent reality. Empirical results show that our approach reduces unnecessary splits, i.e., improves runtimes, while keeping accuracy in inference high.
Modelling agent-environment interactions in an agent-based simulation requires careful design choices. Selecting an interaction partner forms an often neglected, but essential element.
In this paper we introduce affordance schemata as an element of agent-based simulation models. We describe how affordances can be generated based on them during a running simulation to capture action potential that an interaction partner offers. We illustrate the introduced concepts with a small proof-of-concept implementation.
This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that, we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic’s score of a high score image and increase the critic’s score of a low score image by swapping the masked areas between these two images. We trained the model on an imitation learning dataset from the NeurIPS 2020 MineRL Competition Track, where our model learned to mask rewarding objects in a complex interactive 3D environment with a sparse reward signal. This approach was part of the 1st place winning solution in this competition. Video demonstration and code: this https URL
This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that our approach outperforms existing domain adaption strategies. Furthermore, when utilizing entire training data in the target domain, we are able to compete with several state-of-the-art approaches in final classification accuracy just by applying standard network architectures and using image-level labels.
Automated systems for assisting persons to achieve their everyday tasks are gaining popularity, both in the application domains for supporting healthy persons, as well as for assisting people with impairments. The development of such assistive systems is a challenging task associated with a lot of time and effort and often requires the involvement of domain experts. To address this problem, different works have investigated the automated knowledge extraction and model generation for behaviour interpretation and assistance. Existing works, however, usually concentrate on one source of data for the task of automated knowledge generation, which could potentially result in simpler models that are unable to adequately support the person. To address this problem, in this work we present the BehavE methodology, which proposes the extraction of knowledge from different types of sources and its consolidation into a unified semantic model that is used for behaviour interpretation and generation of assistance strategies.
コメント