Machine Learning Technology Artificial Intelligence Technology Natural Language Processing Technology Semantic Web Technology Collecting AI Conference Papers Deep Learning Technology Knowledge Information Processing Ontology Technology Digital Transformation Technology
From Knowledge Graph and Semantic Computing 2018
This volume contains the papers presented at CCKS 2018: the China Conference on Knowledge Graph and Semantic Computing held during August 14–17, 2018, in Tianjin. CCKS is organized by the Technical Committee on Language and Knowledge Computing of the Chinese Information Processing Society (CIPS). CCKS 2018 was the third edition of the conference series, where the first edition was held in Beijing in 2016, and the second edition held in Chengdu in 2017. The first edition, CCKS 2016, was the merger of two premier relevant forums held previously: the Chinese Knowledge Graph Symposium (KGS) and the Chinese Semantic Web and Web Science Conference (CSWS). KGS was held in Beijing in 2013, in Nanjing in 2014, and in Yichang in 2015. CSWS was first held in Beijing in 2006, and has been the main forum for research on the Semantic (Web) technologies in China for nearly ten years. CCKS covers wide research fields including knowledge graph, the Semantic Web, linked data, NLP, knowledge representation, graph databases etc. It aims to be the top forum on knowledge graph and semantic technologies for Chinese researchers and practitioners from academia, industry, and government. The theme of this year was “Knowledge Computing and Language Understanding.” There were 101 submissions. Each submission was reviewed by at least two, and on average 3.1, Program Committee members. The committee decided to accept 29 full/short papers (including 15 papers written in English and 14 papers written in Chinese). The program also included four invited keynotes, six tutorials, four shared tasks, two industrial forums, and one panel. The CCKS volume contains revised versions of 12 full/short papers written in English. This year’s invited talks were given by Prof. Bo Zhang from Tsinghua University, Prof. James A. Hendler from Rensselaer Polytechnic Institute, Dr. Hui Qiang from Alibaba, and Prof. Roberto Navigli from Sapienza University of Rome. The hard work and close collaboration of a number of people contributed to the success of this conference. We would like to thank the members of the Organizing Committee and Program Committee for their support as well as the authors and participants, who are the primary reason of the success of this conference.
Answering geography questions in a university’s entrance exam (e.g., Gaokao in China) is a new AI challenge. In this paper, we analyze its difficulties in problem understanding and solving, which suggest the necessity of developing novel methods. We present a pipeline approach that mixes information retrieval techniques with knowledge engineering and exhibits an interpretable problem solving process. Our implementation integrates question parsing, semantic matching, and spreading activation over a knowledge graph to generate answers. We report its promising performance on a representative sample of 1,863 questions used in real exams. Our analysis of failures reveals a number of open problems to be addressed in the future.
Temporal tagging plays an important role in many tasks such as event extraction and reasoning. Extracting Chinese temporal expressions is challenging because of the diversity of time phrases in Chinese. Usually researchers use rule-based methods or learning-based methods to extract temporal expressions. Rule-based methods can often achieve good results in certain types of text such as news but multi-type text with complex time phrases. Learning-based methods often require large amounts of annotated corpora which are hard to get, and the training data is difficult to extend to other tasks with different text type. In this paper, we consider time expression extraction as a sequence labeling problem and try to solve it by a popular model BiLSTM+CRF. We propose a distant supervision method using CN-DBPedia (an open domain Chinese knowledge graph) and BaiduBaike (one of the largest Chinese encyclopedias) to generate a dataset for model training. Results of our experiments on encyclopedia text and TempEval2 dataset indicate that the method is feasible. While obtaining acceptable tagging performance, our approach does not involve designing manual patterns as rule-based ones do, does not involve the constructing annotated data manually, and has a good adaptation to different types of text.
We propose a staged framework for question answering over a large-scale structured knowledge base. Following existing methods based on semantic parsing, our method relies on various components for solving different sub-tasks of the problem. In the first stage, we directly use the result of entity linking to obtain the topic entity in a question, and simplify the process as a semantic matching problem. We train a neural network to match questions and predicate sequences to get a rough set of candidate answer entities from the knowledge base. Unlike traditional methods, we also consider entity type as a constraint on candidate answers to remove wrong candidates from the rough set in the second stage. By applying a convolutional neural network model to match questions and predicate sequences and a type constraint to filter candidate answers, our method achieves an average F1 measure of 74.8% on the WEBQUESTIONSSP dataset, it is competitive with state-of-the-art semantic parsing approaches.
Complex relationships and restrictions on social networking sites are severe issues in social network data acquisition. Covering information of all users in social network and ensuring timeliness of data acquisition is of great significance. Therefore, it is critical to develop an efficient data acquisition strategy. In particular, smart deployment of monitoring points on social networks has a great impact on data acquisition efficiency. In this paper, we formulate the monitoring point deployment issue as a capacitated set cover problem (CSCP) and present a maximum monitoring contribution rate deployment algorithm (MMCRD). We further compare the proposed algorithm with random approximation deployment algorithm (RD) and maximum out-degree approximation deployment algorithm (MOD), using synthetic BA scale-free networks and real-world social network datasets derived from Facebook, Twitter and Weibo. The results show that our MMCRD algorithm is superior to the other two deployment algorithms, since our approach can monitor the entire social network users by monitoring at most 12% of users, and meanwhile, guarantee timeliness.
Time series prediction with data stream has been widely studied. Current deep learning methods e.g., Long Short-Term Memory (LSTM) perform well in learning feature representations from raw data. However, most of these models can narrowly learn semantic information behind the data. In this paper, we revisit LSTM from the perspective of Semantic Web, where streaming data are represented as ontology sequences. We propose a novel semantic-based neural network (STBNet) that (i) enriches the semantics of data stream with external text, and (ii) exploits the underlying semantics with background knowledge for time series prediction. Previous models mainly rely on numerical representation of values in raw data, while the proposed STBNet model creatively integrates semantic embedding into a hybrid neural network. We develop a new attention mechanism based on similarity among semantic embedding of ontology stream, and then we combine ontology stream and numerical analysis in the deep learning model. Furthermore, we also enrich ontology stream in STBNet, where Convolutional Neural Networks (CNNs) are incorporated in learning lexical representations of words in the text. The experiments show that STBNet outperforms state-of-the-art methods on stock price prediction.
Knowledge graph (KG) completion aims to fill the missing facts in a KG, where a fact is represented as a triple in the form of (subject, relation, object). Current KG completion models compel two-thirds of a triple provided (e.g., subject and relation) to predict the remaining one. In this paper, we propose a new model, which uses a KG-specific multi-layer recurrent neutral network (RNN) to model triples in a KG as sequences. It outperformed several state-of-the-art KG completion models on the conventional entity prediction task for many evaluation metrics, based on two benchmark datasets and a more difficult dataset. Furthermore, our model is enabled by the sequential characteristic and thus capable of predicting the whole triples only given one entity. Our experiments demonstrated that our model achieved promising performance on this new triple prediction task.
Open Information Extraction systems, such as ReVerb, OLLIE, Clause IE, OpenIE 4.2, Sanford OIE, and PredPatt, have attracted much attention on English OIE. However, few studies have been reported on OIE for languages beyond English. This paper presents a Chinese OIE system PLCOIE to extract binary relation triples and N-ary relation tuples from Chinese documents. Our goal is to learn general patterns that is composed of both dependency parsing roles and parts of speech from large corpus, and the learned patterns are used to extract relation tuples from documents. In addition, this paper alleviates trans-classed word issue and light verb construction issue. PLCOIE can extract binary relation triples as well as N-ary relation tuples, and experiments on four real-world data sets show that the results are more precise than state-of-the-art Chinese OIE systems, which indicate that PLCOIE is feasible and effective.
In recent years, deep neural networks have achieved significant success in relation classification and many other natural language processing tasks. However, existing neural networks for relation classification heavily rely on the quality of labelled data and tend to be overconfident about the noise in input signals. They may be limited in robustness and generalization. In this paper, we apply adversarial training to the relation classification by adding perturbations to the input vectors in bidirectional long short-term memory neural networks rather than to the original input itself. Besides, we propose an attention based gate module, which can not only discern the important information when learning the sentence representations but also adaptively concatenate sentence level and lexical level features. Experiments on the SemEval-2010 Task 8 benchmark dataset show that our model significantly outperforms other state-of-the-art models.
The Infoboxes in encyclopedia articles contain the structured factoid knowledge and have been the most important source for open domain knowledge base construction. However, if the hyperlink is missing in the Infobox, the semantic relatedness cannot be created. In this paper, we propose an effective model and summarize the most possible features for the infobox entity linking problem. Empirical studies confirm the superiority of our proposed model.
Applying data mining techniques to help researchers discover, understand, and predict research trends is a highly beneficial but challenging task. The existing researches mainly use topics extracted from literatures as objects to build predicting model. To get more accurate results, we use concepts instead of topics constructing a model to predict their rise and fall trends, considering the rhetorical characteristics of them. The experimental results based on ACL1965-2017 literature dataset show the clues of the scientific trends can be found in the rhetorical distribution of concepts. After adding the relevant concepts’ information, the predict model’s accuracy rate can be significantly improved, compared to the prior topic-based algorithm.
This paper proposes a Knowledge Augmented Inference Network (K- AIN) that can effectively incorporate external knowledge into existing neural network models on Natural Language Inference (NLI) task. Different from previous works that use one-hot representations to describe external knowledge, we employ the TransE model to encode various semantic relations extracted from the external Knowledge Base (KB) as distributed relation features. We utilize these distributed relation features to construct knowledge augmented word embeddings and integrate them into the current neural network models. Experimental results show that our model achieves a better performance than the strong baseline on the SNLI dataset and we also surpass the current state-of-the-art models on the SciTail dataset.
With the rapid growth of knowledge graphs, schema induction, as a task of extracting relations or constraints from a knowledge graph for the classes and properties, becomes more critical and urgent. Schema induction plays an important role to facilitate many applications like integrating, querying and maintaining knowledge graphs. To provide a comprehensive survey of schema induction, in this paper, we overview existing schema induction approaches by mainly considering their learning methods, the types of learned axioms and the external resources that may be used during the learning process. Based on the comparison, we point out the challenges and directions for schema induction.
コメント