Chapter

Research on Chain of Evidence Based on Knowledge Graph

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Evidence plays an extremely important role in legal proceedings, historical research, and diplomatic disputes. In recent years, more and more evidence data has been presented in the form of electronic data. Therefore, the extraction, organization, and validation of knowledge in electronic evidence also seem more and more important. In order to effectively organize the evidence data and construct the Chain of Evidence that can meet the practical needs, this paper uses Knowledge Graph based method to conduct relevance of evidence research. Firstly, through Knowledge Extraction, Knowledge Fusion, Knowledge Reasoning, the evidence data Knowledge Graph is constructed. After that, the Chain of Evidence will be built through evidence correlation and evidence influence evaluation, and finally different forms of knowledge presentation are carried out for different users. In this paper, the work and process of evidence chain association based on knowledge map are introduced, and the possible research direction in the future is prospected.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Ontologies have gained popularity in the scientific community as representational mechanisms to support intelligent reasoning and execute inferences. In this paper we describe an ontology designed specifically to represent academic contexts at a public university. This model consists of a collection of ontologies designed to represent persons, physical space, sensor networks, events, etc. among other entities that exist in the academic environment. In particular, we describe the design requirements that guided the construction of the ontologies. The resulting ontology model is evaluated considering the competency of the ontology, and the concept domain coverage. Results are promising and the set of competency questions are translated to queries showing that the ontology model adheres to the requirements.
Article
Full-text available
Microblog such as Twitter and Sina Weibo provides a convenient and instant platform which makes information easy to share and acquire. However, Microblog’s short, noisy, real-time features make Chinese Microblog entity linking task a new challenge. In this paper, we investigate many linking methods and introduce the implementation of our work on Chinese microblog entity linking task. By means of crawling Baidu encyclopaedia web page, we generate polysemous, synonymous and index collections in MongoDB to manage the entities. We use a Chinese NLP tools named HanLP¹ to perform noun words extracting, and then generate candidate set with these collections and word similarity. For disambiguation part, we take Word2vec² whose model is trained by THUC news³ to determine the textual relevance. Our work performs pretty well on the Sina Weibo data set.
Conference Paper
Full-text available
State-of-the-art sequence labeling systems traditionally require large amounts of task-specific knowledge in the form of hand-crafted features and data pre-processing. In this paper, we introduce a novel neutral network architecture that benefits from both word- and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF. Our system is truly end-to-end, requiring no feature engineering or data pre-processing, thus making it applicable to a wide range of sequence labeling tasks on different languages. We evaluate our system on two data sets for two sequence labeling tasks --- Penn Treebank WSJ corpus for part-of-speech (POS) tagging and CoNLL 2003 corpus for named entity recognition (NER). We obtain state-of-the-art performance on both the two data --- 97.55\% accuracy for POS tagging and 91.21\% F1 for NER.
Conference Paper
Full-text available
State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures---one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words: character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora. Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers.
Article
Full-text available
Life cycle and chain of digital evidence are very important parts of digital investigation process. It is very difficult to maintain and prove chain of custody. Investigators and expert witness must know all details on how the evidence was handled every step of the way. At each stage in life cycle of digital evidence, there is more impact (human, technical and natural) that can violate digital evidence. This paper presents a basic concept of “chain of custody of digital evidence” and “life cycle of digital evidence”. It will address a phase in life cycle in digital archiving. The authors also warn of certain shortcomings in terms of answering specific questions, and gives same basic definition.
Article
Full-text available
Named entity recognition (NER) is one of the fundamental tasks in natural language processing. In the medical domain, there have been a number of studies on NER in English clinical notes; however, very limited NER research has been carried out on clinical notes written in Chinese. The goal of this study was to systematically investigate features and machine learning algorithms for NER in Chinese clinical text. We randomly selected 400 admission notes and 400 discharge summaries from Peking Union Medical College Hospital in China. For each note, four types of entity-clinical problems, procedures, laboratory test, and medications-were annotated according to a predefined guideline. Two-thirds of the 400 notes were used to train the NER systems and one-third for testing. We investigated the effects of different types of feature including bag-of-characters, word segmentation, part-of-speech, and section information, and different machine learning algorithms including conditional random fields (CRF), support vector machines (SVM), maximum entropy (ME), and structural SVM (SSVM) on the Chinese clinical NER task. All classifiers were trained on the training dataset and evaluated on the test set, and micro-averaged precision, recall, and F-measure were reported. Our evaluation on the independent test set showed that most types of feature were beneficial to Chinese NER systems, although the improvements were limited. The system achieved the highest performance by combining word segmentation and section information, indicating that these two types of feature complement each other. When the same types of optimized feature were used, CRF and SSVM outperformed SVM and ME. More specifically, SSVM achieved the highest performance of the four algorithms, with F-measures of 93.51% and 90.01% for admission notes and discharge summaries, respectively.
Article
Full-text available
This paper suggests that administrators form a new way of conceptualizing evidence collection across an intranet based on a model consisting of linked audit logs. This methodology enables the establishment of a chain of evidence that is especially useful across a corporate intranet environment. Administrators are encouraged to plan event configuration such that audit logs provide complementary information across the intranet. Critical factors that determine the quality of evidence are also discussed and some limitations of the model are highlighted.
Conference Paper
Full-text available
Standard pairwise coreference resolution systems are subject to errors resulting from their performing anaphora identifi- cation as an implicit part of coreference resolution. In this paper, we propose an integer linear programming (ILP) for- mulation for coreference resolution which models anaphoricity and coreference as a joint task, such that each local model in- forms the other for the final assignments. This joint ILP formulation provides f - score improvements of 3.7-5.3% over a base coreference classifier on the ACE datasets.
Conference Paper
Full-text available
Increasingly, licensing and safety regulatory bodies require the suppliers of software-intensive, safety-critical systems to provide an explicit software safety case - a structured set of arguments based on objective evidence to demonstrate that the software elements of a system are acceptably safe. Existing research on safety cases has mainly focused on how to build the arguments in a safety case based on available evidence; but little has been done to precisely characterize what this evidence should be. As a result, system suppliers are left with practically no guidance on what evidence to collect during software development. This has led to the suppliers having to recover the relevant evidence after the fact - an extremely costly and sometimes impractical task. Although standards such as the IEC 61508 - which is widely viewed as the best available generic standard for managing functional safety in software - provide some guidance for the collection of relevant safety and certification information, this guidance is mostly textual, not expressed in a precise and structured form, and is not easy to specialize to context-specific needs. To address these issues, we present a conceptual model to characterize the evidence for arguing about software safety. Our model captures both the information requirements for demonstrating compliance with IEC 61508 and the traceability links necessary to create a seamless chain of evidence. We further describe how our generic model can be specialized according to the needs of a particular context, and discuss some important ways in which our model can facilitate software certification.
Article
Full-text available
The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities-including medical problems, tests, and treatments, as well as their asserted status-from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge. The authors implemented a machine-learning-based named entity recognition system for clinical text and systematically evaluated the contributions of different types of features and ML algorithms, using a training corpus of 349 annotated notes. Based on the results from training data, the authors developed a novel hybrid clinical entity extraction system, which integrated heuristic rule-based modules with the ML-base named entity recognition module. The authors applied the hybrid system to the concept extraction and assertion classification tasks in the challenge and evaluated its performance using a test data set with 477 annotated notes. Standard measures including precision, recall, and F-measure were calculated using the evaluation script provided by the Center of Informatics for Integrating Biology and the Bedside/VA challenge organizers. The overall performance for all three types of clinical entities and all six types of assertions across 477 annotated notes were considered as the primary metric in the challenge. Systematic evaluation on the training set showed that Conditional Random Fields outperformed Support Vector Machines, and semantic information from existing natural-language-processing systems largely improved performance, although contributions from different types of features varied. The authors' hybrid entity extraction system achieved a maximum overall F-score of 0.8391 for concept extraction (ranked second) and 0.9313 for assertion classification (ranked fourth, but not statistically different than the first three systems) on the test data set in the challenge.
Article
We model knowledge graphs for their completion by encoding each entity and relation into a numerical space. All previous work including Trans(E, H, R, and D) ignore the heterogeneity (some relations link many entity pairs and others do not) and the imbalance (the number of head entities and that of tail entities in a relation could be different) of knowledge graphs. In this paper, we propose a novel approach TranSparse to deal with the two issues. In TranSparse, transfer matrices are replaced by adaptive sparse matrices, whose sparse degrees are determined by the number of entities (or entity pairs) linked by relations. In experiments, we design structured and unstructured sparse patterns for transfer matrices and analyze their advantages and disadvantages. We evaluate our approach on triplet classification and link prediction tasks. Experimental results show that TranSparse outperforms Trans(E, H, R, and D) significantly, and achieves state-of-the-art performance.
Article
Representation learning (RL) of knowledge graphs aims to project both entities and relations into a continuous low-dimensional space. Most methods concentrate on learning representations with knowledge triples indicating relations between entities. In fact, in most knowledge graphs there are usually concise descriptions for entities, which cannot be well utilized by existing methods. In this paper, we propose a novel RL method for knowledge graphs taking advantages of entity descriptions. More specifically, we explore two encoders, including continuous bag-of-words and deep convolutional neural models to encode semantics of entity descriptions. We further learn knowledge representations with both triples and descriptions. We evaluate our method on two tasks, including knowledge graph completion and entity classification. Experimental results on real-world datasets show that, our method outperforms other baselines on the two tasks, especially under the zero-shot setting, which indicates that our method is capable of building representations for novel entities according to their descriptions. The source code of this paper can be obtained from https://github.com/xrb92/DKRL.
Article
The goal of research on the topics such as sentiment analysis and cognition is to analyze the opinions, emotions, evaluations and attitudes that people hold about the entities and their attributes from the text. The word level affective cognition becomes an important topic in sentiment analysis. Extracting the (attribute, opinion word) binary relationship by word segmentation and dependency parsing, and labeling those by existing emotional dictionary combined with webpage information and manual annotation, this paper constitutes a binary relationship knowledge base. By using knowledge embedding method, embedding each element in (attribute, opinion, opinion word) as a word vector into the Knowledge Graph by TransG, and defining an algorithm to distinguish the opinion between the attribute word vector and the opinion word vector. Compared with traditional method, this engine has the advantages of high processing speed and low occupancy, which makes up the time-costing and high calculating complexity in the former methods.
Article
We present Wiser, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking. Wiser indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author’s publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author’s expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia. At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author’s documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author’s expertise and the query topic via the above graph-based profiles. The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that Wiser achieves better performance than all the other competitors, thus proving the effectiveness of modeling author’s profile via our “semantic” graph of entities. Finally, we comment on the use of Wiser for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University.
Article
Dempster-Shafer (D-S) evidence theory is a key technology for integrating uncertain information from multiple sources. However, the combination rules can be paradoxical when the evidence seriously conflict with each other. In the paper, we propose a novel combination algorithm based on unsupervised Density-Based Spatial Clustering of Applications with Noise (DBSCAN) density clustering. In the proposed mechanism, firstly, the original evidence sets are preprocessed by DBSCAN density clustering, and a successfully focal element similarity criteria is used to mine the potential information between the evidence, and make a correct measure of the conflict evidence. Then, two different discount factors are adopted to revise the original evidence sets, based on the result of DBSCAN density clustering. Finally, we conduct the information fusion for the revised evidence sets by D-S combination rules. Simulation results show that the proposed method can effectively solve the synthesis problem of high-conflict evidence, with better accuracy, stability and convergence speed.
Article
Objective and background: The exponential growth of the unstructured data available in biomedical literature, and Electronic Health Record (EHR), requires powerful novel technologies and architectures to unlock the information hidden in the unstructured data. The success of smart healthcare applications such as clinical decision support systems, disease diagnosis systems, and healthcare management systems depends on knowledge that is understandable by machines to interpret and infer new knowledge from it. In this regard, ontological data models are expected to play a vital role to organize, integrate, and make informative inferences with the knowledge implicit in that unstructured data and represent the resultant knowledge in a form that machines can understand. However, constructing such models is challenging because they demand intensive labor, domain experts, and ontology engineers. Such requirements impose a limit on the scale or scope of ontological data models. We present a framework that will allow mitigating the time-intensity to build ontologies and achieve machine interoperability. Methods: Empowered by linked biomedical ontologies, our proposed novel Automated Ontology Generation Framework consists of five major modules: a) Text Processing using compute on demand approach. b) Medical Semantic Annotation using N-Gram, ontology linking and classification algorithms, c) Relation Extraction using graph method and Syntactic Patterns, d), Semantic Enrichment using RDF mining, e) Domain Inference Engine to build the formal ontology. Results: Quantitative evaluations show 84.78% recall, 53.35% precision, and 67.70% F-measure in terms of disease-drug concepts identification; 85.51% recall, 69.61% precision, and F-measure 76.74% with respect to taxonomic relation extraction; and 77.20% recall, 40.10% precision, and F-measure 52.78% with respect to biomedical non-taxonomic relation extraction. Conclusion: We present an automated ontology generation framework that is empowered by Linked Biomedical Ontologies. This framework integrates various natural language processing, semantic enrichment, syntactic pattern, and graph algorithm based techniques. Moreover, it shows that using Linked Biomedical Ontologies enables a promising solution to the problem of automating the process of disease-drug ontology generation.
Article
Trace metals can have far-reaching ecosystem impacts. In this study, we develop consistent and evidence-based logic chains to demonstrate the wider effects of trace metal contamination on a suite of ecosystem services. They demonstrate knock-on effects from an initial receptor that is sensitive to metal toxicity, along a cascade of impact, to final ecosystem services via alterations to multiple ecosystem processes. We developed logic chains to highlight two aspects of metal toxicity: for impacts of copper pollution in soil ecosystems, and for impacts of mercury in freshwaters. Each link of the chains is supported by published evidence, with an indication of the strength of the supporting science. Copper pollution to soils (134 unique chains) showed a complex network of pathways originating from direct effects on a range of invertebrate and microbial taxa and plants. In contrast, mercury pollution on freshwaters (63 unique chains) shows pathways that broadly follow the food web of this habitat, reflecting the potential for mercury bioaccumulation. Despite different pathways, there is considerable overlap in the final ecosystem services impacted by both of these metals and in both ecosystems. These included reduced human-use impacts (food, fishing), reduced human non-use impacts (amenity value) and positive or negative alterations to climate regulation (impacts on carbon sequestration). Other final ecosystem goods impacted include reduced crop production, animal production, flood regulation, drinking water quality and soil purification. Taking an ecosystem services approach demonstrates that consideration of only the direct effects of metal contamination of soils and water will considerably underestimate the total impacts of these pollutants. Construction of logic chains, evidenced by published literature, allows a robust assessment of potential impacts indicating primary, secondary and tertiary effects.
Article
Geoscience literature published online is an important part of open data, and brings both challenges and opportunities for data analysis. Compared with studies of numerical geoscience data, there are limited works on information extraction and knowledge discovery from textual geoscience data. This paper presents a workflow and a few empirical case studies for that topic, with a focus on documents written in Chinese. First, we set up a hybrid corpus combining the generic and geology terms from geology dictionaries to train Chinese word segmentation rules of the Conditional Random Fields model. Second, we used the word segmentation rules to parse documents into individual words, and removed the stop-words from the segmentation results to get a corpus constituted of content-words. Third, we used a statistical method to analyze the semantic links between content-words, and we selected the chord and bigram graphs to visualize the content-words and their links as nodes and edges in a knowledge graph, respectively. The resulting graph presents a clear overview of key information in an unstructured document. This study proves the usefulness of the designed workflow, and shows the potential of leveraging natural language processing and knowledge graph technologies for geoscience.
Conference Paper
Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
Article
Knowledge graph completion aims to perform link prediction between entities. In this paper, we consider the approach of knowledge graph embeddings. Recently, models such as TransE and TransH build entity and relation embeddings by regarding a relation as translation from head entity to tail entity. We note that these models simply put both entities and relations within the same semantic space. In fact, an entity may have multiple aspects and various relations may focus on different aspects of entities, which makes a common space insufficient for modeling. In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to state-of-the-art baselines including TransE and TransH.
Article
Knowledge graphs have gained increasing popularity in the past couple of years, thanks to their adoption in everyday search engines. Typically, they consist of fairly static and encyclopedic facts about persons and organizations–e.g. a celebrity’s birth date, occupation and family members–obtained from large repositories such as Freebase or Wikipedia. In this paper, we present a method and tools to automatically build knowledge graphs from news articles. As news articles describe changes in the world through the events they report, we present an approach to create Event-Centric Knowledge Graphs (ECKGs) using state-of-the-art natural language processing and semantic web techniques. Such ECKGs capture long-term developments and histories on hundreds of thousands of entities and are complementary to the static encyclopedic information in traditional knowledge graphs. We describe our event-centric representation schema, the challenges in extracting event information from news, our open source pipeline, and the knowledge graphs we have extracted from four different news corpora: general news (Wikinews), the FIFA world cup, the Global Automotive Industry, and Airbus A380 airplanes. Furthermore, we present an assessment on the accuracy of the pipeline in extracting the triples of the knowledge graphs. Moreover, through an event-centered browser and visualization tool we show how approaching information from news in an event-centric manner can increase the user’s understanding of the domain, facilitates the reconstruction of news story lines, and enable to perform exploratory investigation of news hidden facts.
Conference Paper
With over a billion active users monthly Facebook is one of the biggest social media sites in the world. Facebook encourages friends and people with similar interests to share information such as messages, pictures, videos, website links, and other digital media. With the large number of users active on Facebook, an upgrade to Facebook's searching capability was made through the launch of graph search. Graph search is a powerful search feature which allows users to search Facebook using queries phrased in simple English. When a query is executed, the results from the search can reveal personal information of friends as well as strangers. This availability of personal information to strangers is a cyber security threat to citizens. Cyber criminals can use the graph search feature for malicious and illegal intent. This paper presents an analysis of graph search on Facebook. The purpose of the study is to highlight the amount and type of personal information that is accessible through Facebook's graph search. This is done through the design and execution of graph queries on two separate Facebook profiles. An analysis of the results is presented, together with possible negative consequences, and guidance as to best practices to follow in order to minimise the cyber security threats imposed by Facebook's graph search.
Article
Entity-linking is a natural-language-processing task that consists in identifying strings of text that refer to a particular item in some reference knowledge base. When the knowledge base is Wikipedia, the problem is also referred to as wikification (in this case, items are Wikipedia articles). Entity-linking consists conceptually of many different phases: identifying the portions of text that may refer to an entity (sometimes called “entity detection”), determining a set of concepts (candidates) from the knowledge base that may match each such portion, and choosing one candidate for each set; the latter step, known as candidate selection, is the phase on which this paper focuses. One instance of candidate selection can be formalized as an optimization problem on the underlying concept graph, where the quantity to be optimized is the average distance between the selected items. Inspired by this application, we define a new graph problem which is a natural variant of the Maximum Capacity Representative Set. We prove that our problem is NP-hard for general graphs; we propose several heuristics trying to optimize similar easier objective functions; we show experimentally how these approaches perform with respect to some baselines on a real-world dataset. Finally, in the appendix, we show an exact linear time algorithm that works under some more restrictive assumptions.
Article
Recent years have witnessed a proliferation of large-scale knowledge graphs, such as Freebase, YAGO, Google's Knowledge Graph, and Microsoft's Satori. Whereas there is a large body of research on mining homogeneous graphs, this new generation of information networks are highly heterogeneous, with thousands of entity and relation types and billions of instances of vertices and edges. In this tutorial, we will present the state of the art in constructing, mining, and growing knowledge graphs. The purpose of the tutorial is to equip newcomers to this exciting field with an understanding of the basic concepts, tools and methodologies, available datasets, and open research challenges. A publicly available knowledge base (Freebase) will be used throughout the tutorial to exemplify the different techniques.
Article
Two approaches to the problem of resolving pronoun references are presented. The first is a naive algorithm that works by traversing the surface parse trees of the sentences of the text in a particular order looking for noun phrases of the correct gender and number. The algorithm clearly does not work in all cases, but the results of an examination of several hundred examples from published texts show that it performs remarkably well.In the second approach, it is shown how pronoun solution can be handled in a comprehensive system for semantic analysis of English texts. The system is described, and it is shown in a detailed treatment of several examples how semantic analysis locates the antecedents of most pronouns as a by-product. Included are the classic examples of Winograd and Charniak.
Microsoft’s Bing Seeks Enlightenment with Satori
  • D Farber