Tim Finin

Tim Finin
  • Ph.D.
  • Professor at University of Maryland, Baltimore County

About

598
Publications
148,575
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
32,821
Citations
Introduction
AI, knowledge representation & reasoning, and language understanding with applications to knowledge graphs, cybersecurity and more
Current institution
University of Maryland, Baltimore County
Current position
  • Professor

Publications

Publications (598)
Article
Full-text available
Website privacy policies are often lengthy and intricate. Privacy assistants assist in simplifying policies and making them more accessible and user-friendly. The emergence of generative AI (genAI) offers new opportunities to build privacy assistants that can answer users’ questions about privacy policies. However, genAI’s reliability is a concern...
Article
High-quality knowledge graphs (KGs) play a crucial role in many applications. However, KGs created by automated information extraction systems can suffer from erroneous extractions or be inconsistent with provenance/source text. It is important to identify and correct such problems. In this paper, we study leveraging the emergent reasoning capabili...
Conference Paper
Full-text available
High-quality knowledge graphs (KGs) play a crucial role in many applications. However, KGs created by automated information extraction systems can suffer from erroneous extractions or be inconsistent with provenance/source text. It is important to identify and correct such problems. In this paper, we study leveraging the emergent reasoning capabili...
Preprint
Full-text available
Privacy policies inform users about the data management practices of organizations. Yet, their complexity often renders them largely incomprehensible to the average user, necessitating the development of privacy assistants. With the advent of generative AI (genAI) technologies, there is an untapped potential to enhance privacy assistants in answeri...
Conference Paper
Full-text available
Synthesizing information from collections of tables embedded within scientific and technical documents is increasingly critical to emerging knowledge-driven applications. Given their structural heterogeneity, highly domain-specific content, and diffuse context, inferring a precise semantic understanding of such tables is traditionally better accomp...
Preprint
Full-text available
As the adoption of smart devices continues to permeate all aspects of our lives, concerns surrounding user privacy have become more pertinent than ever before. While privacy policies define the data management practices of their manufacturers, previous work has shown that they are rarely read and understood by users. Hence, automatic analysis of pr...
Preprint
Full-text available
Entity linking is an important step towards constructing knowledge graphs that facilitate advanced question answering over scientific documents, including the retrieval of relevant information included in tables within these documents. This paper introduces a general-purpose system for linking entities to items in the Wikidata knowledge base. It de...
Conference Paper
Full-text available
AI models for cybersecurity have to detect and defend against constantly evolving cyber threats. Much effort is spent building defenses for zero days and unseen variants of known cyber-attacks. Current AI models for cybersecurity struggle with these yet unseen threats due to the constantly evolving nature of threat vectors, vulnerabilities, and exp...
Preprint
Full-text available
We show how to leverage quantum annealers (QAs) to better select candidates in greedy algorithms. Unlike conventional greedy algorithms that employ problem-specific heuristics for making locally optimal choices at each stage, we use QAs that sample from the ground state of problem-dependent Hamiltonians at cryogenic temperatures and use retrieved s...
Preprint
Full-text available
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently a...
Preprint
Full-text available
The Internet of Battlefield Things (IoBT) will advance the operational effectiveness of infantry units. However, this requires autonomous assets such as sensors, drones, combat equipment, and uncrewed vehicles to collaborate, securely share information, and be resilient to adversary attacks in contested multi-domain operations. CAPD addresses this...
Conference Paper
Full-text available
We show how to leverage quantum annealers (QAs) to better select candidates in greedy algorithms. Unlike conventional greedy algorithms that employ problem-specific heuristics for making locally optimal choices at each stage, we use QAs that sample from the ground state of a problem-dependent Hamiltonians at cryogenic temperatures and use retrieved...
Conference Paper
Full-text available
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently a...
Article
Full-text available
Storytelling, and the delivery of societal narratives, enable human beings to communicate, connect, and understand one another and the world around them. Narratives can be defined as spoken, visual, or written accounts of interconnected events and actors, generally evolving through some notion of time. Today, information is typically conveyed over...
Article
Full-text available
Data confidentiality is an issue of increasing importance. Several authorities and regulatory bodies are creating new laws that control how web services data is handled and shared. With the rapid increase of such regulations, web service providers face challenges in complying with these evolving regulations across jurisdictions. Providers must upda...
Article
In many social media applications, a small fraction of the members are highly linked while most are sparsely connected to the network. Such a skewed distribution is sometimes referred to as the "long tail". Popular applications like meme trackers and content aggregators mine for information from only the popular blogs located at the head of this cu...
Article
Identifying topics and concepts associated with a set of documents is a task common to many applications. It can help in the annotation and categorization of documents and be used to model a person's current interests for improving search results, business intelligence or selecting appropriate advertisements. One approach is to associate a document...
Article
Analysing complex natural phenomena often requires synthesized data that matches observed characteristics. Graph models are widely used in analyzing the Web in general, but are less suitable for modeling the Blogosphere. While blog networks resemble many properties of Web graphs, the dynamic nature of the Blogosphere, its unique structure and the e...
Article
Full-text available
We present multi-qubit correction (MQC) as a novel postprocessing method for quantum annealers that views the evolution in an open-system as a Gibbs sampler and reduces a set of excited states to a new synthetic state with lower energy value. After sampling from the ground state of a given (Ising) Hamiltonian, MQC compares pairs of excited states t...
Article
Full-text available
Cybersecurity threats continue to increase and are impacting almost all aspects of modern life. Being aware of how vulnerabilities and their exploits are changing gives helpful insights into combating new threats. Applying dynamic topic modeling to a time-stamped cybersecurity document collection shows how the significance and details of concepts f...
Conference Paper
Full-text available
Cyber-defense systems are being developed to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. A potential risk is that fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these sys...
Preprint
Full-text available
Cyber-defense systems are being developed to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. A potential risk is that fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these sys...
Preprint
Full-text available
We present a general post-quantum error-correcting technique for quantum annealing, called multi-qubit correction (MQC), that views the evolution in an open-system as a Gibs sampler and reduces a set of (first) excited states to a new synthetic state with lower energy value. After sampling from the ground state of a given (Ising) Hamiltonian, MQC c...
Preprint
Full-text available
We leverage the idea of a statistical ensemble to improve the quality of quantum annealing based binary compressive sensing. Since executing quantum machine instructions on a quantum annealer can result in an excited state, rather than the ground state of the given Hamiltonian, we use different penalty parameters to generate multiple distinct quadr...
Conference Paper
Full-text available
As cybersecurity-related threats continue to increase , understanding how the field is changing over time can give insight into combating new threats and understanding historical events. We show how to apply dynamic topic models to a set of cybersecurity documents to understand how the concepts found in them are changing over time. We correlate two...
Conference Paper
Full-text available
We present a family of four novel methods for embedding knowledge graphs into real-valued tensors that capture the ordered relations found in RDF. Unlike many previous models, these can easily use prior background knowledge from users or existing knowledge graphs. We demonstrate our models on the task of predicting new facts on eight different know...
Article
Full-text available
We introduce the notion of reinforcement quantum annealing (RQA) scheme in which an intelligent agent searches in the space of Hamiltonians and interacts with a quantum annealer that plays the stochastic environment role of learning automata. At each iteration of RQA, after analyzing results (samples) from the previous iteration, the agent adjusts...
Conference Paper
Full-text available
We present a way to generate gazetteers from the Wikidata knowledge graph and use the lists to improve a neural NER system by adding an input feature indicating that a word is part of a name in the gazetteer. We empirically show that the approach yields performance gains in two distinct languages: a high-resource, word-based language, English and a...
Article
This paper, based on data from the first nationwide survey of cybersecurity among local or grassroots governments in the United States, examines how these governments manage this important function. As we have shown elsewhere, cybersecurity among local governments is increasingly important because these governments are under constant or nearly cons...
Article
We present CASIE, a system that extracts information about cybersecurity events from text and populates a semantic model, with the ultimate goal of integration into a knowledge graph of cybersecurity data. It was trained on a new corpus of 1,000 English news articles from 2017–2019 that are labeled with rich, event-based annotations and that covers...
Preprint
Full-text available
The goal of this work is to improve the performance of a neural named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer. This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system. Experiments...
Conference Paper
Full-text available
We present CASIE, a system that extracts information about cybersecurity events from text and populates a semantic model, with the ultimate goal of integration into a knowledge graph of cybersecurity data. It was trained on a new corpus of 1,000 English news articles from 2017-2019 that are labeled with rich, event-based annotations and that covers...
Preprint
Full-text available
We introduce the reinforcement quantum annealing (RQA) scheme in which an intelligent agent interacts with a quantum annealer that plays the stochastic environment role of learning automata and tries to iteratively find better Ising Hamiltonians for the given problem of interest. As a proof-of-concept, we propose a novel approach for reducing the N...
Article
Full-text available
After Action Reports (AARs) provide incisive analysis of cyber-incidents. Extracting cyber-knowledge from these sources would provide security analysts with credible information, which they can use to detect or find patterns indicative of a cyber-attack. In this paper, we describe a system to extract information from AARs, aggregate the extracted i...
Preprint
Full-text available
We propose to leverage quantum annealers to optimally select candidates in greedy algorithms. Unlike conventional greedy algorithms that employ problem-specific heuristics for making locally optimal choices at each stage, we use quantum annealers that sample from the ground state of Ising Hamiltonians at cryogenic temperatures and use retrieved sam...
Preprint
Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be multi-themed, complex, noisy and cover diverse topics. We describe a framework that can analyze large documents and...
Article
Full-text available
Medical organizations find it challenging to adopt cloud-based Electronic Health Records (EHR) services due to the risk of data breaches and the resulting compromise of patient data. Existing authorization models follow a patient-centric approach for EHR management, where the responsibility of authorizing data access is handled at the patients end....
Preprint
Full-text available
Keeping up with threat intelligence is a must for a security analyst today. There is a volume of information present in `the wild' that affects an organization. We need to develop an artificial intelligence system that scours the intelligence sources, to keep the analyst updated about various threats that pose a risk to her organization. A security...
Preprint
Full-text available
We propose to reduce the original well-posed problem of compressive sensing to weighted-MAX-SAT. Compressive sensing is a novel randomized data acquisition approach that linearly samples sparse or compressible signals at a rate much below the Nyquist-Shannon sampling rate. The original problem of compressive sensing in sparse recovery is NP-hard; t...
Article
This article examines data from the first‐ever nationwide survey of cybersecurity among American local governments. The data show that these governments are under constant or near‐constant cyberattack, yet, on average, they practice cybersecurity poorly. While nearly half reported experiencing cyberattacks at least daily, one‐third said that they d...
Preprint
Full-text available
We present a family of novel methods for embedding knowledge graphs into real-valued tensors. These tensor-based embeddings capture the ordered relations that are typical in the knowledge graphs represented by semantic web languages like RDF. Unlike many previous models, our methods can easily use prior background knowledge provided by users or ext...
Article
Full-text available
We present a family of novel methods for embedding knowledge graphs into real-valued tensors. These tensor-based embeddings capture the ordered relations that are typical in the knowledge graphs represented by semantic web languages like RDF. Unlike many previous models, our methods can easily use prior background knowledge provided by users or ext...
Chapter
Full-text available
Contemporary smartphones are capable of generating and transmitting large amounts of data about their users. Recent advances in collaborative context modeling combined with a lack of adequate permission model for handling dynamic context sharing on mobile platforms have led to the emergence of a new class of mobile applications that can access and...
Preprint
Full-text available
Compressive sensing is a novel approach that linearly samples sparse or compressible signals at a rate much below the Nyquist-Shannon sampling rate and outperforms traditional signal processing techniques in acquiring and reconstructing such signals. Compressive sensing with matrix uncertainty is an extension of the standard compressive sensing pro...
Preprint
Full-text available
Judging the veracity of a sentence making one or more claims is an important and challenging problem with many dimensions. The recent FEVER task asked participants to classify input sentences as either SUPPORTED, REFUTED or NotEnoughInfo using Wikipedia as a source of true facts. SURFACE does this task and explains its decision through a selection...
Article
In this paper, we examine cybersecurity challenges faced by America’s local, governments, including: the extent of cyberattacks; problems faced in preventing attacks from being successful; barriers to providing high levels of cybersecurity management; and actions that local governments believe should be taken to improve cybersecurity practice. Our...
Preprint
Full-text available
KGCleaner is a framework to \emph{identify} and \emph{correct} errors in data produced and delivered by an information extraction system. These tasks have been understudied and KGCleaner is the first to address both. We introduce a multi-task model that jointly learns to predict if an extracted relation is credible and repair it if not. We evaluate...
Preprint
Full-text available
The early detection of cybersecurity events such as attacks is challenging given the constantly evolving threat landscape. Even with advanced monitoring, sophisticated attackers can spend as many as 146 days in a system before being detected. This paper describes a novel, cognitive framework that assists a security analyst by exploiting the power o...
Preprint
Full-text available
In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a si...
Preprint
Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function and meaning of its sections and subsections, and extracting low level entities and facts about them. In this re...
Conference Paper
Full-text available
Medical organizations find it challenging to adopt cloud-based electronic medical records services, due to the risk of data breaches and the resulting compromise of patient data. Existing authorization models follow a patient centric approach for EHR management where the responsibility of authorizing data access is handled at the patients’ end. Thi...
Chapter
Full-text available
Contemporary smartphones are capable of generating and transmitting large amounts of data about their users. Recent advances in collaborative context modeling combined with a lack of adequate permission model for handling dynamic context sharing on mobile platforms have led to the emergence of a new class of mobile applications that can access and...
Conference Paper
Full-text available
Current language understanding approaches focus on small documents, such as newswire articles, blog posts, product reviews and discussion forum discussions. Understanding and extracting information from large documents like legal briefs, proposals, technical manuals and research articles is still a challenging task. We describe a framework that can...
Article
Current language understanding approaches focus on small documents, such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents like legal briefs, proposals, technical manuals and research articles is still a challenging task. We describe a framework that can ana...
Article
Full-text available
Knowledge graphs and vector space models are both robust knowledge representation techniques with their individual strengths and weaknesses. Vector space models excel at determining similarity between concepts, but they are severely constrained when evaluating complex dependency relations and other logic based operations that are a forte of knowled...
Conference Paper
Securing their critical documents on the cloud from data threats is a major challenge faced by organizations today. Controlling and limiting access to such documents requires a robust and trustworthy access control mechanism. In this paper, we propose a semantically rich access control system that employs an access broker module to evaluate access...
Chapter
Contemporary smartphones are capable of generating and transmitting large amounts of data about their users. Recent advances in collaborative context modeling combined with a lack of adequate permission model for handling dynamic context sharing on mobile platforms have led to the emergence of a new class of mobile applications that can access and...
Conference Paper
Multi-relational data, like knowledge graphs, are generated from multiple data sources by extracting entities and their relationships. We often want to include inferred, implicit or likely relationships that are not explicitly stated, which can be viewed as link-prediction in a graph. Tensor decomposition models have been shown to produce state-of-...
Conference Paper
Ensuring privacy of Big Data managed on the cloud is critical to ensure consumer confidence. Cloud providers publish privacy policy documents outlining the steps they take to ensure data and consumer privacy. These documents are available as large text documents that require manual effort and time to track and manage. We have developed a semantical...
Conference Paper
A common Big Data problem is the need to integrate large temporal data sets from various data sources into one comprehensive structure. Having the ability to correlate evolving facts between data sources can be especially useful in supporting a number of desired application functions such as inference and influence identification. As a real world a...
Conference Paper
As of 2016, there are more mobile devices than humans on earth. Today, mobile devices are a critical part of our lives and often hold sensitive corporate and personal data. As a result, they are a lucrative target for attackers, and managing data privacy and security on mobile devices has become a vital issue. Existing access control mechanisms in...
Conference Paper
Full-text available
In the past few years, the Internet of Things has started to become a reality; however, its growth has been hampered by privacy and security concerns. One promising approach is to use Semantic Web technologies to mitigate privacy concerns in an informed, flexible way. We present CARLTON, a framework for managing data privacy for entities in a Physi...
Conference Paper
Full-text available
In recent times, there has been an exponential growth in dig-itization of legal documents such as case records, contracts, terms of services, regulations, privacy documents and compliance guidelines. Courts have been digitizing their archived cases and also making it available for e-discovery. On the other hand, businesses are now maintaining large...
Conference Paper
Full-text available
In order to secure vital personal and organizational system we require timely intelligence on cybersecurity threats and vulnerabilities. Intelligence about these threats is generally available in both overt and covert sources like the National Vulnerability Database, CERT alerts, blog posts, social media, and dark web resources. Intelligence update...
Conference Paper
Data from computer log files record traces of events involving user activity, applications, system software and network traffic. Logs are usually intended for diagnostic and debugging purposes, but their data can be extremely useful in system audits and forensic investigations. Logs created by intrusion detection systems, web servers, anti-virus an...
Conference Paper
Full-text available
This paper describes a data driven approach to studying the science of cyber security (SoS). It argues that science is driven by data. It then describes issues and approaches towards the following three aspects: (i) data driven science for attack detection and mitigation, (ii) foundations for data trustworthiness and policy-based sharing, and (iii)...
Data
Full-text available
Conference Paper
Full-text available
According to recent media reports, there has been a surge in the number of devices that are being connected to the Internet. The Internet of Things (IoT), also referred to as Cyber-Physical Systems, is a collection of physical entities with computational and communication capabilities. The storage and computing power of these devices is often limit...
Article
Full-text available
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis a...
Conference Paper
Full-text available
In this paper we describe the Unified Cybersecurity On-tology (UCO) that is intended to support information integration and cyber situational awareness in cybersecu-rity systems. The ontology incorporates and integrates heterogeneous data and knowledge schemas from different cybersecurity systems and most commonly used cybersecurity standards for i...
Article
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis a...
Conference Paper
Full-text available
Topic models are widely used to thematically describe a collection of text documents and have become an important technique for systems that measure document similarity for classification, clustering, segmentation, entity linking and more. While they have been applied to some non-text domains, their use for semi-structured graph data, such as RDF,...
Conference Paper
Full-text available
The number of mobile applications (apps) in major app stores exceeded one million in 2013. While app stores provide a central point for storing app metadata, they often impose restrictions on the access to this information thus limiting the potential to develop tools to search, recommend, and analyze app information. A few projects have circumvente...
Conference Paper
GeoLink is one of the building block projects within EarthCube, a major effort of the National Science Foundation to establish a next-generation knowledge infrastructure for geosciences. As part of this effort, GeoLink aims to improve data retrieval, reuse, and integration of seven geoscience data repositories through the use of ontologies. In this...
Conference Paper
Full-text available
We present Mobipedia, an integrated knowledge base with information about 1 million mobile applications (apps) such as their category , meta-data (author, reviews, rating, release date), permissions and libraries used, and similar apps. The goal of Mobipedia is to integrate unstructured and semi-structured data about mobile apps from publicly avail...
Conference Paper
Full-text available
Vehicular Ad-hoc Networks (VANETs) are known to be very susceptible to various malicious attacks. To detect and mitigate these malicious attacks, many security mechanisms have been studied for VANETs. In this paper, we propose a context aware security framework for VANETs that uses the Support Vector Machine (SVM) algorithm to automatically determi...
Article
Full-text available
The Platys project focuses on developing a high-level, semantic notion of location called place. A place, unlike a geospa-tial position, derives its meaning from a user's actions and interactions in addition to the physical location where they occur. Our aim is to enable the construction of a large variety of applications that take advantage of pla...

Network

Cited By