
Francesca SpezzanoBoise State University | BSU · Department of Computer Science
Francesca Spezzano
PhD
About
69
Publications
7,341
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
857
Citations
Citations since 2017
Introduction
Additional affiliations
August 2015 - present
August 2013 - August 2015
April 2012 - July 2013
Education
November 2008 - March 2012
Publications
Publications (69)
Ribozymes are RNA molecules that catalyze biochemical reactions. Self-cleaving ribozymes are a common naturally occurring class of ribozymes that catalyze site-specific cleavage of their own phosphodiester backbone. In addition to their natural functions, self-cleaving ribozymes have been used to engineer control of gene expression because they can...
The ability to study "gain of function" mutations has important implications for identifying and mitigating risks to public health and national security associated with viral infections. Numerous respiratory viruses of concern have RNA genomes (e.g., SARS and flu). These RNA genomes fold into complex structures that perform several critical functio...
Ribozymes are RNA molecules that catalyze biochemical reactions. Self-cleaving ribozymes are a common naturally occurring class of ribozymes that catalyze site-specific cleavage of their own phosphodiester backbone. In addition to their natural functions, self-cleaving ribozymes have been used to engineer control of gene expression because they can...
Due to its rapid spread over social media and the societal threat of changing public opinion, fake news has gained massive attention. Users’ role in disseminating fake news has become inevitable with the increase in popularity of social media for daily news diet. People in social media actively participate in the creation and propagation of news, f...
As news is increasingly spread through social media platforms, the problem of identifying misleading or false information (colloquially called "fake news'') has come into sharp focus. There are many factors which may help users judge the accuracy of news articles, ranging from the text itself to meta-data like the headline, an image, or the bias of...
Fake news, a deliberately designed news to mislead others, is becoming a big societal threat with its fast dissemination over the Web and social media and its power to shape public opinion. Many researchers have been working to understand the underlying features that help identify these fake news on the Web. Recently, Horne and Adali found, on a sm...
Depression is the most common mental illness in the US, with 6.7% of all adults experiencing a major depressive episode. Unfortunately, depression extends to teens and young users as well and researchers have observed an increasing rate in recent years (from 8.7% in 2005 to 11.3% in 2014 in adolescents and from 8.8 to 9.6% in young adults), especia...
The internet is a valuable resource to openly share information or opinions. Unfortunately, such internet openness has also made it increasingly easy to abuse these platforms through the dissemination of misinformation. As people are generally awash in information, they can sometimes have difficulty discerning misinformation propagated on these web...
Nowadays, a huge part of the information present on the Web is delivered through Social Media and User-Generated Content (UGC) platforms, such as Quora, Wikipedia, YouTube, Yelp, Slashdot.org, Stack Overflow, Amazon product reviews, and much more. Here, many users create, manipulate, and consume content every day. Thanks to the mechanism by which a...
The ICIJ Offshore Leaks Database represents a large set of relationships between people, companies, and organizations involved in the creation of offshore companies in tax-heaven territories, mainly for hiding their assets. This data are organized into four networks of entities and their interactions: Panama Papers, Paradise Papers, Offshore Leaks,...
Graph databases such as chemical databases, protein databases, and RNA motif databases, are simply a collection of graphs. Querying a graph database involves the computation of a subgraph isomorphism problem (which is NP-complete) for each graph in the database. Therefore, an index is required to filter out false positives and reduce the number of...
Link prediction is the problem of inferring new relationships among nodes in a network that are likely to occur in the near future. Classical approaches mainly consider neighborhood structure similarity when linking nodes. However, we may also want to take into account if the two nodes are already indirectly interacting and if they will benefit fro...
Wikipedia is based on the idea that anyone can make edits to the website to create reliable and crowd-sourced content. Yet with the cover of internet anonymity, some users make changes to the website that do not align with Wikipedia’s intended uses. For this reason, Wikipedia allows for some pages of the website to become protected, where only cert...
These days, user-generated content platforms such as social media, question-answering Websites, and open collaboration systems are a source of information for many. These platforms survive, thanks to the pool of active contributors who generate content. As a consequence, they continuously face the problem of acquiring new users and retain them in t...
We present the results of an initial analysis conducted on a real-life setting to quantify the effect of shilling attacks on recommender systems. We focus on both algorithm performance as well as the types of users who are most affected by these attacks.
In this paper, we describe our on-going research on the problem of predicting needed hyperlinks between pairs of Wikipedia pages (u,v) that are not connected, yet show readers' search navigation from u to v. We propose a solution that first estimates how long will these searches last and then predicts new hyperlinks according to descending order of...
A graph database D is a collection of graphs. To speed up query answering on graph databases, indexes are commonly used. State- of-the-art graph database indexes do not adapt or scale well to dynamic graph database use; they are static, and their ability to prune possible search responses to meet user needs worsens over time as databases change and...
In this paper, we focus on English Wikipedia, one of the main user-contributed content systems, and study the problem of predicting which users will become inactive and stop contributing to the encyclopedia. We propose a predictive model leveraging frequent patterns appearing in user's editing behavior as features to predict active vs. inactive Wik...
Effective friend classification in Online Social Networks (OSN) has many benefits in privacy. Anything posted by a user in social networks like Facebook is distributed among all their friends. Although the user can select the manual option for their post-dissemination, it is not feasible every time. Since not all friends are the same in social netw...
In this paper, we address the problem of identifying spam users on Wikipedia and present our preliminary results. We formulate the problem as a binary classification task and propose a set of features based on user editing behavior to separate spammers from benign users. We tested our system on a new dataset we built consisting of 4.2K (half spam a...
In this paper, we present our research on the problem of ensuring the integrity of Wikipedia, the world’s biggest free encyclopedia. As anyone can edit Wikipedia, many malicious users take advantage of this situation to make edits that compromise pages’ content quality. Specifically, we present DePP, the state-of-the-art tool that detects article p...
Bad actors seriously compromise social media every day by threatening the safety of the users and the integrity of the content. This keynote speech will give an overview of the state of the art social network analysis, data mining, and machine learning techniques to detect bad actors in social media. More specifically, we will describe both general...
Wikipedia is based on the idea that anyone can make edits to the website in order to create reliable and crowd-sourced content. Yet with the cover of internet anonymity, some users make changes to the website that do not align with Wikipedia's intended uses. For this reason, Wikipedia allows for some pages of the website to become protected, where...
We consider the problem of modeling competitive diffusion in real world social networks via the notion of ChoiceGAPs which combine choice logic programs and Generalized Annotated Programs. We assume that each vertex in a social network is a player in a multi-player game (with a huge number of players) — the choice part of the ChoiceGAPs describes u...
There are many classifiers that treat entities to be classified as points in a high-dimensional vector space and then compute a separator S between entities in class \(+1\) from those in class \(-1\). However, such classifiers are usually very hard to explain in plain English to domain experts. We propose Metric Logic Programs (MLPs) which are a fr...
Nowadays, detecting health-violating restaurants is a serious problem due to the limited number of health inspectors in a city as compared to the number of restaurants. Rarely inspectors are helped by formal complaints, but many complaints are reported as reviews on social media such as Yelp.
In this paper we propose new predictors to detect health...
We propose Diffusion Centrality (DC) in which semantic aspects of a social network are used to characterize vertices that are influential in diffusing a property p. In contrast to classical centrality measures, diffusion centrality of vertices varies with the property p, and depends on the diffusion model describing how p spreads. We show that DC a...
We study the problem of detecting vandals on Wikipedia before any human or known vandalism detection system reports flagging a potential vandals so that such users can be presented early to Wikipedia administrators. We leverage multiple classical ML approaches, but develop 3 novel sets of features. Our Wikipedia Vandal Behavior (WVB) approach uses...
We consider the problem of modeling competitive diffusion in real world
social networks via the notion of ChoiceGAPs which combine choice logic
programs due to Sacca` and Zaniolo and Generalized Annotated Programs due to
Kifer and Subrahmanian. We assume that each vertex in a social network is a
player in a multi-player game (with a huge number of...
Online social networks like Slashdot bring valuable information to millions of users - but their accuracy is based on the integrity of their user base. Unfortunately, there are many “trolls” on Slashdot who post misinformation and compromise system integrity. In this paper, we develop a general algorithm called TIA (short for Troll Identification A...
The STONE algorithms identify a set of operatives whose removal would maximally reduce lethality to destabilize terrorist organizations. STONE uses three novel algorithms, Terrorist Successor Problem (TSP), Multiple Terrorist Successor Problem (MTSP), and Terrorist Network Reshaping Problem (TNRP). Measuring lethality of terrorist networks, STONE a...
In recent years, probabilistic data management has received a lot of attention due to several applications that deal with uncertain data: RFID systems, sensor networks, data cleaning, scientific and biomedical data management, and approximate schema mappings. Query evaluation is a challenging problem in probabilistic databases, proved to be #P-hard...
Recently, there has been an increasing interest in the bottom-up evaluation
of the semantics of logic programs with complex terms. The presence of function
symbols in the program may render the ground instantiation infinite, and
finiteness of models and termination of the evaluation procedure, in the
general case, are not guaranteed anymore. Since...
The aim of this paper is to present more general criteria and techniques for chase termination. We first present extensions of the well-known stratification criterion and introduce a new criterion, called local stratification, which generalizes both super-weak acyclicity and stratification-based criteria (including the class of constraints which ar...
Defining “good” semantics for non-monotonic queries and for aggregate queries in the context of data exchange has turned out to be a challenging problem for a number of reasons, including the dependence of the semantics of the concrete syntactic representation of the schema mapping at hand. In this paper, we revisit the semantics of aggregate queri...
This paper focuses primarily on the Person Successor Problem (PSP): when a terrorist is removed from a terrorist network, who is most likely to take his place? We leverage the solution to PSP to predict a new terrorist network after removal of a set of terrorists and to answer the question: which set of k (k > 0) terrorists should be removed in ord...
The chase has long been used as a central tool to analyze dependencies and their effect on queries. It has been applied to different relevant problems in database theory such as query optimization, query containment and equivalence, dependency implication, and database schema design. Recent years have seen a renewed interest in the chase as an impo...
Recently there has been an increasing interest in the bottom-up evaluation of the semantics of logic programs with complex terms. The main problem due to the presence of functional symbols in the head of rules is that the corresponding ground program could be infinite and that finiteness of models and termination of the evaluation procedure is not...
Several database areas such as data exchange and integration share the problem of fixing database instance violations with respect to a set of constraints. The chase algorithm solves such violations by inserting tuples and setting the value of nulls. Unfortunately, the chase algorithm may not terminate and the problem of deciding whether the chase...
This chapter presents an overview of the well-known chase termination conditions. They guarantee for every database D the termination of all chase sequences.
The Chase is a fixpoint algorithm enforcing satisfaction of data dependencies in databases. It was proposed more than 30 years ago by Aho et al. [1979a], Maier et al. [1979] and has received increasing attention in recent years in both database theory and practical applications.
A database is a collection of data organized to model relevant aspects of reality and support processes requiring this information.
This chapter describes several database applications in which some typical database problems arise, and where the chase represents a fundamental tool for their solution. In every case, the termination of the chase algorithm guarantees the applicability of the proposed resolution methods.
In the presence of inconsistencies, the aim of the chase algorithm is to make the database consistent by adding tuples and setting null values. In such a context, it can be assumed that the input database is sound and that inconsistencies are due to missing tuples. However, in some cases, the assumption that the input database is sound is not feasi...
The chase is an important tool which was initially proposed for reasoning about data dependencies and queries in the presence of dependencies. In this chapter we study special classes of data dependencies and normal forms for relations. Normal forms are introduced to eliminate or minimize redundant information, that is single attribute values or tu...
The problem of incomplete information in relational databases have been investigated since the introduction of the relational data model. From a semantic standpoint an incomplete database is a set of (complete) databases, also called possible worlds. Thus, instead of completely specifying one state of the world, an incomplete database provides a se...
Consistency problems arise in many fundamental database applications as data exchange, data integration, data warehouse and many others. The chase algorithm is a fundamental and useful tool fixing inconsistencies of database instances with respect to a set of data dependencies. It is well known that the chase algorithm may be nonterminating and sev...
The Chase is a fixpoint algorithm enforcing satisfaction of data dependencies in databases. Its execution involves the insertion of tuples with possible null values and the changing of null values which can be made equal to constants or other null values. Since the chase fixpoint evaluation could be non-terminating, in recent years the problem know...
Several database areas such as data exchange and integration share the problem of fixing database instance violations with respect to a set of constraints. The chase algorithm solves such violations by inserting tuples and setting the value of nulls. Unfortunately, the chase algorithm may not terminate and the problem of deciding whether the chase...
Several database areas such as data exchange and integration share the problem of fixing database instance violations with respect to a set of constraints. The chase algorithm solves such violations by inserting tuples and setting the value of nulls. Unfortunately, the chase algorithm may not terminate and the problem of deciding whether the chase...
This paper addresses the problem of managing inconsistent databases, that is, databases violating integrity constraints. Different approaches for repairing and querying inconsistent databases in the presence of full integrity constraints are discussed. Particular settings where restricted classes of constraints (tuple and equality generating depend...
Several techniques for repairing and querying inconsistent databases have been proposed in recent years. In most of them, the repair strategy consists in performing minimal sets of tuple insertions and deletions. The consistent query answers are those query answers which can be derived from every consistent repaired database. A problem with such te...