
Anthony Tomasic- Carnegie Mellon University
Anthony Tomasic
- Carnegie Mellon University
About
127
Publications
17,336
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,151
Citations
Current institution
Publications
Publications (127)
Current architectures for main-memory online transaction processing (OLTP) database management systems (DBMS) typically use random scheduling to assign transactions to threads. This approach achieves uniform load across threads but it ignores the likelihood of conflicts between transactions. If the DBMS could estimate the potential for transaction...
This paper explores the integration of two AI subdisciplines employed in the development of artificial agents that exhibit intelligent behavior: Large Language Models (LLMs) and Cognitive Architectures (CAs). We present three integration approaches, each grounded in theoretical models and supported by preliminary empirical evidence. The modular app...
This article explores the integration of two AI subdisciplines employed in the development of artificial agents that exhibit intelligent behavior: Large Language Models (LLMs) and Cognitive Architectures (CAs). We present three integration approaches, each grounded in theoretical models and supported by preliminary empirical evidence. The modular a...
This paper presents exploratory work on how the key components of transformer-based neural language models (self-attention mechanism and large-scale pre-trained models) can be leveraged to perform context retrieval, symbol manipulation, and propositional reasoning. We fine-tuned GPT-X language models to learn Prolog programs that simulate the propo...
Recently, transformer language models have been applied to build both task-and non-task-oriented dialogue systems. Although transformers perform well on most of the NLP tasks, they perform poorly on context retrieval and symbolic reasoning. Our work aims to address this limitation by embedding the model in an operational loop that blends both natur...
Personalization of user experience has a long history of success in the HCI community. More recently the community has focused on adaptive user interfaces, supported by machine learning, that reduce interaction efforts and improves user experience by collapsing transactions and pre-filtering results. However, generally, these more recent results ha...
Experiences of contingent responsivity during shared book reading predict better learning outcomes. However, it is unclear whether contingent responsivity from a digital book could provide similar support for children. The effects on story recall and engagement interacting with a digital book that responded contingently on children’s vocalizations...
Web designers use visual cues such as layout and typography to make pages easier to navigate and understand. Yet, screen readers generally ignore these features and present page information in a linear audio stream. We investigate whether transcoding the visual semantics of grid-based layouts to tables supports better navigation. In a controlled ex...
Question answering (QA) provides answers to a wide range of questions but is still limited in the complexity of reasoning and the breadth of accessible data sources. In this paper, we describe a dataset and baseline results for a question answering system that utilizes web tables. The dataset is derived from commonly asked questions on the web, and...
Current main memory database system architectures are still challenged by high contention workloads and this challenge will continue to grow as the number of cores in processors continues to increase [23]. These systems schedule transactions randomly across cores to maximize concurrency and to produce a uniform load across cores. Scheduling never c...
Pre-trained word embeddings are the primary method for transfer learning in several Natural Language Processing (NLP) tasks. Recent works have focused on using unsupervised techniques such as language modeling to obtain these embeddings. In contrast, this work focuses on extracting representations from multiple pre-trained supervised models, which...
There is great promise in creating effective technology experiences during situationally-induced impairments and disabilities through the combination of universal design and adaptive interfaces. We believe this combination is a powerful approach for meeting the UX needs of people with disabilities, including those which are temporary in nature. Res...
In this paper, we describe a dataset and baseline result for a question answering that utilizes web tables. It contains commonly asked questions on the web and their corresponding answers found in tables on websites. Our dataset is novel in that every question is paired with a table of a different signature. In particular, the dataset contains two...
Current main memory database system architectures are still challenged by high contention workloads and this challenge will continue to grow as the number of cores in processors continues to increase. These systems schedule transactions randomly across cores to maximize concurrency and to produce a uniform load across cores. Scheduling never consid...
In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to memorize large and complex structures. We introduce ReCode, a method based on subtree retrieval that makes it poss...
Participatory sensing systems use people and their smartphones as a sensing infrastructure, and getting people to make contributions remains a critical challenge. Little work details how system designers should combine different interactions to increase coverage of service location. Tiramisu, a participatory sensing system, invites transit riders t...
Machine learning improves mobile user experience. Interestingly, envisioning apps with adaptive interfaces that reduce navigation and selection effort is not standard UX practice. When implementing an adaptive UI for our mobile transit app, we encountered a number of problems. Our original design did not log necessary information nor did it induce...
Today, many data-driven web pages present information in a way that is difficult for blind and low vision users to navigate and to understand. EnTable addresses this challenge. It re-writes confusing and complicated template-based data sets as accessible tables. EnTable allows blind and low vision users to submit requests for pages they wish to acc...
The web contains many datasets presented visually, whose lack of semantic markup renders them difficult to understand and navigate using a screen reader. In this work, we explore the possibility of understanding the semantics of web datasets by asking sighted web users to manually scrape web pages into spreadsheets. Web users constitute a huge popu...
Crowdsourced mobile sensing systems provide a counterpoint to the idea of fully automated sensing systems by transferring some or all of the sensing duties to the end users. Humans can easily sense in some ways that are impossible for machines to sense, leading to hybrid crowdsourced-automated systems. However, this transfer of sensing to humans co...
Although advances in technology now enable people to communicate ‘anytime, anyplace’, it is not clear how citizens can be motivated to actually do so. This paper evaluates the impact of three principles of psychological empowerment, namely perceived self-efficacy, sense of community and causal importance, on public transport passengers’ motivation t...
Participatory sensing systems (PSS) require frequent injection of information that has a short shelf-life. The use of crowds to gather information for PSS is therefore particularly challenging. In this study, we explore the impact of two policies on user contributions. A quid-pro-quo policy exchanges contributions from users for access to critical...
In various embodiments, a method for processing a user request is provided. The method may include receiving input data from a user including at least natural language associated with a user request; analyzing the user input data with an intermediary agent; selecting at least one form based on analyzing the user input data; and, executing at least...
Many mobile applications rely on location information gained from location services on mobile devices. However, continuously tracking the device location with high accuracy drains the battery quickly. Furthermore, sensing the same location can be redundant when multiple devices are co-located. In this paper, we develop a crowdsourcing-based locatio...
Mixed-initiative message-augmenting agent systems and methods that provide users with tools that allow them to respond to messages, such as email messages, containing requests for information or otherwise requiring responses that require information that needs to be retrieved from one or more data sources. The systems and methods allow users to tra...
Extensive interviews with riders of the Pittsburgh, Pennsylvania, bus system revealed that, as the top priority, riders wanted to know the actual arrival time of buses. Following a universal design approach, a system called Tiramisu was created to foster a greater sense of community between riders and transit bus service providers. The design focus...
Office administrators are frequently asked to create ad hoc reports based on web accessible data. The web contains the desired data but does not allow efficient access in the way the administrator needs, prompting a tedious and labor-intensive task of retrieving and integrating the required data. Mixer is a programming-by-demonstration (PBD) tool e...
Crowd-sourcing social computing systems represent a new material for HCI designers. However, these systems are difficult to work with and to prototype, because they require a critical mass of participants to investigate social behavior. Service design is an emerging research area that focuses on how customers co-produce the services that they use,...
Crowd-sourcing social computing systems represent a new material for HCI designers. Crowd sourcing has been successfully applied to many areas, but these systems are difficult to reliably design and prototype. One source of uncertainty during design is the behavior of the crowd at different levels of scale. A second source of uncertainty during des...
One formidable problem in language technology is the word sense disambiguation (WSD) problem: disambiguating the true sense of a word as it occurs in a sentence (e.g., recognizing whether the word "bank" refers to a river bank or to a financial institution). This paper explores a strategy for harnessing the linguistic abilities of human beings to d...
The recent advances in web 2.0 technologies and the rapid adoption of smart phones raises many opportunities for public services to improve their services by engaging their users (who are also owners of the service) in co-design: a dialog where users help design the services they use. To investigate this opportunity, we began a service design proje...
A promising approach to scaling Web applications is to distribute the server infrastructure on which they run. This approach, unfortunately, can introduce latency between the application and database servers, which in turn increases the network latency of Web interactions for the clients (end users). In this paper we introduce the concept of source...
A key challenge for mixed-initiative systems is to create a shared understanding of the task between human and agent. To address this challenge, we created a mixed-initiative interface called Mixer to aid administrators with automating tedious information-retrieval tasks. Users initiate communication with the agent by constructing a form, creating...
The backend database system is often the performance bot- tleneck when running web applications. A common ap- proach to scale the database component is query result cach- ing, but it faces the challenge of maintaining a high cache hit rate while eciently ensuring cache consistency as the database is updated. In this paper we introduce Ferdinand, th...
Ratings-based recommender systems typically pre-dict user preferences for items based on the user's preference history, information about items, and the preferences of similar users. In content-based rec-ommending, the similarities between items the user has previously expressed interest in form the basis for recommending new items. There are a num...
Email client software is widely used for personal task management, a purpose for which it was not designed and is poorly suited. Past attempts to remedy the problem have focused on adding task management features to the client UI. RADAR uses an alternative approach modeled on a trusted human assistant who reads mail, identifies task-relevant messag...
Current database performance optimizations stop at the bor-der between the database application and the database sys-tem, focusing either on improving the performance of just the database system or the application's execution in isola-tion of the other. We argue that typical database applica-tion design enables a more holistic analysis that maintai...
For their scalability needs, data-intensive Web applications can use a database scalability service (DBSS), which caches applications' query results and answers queries on their behalf. One way for applications to address their security/privacy concerns when using a DBSS is to encrypt all data that passes through the DBSS. Doing so, however, causes...
Workers in organizations frequently request help from assistants by sending request messages that express information intent: an intention to update data in an information system. Human assistants spend a significant amount of time and effort processing these requests. For example, human-resource assistants process requests to update personnel reco...
Each month, more attacks are launched with the aim of making web users believe that they are communicating with a trusted entity for the purpose of stealing account infor- mation, logon credentials, and identity information in gen- eral. This attack method, commonly known as "phishing," is most commonly initiated by sending out emails with links to...
Today many workers spend too much of their time translating their co-workers' requests into structures that information systems can understand. This paper presents the novel interaction design and evaluation of VIO, an agent that helps workers trans late request. VIO monitors requests and makes suggestions to speed up the translation. VIO allows us...
Administrators frequently perform data integration "by hand" on the desktop as part of the execution of administrative tasks. This position paper discusses the application of mixed-initiative design to this problem. This design style leverages the interaction between a user and an intelligent assistant, minimizing the effort required to execute a t...
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and supplies query answers on behalf of the application. Cost-effective DSSPs will need to cache data from many applications, inevitably raising concerns about...
Large organizations with sophisticated infrastructures have large form-based systems that manage the interaction between the user community and the infrastructure. In many cases, when a user needs to complete a form to accomplish a task, the user e-mails a description of the task to the appropriate form expert. In many cases this description is inc...
Information integration is one of the oldest and most important computer science problems: Information from diverse sources must be combined, so that users can access and manipulate the information in a unified way. One of the central problems in information ...
We describe a method based on "tweak- ing" an existing learned sequential classi- fier to change the recall-precision tradeoff, guided by a user-provided performance criterion. This method is evaluated on the task of recognizing personal names in email and newswire text, and proves to be both simple and effective.
Abstract Current publish / subscribe systems oer,a range of expressive subscription languages for con- straints. However, classical systems restrict the publish operation to be a single published object that contains only constants and no constraints. We introduce symmetric publish / subscribe, a novel generalization of publish / subscribe where bo...
We experimentally evaluate components of a system that learns to analyze natural-language requests to update information on a database-backed website.
Given a particular update request to a WWW system, users are faced with the navigation problem of finding the correct form to accomplish the update request. In a large system, such as SAP with about 10,000 relations for the standard installation, users are faced with a sea of thousands of forms to navigate. For familiar tasks, users have various ai...
Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with unavailable data sources. Database administrators must deal with incorporating each new data source into the system. Database implementors must deal with the transformation of queries...
Many collections of scientific data in particular disciplines are available today on the World Wide Web. Most of these data sources are compliant with some standard for interoperable access. In addition, sources may support a common semantics, i.e., a shared meaning for the data types and their domains. However, sharing data among a global communit...
Many collections of scientific data in particular disciplines are available today on the World Wide Web. Most of these data sources are compliant with some standard for interoperable access. In addition, sources may support a common semantics, i.e., a shared meaning for the data types and their domains. However, sharing data among a global communit...
This paper describes the e-XML component suite, a modular product for integrating heterogeneous data sources under an XML schema and querying in real-time the integrated information using XQuery, the emerging W3C standard for XML query. We describe the two main components of the suite, i.e., the repository for warehousing XML and the mediator for d...
This paper describes the e-XML component suite, a modular product for integrating heterogeneous data sources under an XML schema and querying in real-time the integrated information using XQuery, the emerging W3C standard for XML query. We describe the two main components of the suite, i.e., the repository for warehousing XML and the mediator for d...
Thesaurus 11382655 3695 Conference 7246145 11934 Organization 9374199 62051 Class 4211136 2962 Numbers (ISBN,...) 2445828 12637 Report Numbers 7833 7508 Totals 130,340,123 1,089,614 Table VI. Storage Estimates for bGlOSS and a Full Text Index for the INSPEC Database Size of Full Index bGlOSS/threshold#0 Index 248.60 MBytes 2.60 MBytes Total 251.73...
From beginning of 1994 to the end of 1996, the IRO-DB ESPRIT project has developed tools for accessing relational and object-oriented databases in an integrated way. The system is based on the ODMG standard as pivot model and language. It consists of three layers. The local layer provides for an ODMG interface to heterogeneous DBMSs, the communicat...
The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to...
Accessing many data sources aggravates problems for users of heterogeneous distributed databases. Database administrators must deal with fragile mediators, that is, mediators with schemas and views that must be significantly changed to incorporate a new data source. When implementing translators of queries from mediators to data sources, database i...
The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result si...
Given an intensional database (IDB) and an extension database (EDB), the view update problem translates updates on the IDB into updates on the EDB. One approach to the view update problem uses a translation langauge to specify the meaning of a view update. In this paper we prove properties of a translation language. This approach to the view update...
With the profusion of text databases on the Internet, it is becoming increasingly hard to find the most useful databases for a given query. To attack this problem, several existing and proposed systems employ brokers to direct user queries, using a local database of summary information about the available databases. This summary information must ef...
We discuss the problem of unavailable data sources in the context of two mediator based applications. We discuss the limitations of existing system with respect to this problem and describe a novel evaluation model that overcomes these shortcomings. 1 Introduction Mediator systems are being deployed in various environments to provide query access t...
Distributed databases operating over wide-area networks, such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources may vary widely due to network congestion, link failure, and other problems. In this paper we examine a new class of methods, called query scrambli...
Accessing numerous widely-distributed data sources poses significant new challenges for query optimization and execution. Congestion or failure in the network introduce highly-variable response times for wide-area data access. This paper is an initial exploration of solutions to this variability. We investigate a class of dynamic, run-time query pl...
this article, we show that the GlOSS summaries can be employed as the representation for summary information in a large-scale system. In particular, we offer evidence that GlOSS can effectively locate databases of interest even in a system of hundreds of databases. Our metric for effectiveness is based on selecting databases that contain the larges...
The scientific community, public organizations and administrations have generated a large amount of data concerning the environment. There is a need to allow sharing and exchange of this type of information by various kinds of users including scientists, decision-makers and public authorities. Metadata arises as the solution to support these requir...
Distributed systems require declarative access to diverse
information sources. One approach to solving this heterogeneous
distributed database problem is based on mediator architectures. In
these architectures, mediators accept queries from users, process them
with respect to wrappers, and return answers. Wrappers provide access to
underlying sourc...
Mediator systems are used today in a wide variety of unreliable environments. When processinga query, a mediator may try to access a data source which is unavailable. In this situation,existing systems either silently ignore unavailable data sources or generate an error. This behavioris inefficient in environments with a non-negligible probability...
From beginning of 1994 to the end of 1996, theIRO-DB ESPRIT project has developed toolsfor accessing relational and object-orienteddatabases in an integrated way. The systemis based on the ODMG standard as pivotmodel and language. It consists of three layers.The local layer provides for an ODMG interfaceto heterogeneous DBMSs, the communicationlaye...
: Distributed databases operating over wide-area networks such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources can vary widely due to network congestion, link failure, and other problems. In such an unpredictable environment, the traditional iterator-based...
The THETIS system is viewed as a digital library of data repositories and visualization tools. In addition to its index/search capacity, the digital library also provides data querying, data combining, and data visualization capabilities. This paper presents an overview of the design of THETIS, a system that addresses the frequent requirement of sc...
Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with un- available data sources. Database administrators must deal with incorporating new sources into the model. Database implementors must deal with the translation of queries be- tween...
The Distributed Information Search COmponent (DISCO) is a prototype heterogeneous distributed database that accesses underlying data sources. The Disco prototype currently focuses on three central research problems in the context of these systems. First, since the capabilities of each data source is different, transforming queries into subqueries o...
The Distributed Information Search COmponent (DISCO) is a prototype heterogeneous distributed database that accesses underlying data sources. The DISCO prototype currently focuses on three central research problems in the context of these systems. First, since the capabilities of each data source is different, transforming queries into subqueries o...
: Distributed systems require declarative access to diverse data sources of information. One approach to solving this heterogeneous distributed database problem is based on mediator architectures. In these architectures, mediators accept queries from users, process them with respect to wrappers, and return answers. Wrapper provide access to underly...
: Much of the world's information is stored electronically in data sources. The data sources can be full-fledged databases, simple files, HTML pages or specialized data sources that possess diverse query processing capabilities. The common architecture to integrate such sources consists of mediators that give a global view over the content of all s...
: The Internet contains a large amount of structured data, accessible via ftp files, ODBC connections, or embedded in HTML documents. This data cannot be effectively used for several reasons. First, the structure of the data (type, format, etc.), is usually not described. Second, no tools exist for locating sources for structured data. Third, acces...
The performance of distributed text document retrieval systems is strongly influenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine which...
: Many heterogeneous database system products and prototypes exist today; they will soon be deployed in a wide variety of environments. All existing systems suffer from an Achilles' heel: if some sources are unavailable when accessed, these systems either silently ignore them or generate an error, i.e. they ungraciously fail. This behavior is impro...
A very large number of data sources on environment, energy, and natural resources are available worldwide. Unfortunately, users usually face several problems when they want to search and use environmental information. In this paper, we analyze these problems. We describe a conceptual analysis of the four major tasks in the production of environment...
Accessing data from numerous widely distributed sources poses significant new challenges for query optimization and execution. Congestion and failures in the network can introduce highly variable response times for wide area data access. The paper is an initial exploration of solutions to this variability. We introduce a class of dynamic, run time...
In a wide-area environment, the time required to obtain data from remote sources can vary unpredictably due to network congestion, link failure or other problems. Traditional techniques for query optimization and query execution do not cope well with such unpredictability. The static nature of those techniques prevents them from adapting to remote...
Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by using a trace-driven simulation. It focuses on a ph...
The availability of large numbers of network information sources has led to a new problem: finding which text databases (out of perhaps thousands of choices) are the most relevant to a query. We call this the text-database discovery problem. Our solution to this problem, GlOSS--Glossary-Of-Servers Server, keeps statistics on the available databases...
The popularity of information retrieval has led users to a new problem: finding which text databases (out of thousands of candidate choices) are the most relevant to a user. Answering a given query with a list of relevant databases is the text database discovery problem. The first part of this paper presents a practical method for attacking this pr...
The popularity of information retrieval has led users to a new problem: finding which text databases (out of thousands of candidate choices) are the most relevant to a user. Answering a given query with a list of relevant databases is the text database discovery problem. The first part of this paper presents a practical method for attacking this pr...