Alexandros Labrinidis

Alexandros Labrinidis
University of Pittsburgh | Pitt · Department of Computer Science

Doctor of Philosophy

About

158
Publications
29,254
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,321
Citations
Additional affiliations
September 2002 - present
University of Pittsburgh
Position
  • Professor (Associate)

Publications

Publications (158)
Article
Full-text available
The pervasiveness of public displays is prompting an increased need for “fresh” content to be shown, that is highly engaging and useful to passerbys. As such, live or time-sensitive content is often shown in conjunction with “traditional” static content, which creates scheduling challenges. In this work, we propose a utility-based framework that ca...
Article
Full-text available
Traditional recommender systems help users find the most relevant products or services to match their needs and preferences. However, they overlook the preferences of other sides of the market (aka stakeholders) involved in the system. In this paper, we propose to use contextual bandit algorithms in multi-stakeholder platforms where a multi-sided r...
Preprint
Full-text available
Subject selection plays a critical role in experimental studies, especially ones with human subjects. Anecdotal evidence suggests that many such studies, done at or near university campus settings suffer from selection bias, i.e., the too-many-college-kids-as-subjects problem. Unfortunately, traditional sampling techniques, when applied over biased...
Article
Public transit is one of the first things that come to mind when someone talks about “smart cities.” As a result, many technologies, applications, and infrastructure have already been deployed to bring the promise of the smart city to public transportation. Most of these have focused on answering the question, “When will my bus arrive?”; little has...
Preprint
Full-text available
In the absence of pharmaceutical interventions to curb the spread of COVID-19, countries relied on a number of nonpharmaceutical interventions to fight the first wave of the pandemic. The most prevalent one has been stay-at-home orders, whose the goal is to limit the physical contact between people, which consequently will reduce the number of seco...
Conference Paper
Full-text available
Stream Processing Engines(SPEs) are used for real-time and continuous processing with stateful operations. This type of processing poses numerous challenges due to its associated complexity, unpredictable input, and need for timely results. As a result, users tend to over provision resources, and online scaling is required in order to overcome over...
Chapter
Full-text available
There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. To this end, in this paper, we present an effective solution for detecting the correlation of suc...
Chapter
Data Stream Management Systems (DSMSs) performing online analytics rely on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). In this paper, we study the problem of generating high quality execution plans of ACQs in DSMSs deployed on multi-node (multi-core and multi-processor) distributed environments. Towards this goa...
Conference Paper
The pervasiveness of public displays is prompting an increased need for "fresh" content to be shown, that is highly engaging and useful to passerbys. As such, live or time-sensitive content is often shown in conjunction with "traditional" static content, which creates scheduling challenges. In this work, we propose a utility-based framework and a n...
Chapter
Distributed Data Stream Processing Systems (DDSPS) execute on transient data flowing through long-running, continuous, streaming queries, grouped together in query networks. Often, these continuous queries are outsourced by the querier to third-party computing platforms to help control the cost and maintenance associated with owning and operating s...
Conference Paper
Data Stream Processing Systems (DSPSs) execute long-running, continuous queries over transient streaming data, often making use of outsourced, third-party computational platforms. However, third-party outsourcing can lead to unwanted violations of data providers' access controls or privacy policies, as data potentially flows through untrusted infra...
Conference Paper
Full-text available
More and more organizations (commercial, health, government and security) currently base their decisions on real-time analysis of fast arriving, large volumes of data streams. For such analysis to lead to actionable information in real-time and at the right time, the most recent data needs to be processed within a specified delay target. Effective...
Conference Paper
Data stream processing is becoming essential in most current advanced scientific or business applications as data production rates are increasing. Different companies compete to efficiently ingest high velocity data and apply some form of computation in order to make better business decisions. In order to successfully compete in this environment, c...
Conference Paper
Full-text available
The rapid growth of monitoring applications has led to unprecedented amounts of generated time series data. Data analysts typically explore such large volumes of time series data looking for valuable insights. One such insight is finding pairs of time series, in which subsequences of values exhibit certain levels of correlation. However, since expl...
Conference Paper
With data becoming available in larger quantities and at higher rates, new data processing paradigms have been proposed to handle high-volume, fast-moving data. Data Stream Processing is one such paradigm wherein transient data streams flow through sets of continuous queries, only returning results when data is of interest to the querier. To avoid...
Article
Full-text available
Data stream management systems (DSMSs) offer the most effective solution for processing data streams by efficiently executing continuous queries (CQs) over the incoming data. CQs inherently have different levels of criticality and hence different levels of expected quality of service (QoS) and quality of data (QoD). Adhering to such expected QoS/Qo...
Conference Paper
Full-text available
With the explosion of large, dynamic graph datasets from various fields, graph partitioning and repartitioning are becoming more and more critical to the performance of many graph-based Big Data applications , such as social analysis, web search, and recommender systems. However, well-studied graph (re)partitioners usually assume a homogeneous and...
Conference Paper
Data Stream Management Systems performing on-line analytics rely on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). The state-of-the-art WeaveShare optimizer uses the Weavability concept in order to selectively combine ACQs for partial aggregation and produce high quality execution plans. However, WeaveShare does no...
Conference Paper
Stream query processing is becoming increasingly important as more time-oriented data is produced and analyzed online. Stream processing is typically memory-resident for the fastest processing of ephemeral data. With workload consolidation, processing separate data streams on the same processor may lead to harmful contention between query workloads...
Conference Paper
The ever increasing supply of data is bringing a renewed attention to query personalization. Query personalization is a technique that utilizes user preferences with the goal of providing relevant results to the users. Along with preferences, diversity is another important aspect of query personalization especially useful during data exploration. T...
Conference Paper
Query personalization can be an effective technique in dealing with the data scalability challenge, primarily from the human point of view, i.e., making big data easier to use. In order to customize their query results, users need to express their preferences in a simple and user-friendly manner. In this paper, we present a graph-based theoretical...
Conference Paper
Full-text available
Data Stream Management Systems (DSMS) are crucial for modern high-volume/high-velocity data-driven applications, necessitating a distributed approach to processing them. In addition, data providers often require certain levels of confidentiality for their data, especially in cases of user-generated data, such as those coming out of physical activit...
Article
Full-text available
A constantly growing amount of high-quality information resides in databases and is guarded behind forms that users fill out and submit. The Hidden Web comprises all these information sources that conventional web crawlers are incapable of discovering. In order to excavate and make available meaningful data from the Hidden Web, previous work has fo...
Article
With data becoming available in larger quantities and at higher rates, new data processing paradigms have been pro- posed to handle large and fast data. Data Stream Processing is one such paradigm wherein transient data ows as streams through sets of continuous queries, only returning results when data is of interest to the querier, allowing uninte...
Article
Graph partitioning and repartitioning have been widely used by scientists to parallelize compute- and dataintensive simulations. However, existing graph (re)partitioning algorithms usually assume homogeneous communication costs among partitions, which contradicts the increasing heterogeneity in inter-core communication in modern parallel architectu...
Article
Full-text available
We introduce a web-based computing infrastructure to assist the visual integration, mining and interactive navigation of large-scale astronomy observations. Following an analysis of the application domain, we design a client-server architecture to fetch distributed image data and to partition local data into a spatial index structure that allows pr...
Article
Exploring the inherent technical challenges in realizing the potential of Big Data.
Conference Paper
Traditional distributed Data Stream Management Systems assign query operators to sites by optimizing for some criterion such as query throughput, or network delay. The work presented in this paper begins to augment this traditional operator placement technique by allowing the user issuing a continuous query to specify a variety of constraints - inc...
Conference Paper
User data is growing at an ever greater pace that threatens to overwhelm our ability to effectively manage it. As the types of data increase, and the storage environments become ever more heterogeneous, even reasoning about basic data management decisions becomes increasingly difficult. This expansion in complexity requires new methodologies for ma...
Article
MOTIVATION: As industry and science are increasingly data-driven, the need for skilled data scientists is exceeding what our universities are producing. According to a Mckinsey report: "By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills". Similarly, the ability to extract knowledge from s...
Conference Paper
Traditional workflow management or enactment systems (WfMS) and workflow design processes view the workflow as a one-time interaction with the various data sources, i.e., when a workflow is invoked, its steps are executed once and in-order. The fundamental underlying assumption has been that data sources are passive and all interactions are structu...
Conference Paper
Load shedding is an integral component in many Data Stream Management Systems, aiming at preventing the response time from exceeding a user-specified delay target under overload situations. The currently best performing load shedder determines the correct amount of load to shed by utilizing a feedback loop for correcting the statistics-based estima...
Conference Paper
We introduce a web-based, client-server computing infrastructure to assist the interactive navigation of large-scale astronomy observations. Large image datasets are partitioned into a spatial index structure that allows prefix-matching of spatial objects. In conjunction with pixel-based overlays, this approach allows fetching, displaying, panning...
Article
Full-text available
The promise of data-driven decision-making is now being recognized broadly, and there is growing enthusiasm for the notion of "Big Data," including the recent announcement from the White House about new funding initiatives across different agencies, that target research for Big Data. While the promise of Big Data is real -- for example, it is estim...
Article
This demo presents AstroShelf, our on-going effort to enable astrophysicists to collaboratively investigate celestial objects using data originating from multiple sky surveys, hosted at different sites. The AstroShelf platform combines database and data stream, workflow and visualization technologies to provide a means for querying and displaying t...
Article
Aggregate Continuous Queries (ACQs) are both a very popular class of Continuous Queries (CQs) and also have a potentially high execution cost. As such, optimizing the processing of ACQs is imperative for Data Stream Management Systems (DSMSs) to reach their full potential in supporting (critical) monitoring applications. For multiple ACQs that vary...
Conference Paper
The emergence of Data Stream Management Systems (DSMS) facilitates implementing many types of monitoring applications via continuous queries (CQs). However, these applications usually have different quality-of-service requirements for different CQs. In this work, we are proposing the Adaptive Broadcast Disks (ABD) scheduler, a new scheduling policy...
Article
Full-text available
In this work we present a new model that combines two different types of preferences, qualitative and quantitative. We show how our model can support different types of preferences at different granularity levels and how an application can use these preferences to retrieve a list of tuples. The new model takes advantage of a graph representation of...
Conference Paper
Full-text available
Data Streams Management Systems are designed to support monitoring applications, which require the processing of hundreds of Aggregate Continuous Queries (ACQs). These ACQs typically have different time granularities, with possibly different selection predicates and group-by attributes. In order to achieve scalability in the presence of heavy workl...
Conference Paper
Full-text available
Energy consumption for computing devices in general and for data centers in particular is receiving increasingly high attention, both because of the increasing ubiquity of computing and also because of increasing energy prices. In this work, we propose QMD (Quasi Mirrored Disks) that exploit flash as a write buffer to complement RAID systems consis...
Conference Paper
Traditional workflow enactment systems view a workflow as a one-time interaction with various data sources, executing a series of steps once, whenever the workflow results are requested. The fundamental underlying assumption has been that data sources are passive and all interactions are structured along the request/reply (query) model. Hence, trad...
Conference Paper
Full-text available
Complex event detection over data streams has become ubiquitous through the widespread use of sensors, wireless connectivity and the wide variety of end-user mobile devices. Typically, event detection is carried out by a central server executing continuous queries. In this demonstration, we focus on the case where users with mobile devices submit c...
Conference Paper
Full-text available
Complex event detection over data streams has become ubiquitous through the widespread use of sensors, wireless connectivity and the wide variety of end-user mobile devices. Typically, such event detection is carried out by a data stream management system executing continuous queries (CQs), registered by the users. In this paper, we consider the si...
Conference Paper
Full-text available
In a Data Stream Management System (DSMS), continuous queries (CQs) registered by different applications inherently have different levels of importance (i.e., quality of service (QoS) and quality of data (QoD) requirements). Moreover, the shift to cloud services and the growing need for monitoring applications will inevitably lead to the establishm...
Conference Paper
Modern cloud data storage services have powerful capabilities for data-sets that can be indexed by a single key-key-value stores-and for data-sets that are characterized by multiple attributes (such as Google's BigTable). These data stores have non-ideal overheads, however, when graph data needs to be maintained; overheads are incurred because rela...
Conference Paper
Full-text available
Data streams have become pervasive and data production rates are increasing exponentially, driven by advances in technology, for example the proliferation of sensors, smart phones, and their applications. This fact effectuates an unprecedented opportunity to build real-time monitoring and analytics applications, which when used collaboratively and...
Conference Paper
Full-text available
Complex event detection over data streams has become ubiquitous through the widespread use of sensors, wireless connectivity and the wide variety of end-user mobile devices. Typically, such event detection is carried out by a data stream management server executing continuous queries, previously submitted by the users. In this paper, we consider th...
Conference Paper
Full-text available
Amazon, Google, and IBM now sell cloud computing services.We consider the setting of a for-profit business selling data stream monitoring/management services and we investigate auction-based mechanisms for admission control of continuous queries. When submitting a query, each user also submits a bid of how much she is willing to pay for that query...
Article
Full-text available
Quality of Service (QoS) and Quality of Data (QoD) are the two major dimensions for evaluating any query processing system. In the context of data stream management systems (DSMSs), multi-query scheduling has been exploited to improve QoS. In this paper, we are proposing to exploit query scheduling to improve QoD in DSMSs. Specifically, we are pres...
Conference Paper
Full-text available
We highlight the privacy issues that have arisen from the introduction of the Greek Social Security Number (AMKA), in connection with the availability of personally identifiable information on Greek web sites. In particular, we identify privacy problems with the current AMKA setup and present data from a web study we conducted in May 2009, exposing...
Article
Full-text available
Tularemia is caused by the category A biodefense agent Francisella tularensis. This bacterium is associated with diverse environments and a plethora of arthropod and mammalian hosts. How F. tularensis adapts to these different conditions, particularly the eukaryotic intracellular environment in which it replicates, is poorly understood. Here, we de...
Conference Paper
Full-text available
We study query scheduling in Wireless Sensor Networks (WSNs) with a focus on two important metrics: Quality of Service (QoS) and Quality of Data (QoD). The motivation comes from our observation that most WSN scheduling techniques ignore the quality requirements of queries. As a result, they are inefficient or inapplicable to quite a few application...
Conference Paper
Full-text available
Wireless sensor networks link the physical and digital worlds enabling both surveillance as well as scientific exploration. In both cases, on-line detection of interesting events can be accomplished with continuous queries (CQs) in a Data Stream Management System (DSMS). However, the quality- of-service requirements of detecting these events are di...
Conference Paper
Full-text available
Casualties in emergency situations are often caused by panic and in cases where building evacuation is required, they are often caused by a disorganized evacuation. This has motivated us to design a two-layer indoor evacuation system that takes advantage of two technologies all people carry on them, namely, cellular phones with cameras and RFID car...
Conference Paper
Full-text available
The performance provided by an interactive online database system is typically measured in terms of meeting certain pre-specified Ser- vice Level Agreements (SLAs), with expected transaction latency being the most commonly used type of SLA. This form of SLA acts as a soft deadline for each transaction, and user satisfacti on can be measured in term...
Article
Full-text available
Unstructured peer-to-peer (P2P) networks suffer from the increased volume of traffic produced by flooding. Methods such as random walks or dynamic querying managed to limit the traffic at the cost of reduced network coverage. In this paper, we propose a partitioning method of the unstructured overlay network into a relative small number of distinct...
Conference Paper
Full-text available
In highly interactive dynamic Web database systems, user satisfaction determines their success. In such systems, user requested web pages are dynamically created by executing a number of database queries or Web transactions. In this paper, we model the interrelated transactions generating a web page as workflows and quantify the user satisfaction b...
Conference Paper
Full-text available
Traditional workflow enactment systems and workflow de- sign processes view the workflow as a one-time interaction with the vari- ous data sources, executing a series of steps once, whenever the workflow results are requested. The fundamental underlying assumption has been that data sources are passive and all interactions are structured along the...
Article
Full-text available
Recently, several policies have been proposed for scheduling multiple Continuous Queries (CQs) in a Data Stream Management System (DSMS). The decision on which policy to use plays an important role in shaping the percieved online performance provided by the DSMS. In this tutorial, we provide an overview of different policies employed by current CQ...
Conference Paper
Full-text available
Annotations play an increasingly crucial role in scientific exploration and discovery, as the amount of data and the level of collaboration among scien- tists increase. In this paper, we introduce ViP, a user-cent ric, view-based annota- tion framework that promotes annotations as first-class cit izens. ViP introduces novel ways of propagating anno...
Conference Paper
Full-text available
Annotations play an increasingly crucial role in scientific exploration and discovery, as the amount of data and the level of collaboration among scientists increases. Although all such systems are implemented to take user input (i.e., the annotations themselves), very few systems are user-centric, taking into account user preferences on how annota...
Conference Paper
Full-text available
User satisfaction determines the success of Web-database applications. User satisfaction can be expressed in terms of expected response time or expected delay. Given the bursty and unpredictable behavior of web user populations, we model user requests as transactions with soft-deadlines. For such a model of user requests with soft- deadlines, the h...
Article
Full-text available
The emergence of monitoring applications has precipitated the need for Data Stream Management Systems (DSMSs), which constantly monitor incoming data feeds (through registered continuous queries), in order to detect events of interest. In this article, we examine the problem of how to schedule multiple Continuous Queries (CQs) in a DSMS to optimize...
Book
This book constitutes the thoroughly refereed proceedings of the Second GeoSensor Networks Conference, held in Boston, Massachusetts, USA, in October 2006. The conference addressed issues related to the collection, management, processing, analysis, and delivery of real-time geospatial data using distributed geosensor networks. This represents an ev...
Article
Full-text available
Sensor networks enable an unprecedented level of access to the physical world, and hold tremendous potential to revolutionize many application domains. Research on sensor networks spans many areas of computer science, and there are now major conferences, e.g., IPSN and SenSys, devoted to sensor networks. However, there is no focused forum for discu...
Chapter
Full-text available
Unstructured P2P systems exhibit a great deal of robustness and self-healing at the cost of reduced scalability. Resource location is performed using a broadcast-like process called flooding. The work presented in this paper comprises an effort to reduce the overwhelming volume of traffic generated by flooding, thus increasing the scalability of un...
Conference Paper
Full-text available
The proliferation of database-driven web sites (or web-databases) has brought upon a plethora of applications where both Quality of Service (QoS) and Quality of Data (QoD) are of paramount im- portance to the end users. In our previous work, we have pro- posed Quality Contracts, a comprehensive framework for speci- fying multiple dimensions of QoS/...
Conference Paper
Full-text available
Typical Web-database systems receive read-only queries, that generate dynamic Web pages as a response, and write-only updates, that keep information up-to-date. Users expect short response times and low staleness. However, it may be extremely hard to apply all updates on time, i.e., keep zero staleness, and also get fast response times, especially...
Conference Paper
Full-text available
Search engine quality is impacted by two factors: the qual- ity of the ranking/matching algorithm used and the fresh- ness of the search engine's index, which maintains a "snap- shot" of the Web. Web crawlers capture web pages and refresh the index, but this is always a never-ending quest, as web pages get updated frequently (and thus have to be re...
Conference Paper
Full-text available
Advances in microsensor technology as well as the development of miniaturized computing platforms enable us to scatter numerous untethered sensing devices in hard to reach terrains, and continuously collect geospatial information in never before seen spatial and temporal scales. These geosensor network technologies are revolutionizing the way that...
Conference Paper
Full-text available
Real-time enterprises rely on user queries being answered in a timely fashion and using fresh data. This is relatively easy when systems are lightly loaded and both queries and updates can be finished quickly. However, this goal becomes fundamentally hard to achieve due to the high volume of queries and updates in real systems, especially in period...
Conference Paper
Full-text available
Web-database systems are nowadays an integral part of everybody’s life, with applications ranging from monitoring/ trading stock portfolios, to personalized blog aggregation and news services, to personalized weather tracking services. For most of these services to be successful (and their users to be kept satisfied), two criteria need to be met: u...
Conference Paper
Full-text available
We assume a sensor network with data-centric storage, where sensor data is stored within the sensor network and ad hoc queries are disseminated and processed inside the network. In such an environment, there are often similarities among submitted queries. Using current solutions, similar queries may have to go through the same expensive query proce...
Conference Paper
Full-text available
In this work, we address the problem of replica selection in dis- tributed query processing over the Web, in the presence of user preferences for Quality of Service and Quality of Data. In particu- lar, we propose RAQP, which stands for Replication-Aware Query Processing. RAQP uses an initial statically-optimized logical plan, and then selects the...
Conference Paper
Full-text available
Sensors provide unprecedented access to a wealth of information from the physical environment in real-time. However, they suf- fer from a variety of resource limitations, most importantly power consumption and communication bandwidth. Additionally, envi- ronmental conditions can contribute to sensor failures, disrupting the flow of query results. I...
Conference Paper
Full-text available
Data Stream Management Systems (DSMS) typically host multiple Continuous Queries (CQ) that process streams of data. In this pa- per, we examine the problem of how to schedule CQs in a DSMS to optimize for average QoS. We show that unlike standard on-line systems, scheduling policies in DSMSs that optimize for average response time will be different...
Conference Paper
Full-text available
The Secure and robust Critical Information-Technology Infrastructure (S-CITI for short) project aims at providing support to Emergency Managers (EMs) that are faced with management of resources and with decisions before, during, and after emergencies or disasters. Our approach consists of using new and existing sensors to gather data from the field...
Article
Full-text available
Wireless sensor networks are expected to be an integral part of any pervasive computing environment. This implies an ever-increasing need for efficient energy and resource management of both the sensor nodes, as well as the overall sensor network, in order to meet the expected quality of data and service requirements. There have been numerous studi...
Conference Paper
Full-text available
Interconnected computing nodes in pervasive systems demand efficient management to ensure longevity and effectiveness. This is particularly true when we consider wireless sensor networks, for which we propose a new scheme for adaptive route management. There have been numerous studies that have looked at the routing of data in sensor networks with...
Conference Paper
Full-text available
In this paper we propose a query-driven approach for tuning the time/energy trade-off in sensor networks with mobile sensors. The tuning factors include re-positioning of mobile sensors and changing their transmission ranges. We propose an algebraic query optimization framework that explores these factors while utilizing collision-free concurrent d...

Network

Cited By