ThesisPDF Available

Query Morphing: An Interactive Technique for Data Exploration and Query

Authors:

Abstract and Figures

We are living in the data-centric era; data is omnipresence in either structured or unstructured forms. This data is generated through social media, blogs, lab simulations; sensors etc. for user related operations. Due to this big data occurrence, acquisition of relevant information, becomes a challenging task. The users of such systems are oblivious about the data content and its semantics and often uncertain of their real information needs. Here, evolving search needs are additional challenges. A user often struggle in constructing a query that retrieves relevant information, and hence ended up with a short close ill-phrased navigational query. In this setting, the traditional search systems do information search inefficiently and thus becomes a daunting task. Hence, a data exploration (DE) technique is evolved. The data exploration or say interactive data exploration(IDE) guides user throughout his search session by retrieving valuable data objects on-the-fly. Data exploration is iterative, multi-disciplinary and opportunistic user behavior of information search. For effective data exploration the key challenges are (i) vaguely articulated user’s information needs (ii) utilization of available data space for exploration (iii) identification of shift in the user’s intends (iv) information overload and many more. Therefore, instead of this traditional search, we need data exploration mechanism in which a naïve user walks through the database and stops when satisfactory information is met. During this, a user iteratively transforms his search request in order to gain relevant information; morphing is a historic approach for the generation of a various transformation of input. A proximity-based approach for data exploration and query reformulation to aide exploratory search is the key objective of dissertation. In proposed approach, data objects are retrieves along with additional information via a small transformation of user’s initial query, called as ‘Query Morphs’. The dissertation’s contribution is: (i) to design an exploratory system that supports data exploration and inherits query reformulation. (ii) to assist user in his exploratory query formulation task via additional results. (iii) to aide, in generation of query variants (iv) to enable user, for steering towards the region-of-interest. (v) Minimize user effort and improve search precession. The assessment of proposed strategy indicates significant improvement in data exploration and query reformulation effort of the user and DE system.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
We are living in age where large information in the form of structured and unstructured data is generated through social media, blogs, lab simulations, sensors etc. on daily basis. Due to this occurrences, acquisition of relevant information becomes a challenging task for humans. Fundamental understanding of complex schema and content is necessary for formulating data retrieval request. Therefore, instead of search, we need exploration in which a naïve user walks through the database and stops when satisfactory information is met. During this, a user iteratively transforms his search request in order to gain relevant information; morphing is an approach for generation of various transformation of input. We proposed ‘Query morphing’, an approach for query reformulation based on data exploration. Identified design concerns and implementation constraints are also discussed for the proposed approach.
Article
Full-text available
We describe the Sloan Digital Sky Survey IV (SDSS-IV), a project encompassing three major spectroscopic programs. The Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2) is observing hundreds of thousands of Milky Way stars at high resolution and high signal-to-noise ratio in the near-infrared. The Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey is obtaining spatially-resolved spectroscopy for thousands of nearby galaxies (median redshift of z = 0.03). The extended Baryon Oscillation Spectroscopic Survey (eBOSS) is mapping the galaxy, quasar, and neutral gas distributions between redshifts z = 0.6 and 3.5 to constrain cosmology using baryon acoustic oscillations, redshift space distortions, and the shape of the power spectrum. Within eBOSS, we are conducting two major subprograms: the SPectroscopic IDentification of eROSITA Sources (SPIDERS), investigating X-ray AGN and galaxies in X-ray clusters, and the Time Domain Spectroscopic Survey (TDSS), obtaining spectra of variable sources. All programs use the 2.5-meter Sloan Foundation Telescope at Apache Point Observatory; observations there began in Summer 2014. APOGEE-2 also operates a second near-infrared spectrograph at the 2.5-meter du Pont Telescope at Las Campanas Observatory, with observations beginning in early 2017. Observations at both facilities are scheduled to continue through 2020. In keeping with previous SDSS policy, SDSS-IV provides regularly scheduled public data releases; the first one, Data Release 13, was made available in July 2016.
Book
Information seeking is a fundamental human activity. In the modern world, it is frequently conducted through interactions with search systems. The retrieval and comprehension of information returned by these systems is a key part of decision making and action in a broad range of settings. Advances in data availability coupled with new interaction paradigms, and mobile and cloud computing capabilities, have created a broad range of new opportunities for information access and use. In this comprehensive book for professionals, researchers, and students involved in search system design and evaluation, search expert Ryen White discusses how search systems can capitalize on new capabilities and how next-generation systems must support higher order search activities such as task completion, learning, and decision making. He outlines the implications of these changes for the evolution of search evaluation, as well as challenges that extend beyond search systems in areas such as privacy and societal benefit. Discusses many new technologies and their role in the search process. Covers important issues involving data availability and privacy. Talks about these issues in depth, educating searchers in the benefits and potential costs involved in using big (and small) data. Combines research from information retrieval, information science, and human-computer interaction.
Book
You're being asked to quantify your usability improvements with statistics. But even with a background in statistics, you are hesitant to statistically analyze their data, as they are often unsure which statistical tests to use and have trouble defending the use of small test sample sizes. The book is about providing a practical guide on how to solve common quantitative problems arising in usability testing with statistics. It addresses common questions you face every day such as: Is the current product more usable than our competition? Can we be sure at least 70% of users can complete the task on the 1st attempt? How long will it take users to purchase products on the website? This book shows you which test to use, and how provide a foundation for both the statistical theory and best practices in applying them. The authors draw on decades of statistical literature from Human Factors, Industrial Engineering and Psychology, as well as their own published research to provide the best solutions. They provide both concrete solutions (excel formula, links to their own web-calculators) along with an engaging discussion about the statistical reasons for why the tests work, and how to effectively communicate the results. *Provides practical guidance on solving usability testing problems with statistics for any project, including those using Six Sigma practices *Show practitioners which test to use, why they work, best practices in application, along with easy-to-use excel formulas and web-calculators for analyzing data *Recommends ways for practitioners to communicate results to stakeholders in plain English. © 2012 Jeff Sauro and James R. Lewis Published by Elsevier Inc. All rights reserved.
Chapter
A new trend in many scientific fields is to conduct data-intensive research by collecting and analyzing a large amount of high-density, high-quality, multi-modal data streams. In this chapter we present a research framework for analyzing and mining such data streams at large-scale; we exploit parallel sequential pattern mining and iterative MapReduce in particular to enable human-in-the-loop large-scale data exploration powered by High Performance Computing (HPC). One basic problem is that, data scientists are now working with datasets so large and complex that it becomes difficult to process using traditional desktop statistics and visualization packages, requiring instead “massively parallel software running on tens, hundreds, or even thousands of servers” (Jacobs, Queue 7(6):10:10–10:19, 2009). Meanwhile, discovering new knowledge requires the means to exploratively analyze datasets of this scale—allowing us to freely “wander” around the data, and make discoveries by combining bottom-up pattern discovery and top-down human knowledge to leverage the power of the human perceptual system. In this work, we first exploit a novel interactive temporal data mining method that allows us to discover reliable sequential patterns and precise timing information of multivariate time series. For our principal test case of detecting and extracting human sequential behavioral patterns over multiple multi-modal data streams, this suggests a quantitative and interactive data-driven way to ground social interactions in a manner that has never been achieved before. After establishing the fundamental analytics algorithms, we proceed to a research framework that can fulfill the task of extracting reliable patterns from large-scale time series using iterative MapReduce tasks. Our work exploits visual-based information technologies to allow scientists to interactively explore, visualize and make sense of their data. For example, the parallel mining algorithm running on HPC is accessible to users through asynchronous web service. In this way, scientists can compare the intermediate data to extract and propose new rounds of analysis for more scientifically meaningful and statistically reliable patterns, and therefore statistical computing and visualization can bootstrap each another. Finally, we show the results from our principal user application that can demonstrate our system’s capability of handling massive temporal event sets within just a few minutes. All these combine to reveal an effective and efficient way to support large-scale data exploration with human in the loop.
Article
Although personality traits may influence information-seeking behavior, little is known about this topic. This study explored the impact of the Big Five personality traits on human online information seeking. For this purpose, it examined changes in eye-movement behavior in a sample of 75 participants (36 male and 39 female; age: 22–39 years; experience conducting online searches: 5–12 years) across three types of information-seeking tasks – factual, exploratory, and interpretive. The International Personality Item Pool Representation of the NEO PI-R TM (IPIP-NEO) was used to assess the participants' personality profile. Hierarchical cluster analysis was used to categorize participants based on their personality traits. A three cluster solution was found (cluster one consists of participants who scored high in conscientiousness; cluster two consists of participants who scored high in agreeableness; and cluster three consists of participants who scored high in extraver-sion). Results revealed that individuals high in conscientiousness performed fastest in most information-seeking tasks, followed by those high in agreeableness and extraversion. This study has important practical implications for intelligent human – computer interfaces, personalization, and related applications.
Conference Paper
Methods have been proposed to assist activities in exploratory search processes, but few allow the users to appropriately manage their own search processes. In this paper, we present the TimeTree workspace provided in our process-oriented search system CiteXplore that supports visualized management of search processes and enables reviewing and retrospecting of information during long-term exploratory search. Formal user experiments with 16 participants have been proposed to evaluate the proposed methods. We also discuss possible research directions to use TimeTree to reuse search experiences by enabling collaborative search and providing visualized recommendations.