ArticlePDF Available

Marchionini, G.: Exploratory search: from finding to understanding. Comm. ACM 49(4), 41-46

Authors:

Abstract

The benefits of research tools in exploratory search with user interface capability are discussed. Exploratoty search provide information along with pitfalls and costs and helps in gaining complete knowledge about the product. The tools to support exploratory searching make possible to use use web resources to conduct lookup, learning, and investigate tasks with fluid user interfaces. A good web design supports exploratory search together with client-side applications, that provides people with ways to trade off personal behavior data for added value services. It is recommended that more computational resources should be designed for exploratory search, that will provide easy to apply exploratory search tools and help information seekers in the use of information resources.
COMMUNICATIONSOF THE ACMApril 2006/Vol. 49, No. 4 41
EXPLORATORY SEARCH:
FROM FINDING TO
UNDERSTANDING
rom the earliest days of computers, search has been a
fundamental application that has driven research and
development. For example, a paper published in the
inaugural year of the IBM Journal 36 years ago out-
lined challenges of text retrieval that continue to the
present [4]. Todays data storage and retrieval
applications range from database systems that
manage the bulk of the world’s structured data
to Web search engines that provide access to
petabytes of text and multimedia data. As
computers have become consumer products and the
Internet has become a mass medium, searching the
Web has become a daily activity for everyone from
children to research scientists.
By Gary Marchionini
F
Research tools critical for exploratory search success
involve the creation of new interfaces that move the
process beyond predictable fact retrieval.
42 April 2006/Vol. 49, No. 4 COMMUNICATIONSOF THE ACM
As people demand more of Web services, short
queries typed into search boxes are not robust enough
to meet all of their demands. In studies of early hyper-
text systems, we distinguished analytical search strate-
gies that depend on a carefully planned series of
queries posed with precise syntax from browsing
strategies that depend on on-the-fly selections [7].
The Web has legitimized browsing strategies that
depend on selection, navigation, and trial-and-error
tactics, which in turn facilitate increasing expectations
to use the Web as a source for learning and
exploratory discovery. This overall trend toward more
active engagement in the search process leads the
research and develop-
ment community to
combine work in
human-computer inter-
action (HCI) and infor-
mation retrieval (IR).
This article distinguishes
exploratory search that
blends querying and
browsing strategies from
retrieval that is best
served by analytical
strategies, and illustrates
interactive IR practices
and trends with examples from two user interfaces
that support the full range of strategies.
Exploratory search. Search is a fundamental life
activity. All organisms seek sustenance and propaga-
tion and Maslow’s classic hierarchy of needs theory
predicts that once people fulfill basic physiological
needs, we seek to fulfill social and psychological needs
to belong and to know our world. These higher-level
needs are often informational and this in turn
explains why information resources and communica-
tion facilities are so sophisticated in developed soci-
eties.
Ahierarchy of information needs may
also be defined that ranges from basic facts that guide
short-term actions (for example, the predicted chance
for rain today to decide whether to bring an umbrella)
to networks of related concepts that help us under-
stand phenomena or execute complex activities (for
example, the relationships between bond prices and
stock prices to manage a retirement portfolio) to com-
plex networks of tacit and explicit knowledge that
accretes as expertise over a lifetime (for example, the
most promising paths of investigation for the sea-
soned scholar or designer). For these respective layers
of information needs, we can define kinds of infor-
mation-seeking activities, each with associated strate-
gies and tactics that might be supported with
computational tools.
Figure 1 depicts three kinds of search activities that
we label lookup, learn, and investigate; and highlights
exploratory search as especially pertinent to the learn
and investigate activities.1These activities are repre-
sented as overlapping clouds because people may
engage in multiple kinds of search in parallel, and
some activities may be embedded in others; for exam-
ple, lookup activities are
often embedded in learn
or investigate activities.
The searcher views these
activities as tasks, so we
use “task” in the following
discussion.
Lookup is the most
basic kind of search task
and has been the focus of
development for database
management systems and
much of what Web search
engines support. Lookup
tasks return discrete and
well-structured objects
such as numbers, names, short statements, or specific
files of text or other media. Database management
systems support fast and accurate data lookups in
business and industry; in journalism, lookups are
related to questions of who, when, and where as
opposed to what, how, and why questions. In
libraries, lookups have been called “known item
searches to distinguish them from subject or topical
searches.
Most people think of lookup searches as “fact
retrieval” or “question answering.” In general, lookup
tasks are suited to analytical search strategies that
begin with carefully specified queries and yield precise
results with minimal need for result set examination
and item comparison. Clearly, lookup tasks have been
among the most successful applications of computers
and remain an active area of research and develop-
ment. However, as the Web has become the informa-
tion resource of first choice for information seekers,
people expect it to serve other kinds of information
needs and search engines must strive to provide ser-
vices beyond lookup.
March fig 1 (4/06)- 26.5 picas
Figure 1. Search Activities.
March fig 1 (4/06) - 19.5 picas
Investigate
Learn
Lookup
Exploratory Search
Fact retrieval
Known item search
Navigation
Transaction
Verification
Question answering
Knowledge acquisition
Comprehension/Interpretation
Comparison
Aggregation/Integration
Socialize
Accretion
Analysis
Exclusion/Negation
Synthesis
Evaluation
Discovery
Planning/Forecasting
Transformation
Investigate
Learn
Lookup
Exploratory Search
Fact retrieval
Known item search
Navigation
Transaction
Verification
Question answering
Knowledge acquisition
Comprehension/Interpretation
Comparison
Aggregation/Integration
Socialize
Accretion
Analysis
Exclusion/Negation
Synthesis
Evaluation
Discovery
Planning/Forecasting
T
ransformation
Figure 1. Search activities.
1There are many important theoretical models of information search, for example,
Saracevic summarizes Belkin’s and Ingrewsen’s in his stratified model [9].
COMMUNICATIONSOF THE ACMApril 2006/Vol. 49, No. 4 43
Searching to learn is increasingly viable as more pri-
mary materials go online. Learning searches involve
multiple iterations and return sets of objects that
require cognitive processing and interpretation. These
objects may be instantiated in various media (graphs,
or maps, texts, videos) and often require the informa-
tion seeker to spend time scanning/viewing, compar-
ing, and making qualitative judgments. Note that
“learning” here is used in its general sense of develop-
ing new knowledge and thus includes self-directed
life-long learning and professional learning as well as
the usual directed learning in schools. Using termi-
nology from Blooms taxonomy of educational objec-
tives, searches that support learning aim to achieve:
knowledge acquisition, comprehension of concepts or
skills, interpretation of ideas, and comparisons or
aggregations of data and concepts.
Another important kind of search that falls under
the learn search activity is social searching where peo-
ple aim to find communities of interest or discover
new friends in social network systems (for example,
www.friendster.com). Although the motivations may
be distinct from other learning search examples, the
exploratory strategies for locating, comparing, and
assessing results are similar. Much of the search time
in learning search tasks is devoted to examining and
comparing results and reformulating queries to dis-
cover the boundaries of meaning for key concepts.
Learning search tasks are best suited to combinations
of browsing and analytical strategies, with lookup
searches embedded to get one into the correct neigh-
borhood for exploratory browsing.
Searches that support investigation involve
multiple iterations that take place over perhaps very
long periods of time and may return results that are
critically assessed before being integrated into per-
sonal and professional knowledge bases. Investigative
searches aim to achieve Bloom’s highest-level objec-
tives such as analysis, synthesis, and evaluation and
require substantial extant knowledge. Such searches
often include explicit evaluative annotation that also
becomes part of the search results. Investigative
searching may be done to support planning and fore-
casting, or to transform existing data into new data or
knowledge. In addition to finding new information,
investigative searches may seek to discover gaps in
knowledge (for example, “negative search” [1]) so that
new research can begin or dead-end alleys can be
avoided. Investigative searches also include alerting
service profiles that are periodically and automatically
executed.
Serendipitous browsing that is done to stimulate
analogical thinking is another kind of investigative
search. Investigative searching is more concerned
with recall (maximizing the number of possibly rele-
vant objects that are retrieved) than precision (mini-
mizing the number of possibly irrelevant objects that
are retrieved) and thus not well supported by today’s
Web search engines that are highly tuned toward pre-
cision in the first page of results. This explains why so
many specialized search services are emerging to aug-
ment general search engines. Because experts typically
know which information resources to use, they can
formulate precise analytical queries but require
sophisticated browsing services that also provide
annotation and result manipulation tools.
These distinctions among different types of search
activities suggest that lookup searches lend themselves
to formalized turn-taking where the information
seeker poses a query and the system does the retrieval
and returns results. Thus, the human and system take
turns in retrieving the best result. However, learning
and investigative searching require strong human par-
ticipation in a more continuous and exploratory
process.
To support the full range of search activities, the IR
community is turning increasingly to CHI develop-
ments to discover ways to bring humans more actively
into the search process. Rather than viewing the
search problem as matching queries and documents
for the purpose of ranking, interactive IR views the
search problem from the vantage of an active human
with information needs, information skills, powerful
digital library resources situated in global and locally
connected communities—all of which evolve over
time. The digital library resources are assumed to
include dynamic contents such as other humans, sen-
sors, and computational tools. In this view, the search
system designer aims to bring people more directly
into the search process through highly interactive user
interfaces that continuously engage human control
over the information seeking process. Although this is
an ambitious design goal, we are beginning to see
some progress in systems that are the forerunners to
the exploratory search engines that will evolve in the
years ahead.
TOWARD EXPLORATORY SEARCH SYSTEMS
Menus in restaurants serve the needs of both man-
agement and diners. From the system point of view,
menus scope the kinds of products and services
available and thus optimize performance; and from
the patrons point of view they simplify selection and
specification of gastronomical needs. In the com-
puter industry, menus were the first kind of alterna-
tive to command systems and remain an important
interaction style for selection and browsing.
Expandable hierarchical file structures are special-
ized menus that serve as the mainstay of personal
computing, cell phone, and PDA
interfaces.
Hypertext links in texts were
called “embedded menus” by Shnei-
derman [10] and current Web direc-
tory structures (for example, Open
Directory) represent sophisticated
menu structures for finding infor-
mation on Web pages. In the data-
base realm, query-by-example
(QBE) interfaces were early alterna-
tives to formal language interfaces
and QBE-like systems remain the
primary method for supporting
non-textual queries in multimedia
systems. These interface design
experiences demonstrate the efficacy
of selection as a form of query spec-
ification, and inspire link navigation as a primary user
interface interaction style in the Web environment.
There is also substantial evidence in the
IR literature that relevance feedback—asking informa-
tion seekers to make relevance judgments about
returned objects and then executing a revised query
based on those judgments—is a powerful way to
improve retrieval. However, practice shows that people
are often unwilling to take the added step to provide
feedback when the search paradigm is the classic turn-
taking model. T
o engage people more fully in the
search process and put them in continuous control,
researchers are devising highly interactive user inter-
faces. Shneiderman and his colleagues created
dynamic query” interfaces [10] that use mouse actions
such as slider adjustments and brushing techniques to
pose queries and client-side processing to immediately
update displays to engage information seekers in the
search process. A number of prototypes (for example,
Dynamic Home Finder, SpotFire, TreeMaps) have
come from these lines of research and development.
These techniques are especially
good for exploration where high-
level overviews of a collection and rapid previews of
objects help people to understand data structures and
infer relationships among concepts.
Other researchers have investigated these highly
interactive interaction styles. Hearst and her col-
leagues created a series of interfaces that tightly cou-
ple queries to results, ranging from TileBars for text
searching [2] to Flamenco (see the sidebar in this sec-
tion), a series of interfaces that provides hierarchical,
faceted metadata as entry points for exploration and
selection. Hearst and Pederson [3], and others (for
example, [11]) have used clustering of search results
to make search more interactive, as represented by
current Web search alternatives such as Clusty
(clusty.com) that aim to provide groups of results that
can be used to further search. Fox et al., schraefel et
al., and Cutrell and Dumais offer other examples in
44 April 2006/Vol. 49, No. 4 COMMUNICATIONSOF THE ACM
Figure 2. Open Video
preview display for a
specific video.
Exploratory search makes us all
pioneers and adventurers in a new world
of information riches awaiting discovery
along with new pitfalls and costs.
this section of blending HCI and IR to support
exploratory search. Our work at the University of
North Carolina parallels these efforts and two exam-
ple search systems that support exploratory search are
illustrated here.
OPEN VIDEO EXAMPLES
The Open Video Digital Library (www.open-
video.org) aims to give people agile views of digital
video files [6]. The Web-based interface provides a
number of alternative ways to slice and dice the video
corpus so that people can see what is in the collection
(overview) and determine greater details about a
video segment (preview) before downloading it.
There are different kinds of surrogates provided,
including textual and visual representations and sev-
eral layers of detail and alternative display options to
give people good control. The user interface was
designed to optimize agile exploration before down-
loading while allowing standard text-based search.
A number of user studies were conducted to deter-
mine which surrogates are effective and what parameters
to use as defaults. This interface has proven to be quite
effective over the past few years as thousands of users
access the corpus each month to find videos for educa-
tional and research purposes. The home page provides a
typical search form but also partitions the video collec-
tion in various ways so that people can select a specific
partition to explore. Result set pages provide alternatives
for what is displayed (formats and
level of text and visual detail) and how
the results are ordered (relevance, title,
duration, date, popularity).
Figure 2 shows a preview for a
video with textual metadata and up
to three kinds of visual surrogate
(storyboard, fast forward, excerpt).
The searcher may get more details
by selecting the visual surrogate or
download a video file in a format of
their choice. The Open Video search
system is meant to put people in
control and support exploration as
well as lookup. Our transaction logs
indicate that half of the searches
conducted begin with keyword
strategies (analytical strategies) and
the remainder begin with partition
selection (browsing strategies).
As part of our efforts to develop highly
interactive UIs that support exploratory search for
government statistical Web sites, we developed a gen-
eral-purpose interface called the Relation Browser
(RB) that can be applied to a variety of data sets [5].
The RB aims to facilitate exploration of the relation-
ships between (among) different data facets, display
alternative partitions of the database with mouse
actions, and serve as an alternative to existing search
and navigation tools. RB provides searchers with a
small number of facets such as topic, time, space, or
data format; each of which is limited to a small num-
ber of attributes that will fit on the screen, simple
mouse-brushing capabilities to explore relationships
among the facets and attributes; and immediate
results displays that dynamically change as brushing
continues. Figure 3 illustrates how the RB works for a
database such as the Open Video DL. Panel 3a depicts
a portion of the RB at startup with the mouse posi-
tioned over the Educational category in the genre
facet. The number of videos in the library in each of
the facet-categories is immediately shown along with
a set of bars that show the distribution visually. Thus,
simply moving the mouse partitions the full corpus
COMMUNICATIONSOF THE ACMApril 2006/Vol. 49, No. 4 45
Figure 3. (a) Relation browser interface for
Open Video Library with mouse over the
education facet; (b) Relation browser display
after educational and Spanish selected, mouse
over fourth title.
(a)
(b)
into a view of the educational items. Clicking the
mouse freezes this partition and allows continued
browsing or retrieval of the partition from the server.
Panel 3b shows a portion of the display after the
user has selected the Spanish language category
within the educational partition and then clicked on
the Search button. The display shows the number of
items in each facet-category for the 41 videos in the
result set in the upper panel and the titles, keywords,
and producing agent for the videos in the bottom
panel with additional metadata available on mouse-
over. These items are hot linked to the Open Video
DL. String search within the results fields is also sup-
ported and all results panel and query panel displays
are coordinated to update in parallel when any mouse
or keyboard action is executed.
The RB has been instantiated for dozens of data-
bases, including several U.S. federal statistical agency
Web sites. RB was designed to facilitate exploration
and is less direct for simple lookup tasks than for
exploratory tasks. Our user studies have demonstrated
its efficacy when compared to standard Web-based
retrieval. To support the dynamics, the metadata and
query results must be available on the client side, thus
limiting scalability to databases of roughly tens of
thousands of items. We see this specialized kind of
interface as an augmentation of todays powerful
lookup engines. The RB could be used as a tool for
exploring very large databases where the results are not
individual items but subcollections or portals. Alterna-
tively, the RB may be used after a standard Web search
has been executed to investigate the result set if on-the-
fly automatic classification is used.
CONCLUSION
It is clear that better tools to support exploratory
searching are needed. Oblinger and Oblinger [8]
argue the “Net generation (those who learned to
read after the Web) are qualitatively different in
their informational behaviors and expectations; they
multitask and expect their informational resources
to be electronic and dynamic. The Net generation
will expect to be able to use Web resources to con-
duct lookup, learning, and investigative tasks with
fluid user interfaces.
As people spend more time online, not only will
they increase their expectations about information
tools and content, but there are more opportunities
for mining their behavior patterns and applying
adversarial computing that tries to take advantage of
system and user behaviors. Exploratory search makes
us all pioneers and adventurers in a new world of
information riches awaiting discovery along with new
pitfalls and costs.
Today, executing a query in a Web search engine
not only returns results but targets the searcher for
many kinds of presumably related opportunities and
services. Exploratory search will exacerbate this trend
as more user interaction data will be available for min-
ing and analysis. One implication of considering
good Web design that supports exploratory search
together with client-side applications, like the RB, is
to provide people with ways to trade off personal
behavior data for added value services. Those who do
not want their information behaviors to be mined can
choose to use more client-side exploration tools, only
sending requests for database partitions to the server.
Regardless of where the exploration takes place, it
is clear that more computational resources will be
devoted to exploratory search and the next search
engine behemoths will be the ones that provide easy
to apply exploratory search tools that help informa-
tion seekers get beyond finding to understanding and
use of information resources.
References
1. Garfield, E. When is a negative search result positive? Essays of an Infor-
mation Scientist 1 (Aug. 12, 1970), 117–118.
2. Hearst, M. TileBars: Visualization of term distribution information in
full text information access. In Proceedings of the ACM SIGCHI Con-
ference on Human Factors in Computing Systems (Denver, CO, 1995).
3. Hearst, M. and Pedersen, P. Reexamining the cluster hypothesis: Scat-
ter/Gather on retrieval results. In Proceedings of 19th Annual Interna-
tional ACM/SIGIR Conference (Zurich, 1996).
4. Luhn, H.P. A statistical approach to mechanized encoding and search-
ing of literary information IBM J. of R&D 1, 4 (1957), 309–317.
5. Marchionini, G. and Brunk, B. Toward a general relation browser: A
GUI for information architects. Journal of Digital Information 4, 1
(2003); jodi.ecs.soton.ac.uk/Articles/v04/i01/Marchionini/.
6. Marchionini, G. and Geisler, G. The open video digital library. dLib
Mag. 8, 12 (2002); www.dlib.org/dlib/december02/marchionini/
12marchionini.html.
7. Marchionini, G. and Shneiderman, B. Finding facts vs. browsing
knowledge in hypertext systems. Computer 2, 11 (Nov. 1988), 70–80.
8. Oblinger, D. and Oblinger, J. Is it age or IT: First steps toward under-
standing the Net generation. Educating the net generation. Educause
(2005); www.educause.edu/educatingthenetgen.
9. Saracevic, T. The stratified model of information retrieval interaction:
Extension and applications. In Proceedings of the American Society for
Information Science 34 (1997), 313–327.
10. Shneiderman, B. and Plaisant, C. Designing the User Interface 4th Ed.
Person/Addison-Wesley, Boston, MA, 2005.
11. Zamir, O. and Etzioni, O. Grouper: A dynamic clustering interface to
Web search results. In Proceedings of WWW8, (Toronto, Canada,
1999).
Gary Marchionini (march@ils.unc.edu) is the Cary C. Boshamer
Professor in the School of Information and Library Science at the
University of North Carolina at Chapel Hill.
This work was supported by NSF grants IIS 009963 8 and EIA 0131824.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
© 2006 ACM 0001-0782/06/0400 $5.00
c
46 April 2006/Vol. 49, No. 4 COMMUNICATIONSOF THE ACM
... As a result, each user achieved the same tasks, with the aim of allowing fair comparative evaluation. We consider 5 types of cognitive complexity following three taxonomies [2,12,18]: fact-finding t ask [2], exploratory learning task [12], decision-making, problem solving task [18] and multicriteria-inferential task [2]. In addition to all the data related to search sessions, user expertise, task complexity and participant's final answers at the end of their task completion, we provide human assessments about the queries (exploration, exploitation, narrow exploitation, spelling correction) and expected subtasks for each task w.r.t. the domain-knowledge involved in the task. ...
... As a result, each user achieved the same tasks, with the aim of allowing fair comparative evaluation. We consider 5 types of cognitive complexity following three taxonomies [2,12,18]: fact-finding t ask [2], exploratory learning task [12], decision-making, problem solving task [18] and multicriteria-inferential task [2]. In addition to all the data related to search sessions, user expertise, task complexity and participant's final answers at the end of their task completion, we provide human assessments about the queries (exploration, exploitation, narrow exploitation, spelling correction) and expected subtasks for each task w.r.t. the domain-knowledge involved in the task. ...
... • Search tasks adapted from the taxonomy of Marchionini [12]. The third type of task complexity we use is the exploratory learning task [12]. ...
... The user must frst articulate a search query to fulfll their information goals. This can be especially challenging in new areas where people often lack domain knowledge to know what to ask, let alone how to ask it [71,86,107]. Then, once the user fnds useful information, they must switch their attention back and forth between the resource and sensemaking applications -like notetaking tools -where they collect, annotate, and synthesize information from multiple queries, sources, and sessions. ...
... For example, academics reviewing literature, designers exploring which tool to use, startup founders performing market analysis, or individuals exploring, learning and making decisions like where to vacation. Exploratory searches involve multiple iterations and return sets of information that require cognitive processing and interpretation and often require the information seeker to spend time scanning/viewing, comparing, critically assessing and making qualitative judgments before being integrated into personal and professional knowledge bases [71,107]. The search task does not exist in isolation from the surrounding task context. ...
... Another limitation of the controlled lab study was that we controlled the time of exploratory search and sensemaking to only 45 minutes. However, complex, exploratory information work often span multiple sessions over multiple days [71,107]. This controlled timed experiment might have afected the searcher's normal searching, sensemaking and learning behavior [61]. ...
... becomes ubiquitous, users' information needs are becoming exploratory. In addition to seeking an answer to a precise query, web users demand intuitive methods to explore multimedia information [2]. Of approximately 3.5 billion issued search queries, 38% express information exploration and discovery intent [3]. ...
... AMED software is publicly available to get hands-on interaction via GitHub codespaces. 2 Through this initiative, we aim to encourage future researchers to produce comparable solutions with state-of-the-art exploratory and discovery systems. ...
... Given the user needs, we derived nine tasks by means of semi-structured interviews with eight PhD-level bioinformaticians (Section S2.3). The first five tasks are related to learning (Marchionini, 2006) about the content of a data repository. First, the user needs to be able to (T1) find the annotation terms of a data set, ...
... To aid in determining the relevance of data sets, users must be able to (T5) summarize and view the metadata of data sets. The next four tasks are related to investigating (Marchionini, 2006) data repositories. First, users must be able to (T6) search for data sets. ...
Preprint
Full-text available
The ever-increasing number of biomedical data sets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating data sets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find data sets of interest. We developed SATORI—an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection.SATORI enables researchers to seamlessly search, browse, and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application,which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform.
... For example, a user looking for "the Discovery Channel's dinosaur site with pictures and games of dinosaurs" and a user looking for "different kinds of dinosaurs" can both search with the query "dinosaur". Ambiguous queries can also indicate the user is conducting an exploratory search, such as learning or investigating searches [30]. A popular system feature for query ambiguity is search result page diversification [21,41]. ...
Preprint
A long-standing challenge for search and conversational assistants is query intention detection in ambiguous queries. Asking clarifying questions in conversational search has been widely studied and considered an effective solution to resolve query ambiguity. Existing work have explored various approaches for clarifying question ranking and generation. However, due to the lack of real conversational search data, they have to use artificial datasets for training, which limits their generalizability to real-world search scenarios. As a result, the industry has shown reluctance to implement them in reality, further suspending the availability of real conversational search interaction data. The above dilemma can be formulated as a cold start problem of clarifying question generation and conversational search in general. Furthermore, even if we do have large-scale conversational logs, it is not realistic to gather training data that can comprehensively cover all possible queries and topics in open-domain search scenarios. The risk of fitting bias when training a clarifying question retrieval/generation model on incomprehensive dataset is thus another important challenge. In this work, we innovatively explore generating clarifying questions in a zero-shot setting to overcome the cold start problem and we propose a constrained clarifying question generation system which uses both question templates and query facets to guide the effective and precise question generation. The experiment results show that our method outperforms existing state-of-the-art zero-shot baselines by a large margin. Human annotations to our model outputs also indicate our method generates 25.2\% more natural questions, 18.1\% more useful questions, 6.1\% less unnatural and 4\% less useless questions.
... Web search engines usually explore images via ranked lists or grid layouts. Gary Marchionini introduced exploratory search by interrelating faceted browsing [21]. Capra and Marchionini recommended facets of semantically related search results to provide topic-specific facets access linearly [22]. ...
Article
Full-text available
Nowadays, exponential growth in online production and extensive perceptual power of visual contents (i.e., images) complicate the users' information needs. The research has shown that users are interested in satisfying their visual information needs by accessing the image objects. However, the exploration of images via existing search engines is challenging. Mainly, existing search engines employ linear lists or grid layouts, sorted in descending order of relevancy to the user's query to present the image results, which hinders image exploration via multiple information modalities associated with them. Furthermore, results at lower-ranking positions are cumbersome to reach. This research proposed a Search User Interface (SUI) approach to instantiate the non-linear reachability of the image results by enabling interactive exploration and visualization options. We represent the results in a cluster-graph data model, where the nodes represent images and the edges are multimodal similarity relationships. The results in clusters are reachable via multimodal similarity relationships. We instantiated the proposed approach over a real dataset of images and evaluated it via multiple types of usability tests and behavioral analysis techniques. The usability testing reveals good satisfaction (76.83%) and usability (83.73%) scores.
... Knowledge acquisition concerns the conscious intellectual activities applied during a learning process, such as comprehension of ideas or skills, conceptual interpretation, and compiling of information, that is, data aggregation. (Marchionini, 2006). None of it is captured by the user behavioral metrics (search behavior) currently used because knowledge acquisition involves transfer of learning, which demands assessing prior relevant knowledge and new situations. ...
... Human interaction with user-facing intelligent systems has been explored in many disparate communities (see Bommasani et al., 2021, §2.5). Arranged by community, examples include work in information retrieval and search engines (Salton, 1970;Belkin et al., 1982;Kuhlthau, 1991;Ingwersen, 1992;Marchionini, 2006;Micarelli et al., 2007;White and Roth, 2009;Kelly, 2009;Croft, 2019), recommender systems (Goldberg et al., 1992;Konstan, 2004), voice assistants (Kamm, 1994;Cohen et al., 2004;Harris, 2004), human-robot interaction and assistive technologies (Hadfield-Menell et al., 2016;Mataric, 2017;Xie et al., 2020;Jeon et al., 2020), and broadly as a means for user creativity and expression (Fede et al., 2022;Lee et al., 2022). Relative to these works, we study how the unique aspects of LMs mediate user interaction with language generation systems. ...
Preprint
Full-text available
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
... It is necessary to have exploratory search mechanisms to use data repositories helpfully. As highlighted by Marchionini et al. [27], short queries typed into search boxes do not fulfil current users' needs. Compared to analytical search strategies that depend on a carefully planned series of questions, browsing depends on on-the-fly choices, encompassing selection, navigation, and most importantly, trial-and-error tactics. ...
Article
Full-text available
Biomedical databases often have restricted access policies and governance rules. Thus, an adequate description of their content is essential for researchers who wish to use them for medical research. A strategy for publishing information without disclosing patient-level data is through database fingerprinting and aggregate characterisations. However, this information is still presented in a format that makes it challenging to search, analyse, and decide on the best databases for a domain of study. Several strategies allow one to visualise and compare the characteristics of multiple biomedical databases. Our study focused on a European platform for sharing and disseminating biomedical data. We use semantic data visualisation techniques to assist in comparing descriptive metadata from several databases. The great advantage lies in streamlining the database selection process, ensuring that sensitive details are not shared. To address this goal, we have considered two levels of data visualisation, one characterising a single database and the other involving multiple databases in network-level visualisations. This study revealed the impact of the proposed visualisations and some open challenges in representing semantically annotated biomedical datasets. Identifying future directions in this scope was one of the outcomes of this work.
Article
Full-text available
User-system interaction is a critical aspect for IR and digital libraries as well. Thus, a better understanding and modeling of these processes is of great importance to efforts aimed at making these systems more user responsive. The traditional IR model, with all its strengths, had a serious weakness: it did not depict the rich and varied interaction processes. Thus, several IR interaction models have been proposed. In 19961 proposed a stratified model that views the interaction as a dialogue between participants, user and 'computer' (system) through an interface at a surface level; furthermore, each of the participants are depicted as having different levels or strata. On the user side elements involve at least these levels: cognitive, affective, and situational. On the 'computer' side there are at least engineering, processing, and content levels. Interaction is the interplay between various levels. This general model is now extended to encompass specific processes or phenomena that play a crucial role in IR interaction: the notion of relevance, user modeling., selection of search terms, and feedback types. Examples from a large study of interaction are used to illustrate these extensions. Suggestions for further research are made.
Conference Paper
Article
Written communication of ideas is carried out on the basis of statistical probability in that a writer chooses that level of subject specificity and that combination of words which he feels will convey the most meaning. Since this process varies among individuals and since similar ideas are therefore relayed at different levels of specificity and by means of different words, the problem of literature searching by machines still presents major difficulties. A statistical approach to this problem will be outlined and the various steps of a system based on this approach will be described. Steps include the statistical analysis of a collection of documents in a field of interest, the establishment of a set of “notions” and the vocabulary by which they are expressed, the compilation of a thesaurus-type dictionary and index, the automatic encoding of documents by machine with the aid of such a dictionary, the encoding of topological notations (such as branched structures), the recording of the coded information, the establishment of a searching pattern for finding pertinent information, and the programming of appropriate machines to carry out a search.
Article
Users of Web search engines are often forced to sift through the long ordered list of document `snippets' returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on most major search engines. The NorthernLight search engine organizes its output into `custom folders' based on pre-computed document labels, but does not reveal how the folders are generated or how well they correspond to users' interests. In this paper, we introduce Grouper, an interface to the results of the HuskySearch meta-search engine, which dynamically groups the search results into clusters labeled by phrases extracted from the snippets. In addition, we report on the first empirical comparison of user Web search behavior on a standard ranked-list presentation versus a clustered presentation. By analyzing HuskySearch logs, we are able to demonstrate substantial differences in the number of documents followed, and in the amount of time and effort expended by users accessing search results through these two interfaces.
Article
In this article we describe the primary goals of the Open Video Digital Library (OVDL), its evolution and current status. We provide overviews of the OVDL user interface research and user studies we have conducted with it, and we outline our plans for future Open Video related activities.