ArticlePDF Available

Searching the Enterprise

Authors:

Abstract and Figures

Search has become ubiquitous but that does not mean that search has been solved. Enterprise search, which is broadly speaking the use of information retrieval technology to find information within organisations, is a good example to illustrate this. It is an area that is of huge importance for businesses, yet has attracted relatively little academic interest. This monograph will explore the main issues involved in enterprise search both from a research as well as a practical point of view. We will first plot the landscape of enterprise search and its links to related areas. This will allow us to identify key features before we survey the field in more detail. Throughout the monograph we will discuss the topic as part of the wider information retrieval research field, and we use Web search as a common reference point as this is likely the search application area that the average reader is most familiar with.
Content may be subject to copyright.
A preview of the PDF is not available
... However, it has been reported that most organisations do not have the resources to look at their search logs [32]. Enterprise search queries are short [32,33], and some studies reporting 79% of all queries are two terms or fewer [34]. It has been found that 80% of users tended to use the search system and immediately stop, termed 'casual unsophisticated users' which may be related to lookup/known item 'fact seekers' and people using search for 'bookmarking' [35]. ...
... These were termed 'knowledgeable' and 'intensive' users who may prefer recall over precision. Combining and synthesising the literature [2,7,18,32,33,35,37,38,39], Table 1 shows the typical work tasks and related information tasks serviced by enterprise search as a 'one size fits all' information system. ...
Article
Full-text available
COVID-19 has created unprecedented organisational challenges, yet no study has examined the impact on information search. A case study in a knowledge-intensive organisation was undertaken on 2.5 million search queries during the pandemic. A surge of unique users and COVID-19 search queries in March 2020 may equate to ‘peak uncertainty and activity’, demonstrating the importance of corporate search engines in times of crisis. Search volumes dropped 24% after lockdowns; an ‘L-shaped’ recovery may be a surrogate for business activity. COVID-19 search queries transitioned from awareness, to impact, strategy, response and ways of working that may influence future search design. Low click through rates imply some information needs were not met and searches on mental health increased. In extreme situations (i.e. a pandemic), companies may need to move faster, monitoring and exploiting their enterprise search logs in real time as these reflect uncertainty and anxiety that may exist in the enterprise.
... Systematically captured and stored workflow data can support mechanisms to ensure the quality and validity of data that can be examined or reviewed in the context of its source and transformations over time (Khan et al. 2016;Kruschwitz and Hull 2017). Workflow data allows users to track provenance information such as evolution. ...
Article
Full-text available
Open-source intelligence is a rapidly expanding area of the security and intelligence industry, involving the collection of internet located open data from various sources, turning that data into actionable intelligence, which is reused where possible and relevant. While creating or processing the raw input data capturing and managing the corresponding provenance information, e.g., workflow, state, raw evidence, reports, and summaries, that simplifies its retrieval and reuse is essential. In comparison, scientific workflows and tools that support them are routinely used in the majority of academic research disciplines, managing diverse sets of data resources and their provenance. Based on the techniques established within the academic community, we have developed a system for managing this open-source intelligence data and associated provenance information. This will enhance the efficiency of retrieving stored data products and reusing them to support intelligence-led security decision-making. The open-source intelligence company partnered within this project has an operational envelope that includes collecting and analyzing personal subject information. Therefore, they must understand the scope of their data holdings appropriately, especially in light of obligations under the General Data Protection Regulation. The system developed allows for tracking requests for intelligence products, ownership of the collection, analysis and generation of intelligence briefs, and tracking the delivery of those final products to the customer for future billing. This adds further layers of efficiency to operations and hence reduces the costs of producing intelligence products.
... Generic search tools, such as desktop search, database search, and web search, can help engineers use specific keywords to find experts and relevant documents, such as material information, technical reports, and patents. Enterprise Search supports the search of documents within the organization that contains relevant information and people with the right expertise [30]. PLM/PDM systems, e.g., Teamcenter, enable engineers to find different kinds of product information, for instance, CAD drawings, BOM data, parts information, and manufacturing instructions, across more domains and departments. ...
Article
Full-text available
Product design is crucial for product success. Many approaches can improve product design quality, such as concurrent engineering and design for X. This study focuses on applying product usage information (PUI) during product development. As emerging technologies become widespread, an enormous amount of product-related information is available in the middle of a product’s life, such as customer reviews, condition monitoring, and maintenance data. In recent years, the literature describes the application of data analytics technologies such as machine learning to promote the integration of PUI during product development. However, as of today, PUI is not efficiently exploited in product development. One of the critical issues to achieve this is identifying and integrating task-relevant PUI fit for purposes of different product development tasks. Nevertheless, preparing task-relevant PUI that fits different product development tasks is often ignored. This study addresses this research gap in preparing task-relevant PUI and rectifies the related shortcomings and challenges. By considering the context in which PUI is utilized, this paper presents a systematic procedure to help identify and specify developers’ information needs and propose relevant PUI fitting the actual information needs of their current product development task. We capitalize on an application scenario to demonstrate the applicability of the proposed approach.
... Unlike search for leisure or personal interest there is a vast area of search contexts which are found in a work environment. Professional search falls into that scope, i.e. search over domain-specific document collections and often with search tasks that are recalloriented rather than precision-focused (Kruschwitz and Hull, 2017;Verberne et al., 2019). Beyond applications where such search effort can directly be measured in financial terms (e.g. in patent search, e-discovery or the compilation of systematic reviews) there are many other fields where these costs are more implicit, e.g. in the area of genocide studies that rely on the analysis of vast quantities of different resources (Bachman, 2020;Hinton, 2012). ...
Preprint
Full-text available
Recent progress in natural language processing has been impressive in many different areas with transformer-based approaches setting new benchmarks for a wide range of applications. This development has also lowered the barriers for people outside the NLP community to tap into the tools and resources applied to a variety of domain-specific applications. The bottleneck however still remains the lack of annotated gold-standard collections as soon as one's research or professional interest falls outside the scope of what is readily available. One such area is genocide-related research (also including the work of experts who have a professional interest in accessing, exploring and searching large-scale document collections on the topic, such as lawyers). We present GTC (Genocide Transcript Corpus), the first annotated corpus of genocide-related court transcripts which serves three purposes: (1) to provide a first reference corpus for the community, (2) to establish benchmark performances (using state-of-the-art transformer-based approaches) for the new classification task of paragraph identification of violence-related witness statements, (3) to explore first steps towards transfer learning within the domain. We consider our contribution to be addressing in particular this year's hot topic on Language Technology for All.
... These distinct forms of expertise are generally found in different groups of individuals with complementary perspectives (see e.g. Kruschwitz and Hull (2017)). ...
... These distinct forms of expertise are generally found in different groups of individuals with complementary perspectives (see e.g. Kruschwitz and Hull (2017)). ...
Preprint
Full-text available
Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.
... While this controlled separation from the external platforms offers the opportunity for novel educational experiences, the heterogeneous nature of signals and structures poses the question of how to join them up. This is all conceptually similar to some of the major challenges and opportunities of enterprise and intranet search compared to general web search (Kruschwitz and Hull, 2017;Hawking, 2010). ...
Preprint
Full-text available
Social media (SM) have become an integral part of our lives, expanding our inter-linking capabilities to new levels. There is plenty to be said about their positive effects. On the other hand however, some serious negative implications of SM have repeatedly been highlighted in recent years, pointing at various SM threats for society, and its teenagers in particular: from common issues (e.g. digital addiction and polarization) and manipulative influences of algorithms to teenager-specific issues (e.g. body stereotyping). The full impact of current SM platform design -- both at an individual and societal level -- asks for a comprehensive evaluation and conceptual improvement. We extend measures of Collective Well-Being (CWB) to SM communities. As users' relationships and interactions are a central component of CWB, education is crucial to improve CWB. We thus propose a framework based on an adaptive "social media virtual companion" for educating and supporting the entire students' community to interact with SM. The virtual companion will be powered by a Recommender System (CWB-RS) that will optimize a CWB metric instead of engagement or platform profit, which currently largely drives recommender systems thereby disregarding any societal collateral effect. CWB-RS will optimize CWB both in the short term, by balancing the level of SM threat the students are exposed to, as well as in the long term, by adopting an Intelligent Tutor System role and enabling adaptive and personalized sequencing of playful learning activities. This framework offers an initial step on understanding how to design SM systems and embedded educational interventions that favor a more healthy and positive society.
... Research conducted on NLP revolves around search, especially enterprise search. Enterprise search is the organized retrieval of structured and unstructured data within an organization (Kruschwitz and Hull, 2017). Further, produces content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience. ...
Book
Physical and behavioral biometric technologies such as fingerprinting, facial recognition, voice identification, etc. have enhanced the level of security substantially in recent years. Governments and corporates have employed these technologies to achieve better customer satisfaction. However, biometrics faces major challenges in reducing criminal, terrorist activities and electronic frauds, especially in choosing appropriate decision-making algorithms. To face this challenge, new developments have been made, that amalgamate biometrics with artificial intelligence (AI) in decision-making modeling. Advanced software algorithms of AI, processing information offered by biometric technology, achieve better results. This has led to growth in the biometrics technology industry, and is set to increase the security and internal control operations manifold. This book provides an overview of the existing biometric technologies, decision-making algorithms and the growth opportunity in biometrics. The book proposes a throughput model, which draws on computer science, economics and psychology to model perceptual, informational sources, judgmental processes and decision choice algorithms. It reviews how biometrics might be applied to reduce risks to individuals and organizations, especially when dealing with digital-based media.
Article
Purpose The purpose of the study is to examine enterprise searching practices across different work areas and work tasks in an enterprise search system in an international biotechnology company. Design/methodology/approach A mixed-method approach studying employees' authentic search activities during a 4-month period by log data, questionnaire survey and interviews. The log data analysed the entire active searcher group, whereas the questionnaire and interviews focused on frequent searchers. Findings The three studies provided insight into the searching activities and an understanding of the way searchers used the enterprise search system to search for information as part of their work tasks. The data identified three searcher groups, each with specific search characteristics. Four work task types were identified, and for all four types the searchers applied a tracing searching technique with use of contextual and historical relationships as paths. Practical implications The findings point to the importance of knowledge on historical and contextual relations in enterprise search. Originality/value The work sheds new light on enterprise searchers' information search practices. A significant contribution is the identification of a tracing search method used in relation to four essential work task types. Another contribution is the importance of historical and contextual knowledge to support the tracing search and decide what paths to follow.
Book
Information seeking is a fundamental human activity. In the modern world, it is frequently conducted through interactions with search systems. The retrieval and comprehension of information returned by these systems is a key part of decision making and action in a broad range of settings. Advances in data availability coupled with new interaction paradigms, and mobile and cloud computing capabilities, have created a broad range of new opportunities for information access and use. In this comprehensive book for professionals, researchers, and students involved in search system design and evaluation, search expert Ryen White discusses how search systems can capitalize on new capabilities and how next-generation systems must support higher order search activities such as task completion, learning, and decision making. He outlines the implications of these changes for the evolution of search evaluation, as well as challenges that extend beyond search systems in areas such as privacy and societal benefit. Discusses many new technologies and their role in the search process. Covers important issues involving data availability and privacy. Talks about these issues in depth, educating searchers in the benefits and potential costs involved in using big (and small) data. Combines research from information retrieval, information science, and human-computer interaction.
Conference Paper
Email is still among the most popular online activities. People spend a significant amount of time sending, reading and responding to email in order to communicate with others, manage tasks and archive personal information. Most previous research on email is based on either relatively small data samples from user surveys and interviews, or on consumer email accounts such as those from Yahoo! Mail or Gmail. Much less has been published on how people interact with enterprise email even though it contains less automatically generated commercial email and involves more organizational behavior than is evident in personal accounts. In this paper, we extend previous work on predicting email reply behavior by looking at enterprise settings and considering more than dyadic communications. We characterize the influence of various factors such as email content and metadata, historical interaction features and temporal features on email reply behavior. We also develop models to predict whether a recipient will reply to an email and how long it will take to do so. Experiments with the publicly-available Avocado email collection show that our methods outperform all baselines with large gains. We also analyze the importance of different features on reply behavior predictions. Our findings provide new insights about how people interact with enterprise email and have implications for the design of the next generation of email clients.
Article
Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media (such as blog articles, forum posts, product reviews, and tweets). This has led to an increasing demand for powerful software tools to help people manage and analyze vast amounts of text data effectively and efficiently. Unlike data generated by a computer system or sensors, text data are usually generated directly by humans, and capture semantically rich content. As such, text data are especially valuable for discovering knowledge about human opinions and preferences, in addition to many other kinds of knowledge that we encode in text. In contrast to structured data, which conform to well-defined schemas (thus are relatively easy for computers to handle), text has less explicit structure, requiring computer processing toward understanding of the content encoded in text. The current technology of natural language processing has not yet reached a point to enable a computer to precisely understand natural language text, but a wide range of statistical and heuristic approaches to management and analysis of text data have been developed over the past few decades. They are usually very robust and can be applied to analyze and manage text data in any natural language, and about any topic. This book provides a systematic introduction to many of these approaches, with an emphasis on covering the most useful knowledge and skills required to build a variety of practically useful text information systems. Because humans can understand natural languages far better than computers can, effective involvement of humans in a text information system is generally needed and text information systems often serve as intelligent assistants for humans. Depending on how a text information system collaborates with humans, we distinguish two kinds of text information systems. The first is information retrieval systems which include search engines and recommender systems; they assist users in finding from a large collection of text data the most relevant text data that are actually needed for solving a specific application problem, thus effecively turning big raw text data into much smaller relevant text data that can be more easily processed by humans. The second is text mining application systems; they can assist users in analyzing patterns in text data to extract and discover useful actionable knowledge directly useful for task completion or decision making, thus providing more direct task support for users. This book covers the major concepts, techniques, and ideas in information retrieval and text data mining from a practical viewpoint, and includes many hands-on exercises designed with a companion software toolkit (i.e., MeTA) to help readers learn how to apply techniques of information retrieval and text mining to real-world text data and how to experiment with and improve some of the algorithms for interesting application tasks. This book can be used as a textbook for computer science undergraduates and graduates, library and information scientists, or as a reference book for practitioners working on relevant problems in managing and analyzing text data.
Conference Paper
Online visitors often do not find the content they were expecting on specific pages of a large enterprise website, and subsequently search for it in site’s search box. In this paper, we propose methods to leverage website search logs to identify missing or expected content on webpages on the enterprise website, while showing how several scenarios make this a non-trivial problem. We further discuss how our methods can be easily extended to address concerns arising from the identified missing content.
Chapter
This chapter introduces patent search in a way that should be accessible and useful to both researchers in information retrieval and other areas of computer science and professionals seeking to broaden their knowledge of patent search. It gives an overview of the process of patent search, including the different forms of patent search. It goes on to describe the differences among different domains of patent search (engineering, chemicals, gene sequences and so on) and the tools currently used by searchers in each domain. It concludes with an overview of open issues.
Article
multiple heterogeneous search services in a unified interface-A single query box and a common presentation of results. In the web search domain, aggregated search systems are responsible for integrating results from specialized search services, or verticals, alongside the core web results. For example, search portals such as Google, Bing, and Yahoo! provide access to vertical search engines that focus on different types of media (images and video), different types of search tasks (search for local businesses and online products), and even applications that can help users complete certain tasks (language translation and math calculations). Aggregated search systems perform two mains tasks. The first task (vertical selection) is to predict which verticals (if any) to present in response to a user's query. The second task (vertical presentation) is to predict where and how to present each selected vertical alongside the core web results. The goal of this work is to provide a comprehensive summary of previous research in aggregated search. We first describe why aggregated search requires unique solutions. Then, we discuss different sources of evidence that are likely to be available to an aggregated search system, as well as different techniques for integrating evidence in order to make vertical selection and presentation decisions. Next, we survey different evaluation methodologies for aggregated search and discuss prior user studies that have aimed to better understand how users behave with aggregated search interfaces. Finally, we review different advanced topics in aggregated search.