ArticlePDF Available

Performance Evaluation of search engines via user efforts Performance Evaluation of search engines via user efforts Performance Evaluation of search engines via user efforts Performance Evaluation of search engines via user efforts measures measures measures measures

Authors:

Abstract and Figures

Many metrics exist to perform the task of search engine evaluation that are either looking for the experts judgments or believe in searchers decisions about the relevancy of the web documents. However, search logs can provide us information about how real users search. This paper explains, our attempts to incorporate the users searching behavior in formulation of user efforts centric evaluation metric. We also incorporate two dimensional users traversing approach in the ERR metric. After the formulation of the evaluation metric, authors judge its goodness and found that presented metric fulfills all the requirements that are needed for a metric to be mathematically accurate. The findings obtained from experiments, present a complete description for search engine evaluation procedure.
Content may be subject to copyright.
A preview of the PDF is not available
... Some common performance metrics include precision, recall and classification accuracy in assessing search engines' performance (Ajayi and Elegbeleye, 2014). Therefore, automated methods have been developed using click-through data analysis and metrics about user behaviour (Liu et al., 2007;Goutam and Dwivedi, 2012). Search engine optimization (SEO) tools follow the same automated methods that evaluate content quality, technical performance and off-page factors (Rachita, 2024) of a website by using the relevance of keywords to rank the website but that could not include human intellect to understand the preciseness of the queries and result. ...
Article
Purpose The study aims to calculate the recall ratio of selected MSEs and provide a comprehensive ranking for MSEs using features, precision and recall analysis. Design/methodology/approach The study was divided into three consecutive sections: Keyword selections and checking demographic searchability; recall calculation among the MSEs and third calculating the Equal Weighted Score by allotting equal weight (0.25) to all MSEs to rank the MSEs based on the re-ranking aggregation approach. Findings The study clearly shows all the four MSEs considered—Dogpile, Metacrawler, DuckDuckGo and Startpage—Metacrawler (71%) ranked highest for recall, followed by DuckDuckGo (68%), Dogpile (63%) and Startpage (60%). The re-ranking aggregation approach results show DuckDuckGo (2) ranked 1st, followed by Startpage (2.5), Dogpile (2.75) and Metacrawler (2.75); lower scores indicate better performance. The findings indicate that DuckDuckGo is the best MSE regarding user experience (UX) and search quality. Research limitations/implications The study used a re-ranking aggregation approach confined to past rankings and limited to four MSEs, limiting its generalizability. Practical implications The finding helps users and developers understand the strengths and weaknesses of the different MSEs, enabling more informed decision-making and enhancing UX. Originality/value The study selected a novel approach for assessing the MSEs, and no similar study conducted in the past used different performance metrics to rank the MSEs.
Article
Full-text available
Interactive information retrieval (IIR) interfaces are commonly evaluated using questionnaires that collect post-task subjective measures such as satisfaction, ease of use, usefulness, and user engagement. Although the importance of measuring emotional responses during the search process has been recognized, incorporating this aspect into IIR user studies has been challenging. We have developed a novel method to capture real-time emotional responses based on advances in facial emotion classification approaches. We utilize consumer-grade front-facing cameras to collect emotional responses, which synchronize with the user’s interactions with the search interface. In a controlled laboratory study, the relevance of search results was manipulated to validate the approach’s effectiveness and explore how search results’ relevance impacts users’ emotional responses, post-task evaluations of the search interface, and interactions with search interface features. This enabled us to examine whether we could detect emotional responses, whether recency effects were observed in post-task evaluations, and whether feature use correlated with emotional responses. The study was conducted in the context of exploratory search within an academic digital library. The results of this study demonstrate that both positive and negative emotional responses can be reliably detected during the search process. There is evidence of recency effects in post-task measures, and the study identifies specific interactive features used during the experience of positive and negative emotional responses. This serves as a foundation for the use of emotional responses to supplement post-task survey data when evaluating search interfaces.
Article
The year 2020 brought a big concern for the global community because of COVID-19, which affected every sector of society, and tourism is no exception. Researchers across the globe are publishing their studies related to different dimensions of tourism in the context of COVID-19, and images have formed an essential component of their research. In tourism, images related to COVID-19 can open new dimensions for scholars. The main aim of the research is to measure the retrieval effectiveness of three image search engines (ISEs), that is, Bing Images, Google Images and Yahoo Image Search, concerning images related to COVID-19 and tourism. The study attempts to identify the capability of the ISEs to retrieve the desired and actual images related to COVID-19 and tourism. The PubMed Central (PMC) Database was consulted to retrieve the desired images and develop a testbed. The advanced search feature of PMC Database was explored by typing the search terms 'COVID-19' and 'Tourism' using 'AND' operator to make the search more comprehensive. Both the terms were searched against the 'Figure/Table' caption to retrieve papers carrying images related to COVID-19 and tourism. on more than one occasion. In contrast, Bing Images retrieves the original image at the first rank in two instances. Yahoo Images performs poorly over this metric as it does not retrieve any original image at the first rank on any other instance. The study cannot be generalised as the scope is only limited to the images indexed by PMC. Furthermore, the retrieval effectiveness of only three ISEs is measured. The study is the first to measure the retrieval effectiveness of ISEs in retrieving images related to the COVID-19 pandemic and tourism. The study can be extended across other image-indexing databases pertinent to tourism studies, and the retrieval effectiveness of other ISEs can also be considered.
Article
Full-text available
Purpose: Evaluating the Fuzzy and classical search engines' performance in Persian information retrieval to determine the false drop rate and select the best in retrieving the lowest duplicate records. Methodology: In this applied research, the semi-experimental, evaluative and comparative methods are adopted. Research samples are selected according to purposeful sampling, based on the popularity of search engines. The data collection tool is a researcher-devised checklist with 20 questions. Findings: It is revealed that Google out performs Yahoo and Bing in both the fuzzy and classical evaluations. The obtained precision ratio of the fuzzy evaluation of search engines is greater than of the classical evaluation. In both evaluations, Google, Yahoo and Bing have the lowest rate of false drop, respectively. In the fuzzy evaluation, this false drop ratio is less than of the classical evaluation. Google has the lowest and Yahoo has the highest duplicate records' count. Conclusion: The obtained findings from the fuzzy and classical evaluation reveal that Fuzzy evaluation increases the precision rate and reduces false drop in search engines. Moreover the Fuzzy evaluation provides a more accurate and realistic precision and false drop rate by introducing a spectrum of relevance rate of retrieved records. It is recommended that researchers apply fuzzy evaluation when evaluating the search engines' performance. In general, Google has better performance than Bing and Yahoo based on three measured criteria. Consequently, users are advised to apply this search engine when searching for Persian information on the web to save time and money.
Conference Paper
Full-text available
The performance evaluation of search engines continues to become an research problem. Numerous researchers tried to measure the quality of information retrieval systems with different parameters and achieved several milestones. in case of graded relevance dependent measures, the Discounted Cumulative Gain metric exists that deals with the position of the retrieved results. In this paper, we present the task of learning rankings with the help of human assessment and click hits. we combined experts judgments about the results with the users clicks hits and found that this strategy is effective to assign ranking. The proposed ranking method can be seen as an extension of Expected Reciprocal Ranking that correlates better with clicks metrics than editorial metrics.
Conference Paper
Full-text available
While numerous metrics for information retrieval are avail- able in the case of binary relevance, there is only one com- monly used metric for graded relevance, namely the Dis- counted Cumulative Gain (DCG). A drawback of DCG is its additive nature and the underlying independence assump- tion: a document in a given position has always the same gain and discount independently of the documents shown above it. Inspired by the "cascade" user model, we present a new editorial metric for graded relevance which overcomes this difficulty and implicitly discounts documents which are shown below very relevant documents. More precisely, this new metric is defined as the expected reciprocal length of time that the user will take to find a relevant document. This can be seen as an extension of the classical recipro- cal rank to the graded relevance case and we call this metric Expected Reciprocal Rank (ERR). We conduct an extensive evaluation on the query logs of a commercial search engine and show that ERR correlates better with clicks metrics than other editorial metrics.
Article
The volume of world wide web (WWW) is increasing enormously due to a world wide move to migrate information to online sources. To search some information on WWW, search engines are used, which when presented with queries, return a list of web pages ranked on the basis of estimation of relevance. Generally the search engines due to the abundance of information available on the web return millions of pages. But user studies indicate that a common user browses through top 10 or 20 documents only. So it’s all-important to get into those top 10 documents. To achieve this web authors are increasingly beginning to rely on underhand techniques to ensure their sites get seen, in turn affecting the performance of search engines. The existing measures to evaluate these systems’ performance are not adequate in the current world of highly interactive end-user systems. In this study a metric 'Ranked Precision' is proposed to evaluate the performance of search engines. http://dx.doi.org/10.14429/dbit.25.2.3649
Article
Three Web search engines, namely, Alta Vista, Excite, and Lycos, were compared and evaluated in terms of their search capabilities (e.g., Boolean logic, truncation, field search, word and phrase search) and retrieval performances (i.e., precision and response time) using sample queries drawn from real reference questions. Recall, the other evaluation criterion of information retrieval, is deliberately omitted from this study because it is impossible to assume how many relevant items there are for a particular query in the huge and ever changing Web system. The authors of this study found that Alta Vista outperformed Excite and Lycos in both search facilities and retrieval performance although Lycos had the largest coverage of Web resources among the three Web search engines examined. As a result of this research, we also proposed a methodology for evaluating other Web search engines not included in the current study.
Article
In this paper we present an analysis of a 280 GB AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents approximately 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. Furthermore we present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques might not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases. This result i...
An inquiry in testing of information retrieval systems
  • C W Cleverdon
  • J Mills
  • E M Keen
Cleverdon, C.W., Mills, J., and Keen, E.M.., "An inquiry in testing of information retrieval systems", Cranfileld, U.K.: Aslib Cranfield Research Project, College of Aeronautics, 1966, pp. 230-232.
Evaluation of Search Engines using Search Length
  • K Sanjay
  • Rajesh Dwivedi
  • Kumar Goutam
Sanjay K. Dwivedi and Rajesh Kumar Goutam , "Evaluation of Search Engines using Search Length," In Proceedings of the International Conference of computer Modeling and Simulatione, 2011, Mumbai, India, pp. 502-
Lucknow 226025 (U.P.) India. His research interest is in Artificial Intelligence, web Mining, NLP and sense disambiguation etc. He has 16 years of experience of teaching and research and has handled/involved in some government funded research projects
  • K Sanjay
  • Dwivedi
Sanjay K. Dwivedi Associate professor, Department of computer science at Babasaheb Bhimrao Ambedkar University, Lucknow 226025 (U.P.) India. His research interest is in Artificial Intelligence, web Mining, NLP and sense disambiguation etc. He has 16 years of experience of teaching and research and has handled/involved in some government funded research projects.
His research interest is in Artificial Intelligence, web Mining, NLP and sense disambiguation etc. He has 16 years of experience of teaching and research and has handled
  • K Sanjay
Sanjay K. Dwivedi Associate professor, Department of computer science at Babasaheb Bhimrao Ambedkar University, Lucknow 226025 (U.P.) India. His research interest is in Artificial Intelligence, web Mining, NLP and sense disambiguation etc. He has 16 years of experience of teaching and research and has handled/involved in some government funded research projects.