P. Raghavan’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


Introduction to information retrieval, chapt
  • Article

January 2008

·

145 Reads

·

7,474 Citations

C.D. Manning

·

P. Raghavan

·

H. Schütze


Citations (3)


... where w ji is the weight of the term t j on the document d i , which quantifies the importance of the term on the document. To compute the weights w ji , one of the most common approaches is the method known as Term Frequency-Inverse Document Frequency (TF-IDF) [26,17]. This method evaluates two aspects: the frequency of a term t on the document d, tf (t, d) , and the inverse frequency of t on the corpus of documents, idf (t, D), quantifying its global importance. ...

Reference:

Offline topic detection and clustering for time limited events in Twitter
Scoring, term weighting, and the vector space model
  • Citing Article
  • January 2008

... In second place, and with the purpose of creating a cleaner corpus, the text of all the messages underwent pre-processing, with the help of Python libraries, which included the habitual procedures of tokenization, lemmatization, n-grams identification, and stopword removal (Heydt, 2018;Kirilenko et al., 2021;Laureate et al., 2023;Maier et al., 2018;Manning, Raghavan, & Schütze, 2008;Vázquez, Pereira-Delgado, Cid-Sueiro, & Arenas-García, 2022). In concrete, the following steps were followed: a) Tokenization with removal of short (fewer than 4 symbols) and long (more than 25 symbols) tokens. ...

Introduction to information retrieval, chapt
  • Citing Article
  • January 2008

... Given the availability of multiple LLMs and the lack of prior investigation into their applicability to NVD data, we develop three variants of ChatNVD with three different widely adopted models: GPT-4o mini by OpenAI [30], Gemini 1.5 Pro by Google [31], and Llama 3 by Meta [32]. The models are trained using the term frequencyinverse document frequency (TF-IDF) embedding technique, chosen for its computational efficiency, lower cost, and faster processing time [33]. High-quality embeddings for the entire NVD dataset (720.7 MB) were deemed too costly and timeintensive, making TF-IDF a suitable alternative for tasks focused on CVE IDs and term significance. ...

Introduction to Information Retrieval
  • Citing Book
  • January 2008