Sentiment Analysis and Sentence Classiﬁcation
in Long Book-Search Queries
Amal Htait, S´ebastien Fournier, and Patrice Bellot
Aix Marseille Univ, Universit´e de Toulon, CNRS, LIS, Marseille, France.
Abstract. Handling long queries can involve either reducing its size by
eliminating unhelpful sentences, or decomposing the long query into sev-
eral short queries based on their content. A proper sentence classiﬁcation
improves the functionality of these procedures. Can Sentiment Analysis
have an eﬀective role in sentence classiﬁcation? This paper analyses the
correlation between sentiment analysis and sentence classiﬁcation in long
book-search queries. Also, it studies the similarity in writing style be-
tween book reviews and sentences in book-search queries. To accomplish
this study, a semi-supervised method for sentiment intensity prediction,
and a language model based on book reviews are presented and used. In
addition to graphical illustrations reﬂecting the feedback of this study,
followed by interpretations and conclusions.
Keywords: sentiment intensity ·language model ·search queries ·books
·word embedding ·seed-words ·book reviews ·sentence classiﬁcation.
Social cataloging web applications store and share book catalogs and various
types of book metadata, while allowing users to search for books or seek rec-
ommendations. Its recommendation and search queries are usually destined to
humans 1, what makes them often long, descriptive, and even narrative. Users
may express their needs for a book, preferences in a type or genre of books,
opinions toward certain books, describe content or event in a book, and even
sometimes share personal information (e.g. I am a teacher).
Being able to diﬀerentiate types of sentences, in previously described long
queries, can improve in diﬀerent ways the automation of such book-search tasks.
Detecting unhelpful to search sentences in the query (e.g. Thanks for any and
all help.), can help in query reduction. And classifying sentences by the type
of information within, can be used for adapted search. For example, sentences
including good read experience, with a book title, can be oriented to a book
similarity search, but sentences including a certain topic preference should be
focusing on a topic search. And also, sentences including personal information
can be used for personalised search.
1An example of a query in LibraryThing: https://www.librarything.com/topic/4920
2 A. Htait et al.
In this work, sentence classiﬁcation is studied on two levels: the helpfulness
of the sentence by containing meaningful information for the search, and the
type of information provided by the sentence. And three types of information
are highlighted on: book titles and author names (e.g. I read ”Peter the Great
His Life and World” by Robert K. Massie.), personal information (e.g. I live in
a very conservative area), and narration of book content or story (e.g The story
opens on Elyse overseeing the wedding preparation of her female cousin.).
The default in text classiﬁcation is using terms as features, and likewise for
sentence classiﬁcation. In this work, the possibility of introducing new features
is tested. Since ”Diﬀerent types of sentences express sentiment in very diﬀerent
ways”, the correlation between sentiment in a sentence and its type is studied
to test the possibility of introducing sentiment as a feature. For this task, senti-
ment intensity is calculated, for its capacity to distinguish between sentences of
same polarity, using a semi-supervised method explained in Section 4.
In addition, sentences in a query can share similar writing style and subjects
with book reviews. Below is a part of a long book-search query:
I just got engaged about a week and a half ago and I’m looking for recommenda-
tions on books about marriage. I’ve already read a couple of books on marriage
that were interesting. Marriage A History talks about how marriage went
from being all about property and obedience to being about love and
how the divorce rate reﬂects this. The Other Woman: Twenty-one
Wives, Lovers, and Others Talk Openly About Sex, Deception, Love,
and Betrayal not the most positive book to read but deﬁnitely inter-
esting. Dupont Circle A Novel I came across at Kramerbooks in DC
and picked it up. The book focuses on three diﬀerent couples including
one gay couple and the laws issues regarding gay marriage ...
In the query example, the part in bold represent a description of speciﬁc
books content with books titles, e.g. ”Marriage A History ”, and interpretations
or personal point of view about the book with expressions like ”not the most
positive book ... but deﬁnitely interesting”. These sentences seem as book reviews
sentences. Therefore, calculating the similarity between sentences in a query and
books reviews can be a possible feature for sentence classiﬁcation, as it can help
classifying sentences with book titles. To calculate that similarity in a general
form, a reviews’ statistical language model is used to ﬁnd, for each sentence in
the query, the probability of being generated from that model (and therefore its
similarity to that model’s training dataset of reviews).
This work covers an analysis of sentence’s type correlation with its sentiment
intensity and its similarity to reviews, and it is presented as below :
–Presenting the book-search queries used for this work.
–Extracting sentiment intensity of each sentence in the queries.
–Creating a statistical language model based on reviews, and calculating the
probability for each sentence to be generated from the model.
–Presenting in graphs and analyzing the relation between language model
scores, sentiment intensity scores and the type of sentences.
Sentiment Analysis and Sentence Classiﬁcation in Long Book-Search Queries 3
2 Related Work
For the purpose of query classiﬁcation, many machine learning techniques have
been applied, including supervised , unsupervised  and semi-supervised
learning . In book-search ﬁeld, fewer studies covered query classiﬁcation. Ol-
lagnier et al.  worked on a supervised machine learning method (Support
Vector Machine) for classifying queries into the following classes: oriented (a
search on a certain subject with orienting terms), non-oriented (a search on a
theme in general), speciﬁc (a search for a speciﬁc book with an unknown title),
and non-comparable (when the search do not belong to any of the previous
classes). Their work was based on 300 annotated query from INEX SBS 20142.
But the mentioned work, and many more, processed the query classiﬁcation and
not the classiﬁcation of the sentences within the query. The length of book-
search queries created new obstacles to defeat, and the most diﬃcult obstacle is
the variety of information in its long content, which require a classiﬁcation at
the sentence level.
Sentences in general, based on their type, reveal sentiment in diﬀerent ways,
therefore, Chen et al.  focused on using classiﬁed sentences to improve sen-
timent analysis with deep machine learning. In this work, the possibility of a
reverse perspective is studied, which is the improvement of sentence classiﬁca-
tion using sentiment analysis.
In addition, this work is studying the improvement of sentence classiﬁcation
using language model technique. Language models (LM) have been successfully
applied to text classiﬁcation. In , models were created using training anno-
tated datasets and then used to compute the likelihood of generating the test
sentences. In this work, a new model is created based on book reviews and used
to compute the likelihood of generating query sentences, as a similarity measure-
ment between book reviews style and book-search query sentences type.
3 Book-search queries
The dataset of book-search queries, used in this work, is provided by CLEF -
Social Book Search Lab - Suggestion Track 3. The track provides realistic search
queries, destined for humans and collected from LibraryThing4.
Out of 680 user queries, from the 2014’s dataset of Social Book Search Lab, 43
queries are randomly selected based on their length, since this work focus on long
queries. These 43 queries have more than 55 words, stop-words excluded. Then,
each query is segmented into sentences, which results a total of 528 sentences.
These sentences are annotated, for this study, based on their helpfulness to the
search, and on the information they provide as: book titles and authors names,
personal information, and narration of book content. An example is shown in
the below XML extraction at Figure 1.
4 A. Htait et al.
Fig. 1: An example of annotated sentences in book-search queries.
4 Sentiment Intensity
As part of this work, sentiment intensity is calculated for each sentence of the
query. Sentiment intensity is chosen for its capability of capturing more accu-
rately the sentiment in text, as for its capacity to distinguish between sentences of
same polarity. The following method is inspired by a semi-supervised method for
sentiment intensity prediction in tweets, and it was established on the concepts
of adapted seed-words and word embedding . To note that the seed-words
are words with strong semantic orientation, chosen for their lack of sensitivity
to the context. They are used as paradigms of positive and negative semantic
orientation. And adapted seed-words are seed-words with the characteristic of
being used in a certain context or subject. Also, word embedding is a method
to represent words in high quality learning vectors, from large amounts of un-
structured and unlabeled text data, to predict neighboring words.
In the work of Htait et al. , the extracted seed-words were adapted to micro-
blogs. For example, the word cool is an adjective that refers to a moderately
low temperature and has no strong sentiment orientation, but it is often used
in micro-blogs as an expression of admiration or approval. Therefore, cool is
considered a positive seed-word in micro-blogs. In this paper, book-search is
the targeted domain for sentiment intensity prediction, therefore, the extracted
seed-words are adapted to books domain, and more speciﬁcally, extracted from
book reviews since the reviews has the richest vocabulary in the book domain.
Using annotated book reviews, as positive and negative, by Blitzer et al.5
, the list of most common words in every annotation class is collected. Then,
5Book reviews from Multi-Domain Sentiment Dataset by
Sentiment Analysis and Sentence Classiﬁcation in Long Book-Search Queries 5
after removing the stop-words, the ﬁrst 43 most relevant to book domain words,
with strong sentiment, are selected manually from each previously described
list, as positive and negative seed-words. The following is an example of the ex-
tracted seed-words, adapted to book domain, as positive: insightf ul,touching,
masterpiece, and as negative: endless,waste,unnecessary.
Word embedding, or distributed representations of words in a vector space,
are capable of capturing lexical, semantic, syntactic, and contextual similarity
between words. And to determine the similarity between two words, the measure
of cosine distance is used between the vectors of these two words in the word
embedding model. In this paper, a word embedding model is created based
on more than 22 million Amazon’s book reviews , as training dataset, after
applying a pre-processing to the corpora (e.g. tokenization, replacing hyperlinks
and emoticons, removing some characters and punctuation).
For the purpose of learning word embedding from the previously prepared
corpora (which is raw text), Word2Vec  is used with the training strategy
Skip-Gram (in which the model is given a word and it attempts to predict its
neighboring words). To train word embedding and create the models, Gensim6
framework for Python is used. And for the parameters, the models are trained
with word representations of dimension 400, a context window of one and nega-
tive sampling for ﬁve iterations (k = 5) . As a result, a model is created with
a vocabulary size of more than 2.5 million words.
Then, and for each word in the sentence, the diﬀerence between average co-
sine similarity with positive seed-words and negative seed-words represent its
sentiment intensity score, using the previously created model. For example, the
word confusing has an average cosine similarity with positive seed-words equals
to 0.2073 and an average cosine similarity with negative seed-words equals to
0.3082, what makes its sentiment intensity score equals to −0.1008 (a negative
score represent a negative feeling). And for the word young the sentiment inten-
sity score equals to 0.0729, which is rather neutral, but closer to positive than
to negative sentiment.
To predict the sentiment intensity of the entire sentence, ﬁrst the adjectives,
nouns and verbs are selected from the sentence using Stanford POS tagger ,
then the ones with high sentiment intensity are used by adding up their score to
have a total score for the sentence. Note that the created tool Adapted Sentiment
Intensity Detector (ASID), used to calculate the sentiment intensity of words, is
shared by this work’s researchers as open source 7.
5 Reviews’ language model
The book reviews are considered a reference in sentence’s characteristic detec-
tion, since a similarity in style is noticed between certain sentences of user queries
and the reviews. To calculate this similarity in writing style, a statistical language
6 A. Htait et al.
modeling approach is used to compute the likelihood of generating a sentence of
a query from a book reviews language model. Such method is unsupervised and
does not require an annotated dataset.
The statistical language modeling were originally introduced by Collins in
, and it is the science of building models that estimate the prior probabilities
of various linguistic units . It makes it possible to easily consider taking
into account large linguistic units, like bigrams and trigrams. The model can be
presented as θR=P(wi|R) with i∈[1,|V|], where P(wi|R) is the probability of
word wiin the reviews corpora R, and |V|is the size of the vocabulary. And this
model is used to denote the probability of a word according to the distribution
as P(wi|θR) .
The probability of a sentence Wto be generated from a book reviews lan-
guage model θRis deﬁned as the following conditional probability P(W|θD) ,
which is calculated as following:
where Wis a sentence, wiis a word in the sentence W, and θRrepresents
the book reviews model.
The tool SRILM8 is used to create the model from book reviews dataset
(as training data), and to compute the probability of sentences in queries to
be generated from the model (as test data). The language model is created as
a standard language model of trigram and Good-Turing discounting (or Katz)
for smoothing, based on 22 million of Amazon’s book reviews , as training
The tool SRILM oﬀers details in the diagnostic output like the number of
words in the sentence, the sentence likelihood to model or the logarithm of
likelihood by logP (W|θR), and the perplexity which is the inverse probability of
the sentence normalized by the number of words, as shown in Equation 2. In this
paper, the length of sentences vary from one word to almost 100 words, therefore
the score of perplexity seems more reliable for a comparison between sentences.
To note that minimizing perplexity is the same as maximizing probability of
likelihood, and a low perplexity indicates the probability distribution is good at
predicting the sample.
with m as the number of words.
Sentiment Analysis and Sentence Classiﬁcation in Long Book-Search Queries 7
6 Scores representation in graphs
As previously explained in Section 3, a corpora of 528 sentences from user queries
is created and annotated as the examples in Figure 1. Then, for each sentence
the sentiment intensity score and the perplexity score are calculated following
the methods previously explained in Sections 4 and 5. To present the scores,
Violin plots are used for their ability to show the probability density of the data
at diﬀerent values. Also, they include a marker (white dot) for the median of the
data and a box (black rectangle) indicating the interquartile range.
6.1 Sentiment intensity, perplexity and helpfulness correlation
The graph in Figure 2 shows the distribution (or probability density) of senti-
ment intensity between two categories of sentences: on the right the sentences
which are helpful to the search and on the left the sentences which are unhelpful
to the search (noise). The shape on the left is horizontally stretched compared
to the right one, and mostly dilated over the area of neutral sentiment intensity
(sentiment score = 0), where also exist the median of the data. On the other
hand, the shape on the right is vertically stretched, showing the diversity in
sentiment intensity in the helpful to search sentences, but concentrated mostly
in the positive area, at sentiment score higher than zero but lower than 0.5.
(a) Unhelpful (b) Helpful
Fig. 2: The distribution of sentiment intensity between two categories of sen-
tences: on the right the sentences which are helpful to the search and on the left
the sentences which are unhelpful to the search.
The graph in Figure 3 represent the distribution of perplexity between
two categories of sentences: on the right the sentences which are helpful to the
search and on the left the sentences which are unhelpful to the search (noise).
Both shapes are vertically compressed and dilated over the area of low perplexity.
But the graph on the right, of the helpful sentences, shows the median of the
data on a lower level of score of perplexity, than the left graph. Explained by
the slightly horisontal dilation of the left graph above the median level.
8 A. Htait et al.
(a) Unhelpful (b) Helpful
Fig. 3: The distribution of perplexity between two categories of sentences: on the
right the sentences which are helpful to the search and on the left the sentences
which are unhelpful to the search.
6.2 Sentiment intensity, perplexity and information type correlation
The graphs in Figure 4 shows the distribution of sentiment on sentences based
on their type of information. The graphs are described below consecutively, from
top to bottom, by information type:
–Book titles and authors names: on the right the sentences with books titles
or authors names, and on the left the sentences without books titles and
authors names. The graph on the right shows a high distribution of positive
sentiment, but the left graph shows a high concentration on neutral senti-
ment with a small distribution for positive and negative sentiment. Also, It
is noticed the lack of negative sentiment in sentences with books titles or
–Personal information: on the right the sentences containing personal infor-
mation about the user, and on the left the sentences without personal in-
formation. The graph on the right shows a high concentration on neutral
sentiment, where also exist the median of the data, and then a smaller dis-
tribution in positive sentiment. On the left, the graph shows a lower concen-
tration on neutral sentiment, but it is noticeable the existence of sentences
with extremely high positivism.
–Narration of book content: on the right the sentences containing book content
or events, and on the left the sentences without book content. Both graphs
are vertically stretched but have diﬀerent shapes. The graph on the right
shows a higher distribution of negative sentiment as for sentences with book
content, and the graph on the left shows higher positive values.
The graphs in Figure 5 shows the distribution of perplexity between the in-
formational sentences, consecutively from top to bottom: Book titles and authors
names, Personal information and Narration of book content. When comparing
the ﬁrst set of graphs, of book titles and authors names, the left graph has its
median of data on a lower perplexity level than the right graph, with a higher
Sentiment Analysis and Sentence Classiﬁcation in Long Book-Search Queries 9
(a) No Books Titles (b) Books Titles
(c) No Personal Info (d) Personal Info
(e) No Books Content (f) Books Content
Fig. 4: The distribution of Sentiment between the informational categories of
sentences: Books titles or authors names, Personal information and Narration of
concentration of data in a tighter interval of perplexity. For the second sets
of graphs, of personal information, the right graph shows a lower interquartile
range than the left graph. As for the third set of graphs, of book content, a slight
diﬀerence can be detected between the two graphs, where the left graph is more
10 A. Htait et al.
(a) No Books Titles (b) Books Titles
(c) No Personal Info (d) Personal Info
(e) No Books Content (f) Books Content
Fig. 5: The distribution of perplexity between the informational categories of
sentences: Books titles or authors names, Personal information and Narration of
6.3 Graphs interpretation
Observing the distribution of data in the graphs of the previous sections, many
conclusions can be extracted:
–In Figure 2, it is clear that unhelpful sentences tend to have high level of
emotions (positive or negative), but unhelpful sentences (noise) are more
probable to be neutral.
Sentiment Analysis and Sentence Classiﬁcation in Long Book-Search Queries 11
–The Figure 3 shows that sentences with high perplexity, which means they
are not similar to book reviews sentences, have a higher probability of being
unhelpful sentence than helpful.
–The Figure 4 gives an idea of sentiment correlation with sentences informa-
tion: sentences with book titles or author names have a high level of positive
emotions, but sentences with personal information tend to be neutral. And
sentences with book content narration are distributed over the area of emo-
tional moderate level, with a higher probability of positive than negative.
–The Figure 5 gives an idea of the correlation between the sentences informa-
tion and their similarity to reviews: sentences with no book titles are more
similar to reviews than the ones with book titles. Also, sentences with per-
sonal information tend to be similar to reviews. And sentences with book
content narration show a slight more similarity with reviews sentences style
than the sentences with no book content narration.
7 Conclusion and Future work
This paper analyses the relation between sentiment intensity and reviews similar-
ity toward sentences types in long book-search queries. First, by presenting the
user queries and books collections, then extracting the sentiment intensity of each
sentence of the queries (using Adapted Sentiment Intensity Detector (ASID)).
Followed by creating a statistical language model based on reviews, and calcu-
lating the probability of each sentence being generated from that model. And
ﬁnally by presenting, in graphs, the relation between sentiment intensity score,
language model score, and the type of sentences.
The graphs show that sentiment intensity can be an important feature to
classify the sentences based on their helpfulness to the search. Since unhelpful
sentences (or noise sentences) are more probable to be neutral in sentiment, than
helpful sentences. Also, the graphs show that sentiment intensity can also be an
important feature to classify the sentences based on the information within. It
is clear in the graphs, that the sentences containing book titles are richer in
sentiment and mostly positive compared to sentences not containing book titles.
In addition, the graphs show that sentences with personal information tend to
be neutral, in a higher probability than those with no personal information.
On the other hand, the graphs show that the similarity of sentences to re-
views style can also be a feature to classify sentences by helpfulness and by their
information content, but in a slightly lower level of importance than sentiment
analysis. Similarity between sentences and book reviews style is higher for help-
ful sentences, for sentences with personal information and for sentences with
narration of book content, but not for sentences containing book titles.
The previous analysis and conclusions gives a preview on the role that senti-
ment analysis and similarity to reviews can play in sentence classiﬁcation of long
book-search queries. The next task would be to test these conclusions by using
sentiment analysis and similarity to reviews, as new features, in a supervised
machine learning classiﬁcation of sentences in long book-search queries.
12 A. Htait et al.
Acknowledgments This work has been supported by the French State, man-
aged by the National Research Agency under the ”Investissements d’avenir”
program under the EquipEx DILOH project (ANR-11-EQPX-0013).
1. Bai, J., Nie, J.Y., Paradis, F.: Using language models for text classiﬁcation. In:
Proceedings of the Asia Information Retrieval Symposium, Beijing, China (2004)
2. Beitzel, S.M., Jensen, E.C., Frieder, O., Lewis, D.D., Chowdhury, A., Kolcz, A.:
Improving automatic query classiﬁcation via semi-supervised learning. In: Data
Mining, Fifth IEEE international Conference on. pp. 8–pp. IEEE (2005)
3. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and
blenders: Domain adaptation for sentiment classiﬁcation. In: Proceedings of the
45th annual meeting of the association of computational linguistics (2007)
4. Chen, T., Xu, R., He, Y., Wang, X.: Improving sentiment analysis via sentence
type classiﬁcation using bilstm-crf and cnn. Expert Systems with Applications pp.
221 – 230 (2017)
5. Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceed-
ings of the eighth conference on European chapter of the Association for Compu-
tational Linguistics. pp. 16–23. Association for Computational Linguistics (1997)
6. Diemert, E., Vandelle, G.: Unsupervised query categorization using automatically-
built concept graphs. In: Proceedings of the 18th international conference on World
wide web. pp. 461–470. ACM (2009)
7. He, R., McAuley, J.: Ups and downs: Modeling the visual evolution of fashion
trends with one-class collaborative ﬁltering. In: proceedings of the 25th interna-
tional conference on world wide web. pp. 507–517. International World Wide Web
Conferences Steering Committee (2016)
8. Htait, A., Fournier, S., Bellot, P.: Lsis at semeval-2017 task 4: Using adapted
sentiment similarity seed words for english and arabic tweet polarity classiﬁca-
tion. In: Proceedings of the 11th International Workshop on Semantic Evaluation
(SemEval-2017). pp. 718–722 (2017)
9. Kang, I.H., Kim, G.: Query type classiﬁcation for web document retrieval. In:
Proceedings of the 26th annual international ACM SIGIR conference on Research
and development in informaion retrieval. pp. 64–71. ACM (2003)
10. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Eﬃcient estimation of word repre-
sentations in vector space. arXiv preprint arXiv:1301.3781 (2013)
11. Ollagnier, A., Fournier, S., Bellot, P.: Analyse en d´ependance et classiﬁcation de
requˆetes en langue naturelle, application `a la recommandation de livres. Traitement
Automatique des Langues (2015)
12. Stolcke, A.: Srilm-an extensible language modeling toolkit. In: Seventh interna-
tional conference on spoken language processing (2002)
13. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech
tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference
of the North American Chapter of the Association for Computational Linguistics
on Human Language Technology-Volume 1. pp. 173–180. Association for Compu-
tational Linguistics (2003)
14. Zhai, C.: Statistical language models for information retrieval. Synthesis Lectures
on Human Language Technologies 1(1), 1–141 (2008)