Conference PaperPDF Available

The influence of search engine optimization on Google's results:: A multi-dimensional approach for detecting SEO

Authors:

Figures

Content may be subject to copyright.
The influence of search engine optimization on Google’s results:
A multi-dimensional approach for detecting SEO
Dirk Lewandowski
Department of Information,
Hochschule für Angewandte
Wissenschaften Hamburg, Hamburg,
Germany
dirk.lewandowski@haw-hamburg.de
Sebastian Sünkler
Department of Information,
Hochschule für Angewandte
Wissenschaften Hamburg, Hamburg,
Germany
sebastian.suenkler@haw-
hamburg.de
Nurce Yagci
Department of Information,
Hochschule für Angewandte
Wissenschaften Hamburg, Hamburg,
Germany
nurce.yagci@haw-hamburg.de
ABSTRACT
Search engine optimization (SEO) can signicantly inuence what is
shown on the result pages of commercial search engines. However,
it is unclear what proportion of (top) results have actually been
optimized. We developed a tool that uses a semi-automatic approach
to detect, based on a given URL, whether SEO measures were taken.
In this multi-dimensional approach, we analyze the HTML code
from which we extract information on SEO and analytics tools.
Further, we extract SEO indicators on the page level and the website
level (e.g., page descriptions and loading time of a website). We
amend this approach by using lists of manually classied websites
and use machine learning methods to improve the classier. An
analysis based on three datasets with a total of 1,914 queries and
256,853 results shows that a large fraction of pages found in Google
is at least probably optimized, which is in line with statements
from SEO experts saying that it is tough to gain visibility in search
engines without applying SEO techniques.
CCS CONCEPTS
Information systems
World Wide Web; Web searching and
information discovery; Content ranking; Information retrieval;
Evaluation of retrieval results; World Wide Web; Web mining.
KEYWORDS
Search engines, search engine optimization, screen scraping
ACM Reference Format:
Dirk Lewandowski, Sebastian Sünkler, and Nurce Yagci. 2021. The inuence
of search engine optimization on Google’s results:: A multi-dimensional
approach for detecting SEO. In 13th ACM Web Science Conference 2021
(WebSci ’21), June 21–25, 2021, Virtual Event, United Kingdom. ACM, New
York, NY, USA, 9 pages. https://doi.org/10.1145/3447535.3462479
1 INTRODUCTION
Search engine optimization (SEO), which is dened as “the practice
of optimizing web pages in a way that improves their ranking in the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
WebSci ’21, June 21–25, 2021, Virtual Event, United Kingdom
©2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-8330-1/21/06.. .$15.00
https://doi.org/10.1145/3447535.3462479
organic search results” [
26
], can be regarded as lying somewhere
between helping search engines nding and indexing relevant con-
tent and manipulating their results. The SEO industry’s revenue is
expected to reach $80 billion in the U.S. in 2020 [
30
]. Many websites
heavily depend on the trac they gain from search engines, espe-
cially from Google, the market leader in Web search. This search
engine has a market share of 87% in the United States and 93% in
Europe across all platforms as of January 2021 [49].
While there are no studies investigating the overall trac web-
sites receive from search engines, search trac to some popular sites
gives an indication: The New York Times website receives 33.5% of
its trac from search engines, IBM.com 45.6%, pinterest.com 38.0%,
and acm.org even 67.6% [47].
A central question that has not been addressed in prior research
is to which degree result lists in search engines are externally
inuenced by search engine optimization (SEO). While there is
a vast body of professional literature on optimizing websites to be
better found through search engines, the scholarly literature on
SEO mainly focuses on analyzing search results to help website
owners apply scientically proven SEO techniques to improve their
sites’ rankings.
In our research, we take the user’s perspective on SEO: To what
degree are search engine result lists inuenced by optimized pages?
This is an essential question as the inuence of SEO could mean that
high-quality pages that have not been optimized are suppressed
from the top results and replaced by lower-quality pages. This, in
turn, would take into question the success of search engines in
providing users with the most relevant results.
This paper aims to describe our approach to identifying content
optimized for search engines through analyzing the HTML code and
further data on the website level, combined with lists of manually
classied websites. We further classify results by using a decision
tree with features of relevant factors for SEO measures. Our analysis
is based on three datasets containing a total of 256,853 URLs. The
main results are that a large fraction of top results in Google has
been optimized, that we can assume the eect of SEO to be stronger
for popular queries, and that there are no vast dierences between
the ratio for optimized results on dierent result positions.
The rest of this paper is structured as follows: First, we provide
a literature review showing the background of our research. Then,
we report on how we identied and classied SEO indicators. After
that, we give information on the URL classication and the datasets
used. We then present the results of our analysis, discuss them and
conclude with implications and suggestions for further research.
12
WebSci ’21, June 21–25, 2021, Virtual Event, United Kingdom Dirk Lewandowski et al.
2 LITERATURE REVIEW
This section gives an overview of research relevant to understand-
ing the practices and the inuence of search engine optimization,
user selection behavior on result pages, trust in search engines, and
approaches used to identify SEO in search result documents.
2.1 The practice and inuence of search engine
optimization
In the context of search engine marketing (SEM), there are two ways
to gain visibility on the result pages of commercial search engines:
paid search marketing (PSM) and search engine optimization (SEO).
In PSM, content provider book ads with the search engine, which
are then usually shown at the top of the search engine result pages
(SERPs). Alongside the booking of keyword ads, SEO is the second
way to gain visibility in search engines [
31
]. Contrary to booking
ads, where the advertiser pays the search engine for every click on
the ad, the clicks on organic results are not associated with direct
costs for the website provider. This makes SEO an attractive set
of techniques to almost all content providers on the web. While
search engine optimization is most often associated with optimizing
for products and services, it is important to note that the same
techniques are used for optimizing informational contents, i.e.,
users aiming to get informed about a topic will encounter optimized
content (such as content produced by public-relations agencies)
without knowing it.
Search engine optimization considers techniques on the website
level as well as on the document level. Some measures are used to
improve the overall performance of a website (e.g., improving the
speed at which pages are delivered, improving the site structure).
Other measures apply to the individual documents on the site (e.g.,
keyword density).
Search engine optimization can have positive eects. It was
demonstrated that search engine optimization methods positively
aect usability [
55
,
59
] and the accessibility of websites [
32
]. How-
ever, it is unclear if – or to what degree – SEO also positively aects
the relevance of the search results. This positive eect may stem
from the fact that content is prepared to easily allow the user to nd
potentially relevant information objects [
54
]. Further, as current
ranking algorithms aim to optimize user satisfaction [
6
], content
optimized to satisfy the users will be preferred.
Overall, search engine optimization professionals paint a very
positive picture of their profession, making content discoverable
through search engines and bringing good content to their top
results [
41
]. However, there is also a “dark side” to SEO, and it is un-
clear where the boundary between ethical and unethical work lies
[
66
].[
28
] surveyed search engine optimizers, journalists, and aca-
demics, revealing the increasing importance of SEO for news outlets
and, in turn, for the training of journalists. This study highlights
the importance of SEO for professional content creation. Further,
in an interview study with professionals at Greek media organiza-
tions, it was found that SEO increasingly inuences journalists in
their writing and that SEO policies are applied in newsrooms [
13
].
Interview studies found that journalists have reservations against
SEO in the context of journalistic work [
7
,
13
,
37
]. [
13
] found that
while SEO was considered indispensable as it ensures visibility of
the content, it also has a signicant inuence on topic setting. This,
in turn, can reduce the quality of the journalism being produced.
2.2 Selection behavior in search engine result
pages
The selection behavior on the search engine results pages can be
characterized as strongly oriented to the given order and represen-
tation. The most important overarching explanatory models for
user behavior on the results pages are the principle of least eort
[67] and satiscing [48].
As search engines present results in the form of ranked lists, the
position eect plays a huge role in what users look at [
23
,
50
] and
which results they select. They predominantly select the results
listed rst, and an overwhelming number of clicks are made on these
results. This eect has been demonstrated in numerous studies (e.g.,
[
2
,
4
,
18
,
19
,
35
,
45
,
62
]). However, not only the ranking of the results
is important but also them being shown in the so-called visible area
of the SERP, i.e., the results that are [
16
]. Users preferably select
results from this area [
20
]. In current result presentations, results
from multiple sources (e.g., news, videos) are shown together on
a single SERP. This incorporation of results from several vertical
search engines leads to a dierent presentation of search results.
Results that take up more space on the search results page or are
graphically more attractive are more likely to be perceived and
selected [27].
The eect of this user behavior is shown in a large-scale study
from Yahoo research [
15
]. Analyzing 2.6 billion search queries, this
research found that in web search, about 80 percent of all clicked
results are accounted for by only 10,000 websites. This underlies
the massive eect of result position and the attractiveness of taking
measures to make one’s content be shown in these top positions.
2.3 Trust in search engines
Search engine users show a high level of trust in search engines and
Google in particular. In a representative survey in the United States,
three-quarters of respondents said they trusted the information they
found on search engines. 28% do so for all or almost all, 45% for most
information [
38
]. Results for the European Union are very similar:
78% of respondents in a representative survey said they trust the
search results provided are the most relevant, ranging from 67% in
Romania to 87% in Austria [
10
]. Search engine trustworthiness is
comparable to traditional news media, as shown by a representative
study of Internet users from 28 markets, including the United States,
China, and Germany [
9
]. A study using a representative sample of
German Internet users found that users with little knowledge of
search engines are more likely to trust and use Google than other
users [43].
Non-representative studies add to this picture by showing that
a high ranking also increases sponsor credibility [
61
], that male
college students seem to be more likely to trust search engines to
provide objective results than women [
53
]. Students place search
engines in terms of their results being reliable somewhere between
libraries and databases (both of which were considered by most
respondents to be mostly or always reliable) and internet commu-
nities, forums, blogs, and podcasts (which were considered not
reliable or only somewhat reliable) [21].
13
The influence of search engine optimization on Google’s results: WebSci ’21, June 21–25, 2021, Virtual Event, United Kingdom
In laboratory studies, it has been shown that users select results
shown at the top of result lists even if they are less relevant [
35
,
45
]
or less credible [56, 58] than results shown at lower ranks.
2.4 Measuring search engine optimization
In this section, we review the literature dealing with identifying
factors relevant to SEO, using them as indicators to detect SEO
measures in websites or documents, and measuring the conformity
of websites or documents to SEO best practices. It is important to
note that the aim of the studies reported here is to measure SEO
success, i.e., whether documents are optimized in ways that reect
best practices in SEO, leading to a higher ranking in commercial
search engines. The focus of our research, on the other hand, is
to investigate whether SEO measures have been applied to the
documents, irrespective of their success.
The aim of most of the research reported here is to compare
the rankings of dierent websites by analyzing indicators that con-
tribute to these sites’ ranking. The goal is to give recommendations
on how to improve websites and documents through SEO tech-
niques. For instance, [
3
] and [
17
] developed tools that recommend
SEO measures based on analysis of a given website.
Correlational studies are based on lists of indicators that at least
in some way reect the (presumed) ranking factors of commercial
search engines. The lists used in empirical work range from con-
taining just some factors to more extensive lists [
8
,
17
,
34
,
39
,
57
].
Some use indicator lists even more ne-grained than the usual fac-
tors reported in the practitioner literature [
34
,
39
]. However, these
works lack a systematic collection and analysis of SEO indicators
based on the literature and expert opinion.
As the studies use dierent sets of indicators, it is hardly possible
to compare the results. Correlations between indicators and search
engine rankings are found [
3
,
8
,
11
,
14
,
17
,
51
,
57
,
65
], but it remains
unclear which factors actually explain the rankings. Some promis-
ing factors, predominantly ones that form the basics of the pro-
fessional SEO literature, are found. These include several on-page
factors such as optimized meta tags (i.e., tags to provide structured
metadata about a web page) and page speed optimization and sev-
eral o-page factors such as the number of backlinks (i.e, links from
external sites pointing to this particular page) [
1
,
33
,
46
,
64
]. While
factors used by SEO professionals have been applied in research,
other factors indicating SEO eorts (such as the use of specialized
SEO tools) have not been included so far.
Regarding the datasets used, it should be noted that it is unclear
how representative the studies’ queries are for the general search
engine queries. Some studies also focus on specic business sectors
or institutions, e.g., media [
14
], publishing [
57
], or institutions of
higher education [
5
,
12
], and, therefore, use specic sets of queries.
Further to the goal of improving rankings in search engines, [
32
]
found positive correlations between SEO and website accessibility.
[
12
] found correlations between quality metrics (including usability)
and SEO success. This line of research indicates positive eects of
SEO beyond the initial goal of achieving better ranking.
While the studies reported so far investigated correlations be-
tween result positions and SEO indicators, some research also fo-
cused on the degree to which websites or documents have been
optimized. In a comparative study, [
34
] use a Multi-Criteria Decision
Making (MCDM) algorithm to determine the degree of optimization
of academic sites. [
29
] used machine learning algorithms to classify
web pages into three predened classes according to the degree
of search engine optimization. This involved both identifying rel-
evant features for SEO using correlation analyses and evaluating
the accuracy of classication algorithms. However, a shortcoming
of this study is that the accuracy is measured by comparing the
algorithm’s performance to expert judgments, which themselves
may not be accurate. Many SEO techniques cannot be seen from a
cursory inspection of a website’s contents but can only be found
through more in-depth technical analysis.
The literature review shows that search engine optimization
is a mature line of business, applying sophisticated techniques
to achieve visibility in search engines. On the other hand, search
engine users choose only from a limited set of top results shown
and usually do not question how these results were generated.
They expose a high level of trust in search engines to provide them
with the most relevant results. Prior research aiming to identify
SEO in given documents focused on whether the measures taken
were successful in terms of better rankings. Research gaps lie in
identifying SEO in search results at a large scale, using a large
number of SEO indicators, and measuring whether SEO measures
have been taken (as opposed to measuring SEO success). In the
remainder of this paper, we address these gaps.
3 IDENTIFYING SEO FACTORS
At rst glance, one could assume that, as search engine optimizers
try to reverse-engineer Google’s ranking factors to make their
content visible in that search engine, ranking factors and SEO
factors are the same. However, we have to consider that not all
SEO eorts are successful. For instance, someone trying to gain
visibility in Google might use keyword stung, i.e., repeating a
keyword on a webpage very often, suggesting to the search engine
that the page is relevant to that keyword. Obviously, this approach
will not work as search engines can detect such simple gaming
methods. However, as we aim to detect whether content providers
seek to optimize their pages, keyword stung may still be a factor
for detecting optimized pages. Whether content providers have
successfully optimized their pages is not a criterion relevant to our
classication.
Our model of SEO factors and its implementation is based on an
extensive review of the professional literature and interviews we
conducted with SEO experts [
41
,
44
]. In total, our model consists
of 48 factors, which can be grouped along three dimensions: tools
and plugins, URL lists, and indicators for SEO. We prioritized these
factors for the implementation in our system. It should be noted that
the current implementation considers 21 factors only. These have
been considered the most fruitful by the experts and researchers,
and have also been validated through machine learning methods in
our initial studies. We are condent that already at this stage, we
can reliably identify optimized content.
Our approach combines the automatic identication of SEO indi-
cators from web pages and websites with manually generating lists
of optimized and not optimized websites. We follow this approach
as results in search engines are not equally distributed, i.e., there
is only a relatively small set of websites that account for a large
14
WebSci ’21, June 21–25, 2021, Virtual Event, United Kingdom Dirk Lewandowski et al.
fraction of the URLs shown [
36
] and clicked in the top results [
15
].
This means that by manually classifying a limited set of websites,
we can already detect a relatively large fraction of optimized pages.
We created ve lists of manually classied websites: SEO cus-
tomers, news websites, online shops, business websites, websites
with ads, and not optimized websites (for details, see Table 1). News
websites are classied as being optimized as all the SEO experts we
interviewed agreed that all news companies use SEO techniques to
increase the visibility of their content. We manually categorized the
websites by evaluating the content of 13,000 URLs. In the automatic
analysis, we focused on identifying tools used by search engine op-
timizers on the one hand and indicators for SEO on the other hand.
We dierentiate between two types of tools: Tools particularly used
for search engine optimization and analytics tools that are not nec-
essarily used for SEO purposes exclusively but are usually used in
the SEO context. We identied tools through analyzing the HTML
code of the results found. When a tool is used, a hint can usually be
found in the HTML comments or a script. The following examples
show code snippets used by the Yoast SEO plugin (an SEO tool) and
Google Analytics (an analytics tool):
<!--This site is optimized with
the Yoast SEO plugin v12.4 --
\url{https://yoast.com/wordpress} /plugins/seo/-->
<!-- Google Analytics -->
<script>(function(i,s,o,g,r,a,m)
\{i['GoogleAnalyticsObject']$=$r;$\ldots$
We manually extracted SEO tools and analytic tools from a set of
approx. 30,000 URLs, resulting in a list of 58 SEO tool names and 54
analytics tool names, respectively. In this approach, we used lists of
known SEO plugins and analytics tools
1
to check if we could nd
references to these plugins in the HTML comments. In addition,
we searched the comments for signal words like SEO to nd tools
and plugins that were not included in the lists.
In terms of SEO indicators, we extract information from the
page’s HTML code, further information on the website level, and
teste for page speed. Data extracted from the HTML code includes
the use of a page description and nofollow links, among others.
Information on the website level includes the use of SEO-specic
information in the robots.txt le and the use of a sitemap le. Fi-
nally, we measure page speed, as one of the fundamental technical
factors in search engine optimization is optimizing the pages to
load quickly. We used these indicators to build a rule-based clas-
sier to determine the probability of search engine optimization
in four classes: denitely optimized, probably optimized, probably
not optimized, and denitely not optimized. The rules are relatively
simple since they only check the presence or absence of an indicator.
A weighting was not performed. We decided to use this approach
because we wanted to create a large dataset for further evaluation
compared to other approaches like collecting judgments by SEO
experts to build a training set [
29
], resulting in small datasets. Fur-
ther, it is unclear how reliable experts can detect SEO measures on
given webpages.
We built the dataset based on this classier. We used it to apply
machine learning methods for evaluating the rule-based classier
1
SEO plugins: https://wordpress.org/plugins/tags/seo/ and analytics tools: https://
wordpress.org/plugins/tags/analytics/
and to evaluate the indicators that we do not track in the rule-based
approach for suitability for classifying web pages using our search
engine optimization probability classes. We also investigated how
well the classier performs without features based on lists (e.g., the
list with SEO tools or the list with News websites). We wanted to
evaluate if we could produce useful results even for features that
require constant editing and maintenance.
The dataset for our machine learning processes consisted of
281.848 documents and all of the 49 indicators. First, we performed
an ANOVA-f test for feature reduction and found that we can per-
form our classication with 21 of the indicators. Next, we deter-
mined the importance of these features using a decision tree classi-
er because it gives feature importances and is a close equivalent to
the rule-based approach. Wecreated a train/test split with a test size
of 0.33 (93.949 documents) using a prediction score and prediction
probability score to get both the prediction from the rule-based
classier and the probability for each prediction. We then calculated
the importance of the features in the classication. As a result, we
used the new feature list to create a new model for a decision tree
classier to classify a dierent dataset. We evaluated 12 classier
algorithms (e.g., Naive Bayes Gradient Boost and Support Vector
Machines) and decided to use a decision tree classier because of
the best ratio of accuracy and processing time.
The decision tree classier performs with an accuracy of 99.7%,
and a macro-precision of 99.6%, macro recall of 99.3% and a macro-f1
of 99.5% which is no surprise since we developed simple rules before-
hand. The goal of this approach was to determine which features
are relevant and which can be neglected. We also created a decision
model without any external features, which performed with an
accuracy of 79.8%. This is still good to pre-classify documents if
external features from our lists are missing. In the following, we
focus on the model with all features since our rule-based approach
is built on it.
We built a system that automatically queries Google, collects
result URLs and result positions from the SERPs, collects the result
documents, analyzes the HTML code, and checks the URLs against
our database of already known websites. We will not focus on the
technical implementation in this paper; details can be found in [
52
].
Table 1 provides an overview of all indicators used for the classica-
tion. It shows the indicators from the rule-based classication with
the class members, short descriptions for the rules for the classes
and for the indicators, and the results from the feature reduction
from machine learning. We used these results for the classication
of the datasets, as detailed in the next section.
4 DATASETS
To test our system and the decision tree classier, we used three
datasets based on query sets and results collected from Google
through screen scraping. Table 2 gives an overview of all datasets
with a description, the number of queries, the number of scraped
search results, and the source for the search queries. With various
topics and a total of 256,853 result URLs without duplicates, we
are condent to have built a large and diverse enough dataset for
testing purposes. In the following, we will present and discuss the
results for the datasets separately to show if and how the results
15
The influence of search engine optimization on Google’s results: WebSci ’21, June 21–25, 2021, Virtual Event, United Kingdom
Table 1: Indicators used for classifying the documents found, ordered according to their respective rule-based class, with
representation of their use in the decision tree classier
Indicator Description decision tree
Denitely
optimized
A result is denitely optimized if at least one of the very obvious search
engine optimization criteria from our list of SEO indicators is met. Very
obvious here means that the intention of SEO is clearly visible.
SEO Tools Tools that dedicatedly support SEO measures, e.g., Yoast SEO
Plugin
x
SEO customers Customers of search engine optimization agencies
(manually collected; 1,004 items)
x
Websites with ads List of websites showing ads (manually collected; 325 items) x
News websites List of news websites (manually collected; 1,203 items) x
Microdata and
schema.org
Use of microdata or schema.org on a website to dene the context of the
data, e.g., JSON-LD
-
Probably optimized A result is probably optimized if it is not classied as denitely optimized, the
element is not classied as not optimized, and it meets one of the indicators
that we dene as best practices for SEO or if the document has a visible
commercial intent.
Analytics Tools Tools that are used for website analytics, e.g., Google Analytics -
Online shops List of websites (manually collected; 178 items) x
Business websites List of business websites (manually collected; 72 items) -
HTTPS Usage of Hypertext Transfer Protocol Secure x
Pagespeed Loading time of a website < 3s x
SEO in robots.txt SEO indicators in robots.txt of a website, e.g., crawl-delay x
Nofollow links Use of tags on the website to instruct search engines to ignore the target of
the link for ranking purposes
-
Canonical links Use of canonical tags on the website to prevent duplicate content issues x
Online advertisements Use of contextual and aliate marketing on a website, e.g., Google Ads -
Sitemap Use of a sitemap on a website -
Viewport Denition of a viewport for a responsive design
e.g., <meta name="viewport" content="width=device-width, initial-scale=1">
x
Open Graph Tags Usage of open graph tags for previews of content on social media
e.g., <meta property="og:title" content="website title” />
x
Probably not
optimized
A result is probably not optimized if it is not denitely optimized, is not
classied as not optimized, and does not have a title or description tag. These
criteria are the basics of search engine optimization, so we weighted this
classication result as more important if we also found criteria for classifying
a result as probably optimized.
Description Use of a site description x
Title Use of a site title x
Denitely not
optimized
A result is denitely not optimized when it is on the list of denitely not
optimized websites.
Not optimized List of websites know not to be optimized (manually collected; 1 item) x
Features not used in the
rule-based classier
H1 Tag Use of H1 tags for headings on the rst level x
URL length Length of the URL without the scheme and protocol x
Keyword in description Keywords of query in any description tag x
Keyword in meta
content
Keywords of query in any meta tag x
Keyword in meta
description
Keywords in the meta content tag x
Keyword in meta open
graph
Keywords in any meta open graph tags x
Keyword in title open
graph
Keywords in title open graph tag x
16
WebSci ’21, June 21–25, 2021, Virtual Event, United Kingdom Dirk Lewandowski et al.
Table 2: Datasets
Dataset Description Queries Results Source Max. result
output per
query
Feature importance
(Top-3)
Google
Trends
Dataset with queries
from Google Trends
collected from March
to June 2020 and from
November and
December 2020.
1,563 207,522 https://trends.google.de/
trends/?geo=DE
325 News (0.53)
Description (0.36)
URL Length (0.15)
Radical right Joint work with a
regional media
regulation authority
to evaluate the use of
SEO on probably
radical right content.
80 12,673 Queries provided by the
Medienanstalt
Hamburg/Schleswig-
Holstein (regional
media regulation
authority).
258 Description (0.56)
News (0.46)
Open Graph (0.18)
Coronavirus 482 queries from
Germany related to
the covid pandemic,
collected in March
2020.
271 36,658 https:
//github.com/microsoft/
BingCoronavirusQuerySet
277 Description (0.54)
News (0.49)
Open Graph (0.18)
Figure 1: Results from the classication into SEO classes
depend on particular datasets and what similarities between the
datasets can be found.
5 RESULTS
The results of the automatic classication are shown in Figure 1. It
can be seen that the vast majority of the results are either optimized
or probably optimized. Depending on the dataset, we can see that
between 41.5 and 63 percent of the results found are classied as
denitely being optimized. Dierences between the datasets can be
attributed to the higher proportion of news content found in the
Trends and Corona dataset as opposed to the radical right dataset
(Google Trends: 48,4 %, radical right: 27 %, Coronavirus: 31 %). Only
a small fraction of the results are classied as not optimized (0.7
percent across all datasets). These are all results from Wikipedia,
as this is the only website on our list of denitely not optimized
sites. Our evaluation also shows that the fraction of probably not
optimized results depends on the topics of the datasets. Thus, the
percentage of non-optimized documents is the lowest for popular
queries, at 11%, while it is around 18% for the other, thematically
specic data sets.
In summary, we found that a large fraction of results found in
Google is either denitely optimized or probably optimized. Over
17
The influence of search engine optimization on Google’s results: WebSci ’21, June 21–25, 2021, Virtual Event, United Kingdom
Figure 2: Results available per position (n=1,914 queries,
256,853 documents)
80 percent of results found belong to these categories. This does
not come as a surprise as we know that SEO is a multi-billion-dollar
industry, and businesses and other actors are often dependent on
the visibility their websites gain from search engine trac. How-
ever, it should be noted that due to the limited number of results
that Google provides for a query (usually not more than 300), this
analysis considers these "top" results only. However, the results
paint a realistic picture of what a user willing to consider all results
will see, as manually querying Google will not lead to more results.
We also evaluated the probability of SEO on the top result po-
sitions across the search queries in our datasets. We decided to
evaluate the score up to position 130 in Google because the num-
ber of available results decreases sharply beyond this position (see
Figure 2). Thus, for many records, less than half are available from
the back positions than in the top 10 results.
We translated the class aliation to a score by dening limit
values (not optimized
=
0, probably not optimized
=
33, probably
optimized
=
67 and denitely optimized
=
100). Figure 3 shows
the mean of the score on the positions in Google up to position
130. The mean of the score on position one is relatively low be-
cause of many Wikipedia results shown on the top position, which
we always classify as not optimized. We found 1,974 results from
Wikipedia in our dataset. Of these, 27 % were found on the top po-
sition and 79.3 % within the top ten positions. Figure 4 again shows
the distribution, excluding Wikipedia results. Our evaluations show
that the probability of SEO is slightly higher on top positions. This
is especially visible on the Google Trends dataset while SEO for
thematically specic topics seems to be more dierentiate. SEO
for content according Coronavirus is more visible on the lower
rankings.
6 DISCUSSION AND CONCLUSION
This paper presented a method for identifying SEO measures in
results shown by commercial search engines. Our model for iden-
tifying these measures is based on an extensive literature review,
interviews with SEO professionals, and an evaluation with machine
learning algorithms. The automatic classication is based on factors
belonging to the three dimensions tools and plugins, URL lists, and
indicators for SEO. It incorporates a total of 21 factors. An analysis
based on three datasets with a total of 256,853 URLs shows that in
Figure 3: Score up to position 130
Figure 4: Score without Wikipedia up to position 130
Google, a large fraction of results available to users is optimized
through SEO measures. The results indicate that the eect of SEO
is stronger for popular queries.
A surprising result is that we did not nd huge dierences be-
tween the ratio of optimized results on dierent result positions.
We assumed that the further one goes down the result list, the lower
the ratio of optimized pages. We instead found that users will be
confronted with a high number of SEO-optimized documents, even
when they are willing to consider a large number of results.
This study has some limitations, the most apparent being that
only data from one search engine (Google) has been analyzed. Fur-
ther, while we tried to diversify queries and search results by using
three dierent datasets, the analysis is not based on a representative
sample of queries, as used by search engine users. In terms of the
factors used, our analysis is limited in that we did not consider
o-site factors (such as the number of backlinks). In future research,
we plan to add more search engines, increase datasets in terms of
size and diversity, and add more factors to our model.
In the results we were able to fetch, we found a large degree of
pages being optimized. This result held for all result positions. In
that sense, it would be interesting to experiment with data from
search engines like millionshort.com (see [
40
]), which allow for
removing top sources from the results and accessing results shown
on positions we could not scrape from Google. We assume that in
these positions, the probability of nding optimized content will
be much lower.
18
WebSci ’21, June 21–25, 2021, Virtual Event, United Kingdom Dirk Lewandowski et al.
Our research contributes to better understanding what users get
to see on search engine result pages. Apart from the organic results
to which the SEO measures apply, external inuence can be exerted
through paid search advertising (PSM; "sponsored results"), where
advertisers bid for positions on the SERPs. Some research has also
focused on the mixture of paid-for and organic results on search
engine result pages (SERPs) and how users able vs. not able to
distinguish between the two result types show a dierent selection
behavior (e.g., [
22
,
24
,
42
]). Further, there is some research on search
engine companies’ self-interests and how they may inuence what
is shown on the result pages [
25
]. From the perspective of search
engine providers, SEO constitutes an external inuence on the
ranking functions.
On the one hand, SEO benets search engines, and search engine
companies even provide help for SEO (e.g., Google’s SEO Starter
Guide, Webmaster Guidelines, and Search Central Help Commu-
nity). On the other hand, optimized pages inuence what users
see in the result lists and may bias search results. This eect is
further enhanced when search engines incorporate user behavior
signals into ranking models through analyzing clicks and further
interactions with the results (e.g., [
60
,
63
]). This may lead to a rich-
get-richer eect preferring content that is not necessarily the most
relevant.
The results reported in this paper are promising, but, of course,
further work is needed to rene and further evaluate the approach.
In future research, it might also be interesting to bring together
the dierent inuences on the search result pages (i.e., through
SEO, paid search marketing, and search engine providers’ self-
interests) to measure how these inuence user selection behavior
and knowledge acquisition through search engines.
ACKNOWLEDGMENTS
This work is funded by the German Research Foundation (DFG -
Deutsche Forschungsgemeinschaft), grant number 417552432.
REFERENCES
[1]
Muhammad Akram, Imran Sohail, Sikandar Hayat, M Imran Sha, and Umer
Saeed. 2010. Search Engine Optimization Techniques Practiced in Organizations:
A Study of Four Organizations. J. Comput. 2, 6 (June 2010), 134–139.
[2]
Judit Bar-Ilan, Kevin Keenoy, Mark Levene, and Eti Yaari. 2009. Presentation bias
is signicant in determining user preference for search results-A user study. J.
Am. Soc. Inf. Sci. Technol. 60, 1 (January 2009), 135–149. https://doi.org/10.1002/
asi.20941
[3]
Aziz Barbar and Anis Ismail. 2019. Search Engine Optimization (SEO) for Websites.
In Proceedings of the 2019 5th International Conference on Computer and Technology
Applications, ACM, New York, NY, USA, 51–55. https://doi.org/10.1145/3323933.
3324072
[4]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimen-
tal comparison of click position-bias models. In Proceedings of the international
conference on Web search and web data mining, Microsoft Research, Cambridge,
United Kingdom Microsoft Research, Redmond, United States, 87–94.
[5]
Ashwini Dalvi and Riya Saraf. 2019. Inspecting Engineering College Websites
for Eective Search Engine Optimization. In 2019 International Conference on
Nascent Technologies in Engineering (ICNTE), IEEE, 1–5. https://doi.org/10.1109/
ICNTE44896.2019.8945823
[6]
Fernando Diaz. 2016. Worst Practices for Designing Production Information
Access Systems. ACM SIGIR Forum 50, 1 (June 2016), 2–11. https://doi.org/10.
1145/2964797.2964799
[7]
Murray Dick. 2011. Search Engine Optimisation in Uk News Production. Journal.
Pract. 5, 4 (August 2011), 462–477. https://doi.org/10.1080/17512786.2010.551020
[8] Ioannis C Drivas, Damianos P Sakas, Georgios A Giannakopoulos, and Daphne
Kyriaki-Manessi. 2020. Big Data Analytics for Search Engine Optimization. Big
Data Cogn. Comput. 4, 2 (April 2020), 1–22. https://doi.org/10.3390/bdcc4020005
[9] Edelman. 2020. Edelman Trust Barometer 2020.
[10]
European Commission. 2016. Special Eurobarometer 447 – Online Platforms. Euro-
pean Commission, Brussels. https://doi.org/10.2759/937517
[11]
Michael P. Evans. 2007. Analysing Google rankings through search engine
optimization data. Internet Res. 17, 1 (2007), 21–37. https://doi.org/10.1108/
10662240710730470
[12]
Andreas Giannakoulopoulos, Nikos Konstantinou, Dimitris Koutsompolis, Minas
Pergantis, and Iraklis Varlamis. 2019. Academic excellence,website quality, SEO
performance: Is there a correlation? Futur. Internet 11, 11 (2019), 1–25. https:
//doi.org/10.3390/11110242
[13]
Dimitrios Giomelakis, Christina Karypidou, and Andreas Veglis. 2019. SEO inside
Newsrooms: Reports from the Field. Futur. Internet 11, 12 (December 2019), 261.
https://doi.org/10.3390/11120261
[14]
Dimitrios Giomelakis and Andreas Veglis. 2016. Investigating Search Engine
Optimization Factors in Media Websites. Digit. Journal. 4, 3 (April 2016), 379–400.
https://doi.org/10.1080/21670811.2015.1046992
[15]
Sharad Goel, Andrei Broder, Evgeniy Gabrilovich, and Bo Pang. 2010. Anatomy
of the long tail. In Proceedings of the third ACM international conference on Web
search and data mining - WSDM ’10, ACM Press, New York, New York, USA, 201.
https://doi.org/10.1145/1718487.1718513
[16]
Nadine Höchstötter and Dirk Lewandowski. 2009. What users see – Structures
in search engine results pages. Inf. Sci. (Ny). 179, 12 (May 2009), 1796–1812.
https://doi.org/10.1016/j.ins.2009.01.028
[17]
C D Hoyos, J C Duque, A F Barco, and É Vareilles. 2019. A search engine opti-
mization recommender system. In CEUR Workshop Proceedings, 43–47.
[18]
Thorsten Joachims, Laura Granka, Bing Pan, and Helene Hembrooke. Accurately
Interpreting Clickthrough Data as Implicit Feedback.
[19]
Mark T. Keane, Maeve O’Brien, and Barry Smyth. 2008. Are people biased in
their use of search engines? Commun. ACM 51, 2 (February 2008), 49–52. https:
//doi.org/10.1145/1314215.1314224
[20]
Diane Kelly and Leif Azzopardi. 2015. How many results per page? In Proceedings
of the 38th International ACM SIGIR Conference on Research and Development in
Information Retrieval - SIGIR ’15, ACM Press, New York, New York, USA, 183–192.
https://doi.org/10.1145/2766462.2767732
[21]
Raphael N. Klein, Lisa Beutelspacher, Katharina Hauk, Christina Terp, Denis
Anuschewski, Christoph Zensen, Violeta Trkulja, and Katrin Weller. 2009. Infor-
mationskompetenz in Zeiten des Web 2.0: Chancen und Herausforderungen im
Umgang mit Social Software. Inf. - Wiss. Prax. 60, 3 (2009), 129–142.
[22]
Dirk Lewandowski. 2017. Users’ Understanding of Search Engine Advertisements.
J. Inf. Sci. Theory Pract. 5, 4 (2017), 6–25. https://doi.org/10.1633/JISTaP.2017.5.4.1
[23]
Dirk Lewandowski and Yvonne Kammerer. 2020. Factors inuencing viewing be-
haviour on search engine results pages: a review of eye-tracking research. Behav.
Inf. Technol. (May 2020), 1–31. https://doi.org/10.1080/0144929X.2020.1761450
[24]
Dirk Lewandowski, Friederike Kerkmann, Sandra Rümmele, and Sebastian Sün-
kler. 2018. An empirical investigation on search engine ad disclosure. J. Assoc.
Inf. Sci. Technol. 69, 3 (March 2018), 420–437. https://doi.org/10.1002/asi.23963
[25]
Dirk Lewandowski and Sebastian Sünkler. 2013. Representative online study
to evaluate the revised commitments proposed by Google on 21 October 2013
as part of EU competition investigation AT.39740-Google: Country comparison
report. Hamburg.
[26]
Kai Li, Mei Lin, Zhangxi Lin, and Bo Xing. 2014. Running and chasing - The
competition between paid search marketing and search engine optimization.
Proc. Annu. Hawaii Int. Conf. Syst. Sci. (2014), 3110–3119. https://doi.org/10.1109/
HICSS.2014.640
[27]
Zeyang Liu, Yiqun Liu, Ke Zhou, Min Zhang, and Shaoping Ma. 2015. Inuence of
Vertical Result in Web Search Examination. In Proceedings of the 38th International
ACM SIGIR Conference on Research and Development in Information Retrieval -
SIGIR ’15, ACM Press, New York, New York, USA, 193–202. https://doi.org/10.
1145/2766462.2767714
[28]
Carlos Lopezosa, Lluís Codina, Javier Díaz-Noci, and José-Antonio Ontalba. 2020.
SEO and the digital news media: From the workplace to the classroom. Comunicar
28, 63 (April 2020), 63–72. https://doi.org/10.3916/C63-2020-06
[29]
Goran Matošević, Jasminka Dobša, and Dunja Mladenić. 2021. Using machine
learning for web page classication in search engine optimization. Futur. Internet
13, 1 (2021), 1–20. https://doi.org/10.3390/13010009
[30]
TJ McCue. 2018. SEO Industry Approaching $80 Billion But All You Want Is More
Web Trac. forbes.com.
[31]
Mike Moran and Bill Hunt. 2015. Search Engine Marketing, Inc.: Driving Search
Trac to Your Company’s Website (Third edit ed.). IBM Press, Upper Saddle River,
NJ.
[32]
Lourdes Moreno and Paloma Martinez. 2013. Overlapping factors in search engine
optimization and web accessibility. Online Inf. Rev. 37, 4 (2013), 564–580. https:
//doi.org/10.1108/OIR-04- 2012-0063
[33]
Ushadi Niranjika and Dinesh Samarasighe. 2019. Exploring the Eectiveness
of Search Engine Optimization Tactics for Dynamic Websites in Sri Lanka. In
2019 Moratuwa Engineering Research Conference (MERCon), IEEE, 267–272. https:
//doi.org/10.1109/MERCon.2019.8818903
[34]
Barış Özkan, Eren Özceylan, Mehmet Kabak, and Metin Dağdeviren. 2020. Evalu-
ating the websites of academic departments through SEO criteria: a hesitant fuzzy
19
The influence of search engine optimization on Google’s results: WebSci ’21, June 21–25, 2021, Virtual Event, United Kingdom
linguistic MCDM approach. Springer Netherlands. https://doi.org/10.1007/s10462-
019-09681- z
[35]
Bing Pan, Helene Hembrooke, Thorsten Joachims, Lori Lorigo, Geri Gay, and
Laura Granka. 2007. In Google We Trust: Users’ Decisions on Rank, Position, and
Relevance. J. Comput. Commun. 12, 3 (April 2007), 801–823. https://doi.org/10.
1111/j.1083-6101.2007.00351.x
[36] Philip Petrescu. 2014. Google Organic Click-Through Rates in 2014 - Moz.
[37]
Indra Prawira and Mariko Rizkiansyah. 2018. Search engine optimization in news
production online marketing practice in Indonesia online news media. Pertanika
J. Soc. Sci. Humanit. 26, T (2018), 263–270.
[38]
Kristen Purcell, Joanna Brenner, and Lee Raine. 2012. Search Engine Use 2012.
Washington, DC.
[39]
Joni Salminen, Roope Marttila, Bernard J. Jansen, Juan Corporan, and Tommi
Salenius. 2019. Using machine learning to predict ranking of webpages in the
gift industry: Factors for search-engine optimization. ACM Int. Conf. Proceeding
Ser. (2019). https://doi.org/10.1145/3361570.3361578
[40]
Philipp Schaer, Philipp Mayr, Sebastian Sünkler, and Dirk Lewandowski. 2016.
How Relevant is the Long Tail? In CLEF 2016, Norbert Fuhr, Paulo Quaresma,
Teresa Gonçalves, Birger Larsen, Krisztian Balog, Craig Macdonald, Linda Cap-
pellato and Nicola Ferro (eds.). Springer International Publishing, Cham, 227–233.
https://doi.org/10.1007/978-3- 319-44564-9_20
[41]
Sebastian Schultheiß and Dirk Lewandowski. 2020. “Outside the industry, no-
body knows what we do” SEO as seen by search engine optimizers and content
providers. J. Doc. (2020). https://doi.org/10.1108/JD-07- 2020-0127
[42]
Sebastian Schultheiß and Dirk Lewandowski. 2020. How users’ knowledge of
advertisements inuences their viewing and selection behavior in search engines.
J. Assoc. Inf. Sci. Technol. (September 2020), asi.24410. https://doi.org/10.1002/asi.
24410
[43]
Sebastian Schultheiß and Dirk Lewandowski. 2021. Misplaced trust? The re-
lationship between trust, ability to identify commercially inuenced results,
and search engine preference. Journal of Information Science. May 2021. https:
//doi.org/10.1177/01655515211014157
[44]
Sebastian Schultheiß and Dirk Lewandowski. 2021. Expert interviews with stake-
holder groups in the context of commercial search engineswithin the SEO Eect
project. Retrieved from https://osf.io/5aufr/
[45]
Sebastian Schultheiß, Sebastian Sünkler, and Dirk Lewandowski. 2018. We still
trust in google, but less than 10 years ago: An eye-tracking study. Inf. Res. 23, 3
(2018).
[46]
Jenna Pack Sheeld. 2020. Search Engine Optimization and Business Commu-
nication Instruction: Interviews With Experts. Bus. Prof. Commun. Q. (January
2020), 232949061989033. https://doi.org/10.1177/2329490619890335
[47] Similarweb. 2021. SimilarWeb | Website Trac Intelligence.
[48]
Herbert Alexander Simon. 1955. A Behavioral Model of Rational Choice. Q. J.
Econ. 69, 1 (1955), 99–118. https://doi.org/10.2307/1884852
[49]
StatCounter. 2020. Search Engine Market Share Europe | StatCounter Global
Stats.
[50]
Artur Strzelecki. 2020. Eye-Tracking Studies of Web Search Engines: A System-
atic Literature Review. Information 11, 6 (June 2020). https://doi.org/10.3390/
info11060300
[51]
Ao-Jan Su, Y Charlie Hu, Aleksandar Kuzmanovic, and Cheng-Kok Koh. 2014.
How to Improve Your Search Engine Ranking: Myths and Reality. Acm Trans.
Web 8, 2 (2014), 8. https://doi.org/10.1145/2579990
[52]
Sebastian Sünkler and Nurce Yagci. 2021. Development and software imple-
mentation of a preliminary model to identify the probability of search engine
optimization on webpages. Hamburg.
[53]
Arthur Taylor and Heather A. Dalal. 2017. Gender and Information Literacy:
Evaluation of Gender Dierences in a Student Survey of Information Sources.
Coll. Res. Libr. 78, 1 (2017), 90–113. https://doi.org/10.5860/crl.78.1.90
[54]
Shari Thurow. 2015. To Optimize Search, Optimize the Searcher. Online Search.
39, 4 (2015), 44–48.
[55]
Shari Thurow and Nick Musica. 2009. When Search Meets Web Usability. New
Riders, Berkeley.
[56]
Andreas Tremel. 2010. Suchen, nden - glauben? Die Rolle der Glaubwu
¨
rdigkeit
von Suchergebnissen bei der Nutzung von Suchmaschinen. Ludwig-Maximilians-
Universität (LMU) München.
[57]
Lance Umenhofer. 2019. Gaining Ground: Search Engine Optimization and Its
Implementation on an Indie Book Press. Publ. Res. Q. 35, 2 (June 2019), 258–273.
https://doi.org/10.1007/s12109-019- 09651-x
[58]
Julian Unkel and Alexander Haas. 2017. The eects of credibility cues on the
selection of search engine results. J. Assoc. Inf. Sci. Technol. 68, 8 (August 2017),
1850–1862. https://doi.org/10.1002/asi.23820
[59]
Eugene B. Visser and Melius Weideman. 2011. An empirical study on website
usability elements and how they aect search engine optimisation. SA J. Inf.
Manag. 13, 1 (March 2011), 1–9. https://doi.org/10.4102/sajim.v13i1.428
[60]
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016.
Learning to rank with selection bias in personal search. SIGIR 2016 - Proc. 39th
Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (2016), 115–124. https://doi.org/10.1145/
2911451.2911537
[61]
Axel Westerwick. 2013. Eects of Sponsorship, Web Site Design, and Google
Ranking on the Credibility of Online Information. J. Comput. Commun. 18, 2
(January 2013), 80–97. https://doi.org/10.1111/jcc4.12006
[62]
Yisong Yue, Rajan Patel, and Hein Roehrig. 2010. Beyond position bias. In Pro-
ceedings of the 19th international conference on World wide web - WWW ’10, ACM
Press, New York, New York, USA, 1011. https://doi.org/10.1145/1772690.1772793
[63]
Hamed Zamani, Michael Bendersky, Xuanhui Wang, and Mingyang Zhang. 2017.
Situational context for ranking in personal search. 26th Int. World Wide Web Conf.
WWW 2017 (2017), 1531–1540. https://doi.org/10.1145/3038912.3052648
[64]
Lihong Zhang, Jianwei Zhang, and Yanbin Ju. 2011. The research on Search
Engine Optimization based on Six Sigma Management. In 2011 International
Conference on E-Business and E-Government (ICEE), IEEE, 1–4. https://doi.org/10.
1109/ICEBEG.2011.5881880
[65]
Christos Ziakis, Maro Vlachopoulou, Theodosios Kyrkoudis, and Makrina
Karagkiozidou. 2019. Important factors for improving Google search rank. Futur.
Internet 11, 2 (2019). https://doi.org/10.3390/11020032
[66]
Malte Ziewitz. 2019. Rethinking gaming: The ethical work of optimization in
web search engines. Soc. Stud. Sci. 49, 5 (2019), 707–731. https://doi.org/10.1177/
0306312719865607
[67]
George Kingsley Zipf. 1949. Human Behaviour and the Principle of Least Eort.
https://doi.org/10.2307/2226729
APPENDICES
A RESEARCH DATA
Research data and software code are available at https://doi.org/10.
17605/OSF.IO/JYV9R.
20
... This leads us to the question of whether the presence or absence of SEO measures allows conclusions to be drawn about content quality of those web pages. Since SEO has a measurable influence on result rankings [29] and thus on what users see, we must also ask how users are affected by the probable higher rankings of optimized content. ...
... For this purpose, we made use of an SEO classification tool. Based on N = 21 indicators, a rule-based classifier determines the probability of SEO on a web page by using its HTML code and further indicators [29]. In addition to asking whether there are differences between optimized and non-optimized web pages, we also address the question of how these differences are perceived by means of thinking aloud. ...
... In the first step, we performed an analysis of the results to N = 15 health-related queries using our SEO classification tool [29] in February 2021. These queries relate to broadly known diseases, e.g., acne, epilepsy, and cataract, and were taken from gesund.bund.de, a service of the German Federal Ministry of Health. ...
Preprint
Full-text available
Searching for medical information is both a common and important activity since it influences decisions people make about their healthcare. Using search engine optimization (SEO), content producers seek to increase the visibility of their content. SEO is more likely to be practiced by commercially motivated content producers such as pharmaceutical companies than by non-commercial providers such as governmental bodies. In this study, we ask whether content quality correlates with the presence or absence of SEO measures on a web page. We conducted a user study in which N = 61 participants comprising laypeople as well as experts in health information assessment evaluated health-related web pages classified as either optimized or non-optimized. The subjects rated the expertise of non-optimized web pages as higher than the expertise of optimized pages, justifying their appraisal by the more competent and reputable appearance of non-optimized pages. In addition, comments about the website operators of the non-optimized pages were exclusively positive, while optimized pages tended to receive positive as well as negative assessments. We found no differences between the ratings of laypeople and experts. Since non-optimized, but high-quality content may be outranked by optimized content of lower quality, trusted sources should be prioritized in rankings.
... The keyword list was used as queries on software that simulates a search on Google, extracts the top 20 website links, and outputs a probability score for optimization. Specifically, the tool analyzes the website HTML searching for SEO indicators, classifying websites into four classes according to these factors: definitely optimized, probably optimized, probably not optimized, and definitely not optimized (Lewandowski et al., 2021;Lewandowski and Schultheiß, 2022). For each query entry, the tool extracted the first 20 results returned by Google. ...
... The keywords were then used as queries on software developed by the Hamburg University of Applied Sciences (HAW Hamburg), which simulates a search on Google, extracts the top 20 website links, and outputs a probability score for optimization. Specifically, the tool analyzes the website HTML searching for 32 SEO indicators, classifying websites into four classes according to these factors: definitely optimized, probably optimized, probably not optimized, and definitely not optimized (Lewandowski et al., 2021;Lewandowski and Schultheiß, 2022). We performed the search straddling the election day: 23, 24, and 25 September 2022. ...
Article
Full-text available
This research investigates the use of search engine optimization (SEO) by political and non-political actors to amplify the visibility of search engine results. While there has been much theoretical speculation around the role of SEO techniques in boosting the ranking appearance of a website, few empirical studies have been undertaken to understand the extent to which SEO techniques are used to promote visibility online. This study takes Italy as a case study to map the information landscape around nine highly controversial issues during the Italian electoral campaign of 2022. Using a combination of digital methods and a tool detecting optimization in websites, our article aims at examining which actors employ SEO techniques to foster the circulation of their ideas and agendas around hot topics. Our analysis reveals that information channels, institutions, and companies are predominant, while political actors remain in the background. Contextually, data indicate that SEO techniques are employed by several recurrent editorial groups, company owners, and institutions. Ultimately, we discuss the impact of SEO techniques on the circulation and visibility of information around relevant policy issues, contributing to shaping and influencing public debate and opinion.
... Además, en el 55% de las consultas se hace clic en los tres primeros resultados, y a la segunda página solo llega un 2,5% de usuarios (Beus, 2020). Asimismo, la aparición de las webs en los primeros resultados de búsqueda depende en gran medida de la aplicación de técnicas SEO (Lewandowski et al., 2021). Por todo ello, queda patente la necesidad de profesionales especializados en esta disciplina, que consigan alcanzar los primeros puestos para obtener visibilidad y atracción de los públicos potenciales en internet. ...
... Se trata de un perfil indefectiblemente ligado al devenir de los buscadores que aplican continuamente cambios a su algoritmo con el objetivo de mejorar los resultados. Su importancia es clave teniendo en cuenta la dependencia de las técnicas SEO para tener un buen posicionamiento (Lewandowski et al., 2021) y el poco conocimiento que los usuarios tienen de la influencia de estas en los resultados que les aparecen (Schultheiß & Lewandowski, 2020). Entre sus retos, las tendencias apuntan a un mayor uso de la búsqueda vocal por la popularización de dispositivos móviles, así como la mejora del entendimiento del lenguaje natural. ...
Article
Full-text available
La mejora de la empleabilidad es uno de los objetivos clave de las Universidades, para lo cual es imprescindible identificar las competencias de los perfiles profesionales surgidos en la era digital, como el especialista en posicionamiento web o SEO (Search Engine Optimizer). Este artículo realiza una propuesta de perfil competencial y compara el grado de importancia otorgado por empresas y profesionales de un conjunto de competencias previamente identificado. Se ha seguido un diseño exploratorio secuencial (Dexplos), en su modalidad derivativa, que combina una metodología mixta por etapas. Se parte del análisis de demanda laboral sobre SEO y se realizan 23 entrevistas a expertos. De esta fase se extraen las competencias y se crea un instrumento cuantitativo que se lanza a modo de encuesta a dos poblaciones: los negocios y las personas que trabajan en SEO. La encuesta obtuvo 340 respuestas de empresas y 311 de profesionales. Se aplican descriptivos y se comparan los grados de importancia con la prueba T de diferencias de medias para muestras independientes. Los resultados muestran que, además de demandar una combinación de competencias específicas de comunicación, marketing y tecnología, se confiere mucha importancia a las competencias genéricas, especialmente a la capacidad de actualización. Los profesionales otorgan más peso a todas ellas hasta el punto de mostrar diferencias significativas con la empresa en la mayoría de las competencias; pero se observan similitudes en el ranking de relevancia, cuestión que conduce a realizar una propuesta válida de cara a crear una oferta formativa que responda a las necesidades en la sociedad digital.
... SEO adalah teknik yang digunakan untuk membuat situs web mudah ditemukan oleh mesin pencari seperti Google [17]. Dengan menggunakan SEO, situs web lebih mudah ditemukan oleh orang yang mencari informasi di internet, sehingga meningkatkan jumlah pengunjung web [26]. ...
Article
Full-text available
UMKM yang sudah mulai terjun ke digital marketing dan perlu meningkatkan branding dilakukan dengan dikenalnya oleh mesin pencari supaya bisa selalu muncul teratas di halaman pertama, diperlukan satu teknik optimasi website yang tepat untuk meningkatkan trafik. Optimasi website sebagai salah satu strategi khusus yang diterapkan untuk meningkatkan user experience, aksesibilitas, performa, traffic, dan tingkat konversi website. Strategi efektif untuk meningkatkan kecepatan pencarian berdasarkan tingkat dan ketertarikan pengunjung web berrdasarkan jumlah kunjungan yang dilakukan. Peningkatan trafik pengunjung dilakukan dalam tiga tahap yaitu pertama konten yang berkualitas,kedua penggunaan Search Engine Optimization supaya situs web mudah ditemukan oleh mesin pencari google dan ketiga dengan promosi secara aktif melalui mediasosial, forum online, dan iklan online. Pengukuran keberhasilan metode dilakukan dengan menganalisa dan melakukan visualisasi pada Access log, sebuah catatan yang menyimpan informasi tentang akses yang dilakukan oleh pengunjung ke sebuah situs web atau server. Access log berguna untuk memantau aktivitas di situs web atau server, menganalisis lalu lintas web dan bisa mengidentifikasi masalah yang mungkin terjadi. Rentang waktu pengukuran adalah 30 hari, dan didapatkan hasil peningkatan pengunjung web. Dari hasil pengujian menunjukkan bahwa dengan optimasi website bisa membantu meningkatkan ketertarikan orang untuk mengunjungi web yang akan berdampak pada lebih dikenalnya UMKM dalam mempromosikan produknya.
... Eine ausführliche Darstellung der technischen Funktionsweise der Software findet sich inLewandowski et al. 2021. ...
Chapter
Die Studie untersucht in einer Vollerhebung aller 6211 Kandidierenden zur Bundestagswahl 2021, inwieweit diese auf ihren persönlichen Websites im Wahlkampf Suchmaschinenoptimierung (SEO) angewandt haben. Dies ließ sich auf 93 Prozent der 1372 untersuchten Websites feststellen. Auf individueller Ebene der Kandidierenden zeigt sich eine Bedeutung der Professionalisierung für die Nutzung von SEO: Wer in den Bundestag eingezogen ist, nutzt eher SEO. Auf organisationsbezogener Ebene zeigt sich, dass eine Zugehörigkeit der Partei zum Parlament die SEO-Nutzung der Kandidierenden begünstigt.
Article
Full-text available
With advances in artificial intelligence and semantic technology, search engines are integrating semantics to address complex search queries to improve the results. This requires identification of well-known concepts or entities and their relationship from web page contents. But the increase in complex unstructured data on web pages has made the task of concept identification overly complex. Existing research focuses on entity recognition from the perspective of linguistic structures such as complete sentences and paragraphs, whereas a huge part of the data on web pages exists as unstructured text fragments enclosed in HTML tags. Ontologies provide schemas to structure the data on the web. However, including them in the web pages requires additional resources and expertise from organizations or webmasters and thus becoming a major hindrance in their large-scale adoption. We propose an approach for autonomous identification of entities from short text present in web pages to populate semantic models based on a specific ontology model. The proposed approach has been applied to a public dataset containing academic web pages. We employ a long short-term memory (LSTM) deep learning network and the random forest machine learning algorithm to predict entities. The proposed methodology gives an overall accuracy of 0.94 on the test dataset, indicating a potential for automated prediction even in the case of a limited number of training samples for various entities, thus, significantly reducing the required manual workload in practical applications.
Article
Full-text available
In early 2017, a journalist and search engine expert wrote about “Google’s biggest ever search quality crisis.” Months later, Google hired him as the first Google “Search Liaison” (GSL). By October 2021, when someone posted to Twitter a screenshot of misleading Google Search results for “had a seizure now what,” users tagged the Twitter account of the GSL in reply. The GSL frequently publicly interacts with people who complain about Google Search on Twitter. This article asks: what functions does the GSL serve for Google? We code and analyze 6 months of GSL responses to complaints on Twitter. We find that the three functions of the GSL are: (1) to naturalize the logic undergirding Google Search by defending how it works, (2) perform repair in responses to complaints, and (3) boundary drawing to control critique. This advances our understanding of how dominant technology companies respond to critiques and resist counter-imaginaries.
Chapter
Virtual Reality (VR) technology is mostly used in gaming, videos, engineering applications, and training simulators. One thing which is shared among all of them is the necessity to display text. Text reading experience is not always in focus for VR systems because of limited hardware capabilities, lack of standardization, user interface (UI) design flaws, and physical design of Head-Mounted Displays (HMDs). With this paper, key variables from the UI design side were researched that can improve text reading user experience in VR. Therefore four important points for reading in VR application were selected to be focused on: 1) Difference in canvas type (flat/curved), 2) Contrast on virtual scene (light/dark), 3) Number of columns in layout (1 column/2 column/3 column) 4) Text distance from the subject (1.5 m/6.5 m). For a user study a VR app for Oculus Quest was developed, enabling the possibility to display text while varying some of the features important for readability in VR. This user experiment has shown parameters that are important for text reading experience in VR. Specifically, subjects performed very well when the text was on a 6.5-meter distance from the subject with font size 22pt, on a flat canvas with one column layout. When it comes to physiological variables, the conditions measurements were behaving similarly, as all of the selected parameters were in line with the design guidelines. Therefore, selection on final settings should be more oriented towards user experience and preferences.
Article
This research focuses on what users know about search engine optimization (SEO) and how well they can identify results that have potentially been influenced by SEO. We conducted an online survey with a sample representative of the German online population (N = 2,012). We found that 43% of users assume a better ranking can be achieved without paying money to Google. This is in stark contrast to the possibility of influence through paid advertisements, which 79% of internet users are aware of. However, only 29.2% know how ads differ from organic results. The term ‘search engine optimization’ is known to 8.9% of users but 14.5% can correctly name at least one SEO tactic. Success in labelling results that can be influenced through SEO varies by search engine result page (SERP) complexity and devices: participants achieved higher success rates on SERPs with simple structures than on the more complex SERPs. SEO results were identified better on the small screen than on the large screen. 59.2% assumed that SEO has a (very) strong impact on rankings. SEO is more often perceived as positive (75.2%) than negative (68.4%). The insights from this study have implications for search engine providers, regulators, and information literacy.
Article
Full-text available
İnternetteki gelişmeler, Web siteleri ve Web sayfaları sayısındaki hızlı artış arama motorlarını günlük hayatta faydalı bilgilerin, ürün ve hizmetlerin bulunmasına yardımcı olan çok popüler bir araç haline getirmiştir. E-ticaretin yükselişiyle birlikte dünya çapında çevrimiçi alışveriş yapan müşterilerin sayısı da artmaktadır. Bu noktada arama motorları, işletmeler ve müşterileri birbirine bağlayan önemli bir köprü rolünü üstlenmektedir. SEO, bir web sitesinin arama motoru sonuç sayfalarındaki sıralamasını iyileştirmeye yönelik çabaları ifade eder. SEO, web sitelerinin çevrimiçi aramada bulunmasına yardımcı olma sanatı ve bilimidir. SEO, marka bilinirliğini, Web sitesi trafiğini, satış ve gelirlerini arttırmak ve kullanıcıları aktif müşterilere dönüştürmek gibi işletmelere birçok fayda sağlamaktadır. Günümüzde kullanıcıların arama motoru sonuç sayfalarını incelerken çok nadiren birinci sayfadan ötesine baktıkları göz önüne alındığında, SEO'nun işletmeler için ne kadar kritik olduğu daha net anlaşılmaktadır. Bu çalışma SEO ile ilgili geniş bir kavramsal çerçeve oluşturmaya odaklanmaktadır. Ayrıca çalışma, güncel istatistiklerle birlikte SEO avantajları, zorlukları ve stratejilerini de inceleyerek işletme yöneticilerine ve SEO uzmanlarına söz konusu kararları verirken yardımcı olmayı amaçlamaktadır. The developments in the Internet, the rapid increase in the number of Web sites and Web pages have made search engines a very popular tool that helps to find useful information, products and services in daily life. With the rise of e-commerce, the number of online shoppers worldwide is also increasing. At this point, search engines play an important role as a bridge connecting businesses and customers. SEO refers to efforts to improve a website's ranking in search engine results pages. SEO is the art and science of helping websites get found in online search. SEO provides many benefits to businesses, such as increasing brand awareness, website traffic, sales and revenue, and converting users into active customers. Considering that today's users rarely look beyond the first page when reviewing search engine results pages, it becomes clearer how critical SEO is for businesses. This study focuses on creating a broad conceptual framework for SEO. In addition, the study aims to assist business managers and SEO experts in making these decisions by examining SEO advantages, challenges and strategies along with up-to-date statistics
Article
Full-text available
People have a high level of trust in search engines, especially Google, but only limited knowledge of them, as numerous studies have shown. This leads to the question: To what extent is this trust justified considering the lack of familiarity among users with how search engines work and the business models they are founded on? We assume that trust in Google, search engine preferences and knowledge of result types are interrelated. To examine this assumption, we conducted a representative online survey with n = 2012 German Internet users. We show that users with little search engine knowledge are more likely to trust and use Google than users with more knowledge. A contradiction revealed itself – users strongly trust Google, yet they are unable to adequately evaluate search results. For those users, this may be problematic since it can potentially affect knowledge acquisition. Consequently, there is a need to promote user information literacy to create a more solid foundation for user trust in search engines. The impact of our study lies in emphasising the need for creating appropriate training formats to promote information literacy.
Article
Full-text available
This paper presents a novel approach of using machine learning algorithms based on experts’ knowledge to classify web pages into three predefined classes according to the degree of content adjustment to the search engine optimization (SEO) recommendations. In this study, classifiers were built and trained to classify an unknown sample (web page) into one of the three predefined classes and to identify important factors that affect the degree of page adjustment. The data in the training set are manually labeled by domain experts. The experimental results show that machine learning can be used for predicting the degree of adjustment of web pages to the SEO recommendations—classifier accuracy ranges from 54.59% to 69.67%, which is higher than the baseline accuracy of classification of samples in the majority class (48.83%). Practical significance of the proposed approach is in providing the core for building software agents and expert systems to automatically detect web pages, or parts of web pages, that need improvement to comply with the SEO guidelines and, therefore, potentially gain higher rankings by search engines. Also, the results of this study contribute to the field of detecting optimal values of ranking factors that search engines use to rank web pages. Experiments in this paper suggest that important factors to be taken into consideration when preparing a web page are page title, meta description, H1 tag (heading), and body text—which is aligned with the findings of previous research. Another result of this research is a new data set of manually labeled web pages that can be used in further research.
Article
Full-text available
According to recent studies, search engine users have little knowledge of Google's business model. In addition, users cannot sufficiently distinguish organic results from advertisements, resulting in result selections under false assumptions. Following on from that, this study examines how users' understanding of search‐based advertising influences their viewing and selection behavior on desktop computer and smartphone. To investigate this, we used a mixed methods approach (n = 100) consisting of a pre‐study interview, an eye‐tracking experiment, and a post‐study questionnaire. We show that participants with a low level of knowledge on search advertising are more likely to click on ads than subjects with a high level of knowledge. Moreover, subjects with little knowledge show less willingness to scroll down to organic results. Regarding the device, there are significant differences in viewing behavior. These can be attributed to the influence of the direct visibility of search results on both devices tested: Ads that were ranked on top received significantly more visual attention on the small screen than the top ranked ads on the large screen. The results call for a clearer labeling of advertisements and for the promotion of users' information literacy. Future studies should investigate the motivations of searchers when clicking on ads.
Article
Full-text available
This paper analyzes peer-reviewed empirical eye-tracking studies of behavior in web search engines. A framework is created to examine the effectiveness of eye-tracking by drawing on the results of, and discussions concerning previous experiments. Based on a review of 56 papers on eye-tracking for search engines from 2004 to 2019, a 12-element matrix for coding procedure is proposed. Content analysis shows that this matrix contains 12 common parts: search engine; apparatus; participants; interface; results; measures; scenario; tasks; language; presentation, research questions; and findings. The literature review covers results, the contexts of web searches, a description of participants in eye-tracking studies, and the types of studies performed on the search engines. The paper examines the state of current research on the topic and points out gaps in the existing literature. The review indicates that behavior on search engines has changed over the years. Search engines’ interfaces have been improved by adding many new functions and users have moved from desktop searches to mobile searches. The findings of this review provide avenues for further studies as well as for the design of search engines.
Article
Full-text available
In the Big Data era, search engine optimization deals with the encapsulation of datasets that are related to website performance in terms of architecture, content curation, and user behavior, with the purpose to convert them into actionable insights and improve visibility and findability on the Web. In this respect, big data analytics expands the opportunities for developing new methodological frameworks that are composed of valid, reliable, and consistent analytics that are practically useful to develop well-informed strategies for organic traffic optimization. In this paper, a novel methodology is implemented in order to increase organic search engine visits based on the impact of multiple SEO factors. In order to achieve this purpose, the authors examined 171 cultural heritage websites and their retrieved data analytics about their performance and user experience inside them. Massive amounts of Web-based collections are included and presented by cultural heritage organizations through their websites. Subsequently, users interact with these collections, producing behavioral analytics in a variety of different data types that come from multiple devices, with high velocity, in large volumes. Nevertheless, prior research efforts indicate that these massive cultural collections are difficult to browse while expressing low visibility and findability in the semantic Web era. Against this backdrop, this paper proposes the computational development of a search engine optimization (SEO) strategy that utilizes the generated big cultural data analytics and improves the visibility of cultural heritage websites. One step further, the statistical results of the study are integrated into a predictive model that is composed of two stages. First, a fuzzy cognitive mapping process is generated as an aggregated macro-level descriptive model. Secondly, a micro-level data-driven agent-based model follows up. The purpose of the model is to predict the most effective combinations of factors that achieve enhanced visibility and organic traffic on cultural heritage organizations' websites. To this end, the study contributes to the knowledge expansion of researchers and practitioners in the big cultural analytics sector with the purpose to implement potential strategies for greater visibility and findability of cultural collections on the Web.
Article
Full-text available
The constant struggle to attract new readers has led the digital news media to adopt search engine positioning strategies within their newsrooms. Given that readers are increasingly opting to consume their news via search engines, such as Google or Bing, this study explores perceptions and applications of search engine optimization (SEO) in the online news media and identifies the future training needs of journalists in this sector. To do so, 33 semi-structured interviews were conducted with individuals representative of three professional profiles: professional journalists, SEO consultants, and academics. Based on the data collected, we created five semantic categories – with 25 subcategories – and we correlated the perceptions of the SEO experts employed by the news media with those of the academics. The results highlight varying degrees of convergence and divergence in perceptions across these three professional profiles. Similarly, the results confirm the sector’s pressing need to attract readers by implementing search engine positioning techniques and, hence, its need to ensure future journalists are well trained in technical SEO, on-page SEO, off-page SEO, in the use of SEO analytics and audit tools, and in the ability to identify search trends so that they have the necessary skills to win the struggle for more readers.
Article
Purpose In commercial web search engine results rankings, four stakeholder groups are involved: search engine providers, users, content providers and search engine optimizers. Search engine optimization (SEO) is a multi-billion-dollar industry and responsible for making content visible through search engines. Despite this importance, little is known about its role in the interaction of the stakeholder groups. Design/methodology/approach We conducted expert interviews with 15 German search engine optimizers and content providers, the latter represented by content managers and online journalists. The interviewees were asked about their perspectives on SEO and how they assess the views of users about SEO. Findings SEO was considered necessary for content providers to ensure visibility, which is why dependencies between both stakeholder groups have evolved. Despite its importance, SEO was seen as largely unknown to users. Therefore, it is assumed that users cannot realistically assess the impact SEO has and that user opinions about SEO depend heavily on their knowledge of the topic. Originality/value This study investigated search engine optimization from the perspective of those involved in the optimization business: content providers, online journalists and search engine optimization professionals. The study therefore contributes to a more nuanced view on and a deeper understanding of the SEO domain.
Article
Eye-tracking research is beneficial for better understanding user behaviour in search engines. The present paper presents a comprehensive narrative literature review of eye-tracking studies examining factors influencing users’ viewing behaviour on results pages of search engines. Discipline-specific databases from Psychology, Computer Science, and Library and Information Science, as well as one multidisciplinary database have been searched for relevant articles. Criteria for inclusion were that a paper reported empirical results from an eye-tracking study in which effects of a specific factor on users’ viewing behaviour on search engine results pages (SERPs) were examined, with inferential statistical results being reported. This led to a set of 41 papers that were further examined. The papers were grouped into three categories according to three types of factors that may affect individuals’ web search activities: contextual factors, resource factors, and individual factors. Papers were assigned to these categories and subsequently to sub-categories. Overall, while for some sub-categories robust findings can be reported, we found results in many sub-categories to be inconclusive. For future research, we recommend a shift from small-scale studies examining single factors to more comprehensive and theory-driven research using larger sample sizes.
Article
Search engine optimization (SEO), or the set of practices involved in attaining a high ranking in search engine results, is a web writing skill that requires more attention in business communication pedagogy, because SEO helps businesses attract customers. This article presents the results of interviews with seven SEO experts on SEO best practices and describes how to integrate SEO into business communication courses.