ArticlePDF Available

Understanding Search Engine Optimization

Authors:

Abstract and Figures

Because users rarely click on links beyond the first search results page, boosting search-engine ranking has become essential to business success. With a deeper knowledge of search-engine optimization best practices, organizations can avoid unethical practices and effectively monitor strategies approved by popular search engines.
Content may be subject to copyright.
COMPUTING PRACTICES
COMPUTER 0018-9162/15/$31.00 © 2015 IEEE PUBL ISHED B Y THE IEE E COMPU TER SOC IETY O CTOBER 2 015 43
Understanding
Search-Engine
Optimization
As more organizations make the Web cen-
tral to their mission and product lines,
search-engine rankings are becoming
essential to strategic marketing and sales.
Each search-engine results page (SERP) presented in
response to a user’s request contains a series of snip-
pets—clickable links that often include preview text
to establish the webpage’s relevance to the search. The
snippet’s SERP ranking, which is based on a complex
algorithm that considers more than 200 factors, can
determine whether a user visits that page. A snippet is
organic if its webpage warrants placement high in the
SERP listing solely because of intrinsic merit; it is inor-
ganic, or sponsored, if the organization has paid a fee to
gain that placement.
Obviously, a high ranking in
the rst SERP can greatly boost an
organization’s visibility. Accord-
ing to HubSpot, for example, 61
percent of global Internet users
research products online, and 44
percent of online shoppers begin
their experience with a search
engine. Of these online shoppers,
75 percent never scroll past the
rst SERP.1 The study also found
that snippet type is important:
70 percent of the links that users
click on are organic, with 60 percent going to the top
three links.
These statistics imply that securing a high spot in the
rst SERP not only increases sales but also helps build
the target user’s trust, which can lead to better brand
development. To achieve a high SERP ranking, webmas-
ters use an assortment of practices, collectively referred
to as search-engine optimization (SEO).
To better understand SEO’s relationship to business
success, we surveyed current SEO practices and exam-
ined possible future directions. To aid organizations
new to SEO, as well as those with practices in place, we
compiled a list of the best tools and resources that can
help organizations create and maintain SEO strategies
approved by popular search engines.
Venkat N. Gudivada and Dhana Rao, East Carolina University
Jordan Paris, CBS Interactive
Because users rarely click on links beyond the
first search results page, boosting search-engine
ranking has become essential to business success.
With a deeper knowledge of search-engine
optimization best practices, organizations can
avoid unethical practices and effectively monitor
strategies approved by popular search engines.
COMPUTING PRACTICES
44 COMPUTER WWW.COM PUTE R.ORG/CO MPUTE R
SEO AND SEARCH ENGINES
Searc h engines are clearly fou ndational
to SEO, but many organizations lack
knowledge about how they work. Web-
sites host a range of HTML documents,
each with a unique Uniform Resource
Locator (URL). A search engine enables
Web searching by creating an index,
a process transparent to the user, and
responding to queries, a process that
requires the user’s active participation.
Figure 1 shows a conceptual sche-
matic of webpage (or document) index-
ing. The circled numbers correspond to
the following steps:
1. A webcrawler, also known as a
robot or bot, scours the Web to
retrieve HTML pages.
2. The webcrawler stores these
pages in their original form in
its search engine’s document
database.
3. The pages go through trans-
formations, such as HTML tag
and stop-word removal and
stemming. The transformation,
conducted by the search engine,
extracts signicant textua l con-
tent and information about links
for indexing.
4. The sea rch engine creates in dexes
by generat ing direct and surro-
gate page representations, such as
single words or phrases and their
position al information on t he
page. It also notes in formation
about incoming and outgoing
lin ks and generates a snippet.
5. The sea rch engine stores indexes
in its index database.
Figure 2 shows a conceptual sche-
matic of the querying and document-
retrieval process. The circled numbers
correspond to the following steps:
1. The user employs a search
engine’s browser to enter a
search query—typically a single
keyword or short phrase. As in
step 3 of the indexing process,
the search engine tra nsforms
the user’s query into a canonical
representation.
2. The search engine’s query-
ranking algorithm generates a
ranked list of URLs for docu-
ments it deems relevant on the
basis of t he index database and
contextual information in the
user query. The search engine
then shows the snippets corre-
sponding to the ranked URLs to
the user in SERPs.
3. The user browses the snippets
and clicks on cer tain ones to
retrieve the corresponding
ful l documents in their orig-
inal form from the document
database.
4. The search engine’s retrieval-
evaluation component helps the
user further rene the search on
the basis of feedback about the
document’s relevance: the user
explicitly indicates relevance
(direct feedback) or clicks on rel-
evant links (indirect feedback).
5. Using the relevance feedback,
the search engine might refor-
mulate the user query and re-
execute it. This process repeats
until the user is satised with
the search results or ends the
query session.
6. The search engine stores meta
information such as user que-
ries, relevance feedback, and
clicked snippets in the log data-
base, which it uses to improve its
search performance.
SPAMDEXING
Not all companies follow approved
SEO methods, and unethical practices,
such as generating spam that subverts
ranking veracity, have grown increas-
ingly troublesome i n the past ve years.
Indeed, these practices, combined with
an escalating reliance on the Web as a
marketing tool, have spawned compa-
nies and consultants whose specialty is
to help organizations ensure that their
websites reach the top of the rst SERP.
Although most of these companies
and consultants encourage approved,
or white-hat, SEO practices, a fair num-
ber resort to deceptive and misleading,
or black-hat, practices like spamdexing
to get a high ranking. (See the “Two
Cases, One Key Phrase” sidebar.) As a
2012 ar ticle on the conse quences of Web
spam noted, such search-engine spam
is a serious problem, costing businesses
with lowered rankings an estimated
US$130 billion annually.2
How it works
Spamdexing refers to an array of decep-
tive practices to secure top placement
Document database
Content
transformation
2 3 4
51
Index
creation
Web Index database
Document
acquisition
FIGURE 1. Indexing. Indexing involves
fetching HTML documents, storing them
in their original form, transforming the
documents by processes such as stop-word
removal and stemming, and generating
indexes and storing them in a database.
OCTOBE R 2015 45
in the rst SERP by building webpages
that trick search-engine algorithms
and thus articially boost the page’s
ranking. With spamdexing, even a
page irrelevant to the search word or
phrase can achieve a high ranking.
Such practices aect both search-
engine eciency and results integrity,
and have become suciently problem-
atic that search-engine compan ies have
included eorts to counter spamdexing
in their strategic initiatives.
In SEO’s early days, meta keywords
were the basis for indexing a webpage.
A webmaster could include several
keywords with the meta keywords tag
even though the page content might
have little or no relevance to those
keywords. To avoid the tag’s abuse,
all the major search engines stopped
using it by 2009. However, webmas-
ters still resort to keyword stung
the unnatural repeated use of a key-
word or phrase simply to increase its
frequency on the page.
Prevalent schemes
Current spamdexing schemes encom-
pass automatically generated page con-
tent, deceitful redirects, cloaking, link
spam, hidden text and links, doorway
pages, spam from aliate programs,
embedded malicious behavior, and
user-generated spam.3,4 The sidebar,
“Google’s Fight against Spamdexing,”
gives an idea of spamdexing’s perva-
siveness and the serious consequences
of using this black-hat practice.
Automatic page generation. Scripts
are used to generate webpages using
algorithms that intersperse random
text with desired key words. The page
might also include machine-translated
text that has not undergone human
review and new text that obfuscates
existing text through the use of syn-
onyms. Some webmasters even use
content taken verbatim from more rep-
utable sites with no regard to relevance
or copyright infringement. Webpages
are also automatically generated from
snippets of either search result s or web-
pages that contain desired keywords.
Such pages of ten contain ju st the gener-
ated snippets without any real content.
Redirecting. Some webpages send
the user to a URL other than the one
requested. URL redirecting has valid
uses such as facilitating a website’s
move to a new address. However, some
redirects are designed to show the user
and a webcrawler dierent webpages
for the same URL. Using this scheme,
it is easy to index a page for children’s
stories but take the user to a page with
pornographic images, for example.
Cloaking. Bots and users see the same
page in dierent ways: bots see content
that has the desired keywords and fol-
lows approved guidelines, while users
see content that is of ten malicious or
undesirable. Some hackers use cloak-
ing to keep the webmaster from detect-
ing their work, for example.
Link spam. Any links in a webpage
that exist solely to increase page rank
are considered link spam, regardless
of whether the links are outgoing or
incoming. Link-spam schemes include
buying and selling links, excessively
exchanging links as in mutually linking
partner pages, acquiring keyword-rich
anchor text links through article mar-
keting, and inserting comment links in
blog postings.
Hidden text and links. Excessive key-
words are hidden from a user but visible
to search engines. Methods to hide text
include placing text behind images, set-
ting the text font size to zero, and using
cascading style sheets to position text
oscreen.
Document database
Query ranking
Retrieval
evaluation
2
54
1
3
6
Index database
Log database
FIGURE . Querying and document retrieval. From a user query, an algorithm generates
a ranked list of relevant documents, from which the user browses retrieved documents by
clicking on the corresponding links. Queries can be refined and re-executed on the basis of
user feedback, and the search engine stores meta information about the current search to
improve its future performance
COMPUTING PRACTICES
46 COMPUTER WWW.CO MPUTE R.ORG/CO MPUTE R
Doorway pages. Doorway pages feature
poor-quality content but are optimized
to rank high for specic keywords. Their
sole purpose is to f unnel users to a single
page, usually one they did not select.
Spam from aliate programs. In
aliate marketing, a company rewards
aliates for applying various market-
ing schemes to bring customers to the
company’s website. Schemes include
automatically generating page content
and using specic key words to skew
SEO. Such sites are penalized with lower
search rankings.
Embedded malicious behavior. Here,
the goa l is to add a mal icious act ion when
the use r clicks on a li nk, such a s to insta ll
advertisements, viruses, malware, Tro-
jans, a nd spyw are on the us er’s computer.
The user might also click on a certain
button and cause another, unwanted,
lin k on the same page to ac tivate.
User-generated spam. Not every web-
site visitor has good intentions, and
malicious users often generate com-
ment spam—comments that include
advertisements and links to unre-
lated pages. Some webmasters build
poor-quality links to their compet-
itors’ websites so that the latter get
search-engine penalties, such as a
lower ranking in the SERPs or removal
from the index. This is enabled by back-
link blasting—a software-driven link
scheme to generate thousands of back-
links (external links pointing to the
webpage). Both comment spam and
backlin k blast ing are preva lent enough
that Google prov ides a tool, Disavow, to
deal with them.
IMPLEMENTING WHITE-
HAT PRACTICES
To combat search-engine spam and
help webmasters develop websites
that adhere to white-hat practices,
search-engine companies provide
starter guides and webmaster tools
such as those listed in the sidebar “SEO
Tools and Resources.” Starter guides
contain conceptual details about how
search engines index documents and
process queries without revealing any
strategic or proprietary information.
Webmaster tools help assess the web-
pages’ conformance to guidelines and
best practices. These resources are
excellent starting points toward estab-
lishing or maintaining SEO practices
that can help businesses achieve long-
term strategic goals.
Organizations need to view SEO as
a by-product of good website design
that is part of an evolutionary organi-
zational process, not an afterthought.
SEO practices are constantly chang-
ing to t both demand for high rank-
ings and changing technolog y, so an
organization must be vigilant about its
SEO strategy’s t with both business
TWO CASES, ONE KEY PHRASE
Because competitive business advantage and
confidentiality issues surround search-engine
optimization (SEO) work, empirical data on SEO
practices is rarely published. However, Seospoiler.
com provided descriptions of two companies that
ranked number one for the same key phrase but
reached their ranking along different paths.
BLACK HAT
In late 2012, Greencowseo.com ranked number
one for the coveted key phrase “SEO company.”
However, it managed to have a disturbing num-
ber of incoming links from sites that were highly
trusted but unrelated to SEO. Google removed
Greencowseo.com from its index with the same
alacrity that the SEO company used to secure the
top spot.1
WHITE HAT
In March 2014, Seoexplode.us (formerly Explo-
deseo.com) ranked number one for the same “SEO
company” key phrase. However, although it used
“SEO company” along with its other meta key
phrases “SEO services” and “risk free SEO” as in the
page title (“SEO Company Providing Risk Free SEO
Services”), its use of those key phrases was entirely
ethical. Its meta description tag used “SEO com-
pany” and “SEO services” in a natural way, and of
the 624 words on its homepage, the key phrases
made up less than 3 percent each (2.56 percent
for “SEO services” and 1.92 percent for “SEO com-
pany”). The keyword “SEO” was at 3 percent and
hovered just at the high watermark. In 2014, the
homepage had over 500 incoming links, all either
directly or tangentially related to SEO.2
References
1. “SEO Company GreenCowSEO Exposed! Black Hat SEO Secret
Techniques—Case Study,SEO Case Studies—Exposing Top
SEO Companies & Secret Methods, 2012; http://seospoiler
.com/seo-company-greencowseo-exposed-black-hat-seo
-secret-techniques-case-study.
2. “Explodeseo, SEO Company Reviews Case Study, How Top
SEO Sites Rank,” SEO Case Studies—Exposing Top SEO
Companies & Secret Methods, 2012; http://seospoiler.com
/explodeseo-seo-company-reviews-case-study-how-top
-seo-sites-rank.
OCTOBE R 2015 47
requirements and current SEO best
practices. Above all, organizations must
avoid targeting one search engine and
keep in mind that SEO’s focus is on
the website user, not the search-engine
company.
Google Search is an example of how
search engines evolve guidance and
best practices. In 2011, Google Search
added Panda, a quality- content lter
whose primary goal was to ensure
that low-quality sites ranked low and
relevant sites appeared at the top of
the SERPs. Penguin, the 2012 update
to Panda, aimed to penalize sites that
employed link spam with lower rank-
ings or even eliminate them from
Penguin’s index database. Humming-
bird, released in 2013, aims to assign
importance to pages according to the
algorithm’s semantic understanding of
webpage content. Hummingbird in cor-
porates intent and the keyword’s con-
textual meaning in the user’s search
query. This algorithm has proven
more eective in retrieving pages than
an algorithm based on keyword fre-
quency in the page. Pigeon, the latest
update to Google Search’s algorithm,
incorporates user location and local
listings information in ranking search
results.
On-page optimization
Broadly viewed, white-hat SEO has
two major classes, the rst of which is
on-page opti mization, wh ich deals with
website structure and content. Consid-
erations range from word choice to the
provision of a mechanism for restricted
indexing.
Word choice. Creating precise, rele-
vant, useful, and compelling content
is a fundamental SEO requirement.
GOOGLE’S FIGHT AGAINST SPAMDEXING
Google has penalized several prominent com-
panies for employing black-hat SEO practices,
including BMW, Newsday, JCPenney, Forbes, and
Overstock.com.1 Ironically, in late 2012, Google
moved from litigator to litigant for similar issues
after the European Commission maintained
that Google was favoring its own products while
pushing its competitors further down in the SERP
rankings.2
BMW
In 2006, Google charged BMW with using doorway
pages to funnel trac to BMW’s German website
and briefly removed the site from Google Search
results. BMW admitted to these charges and stated
that it was not aware that doorway pages were
search-engine spam. To reinstate its website in
Google Search results, BMW had to remove the
JavaScript that implemented doorway pages.
NEWS DAY
In 2007, Google charged Newsday with having
outgoing links to unrelated websites, which is para-
mount to selling links to unrelated sites. However, it
is considered an acceptable SEO practice if the links
to unrelated websites include a nofollow attribute.
The latter is a hint to search engines not to factor in
the reputation of Newsday.com in calculating ranks
for the pages targeted by the outgoing links.
JCPENNEY
During the 2010 holiday season, JCPenney used
backlink blasting to boost its SERP position. A
subsequent New York Times investigation revealed
that thousands of unrelated websites were linking
to JCPenny.com. Google’s corrective action resulted
in a dramatic slide of JCPenney.com’s SERP ranking.
FORBES
In 2011, Google charged Forbes with selling
outgoing links on the Forbes website and penalized
it with a lower search ranking. To lift the penalty,
Forbes had to remove the offending links.
OVERSTOCK.COM
Google places enormous trust in links from educa-
tion domain (.edu) webpages. In 2011, Overstock
.com offered a 10 percent discount on their prod-
ucts to universit y students and faculty in exchange
for the inclusion of its links on .edu sites. Because
of the domain’s high trust factor, Overstock.com’s
search ranking went up substantially. Google
penalized the company by lowering its ranking in
search results, which contributed to Overstock
.com’s 2011 fiscal year loss of US$19.4 million.
The company had to remove these links to rein-
state its search results ranking.
References
1. A. Krush, “SEO Epic Fails: 5 Big Names Penalized by Google
for Going Wrong with Their SEO,” 2011; www.link-assistant
.com/blog/seo-epic-fails5-big-names-penalized-by-google
-for-going-wrong-with-their-seo.
2. S. Shankland, “Google Faces European Charge It Abused
Search Dominance, CNET, 15 April 20 15; www.cnet.com/news
/google-faces-european-charge-it-abused-search-dominance.
COMPUTING PRACTICES
48 COMPUTER WWW.CO MPUTE R.ORG/CO MPUTE R
Content should naturally use keywords
without concern for their frequency
and cont ain natu ral and authen tic word
phrases, both short and long, that cap-
ture the page’s topic. Content creators
should bear in mind that users have
dierent vocabularies and accommo-
date those dierences through a mix
of theme words and phrases. Content
should look authoritative with rele-
vant theme words and phrases spread
throughout the page.
Anchor text. Relatively short but
meaningful URLs are preferred for
webpages, and natural keywords
form the basis for URL text. Similar
principles apply to anchor text. Ide-
ally, anchor text should capture the
topic of the page that the anchor link
points to.
Semantic indexing. Search engines
are starting to use more sophisti-
cated indexing algorithms such
as latent semantic indexing (LSI),
which calculates a page’s relevance
not only on the basis of keywords
but also on the page’s overall topic.
Consequently, strategies that focus
exclusively on a select keyword can
no longer guarantee the webpage’s
high ranking.
Non-HTML content. To ensure effec-
tive indexing, most content should be
in HTML format, but content other
than text requires supplemental
information to enable indexing. Con-
tent creators should use the alt attri-
bute to concisely describe image con-
tent and include a transcript for audio
and video content.
Title and meta description tags. The
title tag should reect the page’s topic,
each page should have a distinct and
appropriate title tag, and page content
should be displayed in the snippet’s
rst line. HTML5 semantic elements
and heading tags (<h1> though <h6>)
should reect the page content’s hier-
archical organization. Although search
engines do not use t he meta description
tag’s keywords and phrases in calcu-
lating the page’s ranking, they can use
them to generate snippets. Thus, it is
wise to have a precise and concise meta
description tag for each page.
Navigational search. Users conduct
a navigational search to nd a partic-
ular webpage. Including text in the
webpage title, body, and rst four head-
ing levels as well as the anchor text of
SEO TOOLS AND RESOURCES
Organizations must continually monitor
search-engine optimization (SEO) practices,
which requires more than a surface knowledge of
available methods and their benefits, as well as
an awareness of the pitfalls that can result from
unethical strategies.
SEO BACKGROUND READING
»Tips and information about Web searching,
the search-engine industry, and SEO:
http://searchenginewatch.com
»Ranking of Internet marketing agencies
and tools: www.topseos.com
»Directories for listing websites:
www.directorycritic.com
»Article directories: http://ezinearticles.com,
http://goarticles.com, www.articledashboard
.com, www.galoor.com
»Link exchange market: www.linkmarket.net
»R. Fishkin et al., The Beginners Guide to SEO
(comprehensive information for professional
SEO), 8 Jan. 2015; http://moz.com
/beginners-guide-to-seo
»A. Williams, SEO Checklist: A Step-by-Step Plan
for Fixing SEO Problems with Your Website,
v1.5, 6 Aug. 2014: ezSE0News.com
»Moz, “Search Engine Ranking Factors”
(characteristics of web pages that tend
to rank higher): http://moz.com/
search-ranking-factors
GENERAL TOOLS AND GUIDELINES
»World Wide Web Consortium (W3C) markup
validation service: http://validator.w3.org
»W3C Cascading Style Sheets (CSS) validation
service: http://jigsaw.w3.org/css-validator
»SEO analyzer (to determine whether or not a
webpage complies with 15 SEO best prac-
tices): www.bing.com/webmaster/help
/seo-analyzer-97615e21
»Bing webmaster guidelines (suggested prac-
tices for page structuring and content develop-
ment to enable effective indexing by the Bing
search engine): www.bing.com/webmaster
/help/webmaster-guidelines-30fba23a
»Bing webmaster tool suite for SEO:
www.bing.com/toolbox/webmaster
»Bing link explorer (for exploring backlinks to
any site and gaining insight into which sites
link to sites like yours): www.bing.com
/webmaster/help/how-to-uselink-explorer
-dddffa0a
»Yahoo! webmaster resources: https://help
.yahoo.com/kb/SLN2248.html
OCTOBE R 2015 49
inbound links and their number can
help achieve higher ranking in a navi-
gational search.
Privacy policy. All websites should
include a page that describes the site’s
privacy policy, such as what personal
information the site collects and its use
and distribution. The privacy policy can
help further convey the image of a pro-
fessionally managed website, and some
searc h engines con sider its inclu sion as a
measure of a website’s trustworthiness.
Custom 404 page. Webservers return
a 404 page when the search engine
cannot nd the requested webpage.
Customizing the 404 page helps keep
users on the site and can even enhance
their search experience. Customization
might involve adding a pointer to the
site’s homepage or providing links to
other site content related to the search.
Restricted page indexing. The web-
master might not want to index certain
website pages—for example, when a
page has too many links or privacy is a
concern. Also, certain content might be
intended for registered users. In such
cases, the webmaster can use the robots
.txt le to inform webcrawlers which
pages are accessible for indexing, but
not all search engine indexers are guar-
anteed t o respect the r obots.txt proto col.
O-page optimization
The second white-hat SEO class is o-
page optimization, which addresses
best practices in incorporating both
inbound and outbound external
links. A carefully designed directory
structure for website content not only
helps with site maintenance, but also
enables bots to traverse a website and
index its content. Website naviga-
tion structures include breadcrumbs
and sitemaps—both of which users
and webcrawlers should nd natu-
ral and intuitive to traverse. Web-
crawlers might have diculty nav-
igating through drop-down menus
created using JavaScript, for example.
To enable comprehensive indexing, all
links should be in text, not images.
Navigation structures also include
robots meta tags, which control index-
ing at the page level; text links that
webcrawlers follow to retrieve the
corresponding documents; and back-
links, which determine the page’s
reputation and contribute to its score
assignment.
»Google webmaster guidelines (best
practices to help Googlebot find, crawl,
and index websites): https://support
.google.com/webmasters/answer
/35769?hl=en
»J. DeMers, “How to Use Google Webmaster
Tools to Maximize Your SEO Campaign,”
http://searchenginewatch.com/sew
/how-to/2273660/how-to-use
-google-webmaster-tools-to-maximize
-your-seo-campaign#
»Google analytics for identifying SEO
opportunities in websites: www.google
.com/analytics
»Generating a sitemap automatically:
www.xml-sitemaps.com
»Generating a robots.txt file: www.internet
marketingninjas.com/seo-tools/robots
-txt-generator
»Creating .htaccess and .htpasswd files for
protecting special directories on webservers:
www.yellowpipe.com/yis/tools/htaccess
_generator/index.php
»Analyzing websites for organic SEO: ww w
.semrush.com, www.webseoanalytics.com
»Webpage load speed test: http://tools
.pingdom.com/fpt
»Checking domain history: www.domaintools
.com
»Internal link analysis: http://tools.seochat
.com/tools/page-link-analyzer-seo
»Building backlinks through press releases:
www.free-press-release.com
»Open Site Explorer: The Search Engine for
Links (performing competitive link research,
exploring backlinks, and evaluating anchor
text): www.opensiteexplorer.org
»Generating a website privacy policy:
www.easyriver.com/myprivacy.htm,
www.freeprivacypolicy.com
CONTENT AND COMPLIANCE
EVALUATION TOOLS
»Tools for checking website content for dupli-
cation and plagiarism: http://copyscape.com,
www.dustball.com/cs/plagiarism.checker,
http://plagiarismdetect.com, www.plagtracker
.com, www.comparemyfiles.com
»Analyzing keyword density:
www.internetmarketingninjas.com/seo-tools
/keyword-density
»Fetch as Bingbot (see page exactly the
way it appears to Bingbot): www.bing.com
/webmaster/help/etch-as-bingbot-fe18fa0d
COMPUTING PRACTICES
50 COMPUTER WW W.COMPUTE R.ORG/CO MPUTE R
Breadcrumbs. Breadcrumbs show the
user’s navigation trail through the
website and are intended primarily
to improve site usability. If the bread-
crumb information is available as
HTML markup in the body of a web-
page, some search engines include it
in SERPs. To avoid penalties, usually
reduced ranking, webmasters should
ensure that navigation does not create
distinct URLs for the same content.
Sitemaps. Sitemaps are an import-
ant element of well-designed websites
because they depict the website’s struc-
ture, which facilitates site navigation.
A sitemap should be in both XML for-
mat for the search engine and plain-
text format for the user.
The XML sitemap version should
feature information about every web-
site page, including the page’s URL, last
modied date, page-update frequency,
and the URL’s priority value relative to
that of the site’s other webpage URLs.
Most important URLs will have prior-
ity 1 (highest), with lower values indi-
cating decreased importance. Search
engines use these values to determine
the webpages’ indexing order. Because
a search engine might index only some
pages, the values are also a way for the
webmaster to promote t he most import-
ant page s. Webmaste rs typica lly submit
the XML sitemap version to the search
engi ne because it is more l ikely to resu lt
in a complete site indexing.
The plain-text sitemap version is
useful if the site visitor cannot get to
the desired content using navigational
structures. To increase ease of use, the
plain-text version should have each
URL on a separate line. If the site fea-
tures many images, the webmaster
should provide search engines with an
image sitemap with a structure sim-
ilar to the XML version. Information
should include a caption and title, geo-
graphic location (city, country), and
a URL to the image’s copyright and
licensing terms.
Text links. If the site lacks a map, text
lin ks are crit ical in enabling webcrawl-
ers to navigate among pages. Crawlers
also index text links—indexing infor-
mation that search engines use—
although not all pages are indexed. For
example, the webmaster might have
failed to submit the required forms to
the search engines, the pages could
have links and other plug-ins buried in
JavaScript code that the search engine
cannot parse, or the page might con-
tain an excessive number of links or a
robots meta tag or robots.txt le that is
blocking other pages.
Backlinks. Backlinks help secure a
higher SERP ranking, but link qualit y
is critical. Link spam and other black-
hat methods, such as creating link
farms (a website group in which each
site li nks to all other sites in the group)
and inserting links in a blog’s com-
ments section, can articially boost
backlinks and thus the webpage’s
importance. When articial links are
unavoidable, webmasters can use the
rel=nofollow attribute in the HTML
anchor element to inform search
engines that they should discard the
link when computing the rank of the
page the link points to. If a page points
to external links, those links should
be pointing to trusted and authorita-
tive sites. Otherwise, the external links
risk incurring penalties for that page
and, in some cases, even the entire site’s
removal from the search engine’s index.
Robots meta tag. The robots meta tag
species whether or not a webcrawler
can index a page (index/noindex) or
whether or not it can traverse the links
on a page (follow/nofollow). The no ar-
chive directive prohibits search engines
from saving a cached copy of the page;
the nosnippet direc tive keeps t hem from
displ aying the pa ge snippet in t he SERPs.
The noodp directive instructs search
engines not to use descriptive text of
the page from the Open Directory Proj-
ect (DMOZ). With these bans, webmas-
ters can use a page-specic approach to
control how search results should index
indi vidual page s and show them to u sers.
Although webmasters employ the robots
meta tag link attribute rel=“nofollow”
to prevent link-injection spam, search
engines also use it as a way to discount
any li nk value th at they would ot herwise
include in computing page importance.
The X- robots-tag HTTP header can also
be used to s pecify t his index ing and tra-
versal permission.
MONITORING SEO
PERFORMANCE
SEO performance monitoring should
be an ongoing activity. Webmasters
need to respond to changing guidelines
and suggested SEO strategies from
search-engine companies to ensure
that t he practices they are fol lowing are
in compliance. Also, webmasters must
maintain constant vigilance to detect
negative or subversive SEO from com-
petitors and unsuspecting third par-
ties, monitoring closely for undermin-
ing activities such as the insertion of
undesirable backlinks.
Web analytics programs can help
in performance monitoring, providing
insights into how to ethically improve
SERP ranking. They can inform the
OCTOBE R 2015 51
webmaster about how v isitors reach the
website, their navigation patterns, and
how changes to the page title and meta
description tags aect search- engine
trac. The programs enable simulated
content and site organization exper-
iments in a sandbox environment so
that webmasters can build multiple
page versions and evaluate them in
what-if scenarios.
IMPLICATIONS OF
EMERGING TRENDS
As organizations and individuals
demand more ways to leverage Web
content for commercial, political, and
personal gain, eorts to promote ethi-
cal practices a s well as thwart unethical
attempts to raise search-engine rank-
ings a re likely to c hange the fac e of SEO.
Adversarial information retrieval
Adversarial information retrieval (AIR)
is a new subeld of information retrieval
whose focus is on subverting search
engines to eect higher ranks for pages
using dubious and often unethical tech-
niques. Growing AIR sophistication is
raising complex problems for search-
engine companies as they attempt to
develop eect ive AIR countermeasures.
Several AIRWeb workshops and
other forums held at the annual WWW
and ACM SIGIR con ferences att est to the
incre asing sign icance of AI R research.
Examples include employing user-
behavior features for separating Web
spam pages from ordinary pages5 and
a method to combat new types of spam
based on click-through data analysis.6
Social media content
Content from LinkedIn, ResearchGate,
Twitter, Google+, Pinterest, Instagram
and other social networking sites is
becoming more important for search
engines. Indeed, as more people turn
to social media to nd relevant infor-
mation, the distinction between Web
search and social search is blurring.
Facebook Search (http://search..com)
is an example of this company’s foray
into the social search space. Social
Searcher (www.social-searcher.com)
is a social media search engine that
looks for publicly pos ted information in
social networks.
Mobile device use
According to a 2014 Mobile Path to Pur-
chase survey by Telemetrics and xAd
(ww w.mobilepathtopurchase.com),
50 percent of respondents use mobile
devices to start the search process and
two out of three mobile shoppers ulti-
mately make a purchase. Despite these
statistics, some websites still do not
work properly with mobile devices.
Responsive Web design is emerging to
address this issue.
A concomitant eect of increased
use of mobile devices for search is the
growing use of voice commands rather
than text queries. Under this new para-
digm, users employ multiword phrases
and nearly complete sentences as que-
ries rather than key words.
Locale information will assume
greater prominence as a factor in rank-
ing search results. This will help search
engines show information that is not
only relevant but also has the potential
for the user to take immediate action.
Search engines will introduce the
practice of user registration to receive
higher -quality search results.
Information fusing
Some complex information needs are
extremely dicult to fulll using
search engines. Meeting such needs
involves f using information from mul-
tiple sources, which typically requires
human expertise. Community-based
question-answering systems are used
for this purpose and will gain more
visibility. For example, Stack Overow
(www.stackoverow.com) and Slash-
dot (www.slashdot.org) are question-
answering systems serving specialized
vertical search spaces. A new search
engine, Blekko (w ww.blekko.com),
employs crowdsourcing for content
selection and curation.
Related to community-based ques-
tion answering is collaborative search,
which involves a group of users with
the same information need. Examples
include students working together to
search for information for a term paper,
and team members combining eorts
to assess the environmental impacts of
a proposed river dam.
Web search engines will morph into
metasearch engines. Behind a user-
facing search engine, there will be sev-
eral search engines behind the scenes,
each targeting a dierent vertical
search space.
Shrinking interfaces
Small visual interfaces will continue to
pose challenges for specifying queries
and presenting search results. For exam-
ple, search results might be returned
as info cards— condensed information
that information-extraction algo rithms
generate from unstructured data—
instead of as links, which requires using
information- extraction techniques to
generate c ondensed fac tual i nformat ion.
Also, personal and contextual informa-
tion will play a greater role in tailoring
searc h results to the lim ited number that
a smal l interface can handle.
SMALL VISUAL INTERFACES WILL
CONTINUE TO POSE CHALLENGES
FOR SPECIFYING QUERIES AND
PRESENTING SEARCH RESULTS.
COMPUTING PRACTICES
52 COMPUTER WW W.COMPUTE R.ORG/CO MPUTE R
Long-term and sustainable high
ran ks for webpages w ill come f rom
adhering to SEO guidelines and
best practices coupled with an empha-
sis on high-quality authentic content
authored naturally. The resulting Web
domain trustworthiness and authority
will reap dividends in an environment
where search engines enforce zero tol-
erance of spam.
Webmasters should exercise due dil-
igence when deciding to include third-
par ty plug-ins and se rvices in t heir web-
pages, partic ularly those recommended
by aliates. Hundreds of websites are
penalized daily for employing black-
hat SEO practices. Webmasters should
remove any undesirable and damaging
backlinks that past SEO consultants
have inserted.
Rapidly maturing natural-lang uage
translation and machine-learning
technologies have fundamental impli-
cations for Web search. It is not far-
fetched f or a search-eng ine user to iss ue
a query in one language, and expect to
retrieve relevant documents authored
in multiple other languages that are
automatically translated into the lan-
guage in which the query is issued.
As these and other technologies
emerge and mature, SEO methods will
evolve accordingly, but SEO will con-
tinue to be an integral part of an orga-
nization’s long-term strategic busi-
ness plan.
REFERENCES
1. HubSpot, “All t he Marketi ng Statist ics
You Need,” 2014; www.hubs pot.com
/marketing-statistics#SEO.
2. N. Spiri n and J. Han, “Sur vey on Web
Spam Detection: Principles and Algo-
rith ms,” AC M SIGKDD Explorati on
Newsletter, vol. 13, no. 2, 2012,
pp. 50–64.
3. “Bing Webma ster Guidelines,”
www.bing.com/webmaster/help
/webmaster-guideli nes-30a23a.
4. Google, “Webmaster Guidelines,”
https://support.google.com
/webmasters/answer/35769?hl=en.
5. Y. Liu et al., “Identif ying Web Spam
with t he Wisdom of the Crowds,”
ACM Trans. Web, vol. 6, no. 1, 2012,
pp. 2:1–2:30.
6. C. Wei et al., “Fighti ng against Web
Spam: A Nove l Propagation Method
Based on C lick-through Data,” Proc.
35th Int’ l ACM Conf. Research and
Developm ent in Information R etrieval
(SIGIR 12), 2012, pp. 395–404.
ABOUT THE AUTHORS
VENKAT N. GUDIVADA is a professor and chair of the Department of Computer
Science in the College of Engineering and Technology at East Carolina Univer-
sity. His research interests include database management, information retrieval,
high-performance computing, and personalized electronic learning. While con-
ducting the research for this article, Gudivada was a professor of computer sci-
ence at Marshall University. He received a PhD in computer science from the Uni-
versit y of Louisiana at Lafayette. He is a member of the IEEE Computer Society.
Contact him at gudivadav15@ecu.edu.
DHANA RAO is an assistant professor in the Department of Biology in the
Thomas Harriot College of Arts and Sciences at East Carolina University. Her
research interests include microbial ecology and bioinformatics. While conduct-
ing the research for this article, Rao was an assistant professor of biological sci-
ences at Marshall University. Rao received a PhD from the University of New
South Wales. She is a member of the American Society of Microbiology. Contact
her at raodh15@ecu.edu.
JORDAN PARIS is an associate software engineer at CBS Interactive. His
research interests include information retrieval and high- performance computing.
While conducting the research for this article, Paris was an undergraduate in com-
puter science at Marshall Universit y, where he received a BS in computer science.
Contact him at jordan.paris@cbsinteractive.com.
Selec ted CS articles and
columns are also available for
free at http://ComputingNow
.computer.org.
... However, several search engine indexers are used to violate the robots.txt protocol [23]. ...
... About 50 percent of web users use mobile devices for search process, while two out of three of them are finally coming to a purchase [23]. In many countries, the number of smartphones has surpassed the number of personal computers [2]. ...
... Backlinks are the number of external pages that show the target page with a link. Backlinks help secure a higher SERP ranking [23]. Backlinks affect mostly the position of a website followed by relevant content [5]. ...
Article
Full-text available
Digital marketing, especially search engine optimization (SEO), is an integral part of websites today. Airlines in the COVID-19 era have to use every possible means to survive despite the adverse conditions for both entrepreneurship and travel. Many of them have allocated resources and money to develop SEO strategies by applying SEO techniques to their websites to gain more visitors and bookings. Thus, this research is focused on analyzing airlines’ website presence as regards the implemented SEO techniques and their effect on airlines’ website traffic. In the first phase of the research, we gathered web data from 243 airline firms during a one-year observation period (December 2020–December 2021) using our own-developed tool. Furthermore, we proceeded to create an exploratory model using fuzzy cognitive mapping. From the technical SEO point of view and the descriptive analysis, we conclude that the traffic on airlines’ websites and, consequently, their sustainability are inseparably linked to the corresponding SEO techniques and technologies used.
... seo es el acrónimo de Search Engine Optimization, una nomenclatura que dice mucho de la dependencia inherente de la red de redes de la infraestructura rastreadora dispuesta por servicios como Google o Bing. 10 La World Wide Web, un territorio en permanente transformación que se compone por varios miles de millones de sitios web, se apoya en la capacidad de estos motores de búsqueda para filtrar y organizar sus contenidos en función de los deseos particulares de cada uno de sus miles de millones de usuarios. 11 En consecuencia, el potencial comunicativo de cualquier sitio web depende estructuralmente de su identificación, indexación y posicionamiento por parte de estos buscadores. ...
Article
p class="p2">Este artículo recopila y analiza 500 sitios web dedicados a la arquitectura. Su finalidad es revelar la constitución y las características generales del espacio donde se distribuye el conocimiento arquitectónico en la World Wide Web. Los datos arrojados por este análisis dibujan un heterogéneo espacio informacional que se encuentra dominado por un limitado número de plataformas que se actualizan a diario, ofrecen algún tipo de participación y acumulan una cantidad expansiva de contenidos. </p
... It is a common practice for internet users to use search engines as a starting point in their quest to fulfill information requirements. Information and content providers employ additional means to influence the results shown by search engines such as online advertisements and optimization of website content through a wellknown process search engine optimization (SEO) (Gudivada, Rao & Paris, 2015). SEO has a significant impact on the ranking of content and websites in search engine results, by improving awareness among users for specific content, and even greatly enhancing the market share for the products (Bhandari & Bansal, 2018;Lewandowski, Sunkler & Yagci, 2021;Akram et al., 2010). ...
Article
Full-text available
With advances in artificial intelligence and semantic technology, search engines are integrating semantics to address complex search queries to improve the results. This requires identification of well-known concepts or entities and their relationship from web page contents. But the increase in complex unstructured data on web pages has made the task of concept identification overly complex. Existing research focuses on entity recognition from the perspective of linguistic structures such as complete sentences and paragraphs, whereas a huge part of the data on web pages exists as unstructured text fragments enclosed in HTML tags. Ontologies provide schemas to structure the data on the web. However, including them in the web pages requires additional resources and expertise from organizations or webmasters and thus becoming a major hindrance in their large-scale adoption. We propose an approach for autonomous identification of entities from short text present in web pages to populate semantic models based on a specific ontology model. The proposed approach has been applied to a public dataset containing academic web pages. We employ a long short-term memory (LSTM) deep learning network and the random forest machine learning algorithm to predict entities. The proposed methodology gives an overall accuracy of 0.94 on the test dataset, indicating a potential for automated prediction even in the case of a limited number of training samples for various entities, thus, significantly reducing the required manual workload in practical applications.
... One of the SEO criteria the On-page optimization techniques focus on website content and structure that is meaningful to website performance. Gudivada specifies that website structure and web content SEO parameters are important for page optimization [25]. Lieke determines website accessibility and usability directly depends on on-page optimization [26]. ...
Article
Search engine optimization (SEO) is an important tool that can predict the number of website visits, and good SEO strategies can help increase the visibility of educational resources. In the current article, the relevance between Search engine optimization (SEO) characteristics and the number of Moodle users in four Central Asian countries was examined. The analysis of learning management systems covered 38 Moodle websites. Analysis was conducted using semi-automated tools. Results suggested that although there is no difference in general SEO scores of Moodle installations, page structures and link structures in websites of different countries were dissimilar. Also, server configuration practices were different. In addition, there was a positive correspondence between the number of visits to a website and the number of external factors; the duration of users’ visits was correlated with page structure. As for the normalized number of visits, the analysis revealed that in Kazakh universities, students use the Moodle system more frequently in a day than in other countries, while visits to Tajik universities’ Moodle platforms were the lowest; the difference was 53 users per day.
... The first is tools of SEO, which are tools to optimize the searches in search engine websites. Search engine makes online user easier to find something they need, with search engine created some application system like keyword tools, link tool, usability tool, keyword strategy ad high quality incoming link [10]. ...
Article
StartAja.com is an e-commerce designed to help connecting business people who have similar interests in certain business area. Currently, StartAja.com hasn’t have digital marketing strategy yet to simplify keyword searches in promoting StartAja.com. This research’s objective is to formulate a Search Engine Optimization (SEO) strategy. By using SEO, StartAja.com is expected to be the top ranked e-commerce in search engine websites. Research data are obtained through observing the interface and functionalities of StartAja.com, and also its promotion requirements. This research has discovered the right combination of keywords to be used in SEO, to help optimize the promotion of StartAja.com.
... URL Redirections: URL redirection involves the taking off a visitor to another web page without their intervention, e.g., JavaScript, Meta refresh tags, Flash, and Server-side or Java redirects. However, SEs do not consider permanent or 301 redirects as malicious behaviour [48]. ...
Article
Full-text available
In today's world, internet is a globally used information system through which everyone glances for online information using search engines (SEs). They usually see the results on a search engine's first page before replacing it with another search query. Search engines attempt to return outstanding results in the search engine result pages (SERP) in response to a user's search query. However, results on the first page of SERPs are not always up to par. Moreover, due to web spamming's continuous evolution, it has now become more challenging for search engines to provide excellent quality search results. Furthermore, many providers of information are sought by the user, therefore, presenting every piece of data on the first page of SERPs is impossible. Therefore, most people use search engine optimisation (SEO) techniques to bring their websites to the top of SERPs' first page. However, some people misuse search engines using illegal methods to get their website to the top of the SERP's first page. These prohibited techniques are known by various names, such as web spam, search-engine spamming, web poisoning, and spamdexing. In web spamming, spammers redirect the users to a low ranked website, which reduce the performance of a search engine and spammers improve their business profit. In addition, adversarial information retrieval got much attention from both the industry and academic researchers. This article, therefore, presents the taxonomy of current web spamming techniques using which spammers can gain special ranks for their websites. This article also presents a systematic literature review of algorithms used for the detection and prevention of web spamming. We have divided all the current web spam detection and prevention algorithms into four categories based on the information type they use: Link-based web spam detection techniques, content-based web spam detection techniques, web spam techniques based on non-traditional data, and combined approaches. We have also subcategorised the link-based technique into five categories by the principles and ideas used. We have then compared these five methods with each other based on algorithms, complexity, working, and type of information used. Ultimately, we have shortened the perceptions and unravelled the rules used for the construction of the web spam detection algorithm. .
Chapter
Die Studie untersucht in einer Vollerhebung aller 6211 Kandidierenden zur Bundestagswahl 2021, inwieweit diese auf ihren persönlichen Websites im Wahlkampf Suchmaschinenoptimierung (SEO) angewandt haben. Dies ließ sich auf 93 Prozent der 1372 untersuchten Websites feststellen. Auf individueller Ebene der Kandidierenden zeigt sich eine Bedeutung der Professionalisierung für die Nutzung von SEO: Wer in den Bundestag eingezogen ist, nutzt eher SEO. Auf organisationsbezogener Ebene zeigt sich, dass eine Zugehörigkeit der Partei zum Parlament die SEO-Nutzung der Kandidierenden begünstigt.
Conference Paper
Over the internet, many informatics contents are available and for searching these contents, Search Engines are required. In today's online competitive world, everyone wants their content page at the first position in Search Engine Result Page (SERP) and this is utmost objective. Generally, the increase in the website visibility is obtained by Search Engine Optimization (SEO). In order to get website at highest visible position in SERP, diverse techniques are reported and in this article, a detailed survey of SEO, its working, SEO types, and parts are discussed. In this work, a growth rate of using mobile phones in SEO is presented and data is collected from 2011 to 2020. It is observed that SEO in mobile phones is increased from 0.7% to 52.2% from 2011 to 2020. There are different web browsers based on different backend technologies and some of prominent browsers are Google, Yahoo, Bing and DuckDuckGo. For the calculations of their use in the mobile phone based SEO, we have presented the state of art analysis and it is seen that Google is most used and DuckDuckGo is least used browser. Further a comparative study of mobile users and desktop users in SEO is conducted from 2012 to 2020 and it is evident that in the mid 2016, mobile users crossed desktop users in SEO operations.
Article
Full-text available
Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam-detection techniques are usually designed for specific, known types of Web spam and are incapable of dealing with newly appearing spam types efficiently. With user-behavior analyses from Web access logs, a spam page-detection algorithm is proposed based on a learning scheme. The main contributions are the following. (1) User-visiting patterns of spam pages are studied, and a number of user-behavior features are proposed for separating Web spam pages from ordinary pages. (2) A novel spam-detection framework is proposed that can detect various kinds of Web spam, including newly appearing ones, with the help of the user-behavior analysis. Experiments on large-scale practical Web access log data show the effectiveness of the proposed features and the detection framework.
Article
Full-text available
Search engines became a de facto place to start information acquisition on the Web. Though due to web spam phe-nomenon, search results are not always as good as desired. Moreover, spam evolves that makes the problem of provid-ing high quality search even more challenging. Over the last decade research on adversarial information retrieval has gained a lot of interest both from academia and industry. In this paper we present a systematic review of web spam de-tection techniques with the focus on algorithms and under-lying principles. We categorize all existing algorithms into three categories based on the type of information they use: content-based methods, link-based methods, and methods based on non-traditional data such as user behaviour, clicks, HTTP sessions. In turn, we perform a subcategorization of link-based category into five groups based on ideas and prin-ciples used: labels propagation, link pruning and reweight-ing, labels refinement, graph regularization, and feature-based. We also define the concept of web spam numerically and provide a brief survey on various spam forms. Finally, we summarize the observations and underlying principles applied for web spam detection.
Article
Search engines became a de facto place to start information acquisition on the Web. Though due to web spam phenomenon, search results are not always as good as desired. Moreover, spam evolves that makes the problem of providing high quality search even more challenging. Over the last decade research on adversarial information retrieval has gained a lot of interest both from academia and industry. In this paper we present a systematic review of web spam detection techniques with the focus on algorithms and underlying principles. We categorize all existing algorithms into three categories based on the type of information they use: content-based methods, link-based methods, and methods based on non-traditional data such as user behaviour, clicks, HTTP sessions. In turn, we perform a subcategorization of link-based category into five groups based on ideas and principles used: labels propagation, link pruning and reweighting, labels refinement, graph regularization, and featurebased. We also define the concept of web spam numerically and provide a brief survey on various spam forms. Finally, we summarize the observations and underlying principles applied for web spam detection.
Article
Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as content spamming and link-based spamming. Although these anti-spam approaches have had much success, they encounter problems when fighting against a continuous barrage of new types of spamming techniques. We attempt to solve the problem from a new perspective, by noticing that queries that are more likely to lead to spam pages/sites have the following characteristics: 1) they are popular or reflect heavy demands for search engine users and 2) there are usually few key resources or authoritative results for them. From these observations, we propose a novel method that is based on click-through data analysis by propagating the spamicity score iteratively between queries and URLs from a few seed pages/sites. Once we obtain the seed pages/sites, we use the link structure of the click-through bipartite graph to discover other pages/sites that are likely to be spam. Experiments show that our algorithm is both efficient and effective in detecting Web spam. Moreover, combining our method with some popular anti-spam techniques such as TrustRank achieves improvement compared with each technique taken individually.