Content uploaded by José Ramón Saura
Author content
All content in this area was uploaded by José Ramón Saura on May 10, 2019
Content may be subject to copyright.
Does SEO matter for early-stage startups? Insights from
visual data mining and topic-modeling techniques
Jose Ramon Saura1, Ana Reyes-Menendez1* and Chris Van Nostrand2
1Department of Business Economics, Rey Juan Carlos University, Madrid (Spain);
joseramon.saura@urjc.es, ana.reyes@urjc.es
2UC Berkeley Extension, University of California, Berkeley, UUEE; chrisvannostrand@berkeley.edu
Corresponding author: ana.reyes@urjc.es
Abstract
In the present study, we analyzed User Generated Content (UGC) to measure the importance of
Search Engine Optimization (SEO) for early-state startups. Several social media analytics techniques
were used to derive insights from Twitter-based UGC. Data visualization and topic-modeling
algorithms were applied to a dataset of tweets (n = 67,126). Specifically, hashtag analysis, polarity
and emotion analysis, word analysis, topic modeling, and other relevant approaches were used. In
addition, we also performed a qualitative case study on an early-state startup to validate our findings.
The results helped us identify the communities in Twitter about SEO for early-stage startups and the
main optimization indicators according to the feelings expressed in tweets (positive, negative and
neutral). Our results also demonstrated that SEO is not the most relevant strategy of positioning of
digital marketing for early-state startups and that, although this strategy is used by the early-state
startups, it is predominantly negatively perceived by Twitter users. Due to using novel approaches to
data analysis, our findings provide meaningful implications for both practitioners and academic
researchers.
Keywords: Search engine optimization; visual data mining; digital marketing; data mining, topic
modeling
1. Introduction
In recent years, the Internet has become a widely used and ever-growing data source (Yoo et al. 2012).
A large proportion of these data is generated daily by users who connect to social networks and who
use their mobile applications to express opinions on products or services, or on any specific topic
(Regmi et al. 2015; Rathore et al. 2016). In today’s increasingly connected world, users obtain
information on the Internet through search engines, such as Google, Yahoo!, Bing, Baidu, among
others (Paik and Woo, 2014; Thiel et al., 2014).
A search engine is a website that allows one to make inquiries on any subject (Fishkin, 2005). In just
few thousandths of a second, search engines return multiple results from web pages containing the
searched keywords. If a business is not indexed in the SERPs (Search Engine Result Pages), it does
not appear in user search results. The strategy to position a company’s web pages is known as SEO
(Search Engine Optimization) (Saura et al. 2019a; Aswani et al., 2017a).
SEO is a technique that consists of the optimization of indicators within (on-page) and outside of the
web page (off-page) to achieve the ranking positioning of a page in the search engine results pages
(SERPs). Evidence is available showing that being in the top 3 of the search results is crucial to obtain
98% of the clips from user searches (Fishkin, 2005). At the same time, the importance of the SEO
positioning strategy around the world increases due to the massive use of search engines, the
development of new technologies, and innovation in the business sector (Dellermann et al. 2017).
This combination of factors has led to the emergence of startups, companies based on technology and
innovation, in the global ecosystem (Giardino et al. 2014, 2015, 2016).
In an early-stage phase, startups need digital marketing and SEO positioning to increase their
reputation on the Internet (Picken, 2017). At this stage, startups have to increase the impact of their
projects while looking for funding to invest in digital marketing (Kropp et al. 2008; Davila et al.
2009).
This newly emerged 2.0 ecosystem is rich in user generated content (UGC), i.e. the content daily
generated by Internet users through social networks and digital platforms (LeBrasseur et al. 2003).
Relevant research has analyzed UGC for digital marketing, SEO, SEM (Search Engine Marketing),
ASO (App Store Optimization) or even SMM itself (Social Media Marketing) (Davila and Foster,
2007; Garcia-Perez et al. 2011; Nada et al. 2013; Saura et al. 2019a).
In the present study, our key aim is to identify key indicators that add value to early-stage startup
companies. To this end, we analyze UGC using methods such as hashtag analysis, polarity and
emotion analysis, word analysis, topic modeling, and other relevant approaches extending Aswani et
al. (2019) methodology approach.
1.2. Does SEO really matter for start-ups in an early stage of their development?
SEO is based on the optimization of indicators that measure the relevance of a web page on the
Internet (Evans, 2007). Considering the relevance of these indicators, the search engines index them
in the first positions of its SERPs, the second page, and beyond (Zhang and Cabage, 2017).
Such indicators are many and varied and include, among others, the Page Authority (PA), which
measures the quality of the content of a website page, and the Domain Authority (DA), which
measures the relevance of the domain based on the number of visits and links pointing to it from other
web pages. Combining these parameters (PA and DA), the founders of Google, Larry page, and
Sergey Brin created a device called Pagerank, which was presented by the Moz company (Krrabaj et
al. 2017).
Other important indicators that optimize a website for SEO include the repetition of the main keyword
in title, description, and URL tags content (Lui et al. 2018; Pidpruzhnikov et al. 2017). These three
indicators are shown in the SERPs when a user performs a search. If the itinerancy of the keyword in
the content of a page increases, this leads to an increase of the possibility of that page being indexed
in search engines (Malaga, 2010). There are also other indicators, such as the sitemap, a file in XML
language, which shows the structural design and the information architecture of a website so that
search engines can correctly track it, or the correct installation of the robots.txt file., what tells search
engines what content should not be indexed into the SERPs (Cahill, 2009; Giomelakis and Veglis,
2016). Other relevant indicators include social meta-tags, which offer search engines additional
information about the content of the website: Facebook meta tags are called Facebook Open Graph
Data; for Twitter meta data, social tags are called Twitter Card (Kataria and Sapara, 2016; Aswani et
al. 2017).
As mentioned above, in order to obtain quality traffic and publicize their products and services at an
early stage of their development, startups need to generate impact and visibility of their businesses
for search engines (Baye et al. 2016; Lui and Au, 2018). For a startup business, an early-stage phase
includes the period of looking for investment from accelerators, start-up incubators, business angels,
or competitive startups awards (Dwivedi et al. 2011). In general, investment to such startups comes
from their founders or CEOs, close friends, family members, or occasional accelerators with small
investments (Farooqi et al. 2017).
In this context, it is interesting to evaluate the effectiveness of the SEO strategy for early-stage
startups is and whether such strategy is worth investing so that to generate necessary impact. To
explore this issue, Aswani et al. (2019) used data visualization techniques and content analysis of
what they defined as SEM (Search Engine Marketing), considering also the SEO strategy within the
acronym of SEM. In particular, they developed content analysis of tweets with hashtags #SEO,
#SEM, and #DigitalMarketing. The authors also measured the polarity and sentiment of the collected
tweets. In order to compare the results, they contrasted the conclusions with a case study about
SEOClerks.
The method used in the present study is similar to the one previous used by Aswani et al. (2018).
However, we extended it with the approach presented by Saura and Bennett (2019). Our specific
focus is on early-stage startups. On collecting 67,126 tweets with hangtags #SEO and #Startups, we
submitted this database to several types of analysis, including hashtag analysis, polarity and emotion
analysis, word analysis, textual analysis, topic-modeling with LDA (Latent Dirichlet allocation) and
others advanced algorithms, such as Hyperlink-Induced Topic Search (HITS), the PageRank
algorithm distribution, and Eigenvector centrality distribution algorithm. These methods are used to
extract insights from Twitter UGC data. Finally, to validate the results, we also performed a case
study on one of the startups (Applied) selected by Google to be part of the program called Google for
startups.
The remainder of this paper is structured as follows. On formulation of our research questions in
Section 2, we present the methodology (Section 3) and the results (Section 4). This is followed by a
case study (Section 5). The findings are summarized and discussed in Section 6. Conclusions and a
discussion of implications of our results for both practitioners and academic researchers are presented
in Section 7.
2. Research Questions
In order to increase their impact and potential, as well as to receive new investment, startups in early-
stage should adopt the domain and optimization of SEO strategies. Startup founders usually hire
qualified specialists or develop SEO strategies on their own through social platforms or through other
digital channels, such as email marketing or social ads. These platforms are rich in UGC content, that
is, the content developed by users to express their opinions and to make comments about different
industries. Therefore, social platforms could be used to obtain a holistic picture about customer
satisfaction in a specific domain (Aswani et al., 2018).
The aim of the present study is to explore these discussions on SEO so that to acquire an in-depth
understanding of the dynamics of this niche industry for early-stage startups. In essence, the present
study continues the research started by Aswani et al. (2018) and extends it using the method presented
Saura and Bennett (2019). The insights generated by the proposed method of analysis can be used to
evaluate the SEO industry in early-stage startups and compare whether SEO in early-stage startups is
really important and worthwhile. The research questions addressed in the present study are as follows:
1. What are the dominant discussion themes on SEO in early-stage startups?
2. What are the dominant sentiments (positive, negative and neutral) in these discussions?
3. What is the structure of the network that participates in these discussions?
4. Are the startup CEOs satisfied with the goals achieved by their SEO strategies?
5. If not, what are the drivers of dissatisfaction?
3. Methodology
The methodological process developed in the present study is based on the analysis of SEO in relation
to the research questions posed. As mentioned previously, social media (SM) have become a new
area for the investigation of both structured databases and unstructured databases (Aswani et al.,
2018). In previous studies, SM analysis has been effectively used to study stock price fluctuations,
disease prevention, event monitoring, election result predictions, disaster management, brand
management, public relations, public opinion polling, improvement of tourism services, solutions for
global warming or studies of promotions such as #BlackFriday (Aswani et al., 2018; Saura et al.,
2018a, 2019b; Arias et al., 2014; Chae, 2015; Hughes and Palen, 2009; Inauen and Schoeneborn,
2014; Joseph et al., 2017; Kim, 2014; Williams et al., 2013; Wu and Shen, 2015; Reyes-Menendez et
al., 2018).
In the present study, we used Twitter-based UGC to investigate whether early-state startups should
invest into the SEO strategies to increase their visibility in the Internet (Aswani et al., 2018).
During data collection, a total of 67,126 tweets with hashtags #SEO and #Startups were downloaded
from the public Twitter API over a period of 6 days (April 12-19, 2019). Afterwards, the database
was filtered and cleaned to improve the robustness of the data.
In the next step, we applied the LDA model to identify most discussed topics/themes. Thereafter,
these topics were visualized in nodes to understand their weight. The data were also submitted to
several clustering algorithms—namely, Hyperlink-Induced Topic Search (HITS), the PageRank
algorithm, and the Eigenvector centrality distribution algorithm.
In addition, we performed sentiment analysis of the tweets. To this end, we first used a total of 349
tweets to train the sentiment analysis algorithm that works with machine learning in Python; the
obtained Krippendorff's alpha value (KAV) of 0.791 was reached.
This allowed us to categorize the tweets into three groups (positive, negative, and neutral) depending
on the sentiment expressed in them. This was followed by textual analysis of the data that helped us
identify indicators related to SEO in early-stage startups. Finally, to verify the results, we performed
case study on a startup called Applied from Google for startups program (Chae, 2015; Joseph et al.,
2017; Aswani et al., 2018; Saura and Bennett, 2019).
An overview of the various types of analysis used in the present study is shown in Figure 1.
Figure 1. Main types of analysis in social media analytics
Source: Adapted from Aswani et al. (2018)
In the present study, within the descriptive analytics section (A), we used descriptive analytical
techniques, user statistics, sentiment reputation, and social impact, as well as word clouds and
hangtags analytics. In terms of content analytics (B), we used sentiment analysis, topic modeling,
hangtags analysis, diversity and lexical diversity, and weight percentage.
In terms of network analytics (C), we computed diameter, bridge and distance, centrality and
cohesion, cluster and clique detection and dimension, as well as community detection. Finally, with
respect to space-time analysis (D), we analyzed topic density and evolution, as well as considered
topic predictions.
4. Results
Descriptive statistics provide an overview of the nature of the tweets, the users that interact through
them, and the degree of engagement of relevant stakeholders (Bruns and Burgess, 2013, Aswani et
al., 2018, Saura and Bennett, 2019). In our dataset, of a total of 67,126 tweets, 64,145 were original
tweets, 29% were replies to these tweets, and 27,459 were retweets (RT). These figures highlight a
very active interaction between the stakeholders related to the SEO community and startups.
Likewise, a total of 15,176 different hashtags were detected in the sample, with a total of 15,494
unique users around this UGC.
Over 45% of the tweets contained more than one hashtag, which indicates that many of the tweets
were related to several similar topics in digital marketing. Considering that a total of 15,494 unique
users were identified, each user published on average 4.3 tweets, including 2.1 original tweets, 1.3
retweets, and 1 reply. Regarding the visibility of users, data analysis demonstrated that most users
were active and visible in this social network, so the analyzed content can be assumed to yield relevant
insights (Aswani et al., 2018). In the tweets, there were a total of 51,290 different URLs.
A closer consideration of the data showed that the most popular words used in the discussions in the
tweets (excluding SEO, startups, and digital marketing) were marketing (3.624), business (3.532),
Google (3.073), content (2.679), search (2.649), website (2.253), digital (1892), and tech (1.694).
Similarly, the analysis of the hashtags demonstrated that a total of 766 unique hashtags were found
in the tweets, and they appeared 15,176 times. The most repeated hashtags were #SEO (12.453),
#startups (7.391), #digitalmarketing (3.097), #marketing (3.050), #socialmedia (2385), and #business
(1.762).
A more in-depth analysis of the data demonstrated associations between hashtags, words, and users.
First, popular terms included the words directly related to the best SEO practices for startups (e.g.
trick, tutorial, tips, check, now, why, great, sharing). Second, frequently used were also the names of
active companies and relevant tools in the sector (e.g., Google, Moz, Gmail, PPC, CPM, B2B, B2C,
eCommerce, Screamingfrog, Google Search Console, Google Analytics, Google Ads, etc.). These
findings highlight the strong interest of the UGC community to these tools and companies.
Furthermore, to identify relevant communities in the studied UGC, we used the algorithm of data
visualization and classification previously proposed by Vincent et al. (2008) (see Figure 2a). For the
resolution of the results, Lambiotte et al. (2009) proposed an algorithm to group the results in
communities of neurons or nodes. To understand this process, modularity should be defined.
Modularity is a measure of the structure of networks designed to measure the strength of the division
of a network into clusters or communities (Vincent et al., 2008). Networks with a high modularity
have dense connections between the nodes within modules, but sparse connections between nodes in
different modules. Modularity is often used in optimization methods to detect the community
structure in networks.
Figure 2(b) shows the PageRank (PR) distribution of the identified communities (Page et al., 1999).
The PR is an iterative algorithm that measures the importance of each node within the identified
network. This metric assigns each node a probability of being clicked many times. In addition, we
also used the Hyperlink-Induced Topic Search (HITS), also known as Hubs Authority algorithm,
which rates the authority and the hubs distribution (Kleinberg, 1999) (Figure 2 (c)-(d)).
The HITS metric determines two values for a node: (1) its authority, which estimates the value of the
content of the node, and (2) its hub value, which estimates the value of its links to other nodes. Hits
updates the authority value of each node to be the sum of the hub values for every node it has a link
to (Kleinberg, 1999).
Figure 2. (a) Modularity distribution of communities; (b) PR and relevance nodes analysis; (c) Hubs
distribution; (d) Hits Authority analysis (Source: The authors)
(a)
(b)
(c)
(d)
In our data, a total of 2.145 user communities interacting on the topics related to SEO in startups and
digital marketing strategies were identified. In Figure 2a, the size (number of nodes) is shown on the
Y axis, while the modularity class is shown on the X axis. The detected communities included
communities of women in technology, volunteers and freelancers to develop SEO in startups and
SEO, as well as SEM and SQL experts debating on the strategies and best advice for this industry. Of
all these communities, the communities with more weight were as follows: SEO (the PR measure of
0.0326); business (0.0093); marketing (0.0086); startups (0.0070); digital marketing (0.0062);
entrepreneurship (0.0039); innovation (0.0039); artificial intelligence (IA) (0.0037); social media
marketing (SMM) (0.0036), and Fintech startups (0.0035).
Figure 2(c) shows hubs distribution where the points of nodes that stand out in the distribution
correspond to Google (0.0594); ecommerce (0.0686); small business (0.0630); IoT (0.0606); SEM
(0.043); B2B (0.0536); Blog (0.0575); PPC (0.0437); growth hacking (0.0607); analytics (0.0584);
domains (0.0421); directory (0.0202); link building (0.0285); venture capital (0.0347); data (0.0487);
founders (0.0305), and leadership (0.0487). These results demonstrate the interconnection of SEO
tools and digital business models that develop startups.
Figure 2(d) shows the results on hubs authority. According to the results of this algorithm, the 10
nodes with the most authority within the sample are startups (0.5898); SEO (0.5739); business
(0.2037); marketing (0.1917); entrepreneurship (0.0877); social media (0.0850); technology (0.0973);
innovation (0.0834); design and web design (0.0605), and content marketing (0.0522). Two new
communities focused on web design and innovation and content marketing strategy appear as well.
This adds value to the results, suggesting that SEO can be applied to these digital areas.
In order to better visualize the results of the processes depicted in Figure 2(a-d), Figure 3 shows the
UGC communities in terms of the weight of the corresponding nodes. As can be seen in Figure 3, of
a total 2145 identified communities, 18 had a greater weight.
Figure 3. Groups of communities associated by similar content
By order of weight, our results showed that the node/community corresponding to SEO, Business and
Marketing had the weight of 0.05062; Startups, Digital Marketing, and Entrepreneurship had the
weight of 0.0454. These communities were followed, in the descending order of weight, by
Innovation and AI (0.0077); Fintech startups and Blockchain startups (0.0059); SMM and Content
Marketing (0.0058). Communities from C1 to C14 contained indicators of SEO optimization, tools,
and tips related to SEO; these were small user communities focused on the discussion of the best
practices and offering SEO advice for startups.
Following Saura and Bennett (2019), we created an LDA model, a state-of-art mathematical model
that can divide a sample into topics. The LDA model unfolded in the following two steps. In the first
step, words and their connectors were identified and classified in different documents. In the second
step, the distribution of words and phrases across different topics was computed. Subsequently, the
main topics discovered in the dataset about SEO and startup were classified using Eq. (1).
(1)
where βi is the distribution of a word in topic i, with total K topics; θd is the proportion of topics in
document d, with the total of D documents; zd is the topic assignment in document d; zdn is the topic
assignment for the nth word in document d, with total N words; wd is the observed words for document
d; and wd,n is the nth word for document d. The topics in our dataset were identified using Eq. (2) for
Gibbs sampling [6]. The calculation was performed using Python LDA 1.0.5.
The main aim of the process of identifying topics with LDA was to find SEO optimization indicators
for startups. The identified topics are listed in Table 1.
Table 1. LDA topics identification of SEO indicators for early-stage startups
Topic
Brief description
Count
Robots.txt
and tag
Robots.txt and tag should be implemented on the web indicating the URL
to the sitemap.xml; in its tag, it should appear as "index, follow".
161
URL
URLs should contain the keyword that positions the website; no noise or
special characters are allowed.
157
Title tag
Title tag should not exceed 69 characters and must contain the keyword.
139
Social Tags
Social tags should independently appear on all pages of the website.
123
Open Graph
Data
Open Graph Data are used to mark Facebook data for social tags. It is
important to optimize the "og: image".
110
JavaScript
(JS)
There are problems in the implementation of JS for SEO
104
Backlinks
Backlinks are necessary to increase the PA and DA of the website, but
there are penalties for their purchase.
104
Twitter Card
Data
Twitter Card Data serve to optimize social SEO for Twitter that increases
the impact of content in this social network.
98
Sitemap.xml
Sitemap.xml structures the content of the website and offers search
engines the information architecture of the site.
92
Meta
description
Meta description should contain the keyword to be positioned and should
not exceed 156 characters for the results in desktop SERPs and 333 for
mobile.
83
Traffic
Traffic of a website relevant to SEO should be high.
79
Accelerated
Mobile
Pages
(AMP)
Accelerated Mobile Pages (AMP) is a technology that increases the
loading time of a website optimized to SEO.
79
Long-tail
keywords
Long-tail keywords are the most successful keywords in SEO
71
SEM
SEM is a digital marketing strategy that should accompany SEO tactics.
67
Directory
A website structure should be divided into directories and not in URLs by
subject or sub-domain.
63
(2)
Link
building
Link building is a strategy that penalizes SEO.
57
CTR
CTR is the indicator that measures the effectiveness of clicks on SERPs.
35
Headers
(H1-H6)
Headers show the search engines the structure of the content.
15
HTML
The goals tags in HTML and social tags are important for optimization
14
On identification of the topics, we submitted the data to sentiment analysis to identify the predominant
feelings expressed in the tweets (positive, negative, and neutral). To this end, we first trained a
machine learning algorithm developed in Python in three feelings (positive, negative, and neutral).
The algorithm was trained a total of 349 samples, and the average KAV values achieved for positive,
negative, and neutral sentiments were 0.721, 0.775, and 0.801, respectively. Considering the
conventional thresholds for KAVs values (α ≥ 0.800 high reliability; α ≥ 0.667 tentative conclusions;
α <0.667 low reliability; see Krippendorff, 2004), highly reliable and tentative conclusions could be
made based on our data.
To better visualize the findings, Figure 4(a)-(b) shows two-word clouds with the main words arranged
according to their weight and feeling (positive and negative; neutral was not relevant in this case).
Figure 4(a) Negative topics for SEO early-stage startups; (b) Positive SEO topics for early-stage
startups
(a)
(b)
As can be observed in Figure 4(b), positive issues that should optimize the early-stage startups are
title, description, URL, AMP, sitemap.xml, long-tail keywords, traffic, and social tags. Furthermore,
as shown in Figure 4(a), negative issues are robots.txt and tag, JavaScript, backlinks and link building.
There have also been figures of influencers. In addition, influential figures in the SEO sector, such as
Rand Fishkin, Avinash Kaushik, Matt Cutts, or Eduardo Garolera, turned out to be positively
evaluated in our sample.
5. Case study
For the case study, we selected the Google for startups program startup called Applied. In 2017,
resident startups within this program got $ 23,351,077 funding and created 406 jobs worldwide.
Applied is an early-state startup recruitment platform that hires “the best” regardless of their
background in terms of age, education, wealth, circumstances, gender, or ethnicity. Part of the success
of the Applied platform relied on publicizing its products and services in search engines; accordingly,
substantial traffic was attracted to its website so that users can use the Applied platform. Therefore,
one of the main objectives of the Applied platform was to increase its impact on SEO.
In the present study, we compared the optimization of this early-state startup with the insights derived
from the analysis of communities on Twitter and the identified topics and feelings related to SEO
optimization of startups.
We found that the Applied platform correctly optimized positive indicators such as title, description,
URL, the file sitemap.xml, long-tail keywords and social tags. However, the optimization of AMP
and the traffic indicator, which is directly related to the loading time of the website, the PA, and the
DA (Fisking, 2005), was tested with the PageSpeed Insights tool, with the negative result of 42%
over loading time. These results suggest optimizing the indicators with positive sentiment identified
in the present study is not the most important task for this type of startups in terms of priority,
implementation, and development.
Regarding the negative indicators, robots.txt and tag were correctly implemented; the JS code present
on the web worked correctly with SEO, and the number of backlinks was abundant, thereby allowing
the platform to obtain a PA value of 39/100 and a DA value of 38/100. The link building strategy was
also working effectively. These results demonstrate that, at an early stage, startups more effectively
cover negative indicators than positive indicators. This pattern could be attributed to the fact that,
since early-stage startups aim to rapidly grow in impact on the SERPs, they are more willing to use
SEO black hat tactics that increase their visibility in the SERPs within a shorter period of time, even
if this strategy increases the risk of a possible penalty (Aswani et al. 2018).
With regard to the communities and topics, our analysis of the Applied platform demonstrated that
the important topics were content marketing, social media, and SMM strategies. We also found that
Applied shared content about innovation and new technologies in the startup sector, such as Fintech,
Bitcoin, Blockchain, or Artificial Intelligence. Other salient topics included digital marketing,
entrepreneurship, business, and SEO in which users debated about the main skills they should have
to be hired in this field.
6. Discussion
Previous studies have convincingly demonstrated that SEO is beneficial for startups that do not use
SEM in their strategies to increase their visibility in the top of the SERPs. Previous research has also
identified the key factors that affect the positioning of startups in SERPs and central concepts related
to user behavior or psychology. In addition, SEO has been demonstrated to be an effective strategy
to improve traffic to websites, which also brings profit. Several studies have also investigated the
impact of SEO strategies on global digital marketing campaigns (Xing and Lin, 2006) and pointed
out that search marketing strategies are not as profitable as other strategies carried out by advertisers
in the digital ecosystem (Berman and Katona, 2013).
However, none of previous study has focused on the importance that early-stage startups should
assign to SEO positioning, particularly in relation to the optimization of their social media marketing,
SEO, and content marketing strategies. To fill this gap in the literature, in the present study, we aimed
to explore the early-stage startup industry through a review of Twitter discussions and through a case
study of the Applied platform.
Our results demonstrate that many small startup communities have conversations about digital
marketing and the main strategies that should be implemented. In addition, our findings also clearly
indicate that early-stage startups are closely interconnected through the most important user and
thematic communities where they find useful information about SEO optimization.
Therefore, based on our results about the network dynamics, it can be concluded that the industry
concentration appears to be high, although highly fragmented (see Aswani et al., 2018, for a similar
conclusion). One of the reasons behind this trend may be the close contact among early-state startups,
as they are in the same phase of development and have on the same needs.
However, for the success of the SEO strategy in early-stage startups, the identified indicators should
be optimized. The presence of negative SEO indicators suggest that there are negative experiences
and bad results of other startups and practitioners in the industry. Several previous studies have
analyzed the negative engagement in digital marketing based on the most practical type of black hat
SEO (see Aswani et al., 2018).
In addition, our results show that SEO is not always the strategy that should be carried out in terms
of better positioning of startups in search engines (see also Malaga,2008; Aswani et al., 2018 for
similar conclusions). Instead, alternative strategies, such as PPC, SEM, content marketing, SMM or
influencers marketing, can be meaningfully used to develop the positioning focused on early-state
startups. Also, our results demonstrate that early-stage startups frequently use the black hat techniques
of SEO that have become very prominent in recent times, particularly in small-scale businesses.
However, despite the short-term gains in terms of attracting traffic, in the long run, the links can
become toxic and the purpose behind the whole exercise can be defeated, resulting in no current gain
of online visibility. The aforementioned issues should become the top concerns for the early-stage
startup industry.
To validate the insights, we collected from Twitter-based analysis, we performed a case study of the
Applied platform. The results of this analysis confirmed our previous observations. Specifically, we
found that startups are basing their strategies on techniques with a negative sentiment, which can lead
to penalties for search engines and loss of long-term visibility. In addition, we also observed that the
most basic SEO optimizations were not met at 100%, which means that the early-stage startups did
not focus exclusively on developing their SEO strategies, but also implemented other strategies such
as PPC, SEM or SMM. These findings contradict the results reported by Moreno and Martinez (2013)
who argued that only white hat SEO strategies should be performed.
7. Conclusions
In the present study, we focused on investigating early-stage startup SEO strategies of digital
marketing. The analysis was two-fold: on the one hand, we investigated the issue using Twitter-based
UGC; on the other hand, we also performed a case study of the Applied platform. Our results on the
discussions surrounding SEO on Twitter highlighted that ca. 27% of all discussions had a negative
polarity, indicating that SEO is not the perfect and profitable strategy for early-state startups. This
finding also suggests that there is a high percentage of unsatisfied SEO experts’ experiences and
negative engagement. The analysis clearly demonstrated that most users are not satisfied with the
performance of their SEO strategies in early-stage startups.
This pattern was further validated by a case study of the Applied platform. A detailed analysis
revealed that the major reason behind the dissatisfaction was outsourcing of negative SEO indicators
and topics, such as robots.txt and tag, JavaScript, back links, and link building. In the long run, using
such techniques or wrong optimization, when detected by a search engine, leads to penalization.
Along with contributing to the previous literature (Slegg, 2016; Evans, 2007; Saura et al., 2018a), our
results provide meaningful practical implications for startups, organizations, and individuals
searching for the ways towards quick and efficient solutions to enhance their web visibility.
Extant literature on SEO and startups focuses on the importance of visibility in search engines.
However, studies using UGC to identify insights for early-stage startups to appropriately develop
their digital marketing strategies remain scarce. The present study fills this gap in the literature
7.1. Implications for practitioners
The results of the present study offer several insights for practitioners in the startup industry. First,
early-state startups can use our results to improve the SEO topics and indicators identified in the
Twitter-based UGC. In addition, they can also obtain information regarding the communities that
surround the digital marketing environment on Twitter.
Startups can also use our findings to measure their impact in terms of performance and identify
influencers that can help them improve their impact through the engagement in social networks.
Furthermore, our study provides a global and dynamic vision of the startup ecosystem for industry
practitioners.
Finally, the insight suggested by our findings is that digital agencies and freelancers can improve their
startup performance by being active in the identified communities and trying to increase their impact
by using white hat SEO strategies. In future research, the methodology used in the present study can
be expanded to study other communities in the early-stage startups sector.
7.2. Limitations and future research directions
This study is based on Twitter to measure the importance of SEO strategies in early-state startups. To
complement our findings, future studies should consider including other social platforms in their
analyses. In addition, in future research, it would make sense to broaden the time horizon. The
methodological approach based on content analysis, analysis with algorithms, social analytics, topic-
modeling and sentiment analysis used in the present study can also be extended to obtain additional
findings.
References
1. Arias, M., Arratia, A. and Xuriguera, R., 2013. Forecasting with twitter data. ACM Transactions
on Intelligent Systems and Technology (TIST), 5(1), p.8.
2. Aswani, R., Chandra, S., Ghrera, S. P., and Kar, A. K. (2017a, December). Identifying Popular
Online News: An Approach Using Chaotic Cuckoo Search Algorithm. In 2017 2nd International
Conference on Computational Systems and Information Technology for Sustainable Solution
(CSITSS)(pp. 1-6). IEEE.
3. Aswani, R., Kar, A. K., and Ilavarasan, P. V. (2018). Detection of spammers in twitter marketing:
a hybrid approach using social media analytics and bio inspired computing. Information Systems
Frontiers, 1-16.
4. Aswani, R., Kar, A. K., Aggarwal, S., and Ilavarsan, P. V. (2017, November). Exploring content
virality in Facebook: A semantic based approach. In Conference on e-Business, e-Services and
e-Society (pp. 209-220). Springer, Cham.
5. Aswani, R., Kar, A. K., Ilavarasan, P. V., and Dwivedi, Y. K. (2018). Search engine marketing
is not all gold: Insights from Twitter and SEOClerks. International Journal of Information
Management, 38(1), 107-116.
6. Baye, M. R., De los Santos, B., and Wildenbeest, M. R. (2016). Search engine optimization:
what drives organic traffic to retail sites?. Journal of Economics and Management
Strategy, 25(1), 6-31.
7. Cahill, K., and Chalut, R. (2009). Optimal results: what libraries need to know about Google and
search engine optimization. The Reference Librarian, 50(3), 234-247.
8. Chae, B.K., 2015. Insights from hashtag# supplychain and Twitter analytics: Considering
Twitter and Twitter data for supply chain practice and research. International Journal of
Production Economics, 165, pp.247-259.
9. Davila, A., and Foster, G. (2005). Management accounting systems adoption decisions: evidence
and performance implications from early-stage/startup companies. The Accounting
Review, 80(4), 1039-1068.
10. Davila, A., and Foster, G. (2007). Management control systems in early-stage startup companies.
The accounting review, 82(4), 907-937.
11. Davila, A., Foster, G., and Li, M. (2009). Reasons for management control systems adoption:
Insights from product development systems choice by early-stage entrepreneurial
companies. Accounting, Organizations and Society, 34(3-4), 322-347.
12. Dellermann, D., Lipusch, N., Ebel, P., Popp, K. M., and Leimeister, J. M. (2017). Finding the
unicorn: Predicting early stage startup success through a hybrid intelligence method.
13. Diaz, A. (2008). Through the Google goggles: Sociopolitical bias in search engine design. In
Web search (pp. 11-34). Springer, Berlin, Heidelberg.
14. Dwivedi, Y. K., Wade, M. R., and Schneberger, S. L. (Eds.). (2011). Information systems theory:
Explaining and predicting our digital society (Vol. 1). Springer Science and Business Media.
15. Evans, M. P. (2007). Analysing Google rankings through search engine optimization
data. Internet research, 17(1), 21-37.
16. Farooqi, S., Jourjon, G., Ikram, M., Kaafar, M. A., De Cristofaro, E., Shafiq, Z., ... and Zaffar,
F. (2017, April). Characterizing key stakeholders in an online black-hat marketplace. In 2017
APWG Symposium on Electronic Crime Research (eCrime) (pp. 17-27). IEEE.
17. Fishkin, R., 2015. SEO: The Beginner's Guide to Search Engine Optimization From Moz". Moz.
Retreived from: https://bit.ly/2UKj7U5. Accessed 10 February 2019.
18. Garcia-Perez, A., Romero-Troncoso, R. J., Cabal-Yepez, E., Osornio-Rios, R. A., de Jesus
Rangel-Magdaleno, J., and Miranda, H. (2011, September). Startup current analysis of incipient
broken rotor bar in induction motors using high-resolution spectral analysis. In 8th IEEE
Symposium on Diagnostics for Electrical Machines, Power Electronics and Drives (pp. 657-
663). IEEE.
19. Giardino, C., Bajwa, S. S., Wang, X., and Abrahamsson, P. (2015, May). Key challenges in
early-stage software startups. In International Conference on Agile Software Development(pp.
52-63). Springer, Cham.
20. Giardino, C., Paternoster, N., Unterkalmsteiner, M., Gorschek, T., and Abrahamsson, P. (2016).
Software development in startup companies: the greenfield startup model. IEEE Transactions
on Software Engineering, 42(6), 585-604.
21. Giardino, C., Wang, X., and Abrahamsson, P. (2014, June). Why early-stage software startups
fail: a behavioral framework. In International Conference of Software Business (pp. 27-41).
Springer, Cham.
22. Giomelakis, D., and Veglis, A. (2016). Investigating search engine optimization factors in media
websites: the case of Greece. Digital journalism, 4(3), 379-400.
23. Hughes, A.L. and Palen, L., 2009. Twitter adoption and use in mass convergence and emergency
events. International Journal of Emergency Management, 6(3-4), pp.248-260.
24. Inauen, S. and Schoeneborn, D., 2014. Twitter and its usage for dialogic stakeholder
communication by MNCs and NGOs. In Communicating Corporate Social Responsibility:
Perspectives and Practice (pp. 283-310). Emerald Group Publishing Limited.
25. Jon M. Kleinberg, Authoritative Sources in a Hyperlinked Environment, in Journal of the ACM
46 (5): 604–632 (1999)
26. Joseph, N., Kar, A.K., Ilavarasan, P.V. and Ganesh, S., 2017. Review of Discussions on Internet
of Things (IoT): Insights from Twitter Analytics. Journal of Global Information Management
(JGIM), 25(2), pp.38-51.
27. Kataria, S., and Sapra, P. (2016, March). A novel approach for rank optimization using search
engine transaction logs. In 2016 3rd International Conference on Computing for Sustainable
Global Development (INDIACom) (pp. 3387-3393). IEEE.
28. Kataria, S., and Sapra, P. (2016, March). A novel approach for rank optimization using search
engine transaction logs. In 2016 3rd International Conference on Computing for Sustainable
Global Development (INDIACom) (pp. 3387-3393). IEEE.
29. Kim, T. (2014). Observation on copying and pasting behavior during the Tohoku earthquake:
Retweet pattern changes. International Journal of Information Management, 34(4), 546-555.
30. Kopera, S., Wszendybył-Skulska, E., Cebulak, J., and Grabowski, S. (2018). Interdisciplinarity
in Tech Startups Development–Case Study of ‘Unistartapp’Project. Foundations of
Management, 10(1), 23-32.
31. Kropp, F., Lindsay, N. J., and Shoham, A. (2008). Entrepreneurial orientation and international
entrepreneurial business venture startup. International Journal of Entrepreneurial Behavior and
Research, 14(2), 102-117.
32. Krrabaj, S., Baxhaku, F., and Sadrijaj, D. (2017, June). Investigating search engine optimization
techniques for effective ranking: A case study of an educational site. In 2017 6th Mediterranean
Conference on Embedded Computing (MECO) (pp. 1-4). IEEE.
33. Lagerstedt, M., and Mademlis, A. (2017). Branding for startup companies in Sweden: A study
on startups brand building
34. LeBrasseur, R., Zanibbi, L., and Zinger, T. J. (2003). Growth momentum in the early stages of
small business start-ups. International Small Business Journal, 21(3), 315-330.
35. Lui, R. W., and Au, C. H. (2018). Establishing an Educational Game Development Model: From
the Experience of Teaching Search Engine Optimization. International Journal of Game-Based
Learning (IJGBL), 8(1), 52-73.
36. Lui, R., and Au, C. H. (2018). IS educational game: Adoption in teaching search engine
optimization (SEO). Journal of Computer Information Systems, 1-11.
37. Malaga, R. A. (2010). Search engine optimization—black and white hat approaches.
In Advances in Computers (Vol. 78, pp. 1-39). Elsevier.
38. Nanda, R., and Rhodes-Kropf, M. (2013). Investment cycles and startup innovation. Journal of
Financial Economics, 110(2), 403-418.
39. Neill, S., Metcalf, L., and York, J. L. (2015). Seeing what others miss: A study of women
entrepreneurs in high-growth startups. Entrepreneurship Research Journal, 5(4), 293-322.
40. Page, Lawrence; Brin, Sergey; Motwani, Rajeev and Winograd, Terry (1999). The PageRank
citation ranking: Bringing order to the Web
41. Paik, Y., and Woo, H. (2014). Economic downturn and financing innovative startup
companies. Managerial and Decision Economics, 35(2), 114-128.
42. Perez, L., Whitelock, J., and Florin, J. (2013). Learning about customers: Managing B2B
alliances between small technology startups and industry leaders. European Journal of
Marketing, 47(3/4), 431-462.
43. Picken, J. C. (2017). From startup to scalable enterprise: Laying the foundation. Business
Horizons, 60(5), 587-595.
44. Pidpruzhnikov, V., and Ilchenko, M. (2017). Search optimization and localization of the website
of Department of Applied Linguistics. In Computational linguistics andintelligent systems
(COLINS 2017). National Technical University «KhPI».
45. R. Lambiotte, J.-C. Delvenne, M. Barahona Laplacian Dynamics and Multiscale Modular
Structure in Networks 2009
46. Rathore, A. K., Ilavarasan, P. V., and Dwivedi, Y. K. (2016). Social media content and product
co-creation: an emerging paradigm. Journal of Enterprise Information Management, 29(1), 7-
18.
47. Regmi, K., Ahmed, S. A., and Quinn, M. (2015). Data driven analysis of startup
accelerators. Universal Journal of Industrial and Business Management, 3(2), 54-57.
48. Saura, J. R., Palos-Sanchez, P. R., and Correia, M. B. (2019a). Digital Marketing Strategies
Based on the E-Business Model: Literature Review and Future Directions. In Organizational
Transformation and Managing Innovation in the Fourth Industrial Revolution (pp. 86-103). IGI
Global.
49. Saura, J.R. and Bennet, D. (2019). A Three-Stage Methodological Process of Data Text Mining:
A UGC Business Intelligence Analysis. Symmetry-Basel. doi: 10.13140/RG.2.2.11093.06880
50. Saura, J.R., Palos-Sanchez, P.R. and Grilo, A. (2019a). Detecting Indicators for Startup Business
Success: Sentiment Analysis using Text Data Mining. Sustainability. 15(3), 553;
doi:10.3390/ijerph15030553
51. Saura, J.R., Palos-Sanchez, P.R. and Rios Martin, M.A. (2018a). Attitudes to environmental
factors in the tourism sector expressed in online comments: An exploratory study. International
Journal of Environmental Research and Public Health. 15(3), 553; doi:10.3390/ijerph15030553
52. Saura, J.R., Reyes-Menendez, A. and Alvarez-Alonso, C. (2018b). Do online comments affect
environmental management? Identifying factors related to environmental management and
sustainability of hotels. Sustainability, in Special Issue e-Business, 10(9), 3016. doi:
10.3390/su10093016
53. Saura, J.R.; Rodriguez Herráez, B. and Reyes-Menendez, A (2019b). Comparing a traditional
approach for financial Brand Communication Analysis with a Big Data Analytics technique,
IEEE Access, 7(1). doi: 10.1109/ACCESS.2019.2905301
54. Shah, D. (2006). On Startups: patterns and practices of contemporary software entrepreneurs
(Doctoral dissertation, Massachusetts Institute of Technology).
55. Thiel, P. A., and Masters, B. (2014). Zero to one: Notes on startups, or how to build the future.
Broadway Business.
56. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding
of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment
2008 (10), P1000
57. Ward, A. A. (2017). The SEO Battlefield: Winning Strategies for Search Marketing Programs.
“O’Reilly Media, Inc.".
58. Williams, S.A., Terras, M.M. and Warwick, C., 2013. What do people study when they study
Twitter? Classifying Twitter related academic papers. Journal of Documentation, 69(3), pp.384-
410.
59. Wu, B., and Shen, H. (2015). Analyzing and predicting news popularity on Twitter. International
Journal of Information Management, 35(6), 702-711.
60. Yoo, C., Yang, D., Kim, H., and Heo, E. (2012). Key value drivers of startup companies in the
new media industry—The case of online games in Korea. Journal of Media Economics, 25(4),
244-260.
61. Zhang, S., and Cabage, N. (2017). Search engine optimization: comparison of link building and
social sharing. Journal of Computer Information Systems, 57(2), 148-159.