ArticlePDF Available

A conceptual architecture for content analysis about abortion using the twitter platform


Abstract and Figures

This paper presents a conceptual architecture for content analysis about the opinions expressed on Twitter about abortion. The architecture consisted of five stages: authentication, data collection, data cleaning & processing, modeling & analysis, and presentation of results. In the data collection, a simple size of tweets sent from Ecuador was taken in 2018. All tweets that were not related to the topic were eliminated. In the modeling, it was separated into two categories for and against abortion, where the Naive Bayes and decision tree classifiers were used. Finally, the results were presented in the form of statistical graphs, word clouds and heat maps. During the development, the Google maps platform was also used, where the scripts were made in Python using the Integrated Development Environment (IDE) Spyder (Python 3.6), which is part of the Anaconda platform. The results obtained showed, on average, a majority position against abortion in Ecuador.
Content may be subject to copyright.
Agosto 19 August 19
ISSN: 1646-9895
©AISTI 2019 Nº E22
Revista Ibérica de Sistemas e Tecnologias de Informação
Iberian Journal of Information Systems and Technologies
RISTI, N.º E22, 08/2019
Revista Ibérica de Sistemas e Tecnologias de Informação
Iberian Journal of Information Systems and Technologies
Edição / Edition
Nº. 22, 08/2019
ISSN: 1646-9895
Indexação / Indexing
Academic Journals Database, CiteFactor, Dialnet, DOAJ, DOI, EBSCO, GALE, Index-
Copernicus, Index of Information Systems Journals, Latindex, ProQuest, QUALIS,
SCImago, SCOPUS, SIS, Ulrich’s.
Propriedade e Publicação / Ownership and Publication
AISTI – Associação Ibérica de Sistemas e Tecnologias de Informação
Rua Quinta do Roseiral 76, 4435-209 Rio Tinto, Portugal
RISTI, N.º E22, 08/2019
Revista Ibérica de Sistemas e Tecnologias de Informação
Iberian Journal of Information Systems and Technologies
Recebido/Submission: 11/03/2019
Aceitação/Acceptance: 20/06/2019
A conceptual architecture for content analysis about
abortion using the Twitter platform
Paolo R. Roldán-Robles1, Ana C. Umaquinga-Criollo1, Janneth A. García-Santillán2,
Israel D. Herrera-Granda1, Iván D. García-Santillán1,,,,
1 Faculty of Engineering in Applied Sciences, Universidad Técnica del Norte, 100105. Ibarra- Ecuador
2 Unidad Educativa Juan Pablo II. Ibarra-Ecuador.
Pages: 363–374
Abstract: This paper presents a conceptual architecture for content analysis about
the opinions expressed on Twitter about abortion. The architecture consisted of
ve stages: authentication, data collection, data cleaning & processing, modeling
& analysis, and presentation of results. In the data collection, a simple size of
tweets sent from Ecuador was taken in 2018. All tweets that were not related to
the topic were eliminated. In the modeling, it was separated into two categories
for and against abortion, where the Naive Bayes and decision tree classiers were
used. Finally, the results were presented in the form of statistical graphs, word
clouds and heat maps. During the development, the Google maps platform was also
used, where the scripts were made in Python using the Integrated Development
Environment (IDE) Spyder (Python 3.6), which is part of the Anaconda platform.
The results obtained showed, on average, a majority position against abortion in
Keywords: Data mining; content analysis; abortion; social networks; Twitter
1. Introduction
The advancement of technology and the exponential growth in the volume of structured,
unstructured, and semi-structured data is increasingly evident (Umaquinga C., Peluo O,
Alvarado P., & Cabrera V., 2016). This has led not only to far-reaching changes in the area
of technology, but also in the way all of humanity communicates (González-Lizárraga,
Becerra-Traver, & Yanez-Díaz, 2016), (Baviera, 2016). The cyber communication (Arab
& Díaz, 2015), the publication of information on social networks, including Twitter, has
become an input or material for study and analysis in various areas of science. Such as:
text mining, natural processing language, automatic learning, polarity dictionaries based
on the semantic eld, behavioral patterns and inection points in opinion currents,
among others (Baviera, 2016). This has allowed to the scientic, business, academic and
political communities to evaluate a current of opinion on a specic topic (Baviera, 2016)
(González-Lizárraga et al., 2016).
364 RISTI, N.º E22, 08/2019
A conceptual architecture for content analysis about abortion using the Twitter platform
With the data provided by social networks, electoral processes have been analyzed
predicting the results (Roldán-Robles, 2017), and reactions in the political spheres in
Venezuela (Niklander, 2017). Likewise, the extraction of knowledge in social networks
is used in other areas, such as the analysis of images associated with the tweet (Baecchi,
Claudio; Uricchio, Tiberio; Bertini, Marco; Bimbo, 2015). For example, the analysis
of reactions that can cause social issues ranging from the positive end of Valentine’s
Day to the negative end such as the war in Syria, presenting them in emotional graphs
(Perikos, Isidoros; Hatzilygeroudis, 2018); analysis of feelings in people’s opinions on a
specic issue (Inbal Yahav; Shehory, Onn; Schwartz, 2015); identifying opinion leaders
(Yang, Li; Tian, Yaping; Li, Jin; Ma, Jianfeng; Zhang, 2017); as well as aspects or steps
to classify frauds written in the form of alt-facts such as intentionally disseminating
false information on medical issues in Indonesia (Purnomo, Mauridhi Hery; Sumpeno,
Surya; Setiawan, Esther Irawati; Diana Purwitasaria, 2017).
One of the issues of global health interest is abortion or Voluntary Interruption of
Pregnancy (VTP). In Spain, the number of voluntary interruptions of pregnancy stood
at 108690 cases, representing a rate of 11.74 abortions per 1000 women aged 15 to 44
(Montserrat Femenía, 2016), while in Ecuador between 2004 and 2014 a total of 431614
abortions were reported (Ortíz, 2017).
This research aims to know the public opinion about Abortion in Ecuador, based on the
analysis of the contents of tweets sent from Ecuador using the Twitter platform. This
contributes to have a more objective idea about the positions and beliefs of Ecuadorian
citizens, contributing to decision making regarding public health policy. And, considering
that the Ecuadorian National Assembly is currently discussing the decriminalization of
abortion due to rape for all women in Ecuador.
The manuscript is organized as follows: In section 2, the phases applied in this study are
presented: (i) Authentication, (ii) data collection, (iii) data cleaning and processing, (iv)
modeling and analysis, and (v) presentation of results. Section 3 indicates the results
obtained, including the frequency of hashtags for and against abortion, as well as the
comparative study between the decision tree and Naive Bayes classiers. In section 4,
the discussion of results is carried out comparing with some existing works. Finally,
section 5 presents the main conclusions and future work.
2. Materials and methods
Under the general criteria of the process of knowledge discovery in databases (KDD)
(Fayyad, Piatetsky-Shapiro, & Smyth, 1996) (Timarán Pereira, Hernández Arteaga,
Caicedo Zambrano, Hidalgo Troya, & Alvarado Pérez, 2016), the concept of conceptual
architecture containing ve phases has been adapted in the present research, as shown
in Figure 1:
Phase 1 Authentication: A Twitter application with developer permissions
in was created using the Spyder Python 3.6 IDE
of the Anaconda 3- platform and the tweepy library was installed. Using
the OAuth authentication method, communication was made between tweepy
RISTI, N.º E22, 08/2019
RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação
and Twitter, being necessary to pass four tokens provided by Twitter, after
accepting the privacy policies.
Phase 2 Data Collection:
Collection Dates: August 16th to September 29th, 2018.
Criterion: The total sample size is limited under the criterion of identifying
tweets sent from Ecuador (containing 24 provinces) with specic hashtags and
user accounts specialized in the topic of Abortion.
Using the Streaming API of Twitter, a massive download of tweets ltered by
keywords or usernames was carried out. To limit the territory or country, the
location lter of the stream library obtained from (KlokanTech., 2017), is used,
as indicated by (Sogo, 2016). A JSON le of 1721287 KB in size was obtained,
containing 344149 records or tweets. Table 1 presents the algorithm used for
data collection.
Algorithm: Phase 2 Data collection
1. Authenticate the application on the Twitter platform
2. Enter your passwords.
3. Make the request to download tweets, including the ltering criteria of the sample.
4. Generate or open the pickup le.
5. Store the data in the specied le.
Table 1 – Phase 2, algorithm for data collection
Phase 3 Data Cleaning and Processing: The script
is executed, the operation of which is detailed in the algorithm represented in
Table 2:
Figure 1 – Conceptual architecture for content analysis on Twitter. Adapted from (Roldán-
Robles, 2017)
366 RISTI, N.º E22, 08/2019
A conceptual architecture for content analysis about abortion using the Twitter platform
Algorithm: Phase 3, Functioning of the script
1. Import the JSON le
2. For each le line do:
Extract the hashtag element from the entities variable of the tweet object
If the hashtag element is not in the hashtag dictionary, then:
Save the element in the hashtag dictionary and initialize its frequency to zero.
Increase frequency by one.
Table 2 – Script
From the le received, the hashtags that are not related to the topic of abortion such as:
greetings, proper names and mentions to sports clubs or social events are removed. The
resulting information was processed under two categories:
In favor of abortion (Abortion+)
Against abortion (Abortion-)
The processing was done manually, with proper investigation of the origin of each
hashtag and its use. Because of their complexity, since there are no specic rules for the
creation of hashtags, some of them do not only contain correct words within languages,
but also invented words, word mixtures, words united with dierent connectors, words
with numbers such as abbreviations of dates alluding to nearby events or important
reminders from the collectives for and against abortion.
In some phases of the tweet analysis, additional cleaning actions were carried out as
described in Table 3:
Phase Aspects to be discarded
Generation of heat maps Tweets that do not contain location data
Extraction of the most inuential users
The users who do not mention other users are considered, if
the user does not mention another account, the user is not
interested in exerting inuence on another
Extraction of hashtags The tweets that did not contain hashtags
Table 3 – Tweet analysis phase
Phase 4 Modeling and analysis: The model consists of two categories:
opinions for and against abortion, with the following particularities analyzed:
Hashtag frequency: The top ve of the most used hashtags is obtained from
the le obtained when using the script ltered in the
cleaning phase described in Table 2.
User mentions: The wordcloud library is used in a Python script applying
to the collected le to obtain the word cloud of inuential users, which is done
based on the screen_name attribute of the user object.
RISTI, N.º E22, 08/2019
RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação
Percentages for and against: The decision tree and Naive Bayes classiers
are used. For the training of these two algorithms, the same information was
used: a number of ten (10) hashtags in favor and the same amount against, which
represent 30% of the most relevant according to the frequency of appearance.
The next decision in this part of the process was made by exploring the sample
le texts located within the text attribute of the tweet object. In this part of the
training the most common texts for and against abortion should be put, at the
end it is dened to use seven (7) texts for and seven against.
Statistical graphs: They are generated from Python using pyplotlib matplotlib
library version 1.4.3.
Analysis of feelings:
Working with Decision Tree allows you to manage not only the hashtag but
also the content of the tweet and combine them. Numerical data is assigned
to the hashtags as well as to the sentences for and against, establishing the
conditions to obtain the results in the output matrix shown in Table 4:
Hashtag Phrase Trend brand
Against Against Against
Against In favor In favor
Against Neutral Against.
Against Without hashtag or neutral Against
In favor In favor In favor
In favor Against Against
In favor Neutral In favor
In favor Without hashtag or neutral In favor
Neutral Neutral Ignored or not taken into account
* Against: Against abortion * In favor: in favor of abortion
Table 4 – Hashtag analysis and trend marking
In the case of Naive Bayes, the script from (García Serrano, 2012),
was taken as a reference, TextBlob textblob.classiers was also installed, and
NaiveBayesClassier was imported. The data for the training were not numerical,
so it is necessary to give the classier the learning keys, using a matrix that
receives the data. Each of the data of the matrix has two parameters: hashtag or
phrase, and the second the polarity, being: (i) the position in favor is named pos
and (ii) the position against is named neg.
Location: Tweets that have the geo_enable attribute of the user object enabled
are taken as active, while tweets that have been disabled are labeled as missing.
368 RISTI, N.º E22, 08/2019
A conceptual architecture for content analysis about abortion using the Twitter platform
OpenStreetMap’s Nominatim service was used in the geopy library version
1.11.o, which oers the same functionalities as Google Maps APIS for free. In
the get_user_location class of the sample analysis script, the call to Nominatim
is made, obtaining the coordinates corresponding to the locations where the
tweets were generated. These locations go through the process of conversion to
coordinates to be included in the graph of the map within the HTML le.
Phase 5 of Conceptual Architecture: Table 5 describes the algorithm
applied to dene the polarity:
Algorithm: Phase 5: Conceptual Architecture
1. Import the JSON le
2. Extract the contents of the le
For each le line do:
Go through the classier
Extract the polarity from the tweet
3. Place the tweet in the corresponding group.
4. For each group calculate the percentage of tweets
Table 5 – Algorithm to determine the polarity
The results of the research are presented below.
3. Results
Among the main results are the following:
The top ve of the most used hashtags with reference to abortion can be found
in Table 6:
Hashtags Number of mentions For (Abortion+) Against (Abortion-)
1. #salvemoslas2vidas 12480 X
2. #abortolegalya 9467 X
3. #sialavida 5270 X
4. #28s 4102 X
5. #noalaborto 3306 X
Table 6 – Frequency of hashtags and number of top ve mentions.
The hashtag #28s was created in allusion to September 28, an emblematic day for the cause
that defends abortion. Since the V Latin American and Caribbean Feminist Encounter of
1990 held in Argentina (Campaña, 2015), and September 28, 1871 promulgated in Brazil
(“,” 2010), the law of freedom of the wombs was promulgated where the
children who were born of slaves were declared free. The Table 7 presents the results
of the positions for and against abortion using the classiers: Decision tree and Naive
Bayes. In addition, from the average between the two, it is evident that both dier in
RISTI, N.º E22, 08/2019
RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação
a few percentage points, however, the same overall result is obtained. That is to say,
the position against abortion surpasses the position in favor of abortion, by an average
of 14.7%.
Applied Algorithm For (Abortion+) Against (Abortion-)
1. Decision tree 40.7% 59.3%
2. Naive Bayes 44.6% 55.4%
Total average 42,65% 57,35%
Table 7 – Results of the analysis of feelings for each algorithm applied
The Table 8 below presents the comparison of the classiers to check the existence of the
inuence of the learning keys within the training stage, which is implicitly included in
the phase 4 (modeling and analysis).
The performance on the classiers for decision trees is 97.9% and for Naive Bayes it
is 79.1%. The decision tree was 18.8% more accurate than Naïve Bayes. Tables 7 and 8
complete the rst analysis of the conceptual architecture in phase 5, showing the results
obtained from Python with the use of the Naive Bayes algorithm and with the Decision
Trees for the positions in favor of abortion and against abortion.
Classier TP Rate FP Rate Accuracy Recall F1 score ROC Area
Decision tree 1 0,021 0.979 0.989 0.989 0,989
Naive Bayes 1 0,4 0,791 0,8 0,791 0,81
*True Positives (TP) *False Positives (FP)
*Receiver Operating Characteristic (ROC)
Table 8 – Results of the specic evaluation metrics for the classiers (weighted average)
Figure 2 – Timeline Tweets Frequency of Pro-Abortion and Anti-Abortion Tweets
370 RISTI, N.º E22, 08/2019
A conceptual architecture for content analysis about abortion using the Twitter platform
It is evident that against-abortion tweets are mostly larger than pro-abortion tweets
with the following exceptions: starting September 17, and their notable peaks are found
on August 24, September 3, and the highest peak was found on September 9. The pro
position begins to rise at the end of the sample. That is to say, as it approaches September
28 and presents a tendency to grow on the highest peak of the position against September
9 of 2018. It is believed that it was, a massive response in networks. Subsequently, On
August 8 of 2018 the legality of abortion is denied in the Senate of Argentina, this issue
had repercussions in Latin America including Ecuador. As well as the 28th of September
where abortion was supported; #28s alludes this atypical value to the commemorative
date, as shown in Figure 2.
The heat maps of the General Abortion in Ecuador, represent in red the classication
against abortion and in blue in favor of abortion, as represented in Figure 3. Note that
red points (against abortion) appear in smaller amounts than blue ones (pro-abortion),
in contrast to Table 7, because many of these tweets have not dened their location.
Figure 3 – Heat map of pro-abortion comments in blue, against-abortion in red.
The Figure 4 presents the word cloud over the accounts of users who posted the most,
users who received the most retweets, and/or who were mentioned the most from other
accounts. This result allows us to observe the inuence of these users within the data
taken in the sample.
The “Salvemoslas2vidas” account with a tendency against abortion ranks rst, followed
by the “abortolegalya” account in the second box and with a tendency in favor of abortion;
porlavida2014” ranks third, “sialavida” ranks fourth, these last two organizations
are against abortion. Finally, “28s”, which is a pro-abortion account is in the fth box,
closing the top ve most inuential users or accounts.
RISTI, N.º E22, 08/2019
RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação
Figure 4 – Cloud of Words from Most Inuential Users or Accounts
The gure 5 presents the use of hashtags in a cloud of commonly used words:
Figure 5 – Cloud of most-used hashtags words
4. Discussion
The results of this study are consistent with that presented by (Vila, Dayana; Cisneros,
Saúl; Granda, Pedro; Ortega, Cosme; Posso-Yepez, Miguel; García-Santillan, 2019)
where the decision tree (97.9%) surpassed the Naive Bayes classier in accuracy (79.1%),
contributing a reliable reference point.
Similarly, there is evidence of the appearance of additional information with importance
and consistency about this social impact: both in the clouds of words of hashtags more
used and more inuential users, as in those of causes that can be considered related to
trends. An example of this is the hashtag #niunamenos that promotes the eradication of
femicide, or any abuse of women, which is presented considerably in the study sample,
because by promoting that abortion is a right proper to each woman, this organization
is in favor of abortion. On the other hand, the hashtag #conmishijosnotemetas and
the account of the same name support a cause that largely rejects the teaching of
gender ideology and other related currents as oensive to people’s morals. It also
appears notoriously in the sample, this institution considers abortion as a murder, that
organization has a tendency against abortion.
372 RISTI, N.º E22, 08/2019
A conceptual architecture for content analysis about abortion using the Twitter platform
It is evident that if concrete studies on femicide and gender ideology based on (Niklander,
2017) are required, the use of hashtags #niunamenos and #conmishijosnotemetas,
respectively, should be considered; whereas, for other studies on the subject of abortion,
the specic hashtags suggested are: #salvemoslasdosvidas and #abortolegalya, because
that were the most frequently used in this study.
The main limitation in achieving a greater impact in this research is that for most of the
tweets it was not possible to establish their specic location, which limited the obtaining
of geographical heat maps.
5. Conclusions
Abortion as the chosen theme for the development of Conceptual Architecture
is the main contribution of this research, as it is one of the most commented on
in current times, in society in general, as well as by Twitter users in particular,
where what is expressed in Ecuador, supported by the 97.9% precision of the
decision tree (Table 8), represents 40.7% in favor of abortion and 59.3% against
abortion (Table 7). This classier surpassed that of Naive Bayes which yielded
79.1% accuracy.
Content analysis was obtained by evaluating hashtags with their polarization,
and in a general way, sentiment analysis was obtained by using classiers to
dene the polarity of the tweet text content. The results are very similar, this
is because the text of the tweets is usually very related to the hashtags used in
them, except in some cases where the hashtag is used to show opposition within
the text.
By obtaining messages and positions on abortion, summarizing 42.65% in
favor and 57.35% against on average (Table 7), it was possible to see how a
conceptual architecture allows an analysis of opinions about abortion using the
Twitter platform.
According to the information shown in the geographic heat maps (Figure 3), in
the mountain region there is a greater activity in Twitter, although it is important
to indicate that, in most of the tweets of the sample, the eld “location”, was not
active. Therefore, the locations presented in the heat maps do not reect the
total number of tweets in the sample.
As future work, it is recommended to carry out research on the same subject, in
a sample taken in 2019 or in subsequent years, in order to make a comparison
with this work and thus determine if the percentages have changed or if new
trends are set. In addition, new social networks such as Facebook and Instagram
should be considered.
Arab, L. E., & Díaz, G. A. (2015). Impacto de las redes sociales e internet en la adolescencia:
aspectos positivos y negativos. Revista Médica Clínica Las Condes, 26(1), 7–13.
RISTI, N.º E22, 08/2019
RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação
Baecchi, Claudio; Uricchio, Tiberio; Bertini, Marco; Bimbo, A. Del. (2015). A multimodal
feature learning approach for sentiment analysis of social network multimedia.
Multimed Tools Appl (2016), 19.
Baviera, T. (2016). Técnicas para el análisis del sentimiento en Twitter : Aprendizaje
Automático Supervisado y SentiStrength. Revista Dígitos 1.3, 1(3), 33–50.
Campaña. (2015). Campaña Nacional por el Derecho al Aborto Legal Seguro y Gratuito.
Retrieved June 29, 2019, from
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From Data Mining to Knowledge
Discovery in Databases. AI Magazine, 17(3), 37–37.
García Serrano, A. (2012). INTELIGENCIA ARTIFICIAL Fundamentos,
práctica y aplicaciones (2da Edició). Retrieved from
González-Lizárraga, M. G., Becerra-Traver, M. T., & Yanez-Díaz, M. B. (2016).
Cyberactivism: A new form of participation for University Students. Comunicar,
24(46), 47–54.
Inbal Yahav; Shehory, Onn; Schwartz, and D. (2015). Comments Mining With
TF-IDF: The Inherent Bias and Its Removal. 14.
KlokanTech. (2017). BoundingBox. Retrieved August 12, 2018, from https://
Montserrat Femenía, A. I. (2016). El aborto provocado en relación a la temática de la
feminidad desde una perspectiva psicoanalítica. 341.
Niklander, S. (2017). Content Analysis on Social Networks: Exploring the #Maduro
Hashtag. 5.
Ortíz, E. (2017). Redacción Médica. Retrieved June 29, 2019, from https://www.
Perikos, Isidoros; Hatzilygeroudis, I. (2018). A Framework for Analyzing Big Social
Data and Modelling Emotions in Social Media. 5.
Purnomo, Mauridhi Hery; Sumpeno, Surya; Setiawan, Esther Irawati; Diana
Purwitasaria, C. (2017). Biomedical Engineering Research in the Social Network
Analysis Era: 7.
374 RISTI, N.º E22, 08/2019
A conceptual architecture for content analysis about abortion using the Twitter platform
Sogo, J. G. (2016). Lingẅars. Retrieved February 10, 2019, from http://lingwars.github.
Timarán Pereira, S. R., Hernández Arteaga, I., Caicedo Zambrano, S. J., Hidalgo Troya,
A., & Alvarado Pérez, J. C. (2016). Descubrimiento de patrones de desempeño
académico con árboles de decisión en las competencias genéricas de la formación
Umaquinga C., A. C., Peluo O, D. H., Alvarado P., J. C., & Cabrera V., M. A. (2016).
Estudio descriptivo de técnicas aplicadas en herramientas Open Source y
comerciales para visualización de información de Big Data. In UTN (Ed.), Libro
Generando Ciencia: Memorias de las I Jornadas Internacionales de Investigación
Cientíca (pp. 121–135). UTN. (2010). Retrieved June 29, 2019, from
Vila, Dayana; Cisneros, Saúl; Granda, Pedro; Ortega, Cosme; Posso-Yepez, Miguel;
García-Santillan, I. (2019). Detection of Desertion Patterns in University. Springer
Nature Switzerland AG 2019, 10.
Yang, Li; Tian, Yaping; Li, Jin; Ma, Jianfeng; Zhang, J. (2017). Identifying opinion
leaders in social networks with topic limitationNo Title. Cluster Comput, 11. https://
... So, an analysis based on information from Twitter with a variety of computational linguistic methods allows users to find out the feelings of people about to have a child, how they plan to have children [15]. The sentiment analysis on demographic issues includes abortion [16][17][18][19][20][21], in particular, the legalization of abortion [22], various aspects of parenthood [14,15], health issues [5,21,23,24], drivers of demographic processes (e.g., natural disasters [25]), the demographic structure and trend of telemedicine [26], and the COVID-19 pandemic and other infections [27][28][29][30][31][32]. Topics closely related to demographics were also studied by sentiment analysis: these include sexual harassment or violence [33][34][35], attitudes towards genetic testing [36], processes of racial segregation [37]. ...
Full-text available
We propose to consider our experience in data use of Russian-language texts of social networks, electronic media, and search engines in demographic analysis. Experiments on the automatic classification of opinions have been carried out. Conversational RuBERT has been used in most cases. The following main scientific results on text data will be described: (1) short-term forecasts of fertility dynamics according to Google trend data, (2) automatic measurement of the demographic temperature of various demographic groups (pronatalists and antinatalists) in social networks, (3) sentiment analysis of reproductive behavior, sentiment analysis of vital behavior in pandemic, sentiment analysis of attitudes toward demographic and epidemiological policy according to social network data, (4) analysis of the arguments of social network users, and (5) analysis of media publications on demographic policy. A description of the created open databases of all these studies will be provided. All of the studies described will contain reflections on the advantages and difficulties of using texts as data in demographic analysis.
... В настоящее время все большей популярностью пользуются исследования, в которых используются текстовые данные, в частности, комментарии социальных сетей. В частности, из демографических работ отметим исследования, направленные на изучение отношения людей к абортам (Hasan, Ng, 2013;Sharma et al., 2017;LaRoche et al., 2021;Roldán-Robles et al., 2019;Ntontis, Hopkins, 2018), к вакцинации в целом и во время COVID-19 (Vychegzhanin, Kotelnikov, 2019;Miao et al., 2020;Glandt et al., 2021;Liu, Liu, 2021;Thorpe Huerta et al., 2021;Abosedra et al., 2021). На основе анализа эмоционального фона сообщений были выявлены особенности отношения российских пользователей Twitter к своим и чужим детям (Журавлев, Китова, 2020; Китова, Китов, 2020); к теме выкидышей, абортов и преждевременных родов (Cesare et al., 2020;Graells-Garrido et al., 2019). ...
The main purpose of the study is to identify whether there is a connection between different demographic values, as well as socio-demographic characteristics of social network VKontakte users. Based on a large data set of user comments of two types - parental and childfree groups, - the paper identifies the links between different types of demographic values - positive or negative attitudes towards parenthood, family creation, having children, attitude towardshealthy lifestyle, as well as between values and socio-demographic characteristics such as gender, age, marital status. Drawing on a logit analysis, the authors construct socio-demographic profiles of so-called “pronatalists” (parental groups) and “anti-natalists” (childfree groups)in Russia and prove the correlation between different types of values. For example, positiveattitudes towards parenthood, childbearing, and family creation (reproductive and family values) are associated with negative attitudes towards smoking and alcohol (positive vital values). The marital status is also associated with these positive values (which indirectly indicates a connection with matrimonial values). A connection was found both between different types of demographic values of the social network users of selected demographic groups, and a connection between the socio-demographic characteristics of users and their values. For example, women and older people (in some model specifications) are more proneto family values. Additionally, the study confirms the quality of the choice of demographic groups in social network by names and declared values- a connection is traced between belonging to pronatalist or antinatalist groups and value attitudes about life priorities (familyor leisure and self-development).
... Среди извлечения авторских мнений по демографическим вопросам чаще всего встречается обсуждение проблемы абортов [17][18][19][20][21][22], разных аспектов родительства [23], проблем здравоохранения [24], влияния различных факторов на демографические процессы, например, природных катастроф [6] и пандемии COVID-19. Помимо авторской позиции в исследовании [17] размечаются доводы в пользу той или иной позиции. ...
Full-text available
В данной работе мы представляем специализированный датасет, с разметкой мнений пользователей о репродуктивном поведении. Мы анализируем особенности распределение оценок «за» и «против» по конкретным аспектам репродуктивного поведения. Созданный датасет используется для решения двух задач классификации: классификации сообщений по релевантности изучаемых тем и позиции автора по той или иной теме. Для классификации сообщений используются классические методы машинного обучения, а также нейросетевая модель BERT. Лучшие результаты классификации в обеих задачах достигаются на основе вариантов модели BERT с использованием в классификации пар предложений — варианты NLI (natural language inference — вывод по тексту) и QA (question-answering — вопросно/̄ответный подход). Кроме того, созданный датасет позволяет сделать содержательные выводы по вопросам отношения пользователей сети ВКонтакте к вопросам репродуктивного поведения. Выявлено, что феномен сознательной бездетности активно представлен в сети, а многодетность остается слабо распространенной моделью поведения. В рамках пронаталистской политики важно формировать позитивное общественное мнение о родительстве, смягчать дефицит времени у родителей.
... [6]. Such techniques automatically extract and analyze data characteristics from different application contexts [7][8][9][10] compared to Expert Systems [11] based on extracted knowledge from human experts resulting in a more difficult and expensive extraction process. ...
The integrated security SIS ECU 911 will oversee monitoring emergency situations, video surveillance, and alarm monitoring reported through 911 services throughout the Ecuadorian territory. This research addresses space and time pattern detection at SIS ECU 911 (Imbabura-Ecuador) using mining data techniques to support decision-making processes in addition to operating costs. In 2018–2019, 47.4% of placed calls were ill-intentioned generating significant unnecessary operating costs. The study was conducted in four phases (i) caller location and call data gathering (ii) Creation of a Geo-database and hotspots (iii) Making of data clocks (iv) Prediction model applying a Geo-graphical Weighted Regression (GWR). Hotspots determined that the largest number of ill-intentioned came from Ibarra and Otavalo cities. Data clocks showed a temporary pattern in the months of July and August as they are the most critical months. The GWR model identified that the rate for this type of phone call partially corresponds to a spatial predominant pattern that originated in the rural areas of Ibarra and Pimampiro. Therefore, all ill-intentioned calls respond to certain temporary spatial patterns that help us understand this problem aiming to pose mitigating alternatives.
Full-text available
Biomedical engineering research trend can be healthcare models with unobtrusive smart systems for monitoring vital signs and physical activity. Detecting infant facial cry because of inability to communicate pain, recognizing facial emotion to understand dysfunction mechanisms through micro expression or transform captured human expression with motion device into three-dimensional objects are some of the applied systems. Nowadays, collaborated with biomedical research, mining and analyzing social network can improve public and private health care sectors as well such as research health news shared on social media about pharmaceutical drugs, pandemics, or viral outbreaks. Due to the vast amount of shared news, there is an urgency to select and filter information to prevent the spread of hoax or fake news. We explored in depth some steps to classify hoaxes written as news articles. This discussion also encourages on how technologies of social network analysis could be used to make new kinds improvement in health care sectors. Then close with a description of limitless future possibilities of biomedical engineering research in social media.
Full-text available
El análisis del sentimiento en los mensajes publicados en Twitter ofrece posibilidades de gran interés para evaluar las corrientes de opinión difundidas a través de este medio. Los enormes volúmenes de textos requieren de herramientas capaces de procesar automáticamente estos mensajes sin perder fiabilidad. Este artículo describe dos tipos de técnicas para abordar este problema. La primera estrategia se basa en los procesos de Aprendizaje Automático Supervisado. Su aplicación requiere integrar algunas herramientas del Procesamiento de Lenguajes Naturales y tomar como punto de partida un corpus clasificado. El segundo enfoque está basado en diccionarios de polaridad. En esta línea se sitúa la herramienta de SentiStrength, la cual se está aplicando cada vez más a los estudios de Twitter en inglés. El artículo evalúa los estudios más avanzados que utilizan cada uno de estos enfoques para el análisis de los tweets en castellano. Por último, se señalan las ventajas y limitaciones de cada uno de estos enfoques para su aplicación a la investigación en comunicación política. Si bien el aprendizaje automático supervisado permite tener en cuenta el contexto, el investigador requiere competencias de analista de datos con el fin de afinar mejor el proceso.En cambio, SentiStrength está más orientado al contenido semántico de los términos del mensaje, y se requiere más bien una competencia en lingüística por parte del investigador. La principal conclusión es que ambos métodos automáticos de análisis no pueden prescindir de una exigente codificación manual si se desea utilizarlos confiabilidad en la investigación.
Full-text available
The social networks have been an important platform for people to share and exchange information in their daily life. There are some most critical users called opinion leaders who are always used to achieve the maximization of information transmission and suppress the diffusion of rumours in a short time. Many methods have been proposed by researches for these users. For identifying more accurately and efficiently, we make a further analysis for the real information spread and find that the information is commonly topic sensitive and opinion leaders are always topic limited. In the certain topic area, the users with higher authority in the topic area always play a more crucial role for the information spread. What’s more, in order to quantify the authority in certain topic area, we apply a series of rigorous definitions and topic model. Finally, comparing with the other widely used methods, the result shows the effective performance of our method.
Full-text available
The purpose of this article is to show the results derived from a sample of students who were enrolled in different bachelor degree programs offered by the University of Sonora in Mexico. There was a double objective for this study. First, to identify cyber activist students through the answers gathered through a questionnaire taken electronically using as inclusion criteria the presence of high and medium levels of participation and commitment in different actions undertaken in four topic areas (environment, academic, social and citizen issues, and human rights). As a second objective, and after selecting three unique cases of cyber activist students, inflexion points were determined in the activities performed by these youngsters in digital social networks. Using personal narrative as a methodological strategy, the students described how they interact with others through different digital networks. Among the first categories identified in the in-depth interviews are: interaction history (use, access and availability of technology at a young age), and active participation about topics of interest in social networks (organization and the perceptions of achievements made). As main findings, there are the availability of these resources from a young age, personal motivation in participating in diverse topics, enjoyment of expressing one's opinion freely, electronic participation as a way to commit to a cause, and not joining an organization while participating.
Full-text available
In this paper we investigate the use of a multimodal feature learning approach, using neural network based models such as Skip-gram and Denoising Autoencoders, to address sentiment analysis of micro-blogging content, such as Twitter short messages, that are composed by a short text and, possibly, an image. The approach used in this work is motivated by the recent advances in: i) training language models based on neural networks that have proved to be extremely efficient when dealing with web-scale text corpora, and have shown very good performances when dealing with syntactic and semantic word similarities; ii) unsupervised learning, with neural networks, of robust visual features, that are recoverable from partial observations that may be due to occlusions or noisy and heavily modified images. We propose a novel architecture that incorporates these neural networks, testing it on several standard Twitter datasets, and showing that the approach is efficient and obtains good classification results.
Full-text available
Este artículo presenta una revisión teórica del impacto, tanto positivo como negativo, de las redes sociales en los adoles- centes y de la relación directa entre esto y el uso o abuso de las nuevas tecnologías. A su vez, se establece el vínculo entre las formas de utilizarlas y las características psicológi- cas individuales, los desarrollos previos de personalidad y el control parental. Se toma como punto de partida el desa- rrollo adolescente asociado a la construcción de la identidad juvenil en el contexto de un nuevo paradigma comunica- cional (cibercomunicación), donde el límite entre lo públi- co y lo privado se torna cada vez más difuso. Se expone la terminología propia de las redes sociales y se pone énfasis en las estrategias de supervisión y control adulto. Se deta- llan los aspectos positivos que las nuevas tecnologías ofre- cen (diversas oportunidades de aprendizaje, entretención, socialización, desarrollo de habilidades, creatividad y me- jora de la motivación al aprendizaje especialmente en ado- lescentes, entre otros) y los aspectos negativos asociados (distanciamiento afectivo, pérdida de límites en la comuni- cación y pérdida de la capacidad de escucha, entre otros). Este artículo destaca la necesidad de entregar y estimular modelos de comunicación social reales y una educación en el uso de las nuevas tecnologías. El objetivo es actualizar y orientar a profesionales de la salud sobre los aspectos posi- tivos y/o negativos de las redes sociales en los adolescentes.
En este libro se encuentran condensados los fundamentos de la Inteligencia Artificial desde un punto de vista práctico y accesible, presentando la teoría de cada una de las técnicas y algoritmos de una forma comprensible y simplificada para que todo aquel con interés en iniciarse desde cero pueda adentrarse en esta ciencia.Además de una introducción a sus principios teóricos, las técnicas descritas van acompañadas de ejemplos prácticos programados en lenguaje Python.
Text mining have gained great momentum in recent years, with user-generated content becoming widely available. One key use is comment mining, with much attention being given to sentiment analysis and opinion mining. An essential step in the process of comment mining is text pre-processing; a step in which each linguistic term is assigned with a weight that commonly increases with its appearance in the studied text, yet is offset by the frequency of the term in the domain of interest. A common practice is to use the well-known tf-idf formula to compute these weights.