It is well-known that the lack of quality data is a major problem for information retrieval engines. Web articles are flooded with non-relevant data such as advertising and related links. Moreover, some of these ads are loaded in a randomized way every time you hit a page, so the HTML document will be different and hashing of the content will be not possible. Therefore, we need to filter the non-relevant text of documents. The automatic extraction of relevant text in on-line text (news articles, etc.), is not a trivial task. There are many algorithms for this purpose described in the literature. One of the most popular ones is Boilerpipe and its performance is one of the best. In this paper, we present a method, which improves the precision of the Boilerpipe algorithm using the HTML tree for selection of the relevant content. Our filter greatly increases precision (at least 15%), at the cost of some recall, resulting in an overall F1-measure improvement (around 5%). We make the experiments for the news articles using our own corpus of 2,400 news in Spanish and 1,000 in English.
Evidence from past research and insights from an exploratory investigation are combined in a conceptual model that defines and relates price, perceived quality, and perceived value. Propositions about the concepts and their relationships are presented, then supported with evidence from the literature. Discussion centers on directions for research and implications for managing price, quality, and value.
Tested the view that the number of arguments in a message could affect agreement with a communication by serving as a simple acceptance cue when personal involvement was low but could affect agreement by enhancing issue-relevant thinking when personal involvement was high. In addition to manipulating the personal relevance of the communication topic, both the number and the quality of the arguments in the message were varied. In a pilot study with 46 undergraduates, when the issue was of low relevance, Ss showed more agreement in response to a message containing 6 arguments (3 strong and 3 weak) than to messages containing either 3 strong or 3 weak arguments. Under high involvement, however, the 6-argument message did not increase agreement over the message containing only 3 strong arguments. In the full experiment, 168 undergraduates received either 3 or 9 arguments that were either all cogent or all specious under conditions of either high or low involvement. The manipulation of argument number had a greater impact under low than under high involvement, but the manipulation of argument quality had a greater impact under high than low involvement. Results indicate that increasing the number of arguments in a message could affect persuasion whether or not the actual content of the arguments was scrutinized. (53 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
To understand the current state and dynamics of the development of the Internet information space, fast tools for extracting data for mass media sites that have a large degree of coverage are needed. However, by no means all sites provide data syndication in the RSS format, and the development of specialized tools for extracting data from each Web site is a costly procedure. In this paper, methods for automatic extraction of news texts from arbitrary mass media sites are proposed. Due to classification of Web page types and the subsequent grouping of their URLs, the quality of extracting news texts is improved. A strategy for traversing a site and detecting the pages containing hyperlinks to news pages is proposed. This strategy decreases the number of requests and reduces the site load.
This study derives evaluation terms by analyzing the semantic relationship between design elements and sentiment terms in regards to fashion design. As for research methods, a total of 38,225 texts from Daum and Naver Blogs from November 2015 to October 2016 were collected to analyze the parts, frequency, centrality and semantic networks of the terms. As a result, design elements were derived in the form of a noun while fashion image and user's emotional responses were derived in the form of adjectives. The study selected 15 noun terms and 52 adjective terms as evaluation terms for men's striped shirts. The results of semantic network analysis also showed that the main contents of the users of men's striped shirts were derived as characteristics of expression, daily wear, formation, and function. In addition, design elements such as pattern, color, coordination, style, and fit were classified with evaluation results such as wide, bright, trendy, casual, and slim.
This study attempted to explore and examine a new user experience (UX) research method for IoT products which are becoming widely used but lack practical user research. While user experience research has been traditionally opted for survey or observation methods, this paper utilized big data analysis method for user online reviews on an intelligent agent IoT product, Amazon`s Echo. The results of topic modelling analysis extracted user experience elements such as features, conversational interaction, and updates. In addition, regression analysis showed that the topic of updates was the most influential determinant of user satisfaction. The main implication of this study is the new introduction of big data analysis method into the user experience research for the intelligent agent IoT products.
We investigate the potential use of textual information from user-generated microblogs to predict the stock market. Utilizing the latent space model proposed by Wong et al. (2014), we correlate the movements of both stock prices and social media content. This study differs from models in prior studies in two significant ways: (1) it leverages market information contained in high-volume social media data rather than news articles and (2) it does not evaluate sentiment. We test this model on data spanning from 2011 to 2015 on a majority of stocks listed in the S&P 500 index and find that our model outperforms a baseline regression. We conclude by providing a trading strategy that produces an attractive annual return and Sharpe ratio.
In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish downloading pages in a reasonable amount of time. We first propose multiple architectures for a parallel crawler and identify fundamental issues related to parallel crawling. Based on this understanding, we then propose metrics to evaluate a parallel crawler, and compare the proposed architectures using 40 million pages collected from the Web. Our results clarify the relative merits of each architecture and provide a good guideline on when to adopt which architecture.
Managing brand equity
P H Farquhar
Big Data Use Cases of the Sector
D W Kim
Fundamental questions about big data utilization
J C Kim
AWS [Building a Big Data Analysis Class Environment with Amazon Cloud AWS
Estimation of Explicit Functions using Deep Learning (Unpublished Master's thesis)
Google Cloud vs AWS vs Azure. eWEEK
A Study on the Influence of Contents of Internet News Comments on the Acceptance of New Car Products
M Y Yi
The Propose a Legislation Bill to Apply Autonomous Carsand the Study for Status of Legal and Political Issues
S J Kang
M J Kim
Trend of Big data Analysis Platform Service
B Y Park
S S Kim
J H Kang
M S Jun
Reading the World: The 4th Industrial Revolution of Automobiles and Jeju Green Big Bank
S H Kim
A Methodology for privacy incident inspecting System based on Web Crawler (Unpublished Master's thesis)
S T Kim
The Effect of Wool Coat Image by Country of Origin on Chinese Consumer's Perceived Quality, Product Attitude and Purchase Intention : Focused on Moderating effect of the Brand Familiarity (Master's thesis)
From a smartphone to an Apple car
Research on the Application Methods of Big Data within the Cultural Industry. Academic association of global cultural contents
H K Yoon
A Preliminary Study on Regulation of Emerging Technologies
H S Yoon
Study on female Consumer Perceived Quality of Automobile Design through Usability Evaluation (Master's thesis)
E H Kim
Analysis of Interest in Automobiles Using Naver BigData
G S Lee
J P Woo
Relationship analysis between Customer Satisfaction Index and Market Share in Automotive and IT industries
C S Park
S J Shin
A Study on the Methodology of Data-Driven UX Concept Development
J Y Park
B G Seo
K W Kim
I J Yoo
Proposal of Brand Evaluation Map through Big Data : Focus on The Hyundai Motor's Product Evaluation
D M Yoon
Implementation of AWS-based deep learning platform using streaming server and performance comparison experiment