Gaoyan Ou's research while affiliated with Peking University and other places

Publications (10)

Chapter
Microblog has become a major platform for information about real-world events. Automatically discovering real-world events from microblog has attracted the attention of many researchers. However, most of existing work ignore the importance of emotion information for event detection. We argue that people’s emotional reactions immediately reflect the...
Conference Paper
Full-text available
With the popularity of various social media platforms, the number of people who tend to publish their opinions on the internet grows dramatically. Discovering the public sentiment towards new topics and events becomes an important and challenging task in sentiment analysis. Current methods have not considered the effects caused by user interactions...
Conference Paper
Full-text available
Sarcasm is a pervasive linguistic phenomenon in online documents that express subjective and deeply-felt opinions. Detection of sarcasm is of great importance and beneficial to many NLP applications, such as sentiment analysis, opinion mining and advertising. Current studies consider automatic sarcasm detection as a simple text classification probl...
Conference Paper
In modern cities, more and more vehicles, such as taxis, have been equipped with GPS devices for localization and navigation. The GPS-equipped taxis can be viewed as pervasive sensors and the large scale traces allow us to reveal many hidden “facts” about the city dynamics. In this paper, we aim to estimate the wait time and probability of taking a...
Conference Paper
Microblog has become a popular platform for people to share their ideas, information and opinions. In addition to textual content data, social relations and user behaviors in microblog provide us additional link information, which can be used to improve the performance of sentiment analysis. However, traditional sentiment analysis approaches either...
Conference Paper
Logistic regression is a classical classification method, it has been used widely in many applications which have binary dependent variable. However, when the data sets are imbalanced, the probability of rare event is underestimated in the use of traditional logistic regression. With data explosion in recent years, some researchers propose large sc...
Conference Paper
Full-text available
With the popularity of various social media platforms, the number of online reviews towards different products and services grows dramatically. Discovering sentiments from online reviews becomes an important and challenging task in sentiment analysis. Current methods either extract aspects without separating aspects and sentiments, or extract aspec...
Conference Paper
Extracting sentiments from unstructured text has emerged as an important problem in many disciplines, for example, to mine on-line opinions from the Internet. Many algorithms have been applied to solve this problem. Most of them fail to handle the large scale web data. In this paper, we present a parallel algorithm for BN(Bayesian Networks) structu...

Citations

... Community detection plays a vital role in analyzing the effect of some real-world happenings. Ou et al. (2017) examined the emotion of an event that occurred in the real world-the proposed algorithm finds a community, detects the community emotion, aggregates the community emotion, and detects any community emotion burst. In other cases, most people are comfortable with the brand they use for a given product. ...
... Following key methods are carried out: (i) topic tracking-active learning method (7) and topical phrases learning method (8) were employed to detect the posts related to TCM topics; (ii) topic mining-we utilize LDA topic mining method (9,10) to explore the topic categories in posts; (iii) sentiment analysis-opinion aware knowledge graph (11) and sentiment oriented maximum entropy classification method (12) are used to calculate the posters' opinions on TCM; (iv) factor analysis-Lasso (Least Absolute Shrinkage and Selection Operator) Regression (13) and Bayesian network (14) are employed to calculate the dependencies between the growing visits of TCM and the major factors. The above big data platform and machine learning methods can support us to do such objective quantitative analysis: (i) How intense is the criticism of TCM, and the impacts on the actual TCM visiting population? ...
... Hwang et al. divided the total waiting time at a fixed location by the number of taxi rides and made this value as the waiting time for passengers at this location [15]. Qiu et al. utilized the road, climate, and Non-homogeneous Poisson process (NPPCRW) to forecast waiting time [16]. Jing et al. merged the factors of speed, climate, weekdays, and weekends to construct the T wait and T delay formulas and regarded their sum as the final passenger waiting time [17]. ...
... In advanced logistic regression models, like the one implemented here, biased parameter estimates due to the rare events problem, (small-sample bias), are corrected. For more on logistic modeling of rare events see Heinze and Schemper (2002), Heinze and Puhr (2010), Gao and Shen (2007), Lee et al. (2006), Maalouf and Trafalis (2011), Qiu et al. (2013). ...
... SA techniques based on unsupervised machine learning involve learning algorithms that develop without labeled training samples. In this method, the similarities of the texts are first measured according to keyword lists of categories, and then the texts are clustered into multiple groups according to the similarities of the texts, as exemplified in the work of [30], [66], [21], and [47]. SA techniques based on semi-supervised machine learning utilize learning algorithms that are partially trained on labeled data, as demonstrated by [38] and related works. ...
... Second, these features are processed using some typical machine learning approaches to determine whether or not the text has a sarcastic intent. Liu et al. [25] adopted lexical sequence features and semantic imbalance features to detect the sarcastic properties of sentences. Joshi et al. [11] classified the incongruity theory of sarcasm in linguistics into implicit and explicit incongruity features. ...
... However, it did not handle more types of features such as adjectives and verbs, and did not consider implicit features. Besides, in 2013, authors [25] proposed two novel APSM and ME-APSM models to extract aspects and aspect-specific polarity-aware sentiments from online reviews. However, the results showed that the model still needed improvements in terms of the aspect-level sentiment classification. ...
... It can be replaced by its alternatives like ELMo [30] or XLNet [42]. 8 On average, each variable node contains only 2-5 words in different verticals. 5% 4% 3% 2% 1% 0% Figure 9: Heatmap denoting the performance lifts per F1 score from the out-of-domain knowledge. ...