Yen-Cheng Juan’s research while affiliated with National Taiwan University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Mining Browsing Behaviors for Objectionable Content Filtering
  • Article

May 2015

·

67 Reads

·

11 Citations

Journal of the Association for Information Science and Technology

·

Yen‐Cheng Juan

·

Wei‐Lin Tseng

·

[...]

·

This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.


Users' behavioral prediction for phishing detection

April 2014

·

85 Reads

·

13 Citations

This study explores the users' web browsing behaviors that confront phishing situations for context-aware phishing detection. We extract discriminative features of each clicked URL, i.e., domain name, bag-of-words, generic Top-Level Domains, IP address, and port number, to develop a linear chain CRF model for users' behavioral prediction. Large-scale experiments show that our method achieves promising performance for predicting the phishing threats of users' next accesses. Error analysis indicates that our model results in a favorably low false positive rate. In practice, our solution is complementary to the existing anti-phishing techniques for cost-effectively blocking phishing threats from users' behavioral perspectives.


Objectionable Content Filtering by Click-Through Data

October 2013

·

37 Reads

·

4 Citations

This paper explores users’ browsing intents to predict the category of a user’s next access during web surfing, and applies the results to objectionable content filtering. A user’s access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., hostname, bag-of-words, gTLD, IP, and port, to develop a linear chain CRF model for context-aware category prediction. Large-scale experiments show that our method achieves a promising accuracy of 0.9396 for objectionable access identification without requesting their corresponding page content. Error analysis indicates that our proposed model results in a low false positive rate of 0.0571. In real-life filtering simulations, our proposed model accomplishes macro-averaging blocking rate 0.9271, while maintaining a favorably low macro-averaging over-blocking rate 0.0575 for collaboratively filtering objectionable content with time change on the dynamic web.


Context-Aware Web Security Threat Prevention

October 2012

·

40 Reads

·

6 Citations

This paper studies the feasibility of an early warning system that prevents users from the dangerous situations they may fall into during web surfing. Our approach adopts behavioral Hidden Markov Models to explore collective intelligence embedded in users' browsing behaviors for context-aware category prediction, and applies the results to web security threat prevention. Large-scale experiments show that our proposed method performs accuracy 0.463 for predicting the fine-grained categories of users' next accesses. In real-life filtering simulations, our method can achieve macro-averaging blocking rate 0.4293 to find web security threats that cannot be detected by the existing security protection solutions at the early stage, while accomplishes a low macro-averaging over-blocking rate 0.0005 with the passage of time. In addition, behavioral HMM is able to alert users for avoiding security threats by 8.4 hours earlier than the current URL filtering engine does. Our simulations show that the shortening of this lag time is critical to avoid severe diffusions of security threats.

Citations (4)


... In direct relationship to this paper, earlier research efforts on online viewing behavior and TED Talks are not poor, as summarized in Table 1. Segev and Ahituv (2010) Dependence of online information viewing attitudes on culture and the organization of societies; differentiation of socio-political and entertainment concerns Fiksdal et al. (2014) Information saturation and fatigue as main reasons for stopping information retrieval Wook and Salim (2014) Specification requirements for visual aspects of information provision as the use of space, organization of information, and function and use of color Lee et al. (2015). ...

Reference:

Drivers of Popularity of Online Information: Content, Context and Psychological Processes
Mining Browsing Behaviors for Objectionable Content Filtering
  • Citing Article
  • May 2015

Journal of the Association for Information Science and Technology

... Preventing the display of inappropriate results while also avoiding over-filtering results that may appear as objectionable but are not, e.g., an article on breast cancer [5], requires a solution that goes beyond safe search. To account for the large variety of objectionable material present online, and inspired by prior strategies to detect objectionable resources [14,47], we treat as objectionable for children in the classroom resources that relate to any category in ObjCat: Abortion, Drugs, Hate Speech, Illegal Affairs, Gambling, Pornography, and Violence. Note that the Drugs category refers to resources over-arching drugs, but also alcohol, tobacco, and marijuana. ...

Objectionable Content Filtering by Click-Through Data
  • Citing Conference Paper
  • October 2013

... The U.S. National Counterterrorism Center counts 2750 terrorist incidents on energy infrastructure occurring between 2004 and 2011 [31]. According to [32], the sector is a prime target for cyber attacks due to its reliance on complex networks and digital technologies. Moreover, the interconnectivity of energy systems has a multiplier effect on the cybersecurity risk. ...

Context-Aware Web Security Threat Prevention
  • Citing Conference Paper
  • October 2012