[show abstract][hide abstract] ABSTRACT: The need to identify an approach that recommends items that match users' preferences within social networks has grown in tandem with the increasing number of items appearing within these networks. This research presents a novel technique for item recommendation within social networks that matches user and group interests over time. Users often tag items in social networks with words and phrases that reflect their preferred "vocabulary." As such, these tags provide succinct descriptions of the resource; implicitly reveal user preferences, and, as the tag vocabulary of users tends to change over time, reflect the dynamics of user preferences. Based on evaluation of user and group interests over time, we present a recommendation system employing a modified latent Dirichlet allocation (LDA) model in which users and tags associated with an item are represented and clustered by topics, and the topic-based representation is combined with the item's timestamp to show time-based topic distribution. By representing users via topics, the model can cluster users to reveal the group interests. Based on this model, we developed a recommendation system that reflects user as well as group interests in a dynamic manner that accounts for time, allowing it to perform in a manner superior to that of static recommendation systems in terms of precision rate. Index Terms—Web mining, Tagging, Recommender systems, Information analysis, Social network services.
Eighth International Conference on Information Technology: New Generations, ITNG 2011, Las Vegas, Nevada, USA, 11-13 April 2011; 01/2011
[show abstract][hide abstract] ABSTRACT: The QuakeSim Project improves understanding of earthquake processes by integrating model applications and various heterogeneous data sources within a web services environment. The project focuses on the earthquake cycle and related crustal deformation. Spaceborne GPS and Interferometric Synthetic Aperture data provide information on near-term crustal deformation, while paleoseismic geologic data provide longer-term information on earthquake fault processes. These data sources are integrated into QuakeSim's QuakeTables database and are accessible by users or various model applications. An increasing amount of UAVSAR data is being added to the QuakeTables database through a map browsable interface. Model applications can retrieve data from QuakeTables or remotely served GPS velocity data services or users can manually input parameters into the models. Pattern analysis of GPS and seismicity data has proved useful for mid-term forecasting of earthquakes and for detecting subtle changes in crustal deformation. The GPS time series analysis has also proved useful for detecting changes in processing of the data. Development of the QuakeSim computational infrastructure has benefitted greatly from having the user in the development loop. Improved visualization tools enable more efficient data exploration and understanding. Tools must provide flexibility to science users for exploring data in new ways, but also must facilitate standard, intuitive, and routine uses for end users such as emergency responders.1
[show abstract][hide abstract] ABSTRACT: The 13 papers in this special issue focus on knowledge and data engineering for e-learning. Some of these papers were recommended submissions from the best ranked papers presented at the Sixth International Conference on Web-Based Learning (ICWL '07), held in August 2007 in Edinburgh, United Kingdom.
IEEE Transactions on Knowledge and Data Engineering 07/2009; · 1.89 Impact Factor
[show abstract][hide abstract] ABSTRACT: E-mail is one of the most common communication methods among people on the Internet. However, the increase of e-mail misuse/abuse has resulted in an increasing volume of spam e-mail over recent years. As spammers always try to find a way to evade existing spam filters, new filters need to be developed to catch spam. A statistical learning filter is at the core of many commercial anti-spam filters. It can either be trained globally for all users, or personally for each user. Generally, globally-trained filters outperform personally-trained filters for both small and large collections of users under a real environment. However, globally-trained filters sometimes ignore personal data. Globally- trained filters cannot retain personal preferences and contexts as to whether a feature should be treated as an indicator of legitimate e-mail or spam. Gray e-mail is a message that could reasonably be considered either legitimate or spam. In this paper, a personalized ontology spam filter was implemented to make decisions for gray e-mail. In the future, by considering both global and personal ontology-based filters, we can show a significant improvement in overall performance.
Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), Honolulu, Hawaii, USA, March 9-12, 2009; 01/2009
[show abstract][hide abstract] ABSTRACT: The increase of image spam, a kind of spam in which the text message is embedded into an attached image to defeat spam filtering techniques, is becoming an increasingly major problem.. For nearly a decade, content based filtering using text classification or machine learning has been a major trend of anti- spam filtering systems. A Key technique being used by spammers is to embed text into image(s) in spam email. In (4), we proposed two levels of ontology spam filters: a first level global ontology filter and a second level user-customized ontology filter. However, that previous system handles only text e-mail and the percentage of attached images is increasing sharply. The contribution of the paper is that we add an image e-mail handling capability to the previous anti-spam filtering system, enhancing the effectiveness of spam filtering.
Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), Honolulu, Hawaii, USA, March 9-12, 2009; 01/2009
[show abstract][hide abstract] ABSTRACT: Many data representation structures, such as web site categories and domain ontologies, have been established for semantic-based information search and retrieval on the web. These structures consist of concepts and their interrelationships. Approaches to determine the similarity in semantics among concepts in data representation structures have been developed in order to facilitate information retrieval and recommendation processes. Some approaches are only suitable for similarity computations in pure tree structures. Other approaches designed for the Directed Acyclic Graph structures yield high computational complexity for online similarity decisions. In order to provide efficient similarity computations for data representation structures, we propose a geometry-based solution. Similarity computations are based on geometric properties. The similarity model is based on the proposed geometry-based solution, and the online similarity computation is performed in a constant time.
[show abstract][hide abstract] ABSTRACT: Nowadays, computer interaction is mostly done using dedicated devices. But gestures are an easy mean of ex- pression between humans that could be used to communi- cate with computers in a more natural manner. Most of the current research on hand gesture recognition for Human- Computer Interaction rely on either the Neural Networks or Hidden Markov Models (HMMs). In this paper, we compare different approaches for gesture recognition and highlight the major advantages of each. We show that gestures recog- nition based on the Bio-mechanical characteristic of the hand provides an intuitive approach which provides more accuracy and less complexity.
Human-Computer Interaction. Novel Interaction Methods and Techniques, 13th International Conference, HCI International 2009, San Diego, CA, USA, July 19-24, 2009, Proceedings, Part II; 01/2009
[show abstract][hide abstract] ABSTRACT: Ontology learning integrates many complementary techniques, including machine learning, natural language processing, and data mining. Specifically, clustering techniques facilitate the building of interrelationships between terms by exploit- ing similarities of concepts. With the rapid growth of the Web, online information has become one of the major in- formation sources. The ontology learning process where traditional clustering algorithms are involved tends to be slow and computationally expensive when the dataset is as large as the Web. To address this problem, we present an efficient concept clustering technique for ontology learning that reduces the number of required pairwise term similar- ity computations without a loss of quality. Our approach is to identify relevant terms using a computationally inexpen- sive similarity metric based on an event life cycle in online news articles. Then, we perform more sophisticated simi- larity computations. Hence, we can build clusters with high precision/recall and high speed. Without a loss of cluster- ing quality, our framework reduces the number of required computations from O(N2) to (N +L2) (L ≪ N) where N is the number of candidate concepts. Our experimental results show that clustering based on our similarity framework can construct concept clusters 1541.07% faster than clustering with all term pair similarity computations.
Proceedings of the 2008 ACM Symposium on Applied Computing (SAC), Fortaleza, Ceara, Brazil, March 16-20, 2008; 01/2008
[show abstract][hide abstract] ABSTRACT: This paper presents an analysis of the correlation of annotated information unit (textual) tags and geographical identification metadata geotags. In this paper, to make it possible for geotagging to be used in analysis with tagging, we prove that there is a strong correlation between tagging and geotagging information. Our approach uses tag similarity and newly employed geographical distribution similarity to determine inter-relationships among tags and geotags. From our initial experiments, we show that the power law is established between tag similarity and geographical distribution similarity; they are strongly correlated and the correlation can be used to find more relevant tags in the tag space. The power law, which is any polynomial relationship that exhibits the property of scale invariance, confirms that there is the relationship between tagging and geotagging and the relationship is scalable in size of tags and geotags.
Proceedings of the Second International Conference on Weblogs and Social Media, ICWSM 2008, Seattle, Washington, USA, March 30 - April 2, 2008; 01/2008
[show abstract][hide abstract] ABSTRACT: This paper presents an analysis of the correlation of annotated information unit (textual) tags and geographical identification metadata geotags. Despite the increased usage of geotagging in collaborative tagging systems, most current research focuses on textual tagging alone in solving the tag search problem. This may result in difficulties to search for precise and relevant information within the given tag space. For example, inconsistencies like polysemy, synonyms, and word inflections with plural forms complicate the tag search problem. Therefore, more work needs to be done to include geotag information with existing tagging information for analysis. In this paper, to make geotagging possible to be used in analysis with tagging, we prove that there is a strong correlation between tagging and geotagging information. Our approach uses tag similarity and geographical distribution similarity to determine inter-relationships among tags and geotags. From our initial experiments, we show that the power law is established between tag similarity and geographical distribution similarity: this means that tag similarity and geographical distribution similarity has a strong correlation and the correlation can be used to find more relevant tags in the tag space. The power law confirms that there is an increased relationship between tagging and geotagging and the increased relationship is scalable in size of tags and geotags. Also, using both geotagging and tagging information instead of only tagging, we show that the uncertainty between derived and actual similarities among tags is reduced.
Proceeding of the 2008 ACM Workshop on Search in Social Media, SSM 2008, Napa Valley, California, USA, October 30, 2008; 01/2008
[show abstract][hide abstract] ABSTRACT: Over the last decade, Web and multimedia data have grown at a staggering rate. Users of new media now have great expectations
of what they can see on the Web. In addition, most information retrieval systems, including Web search engines, use similarity
ranking algorithms based on a vector space model to find relevant information in response to a user’s request. However, the
retrieved information is frequently irrelevant, because most of the current information systems employ index terms or other
techniques that are variants of term frequency. This paper proposed a new approach, named “the dynamic multimedia presentations
with a Generality Model,” to offer a customized multi-modal presentation for an intended audience. Moreover, we proposed a
new criterion, “generality,” that provides an additional basis on which to rank retrieved documents. To support multi-modal
presentation, our proposed story model created story structures that can be dynamically instantiated for different user requests
from various multi-modal elements. The generality is a level of abstraction to retrieve results based on desired generality
appropriate for a user’s knowledge and interests. We compared traditional web news search functions and our story model by
using usability test. The result shows that our multimedia presentation methodology is significantly better than the current
search functions. We also compared our generality quantification algorithm with human judges’ weighting of values to show
that the developed algorithm is significantly correlated.
SOFSEM 2008: Theory and Practice of Computer Science, 34th Conference on Current Trends in Theory and Practice of Computer Science, Nový Smokovec, Slovakia, January 19-25, 2008, Proceedings; 01/2008
[show abstract][hide abstract] ABSTRACT: We are developing simulation and analysis tools in order to develop a solid Earth Science framework for understanding and
studying active tectonic and earthquake processes. The goal of QuakeSim and its extension, the Solid Earth Research Virtual
Observatory (SERVO), is to study the physics of earthquakes using state-of-the-art modeling, data manipulation, and pattern
recognition technologies. We are developing clearly defined accessible data formats and code protocols as inputs to simulations,
which are adapted to high-performance computers. The solid Earth system is extremely complex and nonlinear, resulting in computationally
intensive problems with millions of unknowns. With these tools it will be possible to construct the more complex models and
simulations necessary to develop hazard assessment systems critical for reducing future losses from major earthquakes. We
are using Web (Grid) service technology to demonstrate the assimilation of multiple distributed data sources (a typical data
grid problem) into a major parallel high-performance computing earthquake forecasting code. Such a linkage of Geoinformatics
with Geocomplexity demonstrates the value of the Solid Earth Research Virtual Observatory (SERVO) Grid concept, and advances
Grid technology by building the first real-time large-scale data assimilation grid.
[show abstract][hide abstract] ABSTRACT: Email has become one of the fastest and most economical forms of communication. However, the increase of email users have
resulted in the dramatic increase of spam emails during the past few years. In this paper, email data was classified using
four different classifiers (Neural Network, SVM classifier, Naïve Bayesian Classifier, and J48 classifier). The experiment
was performed based on different data size and different feature size. The final classification result should be ‘1’ if it
is finally spam, otherwise, it should be ‘0’. This paper shows that simple J48 classifier which make a binary tree, could
be efficient for the dataset which could be classified as binary tree.
[show abstract][hide abstract] ABSTRACT: Radio Frequency Identification (RFID) is an emerging technique that can significantly enhance supply chain processes and deliver customer service improvements. RFID provides user with efficient tracking on the flow of products throughout the wholesale process. However, the large number of information that has been generated from such a process creates difficulty in extracting and analyzing useful information. In this paper, we propose a method to mine the large data sets that allows smaller and more relevant search space compared to the original data sets. Our work is constructed from the following approaches: ontology-driven rule generalization which concentrates on controlling the level of items, and rule categorization using hierarchical association rule clustering that group the generated rules from the given problem space into hierarchical search space. The detailed steps for rule generalization based on ontologies sire presented, as well as the algorithms for rule categorization using hierarchical association rule clustering is developed. Our experiment proves the feasibility of our work which shows the significant reduction of the search space by decreasing the number of rules to be looked at and increasing the relevance among the rules.
Data Engineering Workshop, 2007 IEEE 23rd International Conference on; 05/2007
[show abstract][hide abstract] ABSTRACT: Email has become one of the fastest and most economical forms of communication. However, the increase of email users has resulted in the dramatic increase of spam emails. As spammers always try to find a way to evade existing filters, new filters need to be developed to catch spam. Ontologies allow for machine-understandable semantics of data. It is important to share information with each other for more effective spam filtering. Thus, it is necessary to build ontology and a framework for efficient email filtering. Using ontology that is specially designed to filter spam, bunch of unsolicited bulk email could be filtered out on the system. This paper proposes to find an efficient spam email filtering method using adaptive ontology
Fourth International Conference on Information Technology: New Generations (ITNG 2007), 2-4 April 2007, Las Vegas, Nevada, USA; 01/2007
[show abstract][hide abstract] ABSTRACT: Buyers and sellers in online auctions are faced with the task of deciding who to entrust their business to based on a very limited amount of information. Current trust ratings on eBay average over 99% positive (13) and are presented as a single num- ber on a user's profile. This paper presents a sys- tem capable of extracting valuable negative infor- mation from the wealth of feedback comments on eBay, computing personalized and feature-based trust and presenting this information graphically.
IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007; 01/2007
[show abstract][hide abstract] ABSTRACT: We describe the goals and initial implementation of the International Solid Earth Virtual Observatory (iSERVO). This system
is built using a Web Services approach to Grid computing infrastructure and is accessed via a component-based Web portal user
interface. We describe our implementations of services used by this system, including Geographical Information System (GIS)-based
data grid services for accessing remote data repositories and job management services for controlling multiple execution steps.
iSERVO is an example of a larger trend to build globally scalable scientific computing infrastructures using the Service Oriented
Architecture approach. Adoption of this approach raises a number of research challenges in millisecond-latency message systems
suitable for internet-enabled scientific applications. We review our research in these areas.
Pure and Applied Geophysics 11/2006; 163(11):2281-2296. · 1.62 Impact Factor