Byeong-Soo JeongKyung Hee University · Biomedical Engineering
Byeong-Soo Jeong
About
101
Publications
23,584
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,739
Citations
Introduction
Skills and Expertise
Publications
Publications (101)
These days Frequent Induced Subgraph Mining (FISM) is an active research direction, in various application domains like biological networks, chemical, or social networks. A number of FISM approaches have been proposed over the years. However, existing methods take long execution time since they perform numerous subgraph isomorphism (SI) operations,...
Opinion or sentiment analysis has risen to extract useful information from a lot of unstructured text data, in the form of customer reviews on different products and their features or online SNS data respectively. Customer reviews are not only helpful for potential customers, but it is also helpful for the manufacturers of the products to raise the...
The recent emergence of body sensor networks (BSNs) has made it easy to continuously collect and process various health-oriented data related to temporal, spatial and vital sign monitoring of a patient. As such, discovering or mining interesting knowledge from the BSN data stream is becoming an important issue to promote and assist important decisi...
The recent emergence of body sensor networks (BSNs) has made it easy to continuously collect and process various health-oriented data related to temporal, spatial and vital sign monitoring of patient. As such, discovering or mining interesting knowledge from the BSN data stream is becoming an important issue to promote and assist important decision...
Due to their popularity and widespread use, blogs have become an important medium through which many people communicate and exchange information on the World Wide Web (WWW). The blogosphere has provided many opportunities for individuals and companies to establish new business models that investigate social relationships. In Korea, there are many b...
Data clustering has been considered as one of the most important techniques for unsupervised learning in diverse applications. Gene clustering is to find out groups of genes similarly expressed in large size of microarray data. Meanwhile, recent development of microarray technology generates a very large number of microarray data with low cost and...
Multilevel knowledge in transactional databases plays a significant role in our real-life market basket analysis. Many researchers have mined the hierarchical association rules and thus proposed various approaches. However, some of the existing approaches produce many multilevel and cross-level association rules that fail to convey quality informat...
The emergence of large real life networks such as social networks, web page links, and traffic networks exhibits complex graph structures with millions of vertices and edges. Among many operations for exploiting these graphs, the shortest path discovery is a major and expensive one. Besides the in-memory approaches, many efficient shortest path com...
Flash memory has its unique characteristics: The write operation is much more costly than the read operation, and in-place updating is not allowed. In flash memory environment, in order to reduce the cost of copying valid pages during an erase operation, hot data clustering methods have been proposed. They try to store data with high write frequenc...
Microarray data analysis has been widely used for extracting relevant biological information from thousands of genes simultaneously expressed in a specific cell. Although many genes are expressed in a sample tissue, most of these are irrelevant or insignificant for clinical diagnosis or disease classification because of missing values and noises. T...
An interesting function named Wake on WLAN (WOW) has recently captured researchers attention as one of the remote computer administration functions that may turn on the remote computerized system through the network connection at the time point of receiving a specially coded packet. The phenomenon comes from the physiognomies of the coded packet su...
Splice site prediction in the pre-mRNA is a very important task for understanding gene structure and
its function. To predict splice sites, SVM (support vector machine)-based classification technique is
frequently used because of its classification accuracy. High performance of SVM largely depends on
DNA encoding method. However, existing encodi...
In this paper, we propose a parameter-insensitive data partitioning approach for Chameleon, a hierarchical clustering algorithm. We first show that the quality of clusters produced by Chameleon is significantly affected by the sizes of initial sub-clusters and also that it is mainly because Chameleon recursively splits a dataset into two equal-size...
Effective representation of DNA sequences is one of the important tasks in the study of genome sequences. In this paper, we propose a graphical representation of DNA sequences based on nucleotide ring structure. In the proposed representation, we convert DNA sequences into 16 dinucleotides on the surface of the hexagon so that it can preserve nucle...
The goal of analyzing a time series database is to find whether and how frequent a periodic pattern is repeated within the series. Periodic pattern mining is the problem that regards temporal regularity. However, most of the existing algorithms have a major limitation in mining interesting patterns of users interest, that is, they can mine patterns...
Splice site prediction in the pre-mRNA is a very important task for understanding gene structure and its function. To predict splice sites, SVM (support vector machine) based classification technique is frequently used because of its classification accuracy. High classification accuracy of SVM largely depends on DNA encoding method for feature extr...
Mining combined association rules with correlation and market basket analysis can discover customers buying purchase rules along with frequently correlated, associated-correlated, and independent patterns synchronously which are extraordinarily useful for making everydays business decisions. However, due to the main memory bottleneck in single comp...
High utility pattern (HUP) mining over data streams has become a challenging research issue in data mining. When a data stream flows through, the old information may not be interesting in the current time period. Therefore, incremental HUP mining is necessary over data streams. Even though some methods have been proposed to discover recent HUPs by...
Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site pr...
A blogosphere is a representative online social network established through blog users and their relationships. Understanding information diffusion is very important in developing successful business strategies for a blogosphere. In this paper, we discuss how to predict information diffusion in a blogosphere. Documents diffused over a blogosphere d...
Weighted frequent pattern (WFP) mining is more practical than frequent pattern mining because it can consider different semantic significance (weight) of the items. For this reason, WFP mining becomes an important research issue in data mining and knowledge discovery. However, existing algorithms cannot be applied for incremental and interactive WF...
Microarray gene expression techniques and tools have become of a substantial importance and widely used to analyze the protein-protein interaction (PPI) and gene regulation network (GRN) research in recent years since it can capture the expressions of thousands of genes in a single experiment. Such dataset poses a great challenge for finding associ...
Market basket analysis techniques are substantially important to everyday's business decision, because of its capability of extracting customer's purchase rules by discovering what items they are buying frequently and together. But, the traditional single processor and main memory based computing is not capable of handling ever increasing large tra...
contemporary web browsers do not provide customized recommendations for the users; rather than some suggestions based on cookies or browsing history after content filtering. Usually, most of the users provide some key words to search the contents inside their preferred websites and based on these key words web servers provide the contents. So, it w...
Market basket analysis is very important to everyday's business decision, because it seeks to find relationships between purchased items. Undoubtedly, these techniques can extract customer's purchase rules by discovering what items they are buying frequently and together. Therefore, to raise the probability of purchasing the corporate manager of a...
Contemporary web browsers do not provide customized recommendations for the users; rather than some suggestions based on cookies or browsing history after content filtering. Usually, most of the users provide some key words to search the contents inside their preferred websites and based on these key words web servers provide the contents. So, it w...
Finding interesting patterns plays an important role in several data mining applications, such as market basket analysis, medical data analysis, and others. The occurrence frequency of patterns has been regarded as an important criterion for measuring interestingness of a pattern in several applications. However, temporal regularity of patterns can...
Problem of finding frequent patterns has long been studied because it is very essential to data mining tasks such as association rule analysis, clustering, and classification analysis. Privacy preserving data mining is another important issue for this domain since most users do not want their private information to leak out. In this paper, we propo...
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still...
Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in fi...
Current DNA sequence datasets have become extremely large, making it a great challenge for single-processor and main-memory-based computing systems to mine interesting patterns. Such limited hardware resources make the performance of most Apriori-like algorithms inefficient. However, recent implementation of a MapReduce framework has overcome these...
Data mining is a relatively new and promising field of computer science. It is used for extracting valuable information or knowledge from large database. Data mining requires searching for frequent patterns from large database. Frequent substructure mining is also denoted by graph mining. Some of the graph mining algorithms were Apriori based and p...
Market basket analysis techniques are useful for extracting customer’s purchase behaviors or rules by discovering what items they buy together using the association rules and correlation. Associated and correlated items are placed in the neighboring shelf to raise their purchasing probability in a super shop. Therefore, the mining combined associat...
Market basket analysis techniques are useful for extracting customer's purchase behaviors or rules by discovering what items they buy together using the association rules and correlation. Associated and correlated items are placed in the neighboring shelf to raise their purchasing probability in a super shop. Therefore, the mining combined associat...
In this paper, a position-based method is proposed for frequent contiguous DNA pattern mining by fast joining the same length patterns to generate next length ones as well as scanning the database only once. At first, we combine the position information of fixed-length patterns to generate a fixed-length spanning tree to mine frequent fixed-length...
High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be...
Traditional frequent pattern mining methods consider an equal profit/weight for all items and only binary occurrences (0/1)
of the items in transactions. High utility pattern mining becomes a very important research issue in data mining by considering
the non-binary frequency values of items in transactions and different profit values for each item...
Due to their popularity and widespread use, blogs have become an important medium through which to communicate and exchange information on the World Wide Web. The advent of the blogosphere may provide opportunities for establishing a new business model that investigates social relationships. In Korea, there are many blogospheres that appear to main...
Mining web access sequences (WASs) can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in WASs, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations suc...
Mining sequential patterns is an important research issue in data mining and knowledge discovery with broad applications. However, the existing sequential pattern mining approaches consider only binary frequency values of items in sequences and equal importance/significance values of distinct items. Therefore, they are not applicable to actually re...
Traditional frequent pattern mining considers equal profit/weight value of every item. Weighted Frequent Pattern (WFP) mining becomes an important research issue in data mining and knowledge discovery by considering different weights for different items. Existing algorithms in this area are based on fixed weight. But in our real world scenarios the...
Mining web access sequences can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in web access sequences, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limita...
High utility pattern (HUP) mining over data streams has become a challenging research issue in data mining. The existing sliding
window-based HUP mining algorithms over stream data suffer from the level-wise candidate generation-and-test problem. Therefore,
they need a large amount of execution time and memory. Moreover, their data structures are n...
Discovering interesting patterns from high-speed data streams is a challenging problem in data mining. Recently, the support
metric-based frequent pattern mining from data stream has achieved a great attention. However, the occurrence frequency of
a pattern may not be an appropriate criterion for discovering meaningful patterns. Temporal regularity...
Recently proposed regular pattern mining provides an effective technique to find patterns occurring at regular interval in a static database. However, the occurrence characteristic of patterns may change significantly with the update of database. Therefore, this paper proposes the Incremental Regular Pattern Tree (IncRT) and a pattern growth mining...
The share measure of item sets has been proposed to discover useful knowledge about numerical values associated with items in a transaction database. Therefore, share-frequent pattern mining problem becomes a very important research issue in data mining. However, the existing algorithms of share-frequent pattern mining are based on static databases...
Finding frequent patterns in a continuous stream of transactions is critical for many applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. Even though numerous frequent pattern mining algorithms have been developed over the past decade, new solutions for handling stream data are still...
Traditional frequent pattern mining algorithms do not consider different semantic significances (weights) of the items. By considering different weights of the items, weighted frequent pattern (WFP) mining becomes an important research issue in data mining and knowledge discovery area. However, the existing state-of-the-art WFP mining algorithms co...
With advances in technology, use of wireless sensor networks (WSNs) has widely increased in recent -decades. In general, WSNs produce a large amount of data in the form of streams. Recently, data-mining techniques have received a great deal of attention for their utility in extracting knowledge from WSN data. Mining association rules on the sensor...
Since mining frequent patterns from transactional databases involves an exponential mining space and generates a huge number
of patterns, efficient discovery of user-interest-based frequent pattern set becomes the first priority for a mining algorithm.
In many real-world scenarios it is often sufficient to mine a small interesting representative su...
Mining useful Web path traversal patterns is a very important research issue in Web technologies. Knowledge about the frequent Web path traversal patterns enables us to discover the most interesting Websites traversed by the users. However, considering only the binary (presence/absence) occurrences of the Websites in the Web traversal paths, real w...
Wireless sensor networks (WSNs) produce large scale of data in the form of streams. Recently, data mining techniques have received a great deal of attention in extracting knowledge from WSNs data. Mining association rules on the sensor data provides useful information for different applications. Even though there have been some efforts to address t...
The FP-growth algorithm using the FP-tree has been widely studied for frequent pattern mining because it can dramatically improve performance compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans, which are not consistent with efficient data stream processing. In this paper, we present a no...
Frequent pattern mining techniques treat all items in the database equally by taking into consideration only the presence of an item within a transaction. However, the customer may purchase more than one of the same item, and the unit price may vary among items. High utility pattern mining approaches have been proposed to overcome this problem. As...
Recently, a significant number of parallel and distributed algorithms have been proposed to mine frequent patterns (FP) from large and/or distributed databases. Among them parallelization of the FP-growth algorithms using the FP-tree has been proved to be highly efficient. However, the FP-tree-based techniques suffer from two major limitations such...
By considering different weights of the items, weighted frequent pattern (WFP)mining can discover more important knowledge compared to traditional frequent pattern mining. Therefore, WFP mining becomes an important research issue in data mining and knowledge discovery area. However, the existing algorithms cannot be applied for stream data mining b...
Mining frequent patterns (FP) from large-scale databases has emerged as an important problem in the data mining and knowledge discovery research community. A significant number of parallel and distributed FP mining algorithms have been proposed, when the database is large and/or distributed. Among them, parallelization of the FP-growth algorithm us...
By considering different weights of the items, weighted frequent pattern (WFP) mining becomes an important research issue
in data mining and knowledge discovery. However, existing algorithms cannot be applied for incremental and interactive WFP
mining because they are based on a static database and require multiple database scans. In this paper, we...
Mining weighted interesting patterns (WIP) [5] is an important research issue in data mining and knowledge discovery with
broad applications. WIP can detect correlated patterns with a strong weight and/or support affinity. However, it still requires
two database scans which are not applicable for efficient processing of the real-time data like data...
Existing weighted frequent pattern (WFP) mining algorithms assume that each item has fixed weight. But in our real world scenarios
the weight (price or significance) of an item can vary with time. Reflecting such change of weight of an item is very necessary
in several mining applications such as retail market data analysis and web click stream ana...
Temporal regularity of pattern appearance can be regarded as an important criterion for measuring the interestingness in several
applications like market basket analysis, web administration, gene data analysis, network monitoring, and stock market. Even
though there have been some efforts to discover periodic patterns in time-series and sequential...
Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Re...
The frequency of a pattern may not be a sufficient criterion for identifying meaningful patterns in a database. The temporal regularity of a pattern can be another key criterion for assessing the importance of a pattern in several applications. A pattern can be said regular if it appears at a regular user-defined interval in the database. Even thou...
This paper proposes a prefix-tree structure, called CPS-tree (Compact Pattern Stream tree) that efficiently discovers the exact set of recent frequent patterns from high-speed data stream. The CPS-tree introduces the concept of dynamic tree restructuring technique in handling stream data that allows it to achieve highly compact frequency-descending...
Interpreting legacy XML documents is a great challenge for realizing the vision of the semantic Web (SW). This paper presents an algorithm to transform XML data into RDF- foundation language of the SW - automatically. Our approach maps element definitions stored in XML schema to RDF schema ontology, where the ontology is used to describe the meanin...
FP-growth algorithm using FP-tree has been widely studied for frequent pattern mining because it can give a great performance
improvement compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans which are not applicable to processing data streams. In this paper, we present
a novel tree structu...
Content adaptation is very much necessary for effective and efficient sharing of files. Now-a-days, people are using devices which vary in their configuration. Moreover, each user may have their own preference whenever they want to share files. To address device heterogeneity as well as user preference, we need to adapt the file at the time of shar...
Sharing files is very common in collaborative environment. Users may want to share each other's file for more effective and meaningful collaboration. Sometimes it would be preferable to adapt the file so that it can provide the required information to users with minimal overhead. Moreover, users may not want to share files in their original format....
Radio Frequency Identification (RFID) technology is an excellent substitute for barcodes in industry. However, the management of a large amount of RFID data, together with complicated relationships between data, in the context of responding to different kinds of queries is not well supported by traditional databases. Therefore, 1) an eventbased mod...
Share-frequent pattern mining discovers more useful and realistic knowledge from database compared to the traditional frequent pattern mining by consider- ing the non-binary frequency values of items in trans- actions. Therefore, recently share-frequent pattern mining problem becomes a very important research issue in data mining and knowledge disc...
In this paper, we investigate the critical low coverage problem of position aware localized efficient broadcast in mobile ad hoc ubiquitous sensor networks and propose a generic framework for it. The framework is to determine a small subset of nodes and minimum transmission radiuses based on snapshots of network state (local views) along the broadc...
Wireless sensor networks have been used more and more widely with the developments of related techniques in telecommunication and computer sciences. While sensor nodes in wireless sensor networks have very limited memory spaces and power. In this paper, we propose a new query aggregation method to preprocess the query predicates. The size of the re...
Interpreting the XML data in a current web into sources that can be used by the Semantic Web has received great attention in recent years. In this paper, we propose a procedure for transforming valid XML documents into RDF by using vocabularies of RDF schema. The first objective here is to obtain classes and properties from labels in XML document e...
Subsequence matching is an operation that finds subsequences whose changing patterns are similar to a given query sequence from time-series databases. This paper identifies a performance bottleneck in subsequence matching, and then proposes an effective method that substantially improves the performance of entire subsequence matching by resolving t...
Wireless sensor networks have been widely used in many fields with the developments of the related techniques. But there are many problems in traditional single sink sensor networks. The energy of the sensors near the sink or on the critical paths consumes too fast causing unbalanced energy consumption. The routing algorithms mainly focus on the ne...
With the developments of related techniques in telecommunication and computer sciences, wireless sensor networks have been used more and more widely. While sensor nodes in wireless sensor networks have very limited memory spaces and power. In this paper, we propose a new method to preprocess the query predicates. The size of the relational table ca...
In ad hoc networks, disconnections occur frequently. In this paper, we allocate the data replicas according to time and space. In temporal method, we store the original data, the median data in a specific time period and the replica with the second highest frequency among all the other data on the mobile hosts to improve data accessibility. In spat...
Wireless sensor networks have been widely used in many fields with the developments of the related techniques. But there are many problems in traditional single sink sensor networks. The energy of the sensors near the sink or on the critical paths consumes too fast causing unbalanced energy consumption. The routing algorithms mainly focus on the ne...
In recent years, computing becomes more mobile and pervasive; these changes imply that applications and services must be aware of and adapt to their changing contexts in highly dynamic environments. To allow interoperability in a context-aware computing environment (e.g. smart meeting space), it is necessary that the context terminology will be com...