## About

121

Publications

10,384

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

3,637

Citations

## Publications

Publications (121)

How can we classify graph-structured data only with positive labels? Graph-based positive-unlabeled (PU) learning is to train a binary classifier given only the positive labels when the relationship between examples is given as a graph. The problem is of great importance for various tasks such as detecting malicious accounts in a social network, wh...

Given a graph with partial observations of node features, how can we estimate the missing features accurately? Feature estimation is a crucial problem for analyzing real-world graphs whose features are commonly missing during the data collection process. Accurate estimation not only provides diverse information of nodes but also supports the infere...

Given a pre-trained BERT, how can we compress it to a fast and lightweight one while maintaining its accuracy? Pre-training language model, such as BERT, is effective for improving the performance of natural language processing (NLP) tasks. However, heavy models like BERT have problems of large memory cost and long inference time. In this paper, we...

How can we accurately and efficiently decompose a tensor stream? Tensor decomposition is a crucial task in a wide range of applications and plays a significant role in latent feature extraction and estimation of unobserved entries of data. The problem of efficiently decomposing tensor streams has been of great interest because many real-world data...

Given a sparse time-evolving tensor, how can we effectively factorize it to accurately discover latent patterns? Tensor decomposition has been extensively utilized for analyzing various multi-dimensional real-world data. However, existing tensor decomposition models have disregarded the temporal property for tensor decomposition while most real-wor...

Given an irregular dense tensor, how can we efficiently analyze it? An irregular tensor is a collection of matrices whose columns have the same size and rows have different sizes from each other. PARAFAC2 decomposition is a fundamental tool to deal with an irregular tensor in applications including phenotype discovery and trend analysis. Although s...

What are the key structures existing in a large real-world MMORPG (Massively Multiplayer Online Role-Playing Game) graph? How can we compactly summarize an MMORPG graph with hierarchical node labels, considering substructures at different levels of hierarchy? Recent MMORPGs generate complex interactions between entities inducing a heterogeneous gra...

How can we model node representations to accurately infer the signs of missing edges in a signed social graph? Signed social graphs have attracted considerable attention to model trust relationships between people. Various representation learning methods such as network embedding and graph convolutional network (GCN) have been proposed to analyze s...

Given a graph dataset, how can we augment it for accurate graph classification? Graph augmentation is an essential strategy to improve the performance of graph-based tasks, and has been widely utilized for analyzing web and social graphs. However, previous works for graph augmentation either a) involve the target model in the process of augmentatio...

Knowledge Distillation (KD) is one of the widely known methods for model compression. In essence, KD trains a smaller student model based on a larger teacher model and tries to retain the teacher model’s level of performance as much as possible. However, existing KD methods suffer from the following limitations. First, since the student model is sm...

Graphs are widely used for representing pairwise interactions in complex systems. Since such real-world graphs are large and often evergrowing, sampling a small representative subgraph is indispensable for various purposes: simulation, visualization, stream processing, representation learning, crawling, to name a few. However, many complex systems...

How can we transfer the knowledge from a source domain to a target domain when each side cannot observe the data in the other side? Recent transfer learning methods show significant performance in classification tasks by leveraging both source and target data simultaneously at training time. However, leveraging both source and target data simultane...

Given a trained deep graph convolution network (GCN), how can we effectively compress it into a compact network without significant loss of accuracy? Compressing a trained deep GCN into a compact GCN is of great importance for implementing the model to environments such as mobile or embedded systems, which have limited computing resources. However,...

Given multiple source datasets with labels, how can we train a target model with no labeled data? Multi-source domain adaptation (MSDA) aims to train a model using multiple source datasets different from a target dataset in the absence of target data labels. MSDA is a crucial problem applicable to many practical cases where labels for the target da...

Given trained models from multiple source domains, how can we predict the labels of unlabeled data in a target domain? Unsupervised multi-source domain adaptation (UMDA) aims for predicting the labels of unlabeled target data by transferring the knowledge of multiple source domains. UMDA is a crucial problem in many real-world scenarios where no la...

Given an edge-labeled graph and two nodes, how can we accurately infer the relation between the nodes? Reasoning how the nodes are related is a fundamental task in analyzing network data, and various relevance measures have been suggested to effectively identify relevance between nodes in graphs. Although many random walk based models have been ext...

How can we effectively regularize BERT? Although BERT proves its effectiveness in various NLP tasks, it often overfits when there are only a small number of training instances. A promising direction to regularize BERT is based on pruning its attention heads with a proxy score for head importance. However, these methods are usually suboptimal since...

Purpose:
Acute kidney injury (AKI) in cancer patients is associated with increased morbidity and mortality. The incidence of AKI in lung cancer seems to be relatively higher compared with other solid organ malignancies, although its impact on patient outcomes remains unclear.
Materials and methods:
The patients newly diagnosed with lung cancer f...

Given a signed social graph, how can we learn appropriate node representations to infer the signs of missing edges? Signed social graphs have received considerable attention to model trust relationships. Learning node representations is crucial to effectively analyze graph data, and various techniques such as network embedding and graph convolution...

What are the key structures existing in a large real-world MMORPG (Massively Multiplayer Online Role-Playing Game) graph? How can we compactly summarize an MMORPG graph with hierarchical node labels, considering consistent substructures at different levels of hierarchy? Recent MMORPGs generate complex interactions between entities inducing a hetero...

Temporal knowledge graphs (TKGs) inherently reflect the transient nature of real-world knowledge, as opposed to static knowledge graphs. Naturally, automatic TKG completion has drawn much research interests for a more realistic modeling of relational reasoning. However, most of the existing mod-els for TKG completion extend static KG embeddings tha...

Given a time-evolving tensor with missing entries, how can we effectively factorize it for precisely predicting the missing entries? Tensor factorization has been extensively utilized for analyzing various multi-dimensional real-world data. However, existing models for tensor factorization have disregarded the temporal property for tensor factoriza...

How can we efficiently compress a model while maintaining its performance? Knowledge Distillation (KD) is one of the widely known methods for model compression. In essence, KD trains a smaller student model based on a larger teacher model and tries to retain the teacher model's level of performance as much as possible. However, the existing KD meth...

How can we effectively regularize BERT? Although BERT proves its effectiveness in various downstream natural language processing tasks, it often overfits when there are only a small number of training instances. A promising direction to regularize BERT is based on pruning its attention heads based on a proxy score for head importance. However, heur...

Given multiple source datasets with labels, how can we train a target model with no labeled data? Multi-source domain adaptation (MSDA) aims to train a model using multiple source datasets different from a target dataset in the absence of target data labels. MSDA is a crucial problem applicable to many practical cases where labels for the target da...

Given a time series vector, how can we efficiently compute a specified part of Fourier coefficients? Fast Fourier transform (FFT) is a widely used algorithm that computes the discrete Fourier transform in many machine learning applications. Despite its pervasive use, all known FFT algorithms do not provide a fine-tuning option for the user to speci...

Given session-based news watch history of users, how can we precisely recommend news articles? Unlike other items for recommendation, the worth of news articles decays quickly and various news sources publish fresh ones every second. Moreover, people frequently select news articles regardless of their personal preferences to understand popular topi...

A connected component in a graph is a set of nodes linked to each other by paths. The problem of finding connected components has been applied to diverse graph analysis tasks such as graph partitioning, graph compression, and pattern recognition. Several distributed algorithms have been proposed to find connected components in enormous graphs. Iron...

How can we rank nodes in signed social networks? Relationships between nodes in a signed network are represented as positive (trust) or negative (distrust) edges. Many social networks have adopted signed networks to express trust between users. Consequently, ranking friends or enemies in signed networks has received much attention from the data min...

How can we analyze large graphs such as the Web, and social networks with hundreds of billions of vertices and edges? Although many graph mining systems have been proposed to perform various graph mining algorithms on such large graphs, they have difficulties in processing Web-scale graphs due to massive communication and I/O costs caused by commun...

How can we efficiently compress Convolutional Neural Networks (CNN) while retaining their accuracy on classification tasks? A promising direction is based on depthwise separable convolution which replaces a standard convolution with a depthwise convolution and a pointwise convolution. However, previous works based on depthwise separable convolution...

How can we accurately estimate local triangles for all nodes in simple and multigraph streams? Local triangle counting in a graph stream is one of the most fundamental tasks in graph mining with important applications including anomaly detection and social network analysis. Although there have been several local triangle counting methods in a graph...

Given a sparse rating matrix and an auxiliary matrix of users or items, how can we accurately predict missing ratings considering different data contexts of entities? Many previous studies proved that utilizing the additional information with rating data is helpful to improve the performance. However, existing methods are limited in that 1) they ig...

How can we analyze tensors that are composed of 0’s and 1’s? How can we efficiently analyze such Boolean tensors with millions or even billions of entries? Boolean tensors often represent relationship, membership, or occurrences of events such as subject–relation–object tuples in knowledge base data (e.g., ‘Seoul’-‘is the capital of’-‘South Korea’)...

Given graph-structured data, how can we train a robust classifier in a semi-supervised setting that performs well without neighborhood information? In this work, we propose belief propagation networks (BPN), a novel approach to train a deep neural network in a hard inductive setting, where the test data are given without neighborhood information. B...

Background
How can we obtain fast and high-quality clusters in genome scale bio-networks? Graph clustering is a powerful tool applied on bio-networks to solve various biological problems such as protein complexes detection, disease module detection, and gene function prediction. Especially, MCL (Markov Clustering) has been spotlighted due to its su...

How can we extract hidden relations from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is an important tool for this purpose. Designing an accurate and efficient CMTF method has become more crucial as the size and dimension of real-world data are growing explosively. Howe...

Given large-scale multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we extract latent concepts/relations of such data? Tensor factorization has been widely used to solve such problems with multi-dimensional data, which are modeled as tensors. However, most tensor factorization algorithms exhibit limited s...

Background
Acute kidney injury (AKI) is a critical issue in cancer patients because it is not only a morbid complication but also able to interrupt timely diagnostic evaluation or planned optimal treatment. However, the impact of AKI on overall mortality in cancer patients remains unclear.
Methods
We conducted a retrospective cohort study of 67 98...

Given multiple time series data, how can we efficiently find latent patterns in an arbitrary time range? Singular value decomposition (SVD) is a crucial tool to discover hidden factors in multiple time series data, and has been used in many data mining applications including dimensionality reduction, principal component analysis, recommender system...

How can we find patterns from an enormous graph with billions of vertices and edges? The subgraph enumeration, which is to find patterns from a graph, is an important task for graph data analysis with many applications, including analyzing the social network evolution, measuring the significance of motifs in biological networks, observing the dynam...

How can we predict the occurrence of acute kidney injury (AKI) in cancer patients based on machine learning with serum creatinine data? Given irregular and heterogeneous clinical data, how can we make the most of it for accurate AKI prediction? AKI is a common and significant complication in cancer patients, and correlates with substantial morbidit...

Given multiple time series data, how can we efficiently find latent patterns in an arbitrary time range? Singular value decomposition (SVD) is a crucial tool to discover hidden factors in multiple time series data, and has been used in many data mining applications including dimensionality reduction, principal component analysis, recommender system...

How can we find patterns and anomalies in peta-scale graphs? Even recently proposed graph mining systems fail in processing peta-scale graphs. In this work, we propose PegasusN, a scalable and versatile graph mining system that runs on Hadoop and Spark. To handle enormous graphs, PegasusN provides and seamlessly integrates efficient algorithms for...

Given sparse multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we discover latent concepts/relations and predict missing values? Tucker factorization has been widely used to solve such problems with multi-dimensional data, which are modeled as tensors. However, most Tucker factorization algorithms regard...

Given a signed directed network, how can we learn node representations which fully encode structural information of the network including sign and direction of edges? Node representation learning or network embedding learns a mapping of each node to a vector. The mapping encodes structural information on network, providing low-dimensional dense nod...

Given a time-evolving graph, how can we track similarity between nodes in a fast and accurate way, with theoretical guarantees on the convergence and the error? Random Walk with Restart (RWR) is a popular measure to estimate the similarity between nodes and has been exploited in numerous applications. Many real-world graphs are dynamic with frequen...

Given graphs with millions or billions of vertices and edges, how can we efficiently make inferences based on partial knowledge? Loopy Belief Propagation(LBP) is a graph inference algorithm widely used in various applications including social network analysis, malware detection, recommendation, and image restoration. The algorithm calculates approx...

How can we estimate local triangle counts accurately in a graph stream without storing the whole graph? How to handle duplicated edges in local triangle counting for graph stream? Local triangle counting, which computes the number of triangles attached to each node in a graph, is a very important problem with wide applications in social network ana...

How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traff...

Given a real-world graph, how can we measure relevance scores for ranking and link prediction? Random walk with restart (RWR) provides an excellent measure for this and has been applied to various applications such as friend recommendation, community detection, anomaly detection, etc. However, RWR suffers from two problems: 1) using the same restar...

How can we leverage social network data and observed ratings to correctly recommend proper items and provide a persuasive explanation for the recommendations? Many online services provide social networks among users, and it is crucial to utilize social information since recommendation by a friend is more likely to grab attention than the one from a...

How can we analyze enormous networks including the Web and social networks which have hundreds of billions of nodes and edges? Network analyses have been conducted by various graph mining methods including shortest path computation, PageRank, connected component computation, random walk with restart, etc. These graph mining methods can be expressed...

How can we capture the hidden properties from a tensor and a matrix data simultaneously? Coupled matrix-tensor factorization (CMTF) is an effective method to solve this problem because it extracts latent factors from a tensor and matrices at once. Designing an efficient CMTF has become more crucial as the size and dimension of real-world data are g...

Between matrix factorization or Random Walk with Restart (RWR), which method works better for recommender systems? Which method handles explicit or implicit feedback data better? Does additional side information help recommen- dation? Recommender systems play an important role in many e-commerce services such as Amazon and Netflix to recommend new...

Given a large graph, how can we determine similarity between nodes in a fast and accurate way? Random walk with restart (RWR) is a popular measure for this purpose and has been exploited in numerous data mining applications including ranking, anomaly detection, link prediction, and community detection. However, previous methods for computing exact...

How can we measure similarity between nodes quickly and accurately on large graphs? Random walk with restart (RWR) provides a good measure, and has been used in various data mining applications including ranking, recommendation, link prediction and community detection. However, existing methods for computing RWR do not scale to large graphs contain...