About
41
Publications
4,899
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
263
Citations
Introduction
Kangil Kim currently works at the Division of Computer Science and Engineering, Konkuk University. Kangil does research in Theory of Computation, Artificial Neural Network and Artificial Intelligence. Their most recent research is regularization of neural network and neural machine translation.
Publications
Publications (41)
Knowledge distillation is an approach to transfer information on representations from a teacher to a student by reducing their difference. A challenge of this approach is to reduce the flexibility of the student's representations inducing inaccurate learning of the teacher's knowledge. To resolve it in BERT transferring, we investigate distillation...
In this work, we seek new insights into the underlying challenges of the scene graph generation (SGG) task. Quantitative and qualitative analysis of the visual genome (VG) dataset implies: 1) ambiguity: even if interobject relationship contains the same object (or predicate), they may not be visually or semantically similar; 2) asymmetry: despite t...
Most neural machine translation models are implemented as a conditional language model framework composed of encoder and decoder models. This framework learns complex and long-distant dependencies, but its deep structure causes inefficiency in training. Matching vector representations of source and target sentences improves the inefficiency by shor...
The flexibility of decision boundaries in neural networks that are unguided by training data is a well-known problem typically resolved with generalization methods. A surprising result from recent knowledge distillation (KD) literature is that random, untrained, and equally structured teacher networks can also vastly improve generalization performa...
Identifying relations between objects is central to understanding the scene. While several works have been proposed for relation modeling in the image domain, there have been many constraints in the video domain due to challenging dynamics of spatio-temporal interactions (e.g., Between which objects are there an interaction? When do relations occur...
In this work, we seek new insights into the underlying challenges of the Scene Graph Generation (SGG) task. Quantitative and qualitative analysis of the Visual Genome dataset implies -- 1) Ambiguity: even if inter-object relationship contains the same object (or predicate), they may not be visually or semantically similar, 2) Asymmetry: despite the...
A common approach to jointly learn multiple tasks with a shared structure is to optimize the model with a combined landscape of multiple sub-costs. However, gradients derived from each sub-cost often conflicts in cost plateaus, resulting in a subpar optimum. In this work, we shed light on such gradient conflict challenges and suggest a solution nam...
Visual understanding of the implied knowledge in line charts is an important task affecting many downstream tasks in information retrieval. Despite common use, clearly defining the knowledge is difficult because of ambiguity, so most methods used in research implicitly learn the knowledge. When building a deep neural network, the integrated approac...
Herein, we aim to assess mortality risk prediction in peritoneal dialysis patients using machine-learning algorithms for proper prognosis prediction. A total of 1,730 peritoneal dialysis patients in the CRC for ESRD prospective cohort from 2008 to 2014 were enrolled in this study. Classification algorithms were used for prediction of N-year mortali...
By applying a deep neural network to selective laser melting, we studied a classification model of melt-pool images with respect to 6 laser power labels. Laser power influenced to form pores or cracks determining the part quality and was positively-linearly dependent to the density of the part. Using the neural network of which the number of nodes...
In laser powder bed fusion, a convolutional neural network could build a good regression model to predict a laser power value from a melt-pool image. To empirically validate it, we used the acquired image data from a monitoring system inside metal additive manufacturing equipment and optimally configured a convolutional network by the grid search o...
Recently, neural approaches for transition-based dependency parsing have become one of the state-of-the art methods for performing dependency parsing tasks in many languages. In neural transition-based parsing, a parser state representation is first computed from the configuration of a stack and a buffer, which is then fed into a feed-forward neura...
In a recent decade, deep neural networks have been applied for many research areas after achieving dramatic improvements of accuracy in solving complex problems in vision and computational linguistics area. However, some problems as environmental modelling are still limited to benefit from the deep networks because of its difficulty in collecting s...
Neural networks often penalize their loss functions by a regularization or constraint term dependent to training data. These penalty terms are defined on activations of hidden vectors and reduced with loss in training process. Reducing the activation, networks condense hidden vectors and often overcompresses specific region in the hidden vector spa...
Standard approaches to named entity recognition (NER) are based on sequential labeling methods, such as conditional random fields (CRFs), which label each word in a sentence and extract entities from them that correspond to named entities. With the extensive deployment of deep learning methods for sequential labeling tasks, state-of-the-art NER per...
Regularization is an important issue for neural
networks because of strong expression power causing overfitting
to data. A regularization method is to penalize cost functions
by activation-based penalty. In its applications to recurrent
neural networks, the method usually assigns penalty uniformly
distributed over time steps. However, required stre...
Neural machine translation decoders are
usually conditional language models to
sequentially generate words for target sen-
tences. This approach is limited to find
the best word composition and requires
help of explicit methods as beam search.
To help learning correct compositional
mechanisms in NMTs, we propose con-
cept equalization using direct...
Document length normalization is one of the fundamental
components in a retrieval model, because term frequencies can readily be
increased in long documents. The key hypotheses in the literature
regarding document length normalization are the verbosity and scope
hypotheses, which imply that document length normalization should
consider the distingu...
In this paper, we introduce a new ensemble method specialized to sequential labeling for syntax analysis and propose a neural network framework adopting the ensemble for dependency parsing of natural sentences. The ensemble method assigns sliding input sites to component classifiers which commonly include the position of the label to predict. The m...
A widely used automatic translation approach, phrase-based statistical machine translation, learns a probabilistic translation model composed of phrases from a large parallel corpus with a large language model. The translation model is often enormous because of many combinations of source and target phrases, which leads to the restriction of applic...
In this paper, we propose a classification-based approach for hybridizing statistical machine translation and rule-based machine translation. Both the training dataset used in the learning of our proposed classifier and our feature extraction method affect the hybridization quality. To create one such training dataset, a previous approach used auto...
Abstract—Estimation of distribution algorithms applied to genetic programming have been studied by a number of authors. Like all estimation of distribution algorithms, they suffer from biases induced by the model building and sampling process. However, the biases are amplified in the algorithms for genetic programming. In particular, many systems u...
Probabilistic model-building algorithms (PMBA), a subset of evolutionary algorithms, have been successful in solving complex problems, in addition providing analytical information about the distribution of fit individuals. Most PMBA work has concentrated on the string representation used in typical genetic algorithms. A smaller body of work has aim...
Estimation of Distribution Algorithms in Genetic Programming (EDA-GP) are algorithms applying stochastic model learning to genetic programming. In spite of various potential benefits, probabilistic prototype tree (PPT) based EDA-GPs recently appeared to have a critical problem of losing diversity easily. As an alternative learning method to reduce...
Genetic programming is very computationally intensive, particularly in CPU time. A number of approaches to evaluation cost reduction have been proposed, among them early termination of evaluation (applicable in problem domains where estimates of the final fitness value are available during evaluation). Like all cost reduction techniques, early term...
Some Genetic Programming (GP) systems have fewer structural constraints than expression tree GP, permitting a wider range of operators. Using one such system, TAG3P, we compared the effects of such new operators with more standard ones on individual fitness, size and depth, comparing them on a number of symbolic regression and tree structuring prob...
This work introduces hardware implementation of artificial neural networks (ANNs) with learning ability on field programmable gate array (FPGA) for dynamic system identification. The learning phase is accomplished by using the improved particle swarm ...
Much recent research in Estimation of Distribution Algorithms (EDA) applied to Genetic Programming has adopted a Stochastic Context Free Grammar(SCFG)-based model formalism. However these methods generate biases which may be indistinguishable from selection bias, resulting in sub-optimal performance. The primary factor generating this bias is the c...
In estimation of distribution algorithms (EDAs), probability models hold accumulating evidence on the location of an optimum. Stochastic sampling drift has been heavily researched in EDA optimization but not in EDAs applied to genetic programming (EDA-GP). We show that, for EDA-GPs using probabilistic prototype tree models, stochastic drift in samp...
Directed protein evolution has led to major advances in organic chemistry, enabling the development of highly optimised proteins. The SELEX method has also been highly effective in evolving ribose nucleic acid (RNA) or deoxy-ribose nucleic acid (DNA) molecules; variants have been proposed which allow SELEX to be used in protein evolution. All of th...
We investigate the application of adaptive operator selection rates to Genetic Programming. Results confirm those from other
areas of evolutionary algorithms: adaptive rate selection out-performs non-adaptive methods, and among adaptive methods, adaptive
pursuit out-performs probability matching. Adaptive pursuit combined with a reward policy that...
Estimation of Distribution Algorithms were introduced into Genetic Programming over 15 years ago, and have demonstrated good performance on a range of problems, but there has been little research into their limitations. We apply two such algorithms - scalar and vectorial Stochastic Grammar GP - to Daida's well-known Lid problem, to better understan...
Analysis of artificial evolutionary systems uses post-processing to extract information from runs. Many effective methods
have been developed, but format incompatibilities limit their adoption. We propose a solution combining XML and compression,
which imposes modest overhead. We describe the steps to integrate our schema in existing systems and to...
Probabilistic models are widely used in evolutionary and related algorithms. In Genetic Programming (GP), the Probabilistic
Prototype Tree (PPT) is often used as a model representation. Drift due to sampling bias is a widely recognised problem, and
may be serious, particularly in dependent probability models. While this has been closely studied in...
Projects
Project (1)