Leonid Zhukov

Leonid Zhukov
National Research University Higher School of Economics | HSE · School of Applied Mathematics and Information Science

PhD

About

98
Publications
12,667
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,842
Citations
Citations since 2017
30 Research Items
595 Citations
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
Introduction
Professor Computer Science, HSE University. CEO of Artificial Intelligence Research Institute & Sber AI Lab Director. Winter 2017 HSE course on Network Science http://www.leonidzhukov.net/hse/2017/networks/
Additional affiliations
April 2021 - present
Artificial Intelligence Research Institute (AIRI)
Position
  • CEO
September 2007 - present
National Research University Higher School of Economics
Position
  • Professor (Full)
October 2002 - May 2006
Yahoo
Position
  • Sr. Research Scientist
Education
September 1993 - June 1998
University of Utah
Field of study
  • Theoretical Physics
September 1986 - June 1993
National Research Nuclear University MEPhI
Field of study
  • Theoretical Physics

Publications

Publications (98)
Conference Paper
Full-text available
Record linkage, or entity resolution, is an important area of data mining. Name matching is a key component of systems for record linkage. Alternative spellings of the same name are a common occurrence in many applications. We use the largest collection of genealogy person records in the world together with user search query logs to build name-matc...
Article
Full-text available
We consider a spatiotemporal method for source localization, taking advantage of the entire EEG time series to reduce the configuration space we must evaluate. The EEG data are first decomposed into signal and noise subspaces using a principal component analysis (PCA) decomposition. This partitioning allows us to easily discard the noise subspace,...
Conference Paper
Full-text available
In this paper we develop a new technique for tracing anatomical fibers from 3D tensor fields. The technique extracts salient tensor features using a local regularization technique that allows the algorithm to cross noisy regions and bridge gaps in the data. We applied the method to human brain DT-MRI data and recovered identifiable anatomical struc...
Conference Paper
Full-text available
In this paper we present top-down and bottom-up hierarchical clustering methods for large bipartite graphs. The top down approach employs a flow-based graph partitioning method, while the bottom up approach is a multiround hybrid of the single-link and average-link agglomerative clustering methods. We evaluate the quality of clusters obtained by th...
Conference Paper
Full-text available
In this paper we use advanced tensor visualization techniques to study 3D diffusion tensor MRI data of a heart. We use scalar and tensor glyph visualization methods to investigate the data and apply a moving least squares (MLS) fiber tracing method to recover and visualize the helical structure and the orientation of the heart muscle fibers.
Article
Full-text available
Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. Most works focus on retrospective analysis of an...
Article
Efficient defect detection in solar cell manufacturing is crucial for stable green energy technology manufacturing. This paper presents a deep-learning-based automatic detection model SeMaCNN for classification and anomaly detection of electroluminescent images for solar cell quality evaluation. The core of the model is an anomaly detection algorit...
Article
Full-text available
The size and complexity of deep neural networks used in AI applications continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI to help data scientists and researchers to track the energy consumption and equivalent CO2 emissions of their model...
Preprint
Full-text available
Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence fo...
Preprint
Modern industrial facilities generate large volumes of raw sensor data during production process. This data is used to monitor and control the processes and can be analyzed to detect and predict process abnormalities. Typically, the data has to be annotated by experts to be further used in predictive modeling. Most of today's research is focusing o...
Preprint
Full-text available
Efficient defect detection in solar cell manufacturing is crucial for stable green energy technology manufacturing. This paper presents a deep-learning-based automatic detection model SeMaCNN for classification and semantic segmentation of electroluminescent images for solar cell quality evaluation and anomalies detection. The core of the model is...
Preprint
Full-text available
Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence fo...
Preprint
Full-text available
The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI 1 to help data scientist and researchers track energy consumption and equivalent CO 2 emissions of their models in a straightforward way. I...
Article
Intermetallic compounds formed by two or more metals are characterized by wide structural diversity. The design of complex intermetallics, such as quasicrystals or their approximants, is a challenging scientific problem. We present a hybrid computational approach for searching for new stable 1/1 Mackay-type quasicrystal approximants in Sc-rich inte...
Preprint
Full-text available
Active learning (AL) is a prominent technique for reducing the annotation effort required for training machine learning models. Deep learning offers a solution for several essential obstacles to deploying AL in practice but introduces many others. One of such problems is the excessive computational resources required to train an acquisition model a...
Article
Full-text available
The COVID-19 pandemic created a significant interest and demand for infection detection and monitoring solutions. In this paper we propose a machine learning method to quickly triage COVID-19 using recordings made on consumer devices. The approach combines signal processing methods with fine-tuned deep learning networks and provides for signal deno...
Preprint
Full-text available
The COVID-19 pandemic created a significant interest and demand for infection detection and monitoring solutions. In this paper we propose a machine learning method to quickly triage COVID-19 using recordings made on consumer devices. The approach combines signal processing methods with fine-tuned deep learning networks and provides methods for sig...
Chapter
As the part of the fourth industrial revolution, the way to perform maintenance has been significantly influenced by digital solutions. Predictive maintenance is one of the key approaches in that context. The idea of increasing equipment availability and reducing maintenance costs at the same time has led to a strong interest from industry. Dependi...
Article
Automated early process fault detection and prediction remains a challenging problem in industrial processes. Traditionally it has been done by multivariate statistical analysis of sensor readings and, more recently, with the help of machine learning methods. The quality of machine learning models strongly depends on feature engineering, that in tu...
Conference Paper
Full-text available
INTRODUCTION The following page contains a full transcript of the discussion ‘Can machines think - 70?’, which was a part of Artificial Intelligence Journey (AIJ), annual Sber’s conference. The discussion was timed to 70’th anniversary of the famous article ‘Computing Machinery and Intelligence’ written by Alan Turing. The article was published in...
Chapter
Nowadays, a lot of scientists’ works aim to improve the quality of people’s life but it could be quite complicated without building a successful collaboration. Productive partnerships can increase research efficiency in many cases and make a huge impact on society. However, today there is no clear way to find such collaborators. In this paper, we p...
Article
Full-text available
We present a study on co-authorship network representation based on network embedding together with additional information on topic modeling of research papers and new edge embedding operator. We use the link prediction (LP) model for constructing a recommender system for searching collaborators with similar research interests. Extracting topics fo...
Chapter
Online social networks play a major role in the spread of information on a very large scale. One of the major problems is to predict information propagation using social network interactions. The main purpose of this paper is to construct a heuristic model of a weighted graph based on empirical data that can outperform the existing models. We sugge...
Chapter
One of the major problem for recommendation services is commercial astroturfing. This work is devoted to constructing a model capable of detecting astroturfing in customer reviews based on network analysis. The model uses projecting a multipartite network to a unipartite graph, for which we detect communities and represent actors with falsified opi...
Chapter
Full-text available
Human brain networks show modular organization: cortical regions tend to form densely connected modules with only weak inter-modular connections. However, little is known on whether modular structure of brain networks is reliable in terms of test–retest reproducibility and, most importantly, to what extent these topological modules are anatomically...
Chapter
In this paper, we consider new formulation of graph embedding algorithm, while learning node and edge representation under common constraints. We evaluate our approach on link prediction problem for co-authorship network of HSE researchers’ publications. We compare it with existing structural network embeddings and feature-engineering models.
Chapter
Co-authorship networks contain invisible patterns of collaboration among researchers. The process of writing joint paper can depend of different factors, such as friendship, common interests, and policy of university. We show that, having a temporal co-authorship network, it is possible to predict future publications. We solve the problem of recomm...
Conference Paper
Co-authorship networks contain hidden structural patterns of research collaboration. While some people may argue that the process of writing joint papers depends on mutual friendship, research interests, and university policy, we show that, given a temporal co-authorship network, one could predict the quality and quantity of future research publica...
Chapter
Modern co-authorship networks contain hidden patterns of researchers interaction and publishing activities. We aim to provide a system for selecting a collaborator for joint research or an expert on a given list of topics. We have improved a recommender system for finding possible collaborator with respect to research interests and predicting quali...
Conference Paper
Human anatomical brain networks derived from the analysis of neuroimaging data are known to demonstrate modular organization. Modules, or communities, of cortical brain regions capture information about the structure of connections in the entire network. Hence, anatomical changes in network connectivity (e.g., caused by a certain disease) should tr...
Conference Paper
Modern bibliographic databases contain significant amount of information on publication activities of research communities. Researchers regularly encounter challenging task of selecting a co-author for joint research publication or searching for authors, whose papers are worth reading. We propose a new recommender system for finding possible collab...
Conference Paper
We consider a task of predicting normal and pathological phenotypes from macroscale human brain networks. These networks (connectomes) represent aggregated neural pathways between brain regions. We point to properties of connectomes that make them different from graphs arising in other application areas of network science. We discuss how machine le...
Article
Full-text available
In this paper, we tackle a problem of predicting phenotypes from structural connectomes. We propose that normalized Laplacian spectra can capture structural properties of brain networks, and hence graph spectral distributions are useful for a task of connectome-based classification. We introduce a kernel that is based on earth mover's distance (EMD...
Conference Paper
Full-text available
This paper aims at tackling the problem of brain network classification using machine learning algorithms based on the spectra of the networks’ matrices. Two approaches are dis-cussed: first, linear and tree-based models are run on the vectors of sorted eigenvalues of the adjacency matrix, the Laplacian matrix and the normalized Laplacian; next, SV...
Conference Paper
Full-text available
The problem of link prediction gathered a lot of attention in the last few years, arising in different applications ranging from recommendation systems to social networks. In this paper, we will describe the most popular similarity indices, compare their performance in their ability to show links with the highest probability of being removed from i...
Technical Report
Full-text available
In this paper we present an algorithm for layout and visualization of music collec- tions based on similarities between musical artists. The core of the algorithm consists of a non-linear low dimensional embedding of a similarity graph constrained to the surface of a hyper-sphere. This approach effectively uses additional dimensions in the embeddin...
Technical Report
Full-text available
Name matching is a key component of systems for entity resolution or record linkage. Alternative spellings of the same names are a com- mon occurrence in many applications. We use the largest collection of genealogy person records in the world together with user search query logs to build name matching models. The procedure for building a crowd-sou...
Article
Full-text available
Two novel approaches to triclustering of three-way binary data are proposed. Tricluster is defined as a dense subset of a ternary relation Y defined on sets of objects, attributes, and conditions, or, equivalently, as a dense submatrix of the adjacency matrix of the ternary relation Y. This definition is a scalable relaxation of the notion of trico...
Conference Paper
Full-text available
A novel approach to triclustering of a three-way binary data is proposed. Tricluster is defined in terms of Triadic Formal Concept Analysis as a dense triset of a binary relation Y, describing relationship between objects, attributes and conditions. This definition is a relaxation of a triconcept notion and makes it possible to find all triclusters...
Chapter
Geometrically, a diffusion tensor can be thought of as an ellipsoid with its three axes oriented along the tensor's three perpendicular eigenvectors and semi-axis lengths proportional to the square root of eigenvalues of the tensor mean diffusion distances. This chapter develops a new technique for tracing anatomical fibers from 3D diffusion-tensor...
Chapter
Full-text available
Handbook of Biomedical Image Analysis: Registration Models (Volume III) is dedicated to the algorithms for registration of medical images and volumes. This volume is aimed at researchers and educators in imaging sciences, radiological imaging, clinical and diagnostic imaging, biomedical engineering, physicists covering different medical imaging mod...
Article
Full-text available
The influence of head tissue conductivity on magnetoencephalography (MEG) was investigated by comparing the normal component of the magnetic field calculated at 61 detectors and the localization accuracy of realistic head finite element method (FEM) models using dipolar sources and containing altered scalp, skull, cerebrospinal fluid, gray, and whi...
Conference Paper
Full-text available
In this paper, we consider the application of the singular value decomposition (SVD) to a search term suggestion system in a pay-for-performance search market. We propose a positive and negative refinement method based on orthogonal subspace projections. We demonstrate that SVD subspace-based methods: 1) expand coverage by reordering the results, a...
Article
In this manuscript, we evaluate the application of the singular value decomposition (SVD) to a search term suggestion system in a pay-for-performance search market. We propose a novel positive and negative relevance feedback method for search refinement based on orthogonal subspace projections. We apply these methods to the subset of Overture's mar...
Article
Full-text available
Many applications can benefit from soft clustering, where each datum is assigned to multiple clusters with membership weights that sum to one. In this paper we present a comparison of principal component analysis (PCA) and independent component analysis (ICA) when used for soft clustering. We provide a short mathematical background for these method...
Article
Full-text available
Estimating the location and distribution of current sources within the brain from electroencephalographic (EEG) recordings is an ill-posed inverse problem. The illposedness of the problem is due to a lack of uniqueness in the solution; that is, di#erent configurations of sources can generate identical external fields. Additionally, the existence of...
Article
Full-text available
Estimating the location and distribution of electric current sources within the brain from electroencephalographic (EEG) recordings is an ill-posed inverse problem. The ill-posed nature of the inverse EEG problem is due to the lack of a unique solution such that dierent congurations of sources can generate identical external electric elds. In this...
Article
Full-text available
In this chapter, we examine the problem of Web community identifica- tion expressed in terms of the graph or network structure induced by the Web. While the task of community identification is obviously related to the more fundamental problems of graph partitioning and clustering, the basic task is dierentiated from other problems by being within t...
Article
Full-text available
Segmentation of anatomical regions of the brain is one of the fundamental problems in medical image analysis. It is traditionally solved by iso-surfacing or through the use of activecontours/deformable models on a gray-scale MRI data. In this paper we develop a technique that uses anisotropic diffusion properties of brain tissue available from DTMR...
Article
Full-text available
this paper, we introduce a novel method for localizing epileptogenic sources in patients with multifocal temporal lobe epilepsy. Localizing multiple deep sources is computationally challenging due to superposition of signal from the active regions and "blurring" of the signal as it projects to the scalp. We address these challenges by incorporating...
Article
Diffusion weighted magnetic resonance imaging (DW MRI) is sensitive to random thermal movement of water molecules known as Brownian motion. Consequently, DWI can be used to detect the diffusion of water molecules in tissues. Because water molecules can diffuse more easily along fiber tracts, for example in the brain, rather than across them, diffus...
Article
Introduction BioPSE is a scientific programming environment that allows the interactive construction, debugging, and steering of large-scale scientific computations. BioPSE can be envisioned as a "computational workbench," in which a scientist can design and modify simulations interactively via a dataflow programming model. As opposed to the typica...
Article
Full-text available
A pervasive problem in neuroscience is determining which regions of the brain are active, given voltage measurements at the scalp. If accurate solutions to such problems could be obtained, neurologists would gain non-invasive access to patient-specific cortical activity. Access to such data would ultimately increase the number of patients who could...
Conference Paper
Full-text available
Laparoscopic surgical procedures require precise hand and eye coordination based on a 2-dimensional representation of 3-dimensional space. Currently, no metric exists to guide the educational process while surgeons are still on the learning curve. In this paper, we propose to identify and qualify the patterns of movements recorded from the da Vinci...
Article
Full-text available
Segmentation of anatomical regions of the brain is one of the fundamental problems in medical image analysis. It is traditionally solved by iso-surfacing or through the use of active contours/ deformable models on a gray-scale magnetic resonance imaging (MRI) data. We develop a technique that uses anisotropic diffusion properties of brain tissue av...
Article
Laparoscopic surgical procedures require precise hand and eye coordination based on a 2-dimensional representation of 3-dimensional space. Currently, no metric exists to guide the educational process while surgeons are still on the learning curve. In this paper, we propose to identify and qualify the patterns of movements recorded from the da Vinci...
Conference Paper
Full-text available
Typically 3-D MR and CT scans have a relatively high resolution in the scanning X-Y plane, but much lower resolution in the axial Z direction. This non-uniform sampling of an object can miss small or thin structures. One way to address this problem is to scan the same object from multiple directions. In this paper we describe a method for deforming...