About
98
Publications
12,667
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,842
Citations
Citations since 2017
Introduction
Professor Computer Science, HSE University.
CEO of Artificial Intelligence Research Institute & Sber AI Lab Director.
Winter 2017 HSE course on Network Science
http://www.leonidzhukov.net/hse/2017/networks/
Additional affiliations
April 2021 - present
Artificial Intelligence Research Institute (AIRI)
Position
- CEO
September 2007 - present
October 2002 - May 2006
Education
September 1993 - June 1998
September 1986 - June 1993
Publications
Publications (98)
Record linkage, or entity resolution, is an important area of data mining. Name matching is a key component of systems for record linkage. Alternative spellings of the same name are a common occurrence in many applications. We use the largest collection of genealogy person records in the world together with user search query logs to build name-matc...
We consider a spatiotemporal method for source localization, taking advantage of the entire EEG time series to reduce the configuration space we must evaluate. The EEG data are first decomposed into signal and noise subspaces using a principal component analysis (PCA) decomposition. This partitioning allows us to easily discard the noise subspace,...
In this paper we develop a new technique for tracing anatomical fibers from 3D tensor fields. The technique extracts salient tensor features using a local regularization technique that allows the algorithm to cross noisy regions and bridge gaps in the data. We applied the method to human brain DT-MRI data and recovered identifiable anatomical struc...
In this paper we present top-down and bottom-up hierarchical clustering methods for large bipartite graphs. The top down approach employs a flow-based graph partitioning method, while the bottom up approach is a multiround hybrid of the single-link and average-link agglomerative clustering methods. We evaluate the quality of clusters obtained by th...
In this paper we use advanced tensor visualization techniques to study 3D diffusion tensor MRI data of a heart. We use scalar and tensor glyph visualization methods to investigate the data and apply a moving least squares (MLS) fiber tracing method to recover and visualize the helical structure and the orientation of the heart muscle fibers.
Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. Most works focus on retrospective analysis of an...
Efficient defect detection in solar cell manufacturing is crucial for stable green energy technology manufacturing. This paper presents a deep-learning-based automatic detection model SeMaCNN for classification and anomaly detection of electroluminescent images for solar cell quality evaluation. The core of the model is an anomaly detection algorit...
The size and complexity of deep neural networks used in AI applications continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI to help data scientists and researchers to track the energy consumption and equivalent CO2 emissions of their model...
Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence fo...
Modern industrial facilities generate large volumes of raw sensor data during production process. This data is used to monitor and control the processes and can be analyzed to detect and predict process abnormalities. Typically, the data has to be annotated by experts to be further used in predictive modeling. Most of today's research is focusing o...
Efficient defect detection in solar cell manufacturing is crucial for stable green energy technology manufacturing. This paper presents a deep-learning-based automatic detection model SeMaCNN for classification and semantic segmentation of electroluminescent images for solar cell quality evaluation and anomalies detection. The core of the model is...
Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence fo...
The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI 1 to help data scientist and researchers track energy consumption and equivalent CO 2 emissions of their models in a straightforward way. I...
Intermetallic compounds formed by two or more metals are characterized by wide structural diversity. The design of complex intermetallics, such as quasicrystals or their approximants, is a challenging scientific problem. We present a hybrid computational approach for searching for new stable 1/1 Mackay-type quasicrystal approximants in Sc-rich inte...
Active learning (AL) is a prominent technique for reducing the annotation effort required for training machine learning models. Deep learning offers a solution for several essential obstacles to deploying AL in practice but introduces many others. One of such problems is the excessive computational resources required to train an acquisition model a...
The COVID-19 pandemic created a significant interest and demand for infection detection and monitoring solutions. In this paper we propose a machine learning method to quickly triage COVID-19 using recordings made on consumer devices. The approach combines signal processing methods with fine-tuned deep learning networks and provides for signal deno...
The COVID-19 pandemic created a significant interest and demand for infection detection and monitoring solutions. In this paper we propose a machine learning method to quickly triage COVID-19 using recordings made on consumer devices. The approach combines signal processing methods with fine-tuned deep learning networks and provides methods for sig...
As the part of the fourth industrial revolution, the way to perform maintenance has been significantly influenced by digital solutions. Predictive maintenance is one of the key approaches in that context. The idea of increasing equipment availability and reducing maintenance costs at the same time has led to a strong interest from industry. Dependi...
Automated early process fault detection and prediction remains a challenging problem in industrial processes. Traditionally it has been done by multivariate statistical analysis of sensor readings and, more recently, with the help of machine learning methods. The quality of machine learning models strongly depends on feature engineering, that in tu...
INTRODUCTION
The following page contains a full transcript of the discussion ‘Can machines think - 70?’, which was a part of Artificial Intelligence Journey (AIJ), annual Sber’s conference. The discussion was timed to 70’th anniversary of the famous article ‘Computing Machinery and Intelligence’ written by Alan Turing. The article was published in...
Nowadays, a lot of scientists’ works aim to improve the quality of people’s life but it could be quite complicated without building a successful collaboration. Productive partnerships can increase research efficiency in many cases and make a huge impact on society. However, today there is no clear way to find such collaborators. In this paper, we p...
We present a study on co-authorship network representation based on network embedding together with additional information on topic modeling of research papers and new edge embedding operator. We use the link prediction (LP) model for constructing a recommender system for searching collaborators with similar research interests. Extracting topics fo...
Online social networks play a major role in the spread of information on a very large scale. One of the major problems is to predict information propagation using social network interactions. The main purpose of this paper is to construct a heuristic model of a weighted graph based on empirical data that can outperform the existing models. We sugge...
One of the major problem for recommendation services is commercial astroturfing. This work is devoted to constructing a model capable of detecting astroturfing in customer reviews based on network analysis. The model uses projecting a multipartite network to a unipartite graph, for which we detect communities and represent actors with falsified opi...
Human brain networks show modular organization: cortical regions tend to form densely connected modules with only weak inter-modular connections. However, little is known on whether modular structure of brain networks is reliable in terms of test–retest reproducibility and, most importantly, to what extent these topological modules are anatomically...
In this paper, we consider new formulation of graph embedding algorithm, while learning node and edge representation under common constraints. We evaluate our approach on link prediction problem for co-authorship network of HSE researchers’ publications. We compare it with existing structural network embeddings and feature-engineering models.
Co-authorship networks contain invisible patterns of collaboration among researchers. The process of writing joint paper can depend of different factors, such as friendship, common interests, and policy of university. We show that, having a temporal co-authorship network, it is possible to predict future publications. We solve the problem of recomm...
Co-authorship networks contain hidden structural patterns of research collaboration. While some people may argue that the process of writing joint papers depends on mutual friendship, research interests, and university policy, we show that, given a temporal co-authorship network, one could predict the quality and quantity of future research publica...
Modern co-authorship networks contain hidden patterns of researchers interaction and publishing activities. We aim to provide a system for selecting a collaborator for joint research or an expert on a given list of topics. We have improved a recommender system for finding possible collaborator with respect to research interests and predicting quali...
Human anatomical brain networks derived from the analysis of neuroimaging data are known to demonstrate modular organization. Modules, or communities, of cortical brain regions capture information about the structure of connections in the entire network. Hence, anatomical changes in network connectivity (e.g., caused by a certain disease) should tr...
Modern bibliographic databases contain significant amount of information on publication activities of research communities. Researchers regularly encounter challenging task of selecting a co-author for joint research publication or searching for authors, whose papers are worth reading. We propose a new recommender system for finding possible collab...
We consider a task of predicting normal and pathological phenotypes from macroscale human brain networks. These networks (connectomes) represent aggregated neural pathways between brain regions. We point to properties of connectomes that make them different from graphs arising in other application areas of network science. We discuss how machine le...
In this paper, we tackle a problem of predicting phenotypes from structural connectomes. We propose that normalized Laplacian spectra can capture structural properties of brain networks, and hence graph spectral distributions are useful for a task of connectome-based classification. We introduce a kernel that is based on earth mover's distance (EMD...
This paper aims at tackling the problem of brain network classification using machine learning algorithms based on the spectra of the networks’ matrices. Two approaches are dis-cussed: first, linear and tree-based models are run on the vectors of sorted eigenvalues of the adjacency matrix, the Laplacian matrix and the normalized Laplacian; next, SV...
The problem of link prediction gathered a lot of attention in the last few years, arising in different applications ranging from recommendation systems to social networks. In this paper, we will describe the most popular similarity indices, compare their performance in their ability to show links with the highest probability of being removed from i...
In this paper we present an algorithm for layout and visualization of music collec- tions based on similarities between musical artists. The core of the algorithm consists of a non-linear low dimensional embedding of a similarity graph constrained to the surface of a hyper-sphere. This approach effectively uses additional dimensions in the embeddin...
Name matching is a key component of systems for entity resolution or record
linkage. Alternative spellings of the same names are a com- mon occurrence in
many applications. We use the largest collection of genealogy person records in
the world together with user search query logs to build name matching models.
The procedure for building a crowd-sou...
Two novel approaches to triclustering of three-way binary data are proposed. Tricluster is defined as a dense subset of a ternary relation Y defined on sets of objects, attributes, and conditions, or, equivalently, as a dense submatrix of the adjacency matrix of the ternary relation Y. This definition is a scalable relaxation of the notion of trico...
A novel approach to triclustering of a three-way binary data is proposed. Tricluster is defined in terms of Triadic Formal
Concept Analysis as a dense triset of a binary relation Y, describing relationship between objects, attributes and conditions. This definition is a relaxation of a triconcept notion
and makes it possible to find all triclusters...
Geometrically, a diffusion tensor can be thought of as an ellipsoid with its three axes oriented along the tensor's three perpendicular eigenvectors and semi-axis lengths proportional to the square root of eigenvalues of the tensor mean diffusion distances. This chapter develops a new technique for tracing anatomical fibers from 3D diffusion-tensor...
Handbook of Biomedical Image Analysis: Registration Models (Volume III) is dedicated to the algorithms for registration of medical images and volumes. This volume is aimed at researchers and educators in imaging sciences, radiological imaging, clinical and diagnostic imaging, biomedical engineering, physicists covering different medical imaging mod...
The influence of head tissue conductivity on magnetoencephalography (MEG) was investigated by comparing the normal component of the magnetic field calculated at 61 detectors and the localization accuracy of realistic head finite element method (FEM) models using dipolar sources and containing altered scalp, skull, cerebrospinal fluid, gray, and whi...
In this paper, we consider the application of the singular value decomposition (SVD) to a search term suggestion system in a pay-for-performance search market. We propose a positive and negative refinement method based on orthogonal subspace projections. We demonstrate that SVD subspace-based methods: 1) expand coverage by reordering the results, a...
In this manuscript, we evaluate the application of the singular value decomposition (SVD) to a search term suggestion system in a pay-for-performance search market. We propose a novel positive and negative relevance feedback method for search refinement based on orthogonal subspace projections. We apply these methods to the subset of Overture's mar...
Many applications can benefit from soft clustering, where each datum is assigned to multiple clusters with membership weights that sum to one. In this paper we present a comparison of principal component analysis (PCA) and independent component analysis (ICA) when used for soft clustering. We provide a short mathematical background for these method...
Estimating the location and distribution of current sources within the brain from electroencephalographic (EEG) recordings is an ill-posed inverse problem. The illposedness of the problem is due to a lack of uniqueness in the solution; that is, di#erent configurations of sources can generate identical external fields. Additionally, the existence of...
Estimating the location and distribution of electric current sources within the brain from electroencephalographic (EEG) recordings is an ill-posed inverse problem. The ill-posed nature of the inverse EEG problem is due to the lack of a unique solution such that dierent congurations of sources can generate identical external electric elds. In this...
In this chapter, we examine the problem of Web community identifica- tion expressed in terms of the graph or network structure induced by the Web. While the task of community identification is obviously related to the more fundamental problems of graph partitioning and clustering, the basic task is dierentiated from other problems by being within t...
Segmentation of anatomical regions of the brain is one of the fundamental problems in medical image analysis. It is traditionally solved by iso-surfacing or through the use of activecontours/deformable models on a gray-scale MRI data. In this paper we develop a technique that uses anisotropic diffusion properties of brain tissue available from DTMR...
this paper, we introduce a novel method for localizing epileptogenic sources in patients with multifocal temporal lobe epilepsy. Localizing multiple deep sources is computationally challenging due to superposition of signal from the active regions and "blurring" of the signal as it projects to the scalp. We address these challenges by incorporating...
Diffusion weighted magnetic resonance imaging (DW MRI) is sensitive to random thermal movement of water molecules known as Brownian motion. Consequently, DWI can be used to detect the diffusion of water molecules in tissues. Because water molecules can diffuse more easily along fiber tracts, for example in the brain, rather than across them, diffus...
Introduction BioPSE is a scientific programming environment that allows the interactive construction, debugging, and steering of large-scale scientific computations. BioPSE can be envisioned as a "computational workbench," in which a scientist can design and modify simulations interactively via a dataflow programming model. As opposed to the typica...
A pervasive problem in neuroscience is determining which regions of the brain are active, given voltage measurements at the scalp. If accurate solutions to such problems could be obtained, neurologists would gain non-invasive access to patient-specific cortical activity. Access to such data would ultimately increase the number of patients who could...
Laparoscopic surgical procedures require precise hand and eye coordination based on a 2-dimensional representation of 3-dimensional space. Currently, no metric exists to guide the educational process while surgeons are still on the learning curve. In this paper, we propose to identify and qualify the patterns of movements recorded from the da Vinci...
Segmentation of anatomical regions of the brain is one of the fundamental problems in medical image analysis. It is traditionally solved by iso-surfacing or through the use of active contours/ deformable models on a gray-scale magnetic resonance imaging (MRI) data. We develop a technique that uses anisotropic diffusion properties of brain tissue av...
Laparoscopic surgical procedures require precise hand and eye coordination based on a 2-dimensional representation of 3-dimensional space. Currently, no metric exists to guide the educational process while surgeons are still on the learning curve. In this paper, we propose to identify and qualify the patterns of movements recorded from the da Vinci...
Typically 3-D MR and CT scans have a relatively high resolution in the scanning X-Y plane, but much lower resolution in the axial Z direction. This non-uniform sampling of an object can miss small or thin structures. One way to address this problem is to scan the same object from multiple directions. In this paper we describe a method for deforming...