Denis Nasonov

Denis Nasonov
  • PhD
  • Senior Researcher at ITMO University

About

81
Publications
14,321
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
644
Citations
Introduction
Denis Nasonov currently works at the Department of High Perfomance Computing, ITMO University. Denis does research in Algorithms, Distributed Computing and Software Engineering. Their current project is 'Supercomputer simulation of critical phenomena in complex social systems'.
Current institution
ITMO University
Current position
  • Senior Researcher
Additional affiliations
January 2012 - June 2015
ITMO University
Position
  • Senior Researcher
February 2011 - present
ITMO University
Position
  • Senior Researcher

Publications

Publications (81)
Preprint
Full-text available
In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode. AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with...
Article
Full-text available
Highlights What are the main findings? Enhanced Event Detection Accuracy: The introduction of the SemConvTree model, which integrates improved versions of BERTopic, TSB-ARTM, and SBert-Zero-Shot, enables a significant enhancement in the detection accuracy of urban events. The model’s ability to incorporate semantic analysis along with statistical e...
Preprint
Full-text available
The digital world is increasingly invading our reality, which leads to the formation of a significant reflection of the processes and activities taking place in the smart city. Such activities include well-known urban events, celebrations, and those with a very local character. Due to the mass occurrence, events have a comparable influence on the f...
Chapter
The paper addresses a problem of tuning topic models with additive regularization by introducing a novel hybrid evolutionary approach that combines Genetic and Nelder-Mead algorithms to generate domain-specific topic models with better quality. Introducing Nelder-Mead into the Genetic Algorithm pursues the goal of enhancing exploitation capabilitie...
Conference Paper
Full-text available
Topic modeling is a popular unsupervised method for text corpora processing to obtain interpreted knowledge on the data. However there is an automatic quality measurement gap between existing metrics, human evaluation and performance on the target tasks. That is a big challenge for automatic hyperparameter tuning methods as they heavily rely on the...
Article
This study describes the semi-automated pipeline created for the comprehensive analysis of the urban areas with the extremely low and extremely high popularity levels. It includes the geo-frequency analysis of the Russian-language Instagram publications for the St. Petersburg area and selection of areas with the extreme values of the popularity lev...
Article
Full-text available
Social media stores a significant amount of information which can be used for extraction of specific knowledge. A variety of topics that arise there concerns a lot of everyday life aspects, including urban-related problems. In this work, we demonstrate the way of using the texts from social media on the topic of housing and utility problems, such a...
Conference Paper
Full-text available
Online advertising is one of the most widespread ways to reach and increase a target audience for those selling products. Usually having a form of a banner, advertising engages users into visiting a corresponding webpage. Professional generation of banners requires creative and writing skills and a basic understanding of target products. The great...
Chapter
One of the areas that gathers momentum is the investigation of location-based social networks (LBSNs) because the understanding of citizens’ behavior on various scales can help to improve quality of living, enhance urban management, and advance the development of smart cities. But it is widely known that the performance of algorithms for data minin...
Article
Full-text available
We present a new approach to large-scale supervised heterogeneous graph classification. We decouple a large heterogeneous graph into smaller homogeneous ones. In this paper, we show that our model provides results close to the state-of-the-art model while greatly simplifying calculations and makes it possible to process complex heterogeneous graphs...
Conference Paper
In this work, we show how social media data can be used for the improvement of touristic experience. We present an algorithm for automated touristic paths construction. Score function for location depends on three components: location's social media popularity and rating, distances of place from others in route, and location's relevance to the city...
Chapter
It is common practice nowadays to use multiple social networks for different social roles. Although this, these networks assume differences in content type, communications and style of speech. If we intend to understand human behaviour as a key-feature for recommender systems, banking risk assessments or sociological researches, this is better to a...
Preprint
It is common practice nowadays to use multiple social networks for different social roles. Although this, these networks assume differences in content type, communications and style of speech. If we intend to understand human behaviour as a key-feature for recommender systems, banking risk assessments or sociological researches, this is better to a...
Article
Full-text available
Orienteering problem (OP) is a routing problem, where the aim is to generate a path through set of nodes, which would maximize total score and would not exceed the budget. In this paper, we present an extension of classic OP—Orienteering Problem with Functional Profits (OPFP), where the score of a specific point depends on its characteristics, posi...
Chapter
Full-text available
Today advanced research is based on complex simulations which require a lot of computational resources that usually are organized in a very complicated way from technical part of the view. It means that a scientist from physics, biology or even sociology should struggle with all technical issues on the way of building distributed multi-scale applic...
Article
Full-text available
Time series data as its analysis and applications recently have become increasingly important in different areas and domains. Many fields of science and industry rely on storing and processing large amounts of time series - economics and finance, medicine, the Internet of Things, environmental protection, hardware monitoring, and many others. This...
Article
Full-text available
The development of information technologies entails a nonlinear growth of both volumes of data and the complexity of data processing itself. Scheduling is one of the main components for optimizing the operation of the computing system. Currently, there are a large number of scheduling algorithms. However, even in spite of existing hybrid schemes, t...
Article
Full-text available
Many companies want or prefer to use chatbot systems to provide smart assistants for accompanying human specialists especially newbies with automatic consulting. Implementation of a really useful smart assistant for a specific domain requires a knowledge base for this domain, that often exists only in the form of text documentation and manuals. Lac...
Article
Full-text available
In today’s world, it is crucial to be proactive and be prepared for events which are not happening yet. Thus, there is no surprise that in the field of social media analysis the research agenda has moved from the development of event detection methods to a brand new area - event prediction models. This research field is extremely important for all...
Article
Full-text available
Data provided by social media becomes an increasingly important analysis material for social scientists, market analysts, and other stakeholders. Diversity of interests leads to the emergence of a variety of crawling techniques and programming solutions. Nevertheless, these solutions have a lack of flexibility to satisfy requirements of different u...
Chapter
Full-text available
Today Big Data occupies a crucial part of scientific research areas as well as in the business analysis of large companies. Each company tries to find the best way to make generated Big Data sets valuable and profitable. However, in most cases, companies have not enough opportunities and budget to solve this complex problem. On the other hand, ther...
Chapter
The current paper is devoted to a problem of deviant users’ identification in social media. For this purpose, each user of social media source should be described through a profile that aggregates open information about him/her within the special structure. Aggregated user profiles are formally described in terms of multivariate random process. The...
Poster
Full-text available
Conference poster for paper "Kalyuzhnaya, A. V., Nikitin, N. O., Butakov, N., & Nasonov, D. (2018, June). Precedent-Based Approach for the Identification of Deviant Behavior in Social Media. In International Conference on Computational Science (pp. 846-852). Springer, Cham. "
Conference Paper
Full-text available
Modern composite scientific applications, also called scientific workflows, require large processing capacities. Cloud environments provide high performance and flexible infrastructure, which can be easily employed for workflows execution. Since cloud resources are paid in the most cases, there is a need to utilize these resources with maximal effi...
Article
Full-text available
To provide fault tolerance, modern distributed storage systems use specialized network topologies and consensus protocols that create high overheads. The main disadvantage of existing specialized topologies is a difficulty to implement an efficient data placement that takes into account locality of the data. In scientific problems very often it is...
Conference Paper
The Multiscale Modelling and Simulation approach is a powerful methodological way to identify sub-models and classify their interaction. The execution order and interaction of computational modules are described in the form of workflow. This workflow can be executed as a single HPC cluster job if there is a middleware which schedule modules executi...
Conference Paper
Efficient data placement and fast query processing are very important for modern distributed storages. In this paper, we present Exarch - modular distributed data storage, which utilizes data semantics for providing efficient data indexing and partitioning. It can be easily extended for any data format using three generic interfaces that are respon...
Article
Today, metocean investigations, combined with forecasts and analysis of extreme events, require new design and development approaches because of their complexity. Extreme metocean events forecasting and prevention is an urgent computing task from decision-making and for reaction point of view. In this case, urgent computing scenario is an essential...
Article
Full-text available
The main objective of Decision Support Systems is detection of critical states and response on them in time. Such systems can be based on constant monitoring of continuously incoming data. Stream processing is carried out on the basis of computing infrastructure and specialized frameworks such as Apache Storm, Flink, Spark Streaming. However, to pr...
Article
The importance of data collection, processing, and analysis is rapidly growing. Big Data technologies are in high demand in many fields, including bio-informatics, hydrometeorology, and high energy physics. One of the most popular computational paradigms used in large data processing frameworks is the MapReduce programming model. Today, majority of...
Article
The development of an efficient Early Warning System (EWS) is essential for the prediction and prevention of imminent natural hazards. In addition to providing a computationally intensive infrastructure with extensive data transfer, high-execution reliability and hard-deadline satisfaction are important requirements of EWS scenario processing. This...
Article
Information spreading analysis is an important problem in the scientific community and is widely studied today using different techniques, from the data analysis to the agent-based modelling. For some extreme situations, like fire or flood, there is little or no reliable information about users’ activity available. That is why an efficient simulati...
Article
Full-text available
Estimation of the execution time is an important part of the workflow scheduling problem. The aim of this paper is to highlight common problems in estimating the workflow execution time and propose a solution that takes into account the complexity and the stochastic aspects of the workflow components as well as their runtime. The solution proposed...
Article
Full-text available
Simulation of the agent-based model has several problems related to scalability, the accuracy of reproduction of motion. The increase in the number of agents leads to additional computations and hence the program run time also increases. This problem can be solved using distributed simulation and distributed computational environments such as clust...
Article
Urgent computing capabilities for early warning systems and decision support systems are vital in situations that require execution be completed before a specified deadline. The cost of missing the deadline in such situations can be unacceptable, while providing insufficient results can mean an ineffective solution that may come at a very high cost...
Article
Full-text available
Cloud computational platforms today are very promising for execution of scientific applications since they provide ready to go infrastructure for almost any task. However, complex tasks, which contain a large number of interconnected applications, which are usually called workflows, require efficient tasks scheduling in order to satisfy user define...
Article
Full-text available
Information spreading simulation is an important problem in scientific community and is widely studied nowadays using different techniques. Efficient users’ activity simulation for urgent scenarios is even more important, because fast and accurate reaction in such situations can save human lives. In this paper we present multi-layer agent-based net...
Article
Full-text available
The usage of Hadoop cluster is widely spread in different business and academic spheres. The performance of Hadoop depends on various factors, such as amount and frequency of CPU cores, RAM capacity, throughput of storages, dataflow's intensity, network bandwidth and latency, etc. The heterogeneity of a computing environment raises such problems as...
Article
Full-text available
Modern scientific applications are composed of various methods, techniques and models to solve complicated problems. Such composite applications commonly are represented as workflows. Workflow scheduling is a well-known optimization problem, for which there is a great amount of solutions. Most of the algorithms contain parameters, which affect the...
Article
Full-text available
In this paper we propose an idea to scale workload via elastic quality of solution provided by the particular streaming applications. The contribution of this paper consists of quality-based workload scaling model, implementation details for quality assessment mechanism implemented at the top of Apache Storm and experimental evaluation of the propo...
Article
Full-text available
The paper describes a problem of computer simulation of critical phenomena in complex social systems on a petascale computing systems in frames of complex networks approach. The three-layer system of nested models of complex networks is proposed including aggregated analytical model to identify critical phenomena, detailed model of individualized n...
Article
The optimal workflow scheduling is one of the most important issues in heterogeneous distributed computational environments. Existing heuristic and evolutionary scheduling algorithms have their advantages and disadvantages. In this work we propose a hybrid algorithm based on heuristic methods and genetic algorithm (GA) that combines best characteri...
Conference Paper
In this paper, we present a new coevolutionary algorithm for workflow scheduling in a dynamically changing environment. Nowadays, there are many efficient algorithms for workflow execution planning, many of which are based on the combination of heuristic and metaheuristic approaches or other forms of hybridization. The coevolutionary genetic algori...
Article
This paper presents ongoing research aimed at developing a data-driven platform for clinical decision support systems (DSSs) that require integration and processing of various data sources within a single solution. Resource management is developed within a framework of an urgent computing approach to address changing requirements defined by the inc...
Article
Full-text available
In this work the framework for detectors layout optimization based on a multi-agent simulation is proposed. Its main intention is to provide a decision support team with a tool for automatic design of social threat detection systems for public crowded places. Containing a number of distributed detectors, this system performs detection and an identi...
Article
Full-text available
Nowadays the importance of data collection, processing, and analyzing is growing tremendously. Big Data technologies are in high demand in different areas, including bio-informatics, hydrometeorology, high energy physics, etc. One of the most popular computation paradigms that is used in large data processing frameworks is the MapReduce programming...
Article
Full-text available
This paper presents a tool for visualization of the executed processes upon the infrastructure of the cloud computing platform CLAVIRE. Such class of tools is extremely important for the cloud platform developers and the end users, because it gives extensional opportunities for platform processes analyzing by providing interactive mechanisms to sup...
Article
Full-text available
An efficient scheduling is the essential part of complex scientific applications processing in computational distributed environments. The computational complexity comes as from environment heterogeneity as from the application structure that usually is represented as a workflow which contains different linked tasks. A lot of well-known techniques...
Conference Paper
Full-text available
Today technological progress makes scientific community to challenge more and more complex issues related to computational organization in distributed heterogeneous environments, which usually include cloud computing systems, grids, clusters, PCs and even mobile phones. In such environments, traditionally, one of the most frequently used mechanisms...
Conference Paper
Full-text available
The paper presents a dynamic Domain-Specific Language (DSL) which is developed to provide the capability of high-level BigData task descriptions within e-Science applications. The dynamic structure of the DSL supports language structure extension depending on a particular problem domain defining specific requirements, data processing, and aggregati...
Conference Paper
Full-text available
The paper presents the technology for building e-Science cyberinfrastructure which enables integration of regular cloud computing environment with big data facilities and stream data processing. The developed technology is aimed to support uniform dynamic interaction with the user during composite application building and execution, as well as resu...
Article
Full-text available
Typical patterns of using scientific workflow management systems (SWMS) include periodical executions of prebuilt workflows with precisely known estimates of tasks' execution times. Combining such workflows into sets could sufficiently improve resulting schedules in terms of fairness and meeting users' constraints. In this paper, we propose a clust...
Article
Coastal surge floods are extreme phenomena with low frequency of occurrence. In this paper, a combination of two approaches based on the stochastic model for multivariate extremes and the synthetic storm model was applied for coastal floods reconstruction in St. Petersburg. The stochastic model is based on multivariate distributions of cyclone’ par...
Article
Digital data universe size is exponentially growing up from year to year and currently is estimated to be more than 4.4 Zb. It compels scientific community to found out more efficient approaches in collecting, organizing and processing of information. A lot of enterprise solutions offer extended software tools based on MapReduce principles for big...
Article
Full-text available
Abstract Saint-Petersburg Flood Warning System (FWS) is a life-critical system that requires permanent maintenance and development. Tasks that arise during thes e processes could be much more resource- intensive than an operational loop of the system and may involve complex problems for research. Thereby it is essential to have a special software t...
Article
Full-text available
Investigations in development of efficient early warning systems (EWS) are essentially for prediction and warning of upcoming natural hazards. Besides providing of communicationand computationally intensive infrastructure, the high resource reliability and hard deadline option are requiredfor EWS scenarios processing in order to get guaranteed info...
Article
Full-text available
The paper presents the technological platform for large data processing within Early Warning Systems (EWS). The core idea of general-purpose EWS platform is based on abstract data processing performed with the use of domain-specific imperative (procedures) and declarative (semantic structure) knowledge. The platform is based on the CLAVIRE cloud co...
Chapter
The optimal workflow scheduling is one of the most important issues in heterogeneous distributed computational environment. Existing heuristic and evolutionary scheduling algorithms have their advantages and disadvantages. In this work we propose a hybrid algorithm based on Heterogeneous Earliest Finish Time heuristic and genetic algorithm that com...
Article
State-of-the-art distributed computational environments requires increasingly flexible and efficient workflow scheduling procedures in order to satisfy the increasing requirements of the scientific community. In this paper, we present a novel, nature-inspired scheduling approach based on the leveraging of inherited populations in order to increase...
Article
Full-text available
This paper discusses two main classes of tasks for flood risk assessment. The first class analyzes feasible flood damage in the conditions when the barrier system has inoperable state and therefore there is no way to control flood flowing. In this case we use historical data and assessment methods based on principles of extreme value theory. Also f...
Article
Full-text available
Abstract Workflow became a mainstream formalism for complex scientific problems’ representation and is applied to different domains. In this paper we propose and analyze the interactive workflow model as the base for urgent computing (UC) infrastructures. Majority of research works in the area of urgent computing is focused on the deadline-driven s...

Network

Cited By