
Denis Nasonov- PhD
- Senior Researcher at ITMO University
Denis Nasonov
- PhD
- Senior Researcher at ITMO University
About
81
Publications
14,321
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
644
Citations
Introduction
Denis Nasonov currently works at the Department of High Perfomance Computing, ITMO University. Denis does research in Algorithms, Distributed Computing and Software Engineering. Their current project is 'Supercomputer simulation of critical phenomena in complex social systems'.
Current institution
Additional affiliations
January 2012 - June 2015
February 2011 - present
Publications
Publications (81)
In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode. AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with...
Highlights
What are the main findings? Enhanced Event Detection Accuracy: The introduction of the SemConvTree model, which integrates improved versions of BERTopic, TSB-ARTM, and SBert-Zero-Shot, enables a significant enhancement in the detection accuracy of urban events. The model’s ability to incorporate semantic analysis along with statistical e...
The digital world is increasingly invading our reality, which leads to the formation of a significant reflection of the processes and activities taking place in the smart city. Such activities include well-known urban events, celebrations, and those with a very local character. Due to the mass occurrence, events have a comparable influence on the f...
The paper addresses a problem of tuning topic models with additive regularization by introducing a novel hybrid evolutionary approach that combines Genetic and Nelder-Mead algorithms to generate domain-specific topic models with better quality. Introducing Nelder-Mead into the Genetic Algorithm pursues the goal of enhancing exploitation capabilitie...
Topic modeling is a popular unsupervised method for text corpora processing to obtain interpreted knowledge on the data. However there is an automatic quality measurement gap between existing metrics, human evaluation and performance on the target tasks. That is a big challenge for automatic hyperparameter tuning methods as they heavily rely on the...
This study describes the semi-automated pipeline created for the comprehensive analysis of the urban areas with the extremely low and extremely high popularity levels. It includes the geo-frequency analysis of the Russian-language Instagram publications for the St. Petersburg area and selection of areas with the extreme values of the popularity lev...
Social media stores a significant amount of information which can be used for extraction of specific knowledge. A variety of topics that arise there concerns a lot of everyday life aspects, including urban-related problems. In this work, we demonstrate the way of using the texts from social media on the topic of housing and utility problems, such a...
Online advertising is one of the most widespread ways to reach and increase a target audience for those selling products. Usually having a form of a banner, advertising engages users into visiting a corresponding webpage. Professional generation of banners requires creative and writing skills and a basic understanding of target products. The great...
One of the areas that gathers momentum is the investigation of location-based social networks (LBSNs) because the understanding of citizens’ behavior on various scales can help to improve quality of living, enhance urban management, and advance the development of smart cities. But it is widely known that the performance of algorithms for data minin...
We present a new approach to large-scale supervised heterogeneous graph classification. We decouple a large heterogeneous graph into smaller homogeneous ones. In this paper, we show that our model provides results close to the state-of-the-art model while greatly simplifying calculations and makes it possible to process complex heterogeneous graphs...
In this work, we show how social media data can be used for the improvement of touristic experience. We present an algorithm for automated touristic paths construction. Score function for location depends on three components: location's social media popularity and rating, distances of place from others in route, and location's relevance to the city...
It is common practice nowadays to use multiple social networks for different social roles. Although this, these networks assume differences in content type, communications and style of speech. If we intend to understand human behaviour as a key-feature for recommender systems, banking risk assessments or sociological researches, this is better to a...
It is common practice nowadays to use multiple social networks for different social roles. Although this, these networks assume differences in content type, communications and style of speech. If we intend to understand human behaviour as a key-feature for recommender systems, banking risk assessments or sociological researches, this is better to a...
Orienteering problem (OP) is a routing problem, where the aim is to generate a path through set of nodes, which would maximize total score and would not exceed the budget. In this paper, we present an extension of classic OP—Orienteering Problem with Functional Profits (OPFP), where the score of a specific point depends on its characteristics, posi...
Today advanced research is based on complex simulations which require a lot of computational resources that usually are organized in a very complicated way from technical part of the view. It means that a scientist from physics, biology or even sociology should struggle with all technical issues on the way of building distributed multi-scale applic...
Time series data as its analysis and applications recently have become increasingly important in different areas and domains. Many fields of science and industry rely on storing and processing large amounts of time series - economics and finance, medicine, the Internet of Things, environmental protection, hardware monitoring, and many others. This...
The development of information technologies entails a nonlinear growth of both volumes of data and the complexity of data processing itself. Scheduling is one of the main components for optimizing the operation of the computing system. Currently, there are a large number of scheduling algorithms. However, even in spite of existing hybrid schemes, t...
Many companies want or prefer to use chatbot systems to provide smart assistants for accompanying human specialists especially newbies with automatic consulting. Implementation of a really useful smart assistant for a specific domain requires a knowledge base for this domain, that often exists only in the form of text documentation and manuals.
Lac...
In today’s world, it is crucial to be proactive and be prepared for events which are not happening yet. Thus, there is no surprise that in the field of social media analysis the research agenda has moved from the development of event detection methods to a brand new area - event prediction models. This research field is extremely important for all...
Data provided by social media becomes an increasingly important analysis material for social scientists, market analysts, and other stakeholders. Diversity of interests leads to the emergence of a variety of crawling techniques and programming solutions. Nevertheless, these solutions have a lack of flexibility to satisfy requirements of different u...
Today Big Data occupies a crucial part of scientific research areas as well as in the business analysis of large companies. Each company tries to find the best way to make generated Big Data sets valuable and profitable. However, in most cases, companies have not enough opportunities and budget to solve this complex problem. On the other hand, ther...
The current paper is devoted to a problem of deviant users’ identification in social media. For this purpose, each user of social media source should be described through a profile that aggregates open information about him/her within the special structure. Aggregated user profiles are formally described in terms of multivariate random process. The...
Conference poster for paper "Kalyuzhnaya, A. V., Nikitin, N. O., Butakov, N., & Nasonov, D. (2018, June). Precedent-Based Approach for the Identification of Deviant Behavior in Social Media. In International Conference on Computational Science (pp. 846-852). Springer, Cham.
"
Modern composite scientific applications, also called scientific workflows, require large processing capacities. Cloud environments provide high performance and flexible infrastructure, which can be easily employed for workflows execution. Since cloud resources are paid in the most cases, there is a need to utilize these resources with maximal effi...
To provide fault tolerance, modern distributed storage systems use specialized network topologies and consensus protocols that create high overheads. The main disadvantage of existing specialized topologies is a difficulty to implement an efficient data placement that takes into account locality of the data. In scientific problems very often it is...
The Multiscale Modelling and Simulation approach is a powerful methodological way to identify sub-models and classify their interaction. The execution order and interaction of computational modules are described in the form of workflow. This workflow can be executed as a single HPC cluster job if there is a middleware which schedule modules executi...
Efficient data placement and fast query processing are very important for modern distributed storages. In this paper, we present Exarch - modular distributed data storage, which utilizes data semantics for providing efficient data indexing and partitioning. It can be easily extended for any data format using three generic interfaces that are respon...
Today, metocean investigations, combined with forecasts and analysis of extreme events, require new design and development approaches because of their complexity. Extreme metocean events forecasting and prevention is an urgent computing task from decision-making and for reaction point of view. In this case, urgent computing scenario is an essential...
The main objective of Decision Support Systems is detection of critical states and response on them in time. Such systems can be based on constant monitoring of continuously incoming data. Stream processing is carried out on the basis of computing infrastructure and specialized frameworks such as Apache Storm, Flink, Spark Streaming. However, to pr...
The importance of data collection, processing, and analysis is rapidly growing. Big Data technologies are in high demand in many fields, including bio-informatics, hydrometeorology, and high energy physics. One of the most popular computational paradigms used in large data processing frameworks is the MapReduce programming model. Today, majority of...
The development of an efficient Early Warning System (EWS) is essential for the prediction and prevention of imminent natural hazards. In addition to providing a computationally intensive infrastructure with extensive data transfer, high-execution reliability and hard-deadline satisfaction are important requirements of EWS scenario processing. This...
Information spreading analysis is an important problem in the scientific community and is widely studied today using different techniques, from the data analysis to the agent-based modelling. For some extreme situations, like fire or flood, there is little or no reliable information about users’ activity available. That is why an efficient simulati...
Estimation of the execution time is an important part of the workflow scheduling problem. The aim of this paper is to highlight common problems in estimating the workflow execution time and propose a solution that takes into account the complexity and the stochastic aspects of the workflow components as well as their runtime. The solution proposed...
Simulation of the agent-based model has several problems related to scalability, the accuracy of reproduction of motion. The increase in the number of agents leads to additional computations and hence the program run time also increases. This problem can be solved using distributed simulation and distributed computational environments such as clust...
Urgent computing capabilities for early warning systems and decision support systems are vital in situations that require execution be completed before a specified deadline. The cost of missing the deadline in such situations can be unacceptable, while providing insufficient results can mean an ineffective solution that may come at a very high cost...
Cloud computational platforms today are very promising for execution of scientific applications since they provide ready to go infrastructure for almost any task. However, complex tasks, which contain a large number of interconnected applications, which are usually called workflows, require efficient tasks scheduling in order to satisfy user define...
Information spreading simulation is an important problem in scientific community and is widely studied nowadays using different techniques. Efficient users’ activity simulation for urgent scenarios is even more important, because fast and accurate reaction in such situations can save human lives. In this paper we present multi-layer agent-based net...
The usage of Hadoop cluster is widely spread in different business and academic spheres. The performance of Hadoop depends on various factors, such as amount and frequency of CPU cores, RAM capacity, throughput of storages, dataflow's intensity, network bandwidth and latency, etc. The heterogeneity of a computing environment raises such problems as...
Modern scientific applications are composed of various methods, techniques and models to solve complicated problems. Such composite applications commonly are represented as workflows. Workflow scheduling is a well-known optimization problem, for which there is a great amount of solutions. Most of the algorithms contain parameters, which affect the...
In this paper we propose an idea to scale workload via elastic quality of solution provided by the particular streaming applications. The contribution of this paper consists of quality-based workload scaling model, implementation details for quality assessment mechanism implemented at the top of Apache Storm and experimental evaluation of the propo...
The paper describes a problem of computer simulation of critical phenomena in complex social systems on a petascale computing systems in frames of complex networks approach. The three-layer system of nested models of complex networks is proposed including aggregated analytical model to identify critical phenomena, detailed model of individualized n...
The optimal workflow scheduling is one of the most important issues in heterogeneous distributed computational environments. Existing heuristic and evolutionary scheduling algorithms have their advantages and disadvantages. In this work we propose a hybrid algorithm based on heuristic methods and genetic algorithm (GA) that combines best characteri...
In this paper, we present a new coevolutionary algorithm for workflow scheduling in a dynamically changing environment. Nowadays, there are many efficient algorithms for workflow execution planning, many of which are based on the combination of heuristic and metaheuristic approaches or other forms of hybridization. The coevolutionary genetic algori...
This paper presents ongoing research aimed at developing a data-driven platform for clinical decision support systems (DSSs) that require integration and processing of various data sources within a single solution. Resource management is developed within a framework of an urgent computing approach to address changing requirements defined by the inc...
In this work the framework for detectors layout optimization based on a multi-agent simulation is proposed. Its main intention is to provide a decision support team with a tool for automatic design of social threat detection systems for public crowded places. Containing a number of distributed detectors, this system performs detection and an identi...
Nowadays the importance of data collection, processing, and analyzing is growing tremendously. Big Data technologies are in high demand in different areas, including bio-informatics, hydrometeorology, high energy physics, etc. One of the most popular computation paradigms that is used in large data processing frameworks is the MapReduce programming...
This paper presents a tool for visualization of the executed processes upon the infrastructure of the cloud computing platform CLAVIRE. Such class of tools is extremely important for the cloud platform developers and the end users, because it gives extensional opportunities for platform processes analyzing by providing interactive mechanisms to sup...
An efficient scheduling is the essential part of complex scientific applications processing in computational distributed environments. The computational complexity comes as from environment heterogeneity as from the application structure that usually is represented as a workflow which contains different linked tasks. A lot of well-known techniques...
Today technological progress makes scientific community to challenge more and more complex issues related to computational organization in distributed heterogeneous environments, which usually include cloud computing systems, grids, clusters, PCs and even mobile phones. In such environments, traditionally, one of the most frequently used mechanisms...
The paper presents a dynamic Domain-Specific Language (DSL) which is developed to provide the capability of high-level BigData task descriptions within e-Science applications. The dynamic structure of the DSL supports language structure extension depending on a particular problem domain defining specific requirements, data processing, and aggregati...
The paper presents the technology for building e-Science cyberinfrastructure which enables integration of regular cloud computing environment with big data facilities and stream data processing. The developed technology is aimed to support uniform dynamic interaction with the user during composite application building and execution, as well as resu...
Typical patterns of using scientific workflow management systems (SWMS) include periodical executions of prebuilt workflows with precisely known estimates of tasks' execution times. Combining such workflows into sets could sufficiently improve resulting schedules in terms of fairness and meeting users' constraints. In this paper, we propose a clust...
Coastal surge floods are extreme phenomena with low frequency of occurrence. In this paper, a combination of two approaches based on the stochastic model for multivariate extremes and the synthetic storm model was applied for coastal floods reconstruction in St. Petersburg. The stochastic model is based on multivariate distributions of cyclone’ par...
Digital data universe size is exponentially growing up from year to year and currently is estimated to be more than 4.4 Zb. It compels scientific community to found out more efficient approaches in collecting, organizing and processing of information. A lot of enterprise solutions offer extended software tools based on MapReduce principles for big...
Abstract Saint-Petersburg Flood Warning System (FWS) is a life-critical system that requires permanent maintenance and development. Tasks that arise during thes e processes could be much more resource- intensive than an operational loop of the system and may involve complex problems for research. Thereby it is essential to have a special software t...
Investigations in development of efficient early warning systems (EWS) are essentially for prediction and warning of upcoming natural hazards. Besides providing of communicationand computationally intensive infrastructure, the high resource reliability and hard deadline option are requiredfor EWS scenarios processing in order to get guaranteed info...
The paper presents the technological platform for large data processing within Early Warning Systems (EWS). The core idea of general-purpose EWS platform is based on abstract data processing performed with the use of domain-specific imperative (procedures) and declarative (semantic structure) knowledge. The platform is based on the CLAVIRE cloud co...
The optimal workflow scheduling is one of the most important issues in heterogeneous distributed computational environment. Existing heuristic and evolutionary scheduling algorithms have their advantages and disadvantages. In this work we propose a hybrid algorithm based on Heterogeneous Earliest Finish Time heuristic and genetic algorithm that com...
State-of-the-art distributed computational environments requires increasingly flexible and efficient workflow scheduling procedures in order to satisfy the increasing requirements of the scientific community. In this paper, we present a novel, nature-inspired scheduling approach based on the leveraging of inherited populations in order to increase...
This paper discusses two main classes of tasks for flood risk assessment. The first class analyzes feasible flood damage in the conditions when the barrier system has inoperable state and therefore there is no way to control flood flowing. In this case we use historical data and assessment methods based on principles of extreme value theory. Also f...
Abstract Workflow became a mainstream formalism for complex scientific problems’ representation and is applied to different domains. In this paper we propose and analyze the interactive workflow model as the base for urgent computing (UC) infrastructures. Majority of research works in the area of urgent computing is focused on the deadline-driven s...