
Mikhail ZymblerSouth Ural State University | SUSU · Data Mining and Virtulization
Mikhail Zymbler
PhD
About
46
Publications
20,205
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
966
Citations
Citations since 2017
Introduction
Additional affiliations
January 2011 - February 2017
September 2004 - November 2014
August 1993 - August 2004
Publications
Publications (46)
Summarization of a long time series often occurs in analytical applications related to decision-making, modeling, planning, and so on. Informally, summarization aims at discovering a small-sized set of typical patterns (subsequences) to briefly represent the long time series. Apparent approaches to summarization like motifs, shapelets, cluster cent...
Currently, big sensor data arise in a wide spectrum of Industry 4.0, Internet of Things, and Smart City applications. In such subject domains, sensors tend to have a high frequency and produce massive time series in a relatively short time interval. The data collected from the sensors are subject to mining in order to make strategic decisions. In t...
A discord is a refinement of the concept of an anomalous subsequence of a time series. Being one of the topical issues of time series mining, discords discovery is applied in a wide range of real-world areas (medicine, astronomy, economics, climate modeling, predictive maintenance, energy consumption, etc.). In this article, we propose a novel para...
Abstract Internet of Things (IoT) is a new paradigm that has changed the traditional way of living into a high tech life style. Smart city, smart homes, pollution control, energy saving, smart transportation, smart industries are such transformations due to IoT. A lot of crucial research studies and investigations have been done in order to enhance...
Breakout is the most expensive and dangerous issue of continuous casting, which causes loss of production time and significant yield penalties. The common cause of breakout is sticker, that is a part of strand shell, which adheres to a mold surface. Stickers can be detected by a temperature pattern in a mold heat-map. SMS group GmbH (Germany) devel...
Currently, discovering subsequence anomalies in time series remains one of the most topical research problems. A subsequence anomaly refers to successive points in time that are collectively abnormal, although each point is not necessarily an outlier. Among numerous approaches to discovering subsequence anomalies, the discord concept is considered...
This paper proposed a short-term two-stage hybrid algorithmic framework for trade and trend analysis of the Forex market by augmenting the currency pair datasets with transformed attributes using a few technical indicators and statistical measures. In the first phase, an optimized deep predictive coding network (DPCN) based on a meta-heuristic rept...
Botanical plants suffer from several types of diseases that must be identified early to improve the production of fruits and vegetables. Mango fruit is one of the most popular and desirable fruits worldwide due to its taste and richness in vitamins. However, plant diseases also affect these plants’ production and quality. This study proposes a conv...
Crude oil market analysis has become one of the emerging financial markets and the volatility effect of the market is paramount and has been considered as an issue of utmost importance. This study examines the dynamics of this volatile market of crude oil by employing a hybrid approach based on an extreme learning machine (ELM) as a regressor and t...
Brain tumors are most common in children and the elderly. It is a serious form of cancer caused by uncontrollable brain cell growth inside the skull. Tumor cells are notoriously difficult to classify due to their heterogeneity. Convolutional neural networks (CNNs) are the most widely used machine learning algorithm for visual learning and brain tum...
Computer-aided diagnosis permits biopsy specimen analysis by creating quantitative images of brain diseases which enable the pathologists to examine the data properly. It has been observed from other image classification algorithms that the Extreme Learning Machine (ELM) demonstrates superior performance in terms of computational efforts. In this s...
This book constitutes refereed proceedings of the 15th International Conference on Parallel Computational Technologies, PCT 2021, held in March-April 2021. Due to the COVID-19 pandemic the conference was held online.
The 22 revised full papers presented were carefully reviewed and selected from 89 submissions. The papers are organized in topical se...
Currently, despite the widespread use of numerous NoSQL systems, relational DBMSs remain the basic tool for data processing in various subject domains. Integration of data mining methods with relational DBMS is a topical issue since such an approach avoids export-import bottleneck and provides the end-user with all the built-in DBMS services. Propr...
Glioblastoma (GBM) is a stage 4 malignant tumor in which a large portion of tumor cells are reproducing and dividing at any moment. These tumors are life threatening and may result in partial or complete mental and physical disability. In this study, we have proposed a classification model using hybrid deep belief networks (DBN) to classify magneti...
In this study, deep neural networks are developed to evaluate its performance over wine data set from UCI repository. The data set consists of white and red wine samples from Portugal. Previous studies claimed that Support Vector Machine (SVM) outperformed the simple ANN and Multiple Regression (MR) on wine data set. We trained different neural net...
This book constitutes refereed proceedings of the 14th International Conference on Parallel Computational Technologies, PCT 2020, held in May 2020. Due to the COVID-19 pandemic the conference was held online.
The 22 revised full papers and 2 short papers presented were carefully reviewed and selected from 124 submissions. The papers are organized i...
A motif is a pair of subsequences of a longer time series, which are very similar to each other. Motif discovery is applied in a wide range of subject areas involving time series: medicine, biology, entertainment, weather prediction, and others. In this paper, we propose a novel parallel algorithm for motif discovery using Intel MIC (Many Integrate...
A discord is a refinement of the concept of an anomalous subsequence of a time series. The task of discovering discords is applied in a wide range of subject areas involving time series: medicine, economics, climate modeling, and others. In this paper, we propose a novel parallel algorithm for discord discovery using Intel MIC (Many Integrated Core...
Customer’s experience is one of the important concern for airline industries. Twitter is one of the popular social media platform where flight travelers share their feedbacks in the form of tweets. This study presents a machine learning approach to analyze the tweets to improve the customer’s experience. Features were extracted from the tweets usin...
Nowadays, subsequence similarity search under the Dynamic Time Warping (DTW) similarity measure is applied in a wide range of time series mining applications. Since the DTW measure has a quadratic computational complexity w.r.t. the length of query subsequence, a number of parallel algorithms for various many-core architectures have been developed,...
Discord is a refinement of the concept of anomalous subsequence of a time series. The task of discords discovery is applied in a wide range of subject domains related to time series: medicine, economics, climate modeling, etc. In this paper, we propose a novel parallel algorithm for discords discovery for the Intel Xeon Phi Knights Landing (KNL) ma...
This book constitutes the refereed proceedings of the 13th International Conference on Parallel Computational Technologies, PCT 2019, held in Kaliningrad, Russia, in April 2019.
The 24 revised full papers presented were carefully reviewed and selected from 96 submissions. The papers are organized in topical sections on high performance architecture...
Frequent itemset mining leads to the discovery of associations and correlations among items in large transactional databases. Apriori is a classical frequent itemset mining algorithm, which employs iterative passes over database combining with generation of candidate itemsets based on frequent itemsets found at the previous iteration, and pruning o...
Nowadays, subsequence similarity search is required in a wide range of time series mining applications: climate modeling, financial forecasts, medical research, etc. In most of these applications, the Dynamic TimeWarping (DTW) similarity measure is used since DTW is empirically confirmed as one of the best similarity measure for most subject domain...
Relational DBMSs (RDBMSs) remain the most popular tool for processing structured data in data intensive domains. However, most of stand-alone data mining packages process flat files outside a RDBMS. In-database data mining avoids export-import data/results bottleneck as opposed to use stand-alone mining packages and keeps all the benefits provided...
Computation of a Euclidean distance matrix (EDM) is a typical task in a wide spectrum of problems connected with data analysis. Currently, many parallel algorithms for this task have been developed for GPUs. However, these developments cannot be directly applied to the Intel Xeon Phi many-core processor. In this paper, we address the task of accele...
Frequent itemset mining leads to the discovery of associations and correlations among items in large transactional databases. Apriori is a classical frequent itemset mining algorithm, which employs iterative passes over database combining with generation of candidate itemsets based on frequent itemsets found at the previous iteration, and pruning o...
This book constitutes the refereed proceedings of the 12th International Conference on Parallel Computational Technologies, PCT 2018, held in Rostov-on-Don, Russia, in April 2018. The 24 revised full papers presented were carefully reviewed and selected from 167 submissions. The papers are organized in topical sections on high performance architect...
The paper presents a parallel implementation of a Dynamic Itemset Counting (DIC) algorithm for many-core systems, where DIC is a variation of the classical Apriori algorithm. We propose a bit-based internal layout for transactions and itemsets with the assumption that such a representation of the transaction database fits in main memory. This techn...
This book constitutes the refereed proceedings of the 11th International Conference on Parallel Computational Technologies, PCT 2017, held in Kazan, Russia, in April 2017.
The 24 revised full papers presented were carefully reviewed and selected from 167 submissions. The papers are organized in topical sections on high performance architectures, to...
The paper touches upon the problem of local-best-match time series subsequence similarity search. The problem assumes that a query sequence and a longer time series are given, and the task is to find all the subsequences whose distance from the query is the minimal among their neighboring subsequences and distance from the query is under specified...
This paper presents an original approach to parallel processing of very large databases by means of
encapsulation of partitioned parallelism into open-source database management systems (DBMSs).
The architecture and methods for implementing a parallel DBMS through encapsulation of partitioned parallelism into PostgreSQL DBMS are described. Experime...
The Partition Around Medoids (PAM) is a variation of well known k-Means clustering algorithm where center of each cluster should be chosen as an object of clustered set of objects. PAM is used in a wide spectrum of applications, e.g. text analysis, bioinformatics, intelligent transportation systems, etc. There are approaches to speed up k-Means and...
Video summary is a sequence of still or moving pictures that represents the content of a video. Personalized summary provides a person with brief information reflecting essential message of the video according to his/her interests. Existing methods of discovering user's personal interests often demands from the user either extra efforts or extra eq...
Subsequence similarity search is one of the most important problems of time series data mining. Nowadays there is empirical evidence that Dynamic Time Warping (DTW) is the best distance metric for many applications. However in spite of sophisticated software speedup techniques DTW still computationally expensive. There are studies devoted to accele...
Subsequence similarity search is one of the basic problems of time series data mining. Nowadays Dynamic Time Warping (DTW) is considedered as the best similarity measure. However despite various existing software speedup techniques DTW is still computationally expensive. There are approaches to speed up DTW computation by means of parallel hardware...
The problem of time series subsequence matching occurs in a wide spectrum of subject areas. Currently Dynamic Time Warping (DTW) is the best similarity measure but despite various existing speedup techniques it is still computationally expensive. Due to this reason science community is trying to accelerate DTW calculation by means of parallel hardw...
The paper introduces an approach to partitioning of very large graphs by means of parallel relational database management sys-tem (DBMS) named PargreSQL. Very large graph and its intermediate data that does not fit into main memory are represented as relational tables and processed by parallel DBMS. Multilevel partitioning is used. Parallel DBMS ca...
The paper describes the design and the implementation of PargreSQL parallel database management system (DBMS) for cluster systems. PargreSQL is based on PostgreSQL open-source DBMS and ex-ploits partitioned parallelism. Presented experimental results show that this scheme is worthy of further development.
The paper describes a set of computer aided design facilities, used for prototyping the parallel DBMS (Database Management System), called Omega. This system is designed for a MVS-100/1000 massively parallel computer system. These computer aided facilities include both software tools from third-party vendors and those especially designed for this p...
The paper describes development principles and the program structure of the Omega File Management System (OFMS) for the Omega parallel DBMS engine. The paper gives requirements for OFMS and the description of its general structure and components. The paper gives some effective protocol for interaction with the Disk Subsystem Unit and describes arch...