Dmitry Yurievich Mishin

Dmitry Yurievich Mishin
  • PhD
  • Applications Programmer at University of California, San Diego

About

44
Publications
4,812
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
177
Citations
Introduction
Dmitry Yurievich Mishin currently works at the San Diego Supercomputer Center (SDSC) , University of California, San Diego. Dmitry does research in Distributed Computing and Parallel Computing. Their current project is 'Comet Virtual Clusters' and Pacific Research Platform - Hyperconverged cluster Nautilus.
Current institution
University of California, San Diego
Current position
  • Applications Programmer
Additional affiliations
June 2014 - present
San Diego Supercomputer Center
Position
  • Applications Programmer
September 2010 - May 2014
Johns Hopkins University
Position
  • Research Assistant
December 1999 - August 2010
Russian Academy of Sciences
Position
  • Researcher
Education
April 2005 - April 2009
Russian Academy of Sciences, Earth Physics Institute
Field of study
  • Geophysics

Publications

Publications (44)
Preprint
Full-text available
Influenced by the advances in data and computing, the scientific practice increasingly involves machine learning and artificial intelligence driven methods which requires specialized capabilities at the system-, science- and service-level in addition to the conventional large-capacity supercomputing approaches. The latest distributed architectures...
Conference Paper
Full-text available
Influenced by the advances in data and computing, the scientific practice increasingly involves machine learning and artificial intelligence driven methods which requires specialized capabilities at the system-, science- and service-level in addition to the conventional large-capacity supercomputing approaches. The latest distributed architectures...
Preprint
Full-text available
Unique scientific instruments designed and operated by large global collaborations are expected to produce Exabyte-scale data volumes per year by 2030. These collaborations depend on globally distributed storage and compute to turn raw data into science. While all of these infrastructures have batch scheduling capabilities to share compute, Researc...
Preprint
Full-text available
NVIDIA has been making steady progress in increasing the compute performance of its GPUs, resulting in order of magnitude compute throughput improvements over the years. With several models of GPUs coexisting in many deployments, the traditional accounting method of treating all GPUs as being equal is not reflecting compute output anymore. Moreover...
Preprint
Full-text available
This paper describes a vision and work in progress to elevate network resources and data transfer management to the same level as compute and storage in the context of services access, scheduling, life cycle management, and orchestration. While domain science workflows often include active compute resource allocation and management, the data transf...
Conference Paper
Full-text available
The Comet petascale system is an XSEDE resource with the goal of serving a large user community. The Comet project has served a large number of users while using traditional supercomputing as well as science gateways. In addition to these offerings, Comet also includes a non traditional virtual machine framework that allows users to access entire V...
Chapter
This work looks at analyzing I/O traffic of users’ jobs on a HPC machine for a period of time. Monitoring tools are collecting the data in a continuous basis on the HPC system. We looked at aggregate I/O data usage patterns of users’ jobs on the system both on the parallel shared Lustre file system and the node-local SSDs. Data mining tools are the...
Preprint
Full-text available
The advances in data, computing and networking over the last two decades led to a shift in many application domains that includes machine learning on big data as a part of the scientific process, requiring new capabilities for integrated and distributed hardware and software infrastructure. This paper contributes a workflow-driven approach for dyna...
Conference Paper
Full-text available
While NSF's recent Campus Cyberinfrastructure investments have catalyzed an enormous leap in campus networking capabilities, it remains necessary to integrate these capabilities into the routine workflows of researchers transferring data among their remote collaborators, data repositories, and visualization facilities. The Pacific Research Platform...
Conference Paper
Full-text available
The Comet petascale supercomputer was put into production as an XSEDE resource in early 2015 with the goal of serving a much larger user community than HPC systems of similar size. The Comet project set an audacious goal of reaching over 10,000 users in its four years of planned operation. That goal was achieved in less than two years, due in large...
Conference Paper
Hardware virtualization has been gaining a significant share of computing time in the last years. Using virtual machines (VMs) for parallel computing is an attractive option for many users. A VM gives users a freedom of choosing an operating system, software stack and security policies, leaving the physical hardware, OS management, and billing to p...
Conference Paper
Full-text available
Rapid secure data sharing and private online discussion are requirements for coordinating today's distributed science teams using High Performance Computing (HPC), visualization, and complex workflows. Modern HPC infrastructures enable fast computation, but the data produced remains within a site's storage and network environment tuned for performa...
Presentation
Full-text available
Part of good customer service in HPC means gracefully migrating file systems to different servers with a minimal disruption to the users. At SDSC, we define this to mean keeping the paths intact (i.e., the same mount point), limiting downtime, and not requiring user action. Over the last year we’ve been put to the test with a need to migrate petaby...
Data
Full-text available
We present the Environmental Scenario Search Engine (ESSE), a set of algorithms and software tools for distributed querying and mining large environmental data archives. The principal requirement of the ESSE system is to allow the user to query the data in meaningful “human linguistic ” terms. The mapping between human language and computer systems...
Article
Full-text available
Despite the last years progress in scientific data storage, still remains the problem of public data storage and sharing system for relatively small scientific datasets. These are collections forming the “long tail” of power log datasets distribution. The aggregated size of the long tail data is comparable to the size of all data collections from l...
Article
Full-text available
The astronomical research community is now use to accessing data through the web. In particular, we have ready access to large surveys as well as to observations from the major observatories. The latter data is typically available in their raw form and often also as "level 1" products that have undergone basic, standard processing. There exists, ho...
Conference Paper
Full-text available
Stream processing methods and online algorithms are increasingly appealing in the scientific and large-scale data management communities due to increasing ingestion rates of scientific instruments, the ability to produce and inspect results interactively, and the simplicity and efficiency of sequential storage access over enormous datasets. This ar...
Chapter
Full-text available
Data mining and visualization in very large spatiotemporal databases requires three kinds of computing parallelism: file system, data processor, and visualization or rendering farm. Transparent data cube combines on the same hardware a database cluster for active storage of spatiotemporal data with an MPI compute cluster for data processing and ren...
Article
Cloud computing offers a scalable on-demand resource allocation model to evolving needs in data intensive geophysical applications, where computational needs in CPU and storage can vary over time depending on modeling or field campaign. Separate, sometimes incompatible cloud platforms and services are already available from major computing vendors...
Article
ViRBO (Virtual Radiation Belt Observatory) is one of the domain-specific virtual observatories funded under the NASA Heliophysics Data Environment (HPDE) program that began development in 2006. In this work, we report on the search, display, and data access functionality of ViRBO along with plans for interaction with upcoming missions, including Ra...
Article
Full-text available
The recent Heliophysics Virtual Observatory (VxO) effort involves the development of separate observatories with a low overlap in physical domain or area of scientific specialization and a high degree of overlap in metadata management needs. VxOware is a content and metadata management system. While it is intended for use by a VxO specifically, it...
Article
Cloud computing is a new economic model of using large cluster computing resources which were earlier managed by GRID. Reusing existing GRID infrastructure gives an opportunity to combine the Cloud and GRID technologies on the same hardware and to provide GRID users with functionality for running high performance computing tasks inside virtual mach...
Article
Full-text available
Seismic anisotropy presents a unique possibility to study tectonic processes at depths inaccessible for direct observations. In our previous study to determine the mantle anisotropic parameters we performed a joint inversion of SKS and receiver functions waveforms, based on approximate methods because of time consuming synthetic seismograms calcula...
Chapter
The increasing data volumes from today's collection systems and the need of the scientific community to include an integrated and authoritative representation of the natural environmentin their analysis requires a new approach to data mining, management and access. The natural environment includes elements from multiple domains such as space, terre...
Article
In previous study waveform inversion was based on approximate methods because of time consuming in synthetic seismogramms calculation. Using parallel calculation and GRID technology allows us to get exact solution of the problem: we can perform direct calculation of cost function on uniform grid within model parameter space. Calculations were perfo...
Article
The first major release of the Virtual Radiation Belt Observatory (ViRBO) is presented. ViRBO is a virtual observatory which allows access to and use of data and tools for radiation belt scientists. Data sets include data from the SAMPEX, GOES, POES, LANL GEO, Polar, and GPS satellites. A number of new data sets, not previously available, are avail...
Article
Full-text available
SPIDR (Space Physics Interactive Data Resource) is a standard data source for solar-terrestrial physics, functioning within the framework of the ICSU World Data Centers. It is a distributed database and application server network, built to select, visualize and model historical space weather data distributed across the Internet. SPIDR can work as a...
Conference Paper
Full-text available
We have developed Environmental Scenario Search Engine (ESSE) for parallel data mining of a set of conditions inside distributed, very large databases from multiple environmental domains. The prime goal for ESSE design is to allow a user to query the environmental data archives in human linguistic terms. The mapping between the human language and t...
Article
The solar-terrestrial physics distributed database for the ICSU World Data Centers, and the NCEP/NCAR climate re-analysis data have been integrated into standard Grid environments using the OGSA-DAI framework. A set of algorithms and software tools for distributed querying and mining environmental archives using the UNIDATA Common Data Model concep...
Article
Full-text available
We present the Environmental Scenario Search Engine (ESSE), a set of algorithms and software tools for distributed querying and mining large environmental data archives. The principal requirement of the ESSE system is to allow the user to query the data in meaningful "human linguistic" terms. The mapping between human language and computer systems...

Network

Cited By