
Dmitry Yurievich Mishin- PhD
- Applications Programmer at University of California, San Diego
Dmitry Yurievich Mishin
- PhD
- Applications Programmer at University of California, San Diego
About
44
Publications
4,812
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
177
Citations
Introduction
Dmitry Yurievich Mishin currently works at the San Diego Supercomputer Center (SDSC) , University of California, San Diego. Dmitry does research in Distributed Computing and Parallel Computing. Their current project is 'Comet Virtual Clusters' and Pacific Research Platform - Hyperconverged cluster Nautilus.
Current institution
Additional affiliations
June 2014 - present
September 2010 - May 2014
December 1999 - August 2010
Education
April 2005 - April 2009
Russian Academy of Sciences, Earth Physics Institute
Field of study
- Geophysics
Publications
Publications (44)
Influenced by the advances in data and computing, the scientific practice increasingly involves machine learning and artificial intelligence driven methods which requires specialized capabilities at the system-, science- and service-level in addition to the conventional large-capacity supercomputing approaches. The latest distributed architectures...
Influenced by the advances in data and computing, the scientific practice increasingly involves machine learning and artificial intelligence driven methods which requires specialized capabilities at the system-, science- and service-level in addition to the conventional large-capacity supercomputing approaches. The latest distributed architectures...
Unique scientific instruments designed and operated by large global collaborations are expected to produce Exabyte-scale data volumes per year by 2030. These collaborations depend on globally distributed storage and compute to turn raw data into science. While all of these infrastructures have batch scheduling capabilities to share compute, Researc...
NVIDIA has been making steady progress in increasing the compute performance of its GPUs, resulting in order of magnitude compute throughput improvements over the years. With several models of GPUs coexisting in many deployments, the traditional accounting method of treating all GPUs as being equal is not reflecting compute output anymore. Moreover...
This paper describes a vision and work in progress to elevate network resources and data transfer management to the same level as compute and storage in the context of services access, scheduling, life cycle management, and orchestration. While domain science workflows often include active compute resource allocation and management, the data transf...
The Comet petascale system is an XSEDE resource with the goal of serving a large user community. The Comet project has served a large number of users while using traditional supercomputing as well as science gateways. In addition to these offerings, Comet also includes a non traditional virtual machine framework that allows users to access entire V...
This work looks at analyzing I/O traffic of users’ jobs on a HPC machine for a period of time. Monitoring tools are collecting the data in a continuous basis on the HPC system. We looked at aggregate I/O data usage patterns of users’ jobs on the system both on the parallel shared Lustre file system and the node-local SSDs. Data mining tools are the...
The advances in data, computing and networking over the last two decades led to a shift in many application domains that includes machine learning on big data as a part of the scientific process, requiring new capabilities for integrated and distributed hardware and software infrastructure. This paper contributes a workflow-driven approach for dyna...
While NSF's recent Campus Cyberinfrastructure investments have catalyzed an enormous leap in campus networking capabilities, it remains necessary to integrate these capabilities into the routine workflows of researchers transferring data among their remote collaborators, data repositories, and visualization facilities. The Pacific Research Platform...
The Comet petascale supercomputer was put into production as an XSEDE resource in early 2015 with the goal of serving a much larger user community than HPC systems of similar size. The Comet project set an audacious goal of reaching over 10,000 users in its four years of planned operation. That goal was achieved in less than two years, due in large...
Hardware virtualization has been gaining a significant share of computing time in the last years. Using virtual machines (VMs) for parallel computing is an attractive option for many users. A VM gives users a freedom of choosing an operating system, software stack and security policies, leaving the physical hardware, OS management, and billing to p...
Rapid secure data sharing and private online discussion are requirements for coordinating today's distributed science teams using High Performance Computing (HPC), visualization, and complex workflows. Modern HPC infrastructures enable fast computation, but the data produced remains within a site's storage and network environment tuned for performa...
Part of good customer service in HPC means gracefully migrating file systems to different servers with a minimal disruption to the users. At SDSC, we define this to mean keeping the paths intact (i.e., the same mount point), limiting downtime, and not requiring user action. Over the last year we’ve been put to the test with a need to migrate petaby...
We present the Environmental Scenario Search Engine (ESSE), a set of algorithms and software tools for distributed querying and mining large environmental data archives. The principal requirement of the ESSE system is to allow the user to query the data in meaningful “human linguistic ” terms. The mapping between human language and computer systems...
Despite the last years progress in scientific data storage, still
remains the problem of public data storage and sharing system for
relatively small scientific datasets. These are collections forming the
“long tail” of power log datasets distribution. The
aggregated size of the long tail data is comparable to the size of all
data collections from l...
The astronomical research community is now use to accessing data through
the web. In particular, we have ready access to large surveys as well as
to observations from the major observatories. The latter data is
typically available in their raw form and often also as "level 1"
products that have undergone basic, standard processing. There exists,
ho...
Stream processing methods and online algorithms are increasingly appealing in the scientific and large-scale data management communities due to increasing ingestion rates of scientific instruments, the ability to produce and inspect results interactively, and the simplicity and efficiency of sequential storage access over enormous datasets. This ar...
Data mining and visualization in very large spatiotemporal databases requires three kinds of computing parallelism: file system, data processor, and visualization or rendering farm. Transparent data cube combines on the same hardware a database cluster for active storage of spatiotemporal data with an MPI compute cluster for data processing and ren...
Cloud computing offers a scalable on-demand resource allocation model to evolving needs in data intensive geophysical applications, where computational needs in CPU and storage can vary over time depending on modeling or field campaign. Separate, sometimes incompatible cloud platforms and services are already available from major computing vendors...
ViRBO (Virtual Radiation Belt Observatory) is one of the domain-specific virtual observatories funded under the NASA Heliophysics Data Environment (HPDE) program that began development in 2006. In this work, we report on the search, display, and data access functionality of ViRBO along with plans for interaction with upcoming missions, including Ra...
The recent Heliophysics Virtual Observatory (VxO) effort involves the development of separate observatories with a low overlap
in physical domain or area of scientific specialization and a high degree of overlap in metadata management needs. VxOware
is a content and metadata management system. While it is intended for use by a VxO specifically, it...
Cloud computing is a new economic model of using large cluster computing resources which were earlier managed by GRID. Reusing existing GRID infrastructure gives an opportunity to combine the Cloud and GRID technologies on the same hardware and to provide GRID users with functionality for running high performance computing tasks inside virtual mach...
Seismic anisotropy presents a unique possibility to study tectonic processes at depths inaccessible for direct observations. In our previous study to determine the mantle anisotropic parameters we performed a joint inversion of SKS and receiver functions waveforms, based on approximate methods because of time consuming synthetic seismograms calcula...
The increasing data volumes from today's collection systems and the need of the scientific community to include an integrated and authoritative representation of the natural environmentin their analysis requires a new approach to data mining, management and access. The natural environment includes elements from multiple domains such as space, terre...
In previous study waveform inversion was based on approximate methods because of time consuming in synthetic seismogramms calculation. Using parallel calculation and GRID technology allows us to get exact solution of the problem: we can perform direct calculation of cost function on uniform grid within model parameter space. Calculations were perfo...
The first major release of the Virtual Radiation Belt Observatory (ViRBO) is presented. ViRBO is a virtual observatory which allows access to and use of data and tools for radiation belt scientists. Data sets include data from the SAMPEX, GOES, POES, LANL GEO, Polar, and GPS satellites. A number of new data sets, not previously available, are avail...
SPIDR (Space Physics Interactive Data Resource) is a standard data source for solar-terrestrial physics, functioning within
the framework of the ICSU World Data Centers. It is a distributed database and application server network, built to select,
visualize and model historical space weather data distributed across the Internet. SPIDR can work as a...
We have developed Environmental Scenario Search Engine (ESSE) for parallel data mining of a set of conditions inside distributed, very large databases from multiple environmental domains. The prime goal for ESSE design is to allow a user to query the environmental data archives in human linguistic terms. The mapping between the human language and t...
The solar-terrestrial physics distributed database for the ICSU World Data Centers, and the NCEP/NCAR climate re-analysis data have been integrated into standard Grid environments using the OGSA-DAI framework. A set of algorithms and software tools for distributed querying and mining environmental archives using the UNIDATA Common Data Model concep...
We present the Environmental Scenario Search Engine (ESSE), a set of algorithms and software tools for distributed querying and mining large environmental data archives. The principal requirement of the ESSE system is to allow the user to query the data in meaningful "human linguistic" terms. The mapping between human language and computer systems...