Content uploaded by Daniel Wright
Author content
All content in this area was uploaded by Daniel Wright on Jun 14, 2019
Content may be subject to copyright.
The Geopoint Project: Cloud Optimized Processing for Hydrographic Data.
Proof of Concept
...a demonstration in principle, whose purpose is to verify that some concept or theory has the potential of being used.
http://www.nexusnet.com.au/2014/02/10/proof-of-concept-vs-pilot-program
The cloud computing environment presents an opportunity to develop hydrographic
data processing algorithms optimized to perform using a parallel processing
architecture. The primary rationale for developing these API’s is to accelerate the data
processing rate, as well as minimally provision the necessary resources. The Geopoint
API has been designed to operate using multiple compute nodes and can be
provisioned as needed in quantity and speed of processors. An efficient solution of this
type brings with it questions of job partitioning, storage and retrieval, reliability, and
robustness. This study considers the architectural elements, the computational
resources needed and a comparison to the corresponding NOAA Bag file.
Background
Daniel Wright, Hydrographic Data Scientist, Sounding Science and Charles E. Wright, Principal Software Engineer,
National Board of Medical Examiners, Philadelphia, PA
For more information on this project and potential applications, contact
Daniel Wright at dwright@soundingscience.com or visit our project page
www.geopointsolutions.us.
For this study, we chose a test data set that represents a fairly common acquisition
mode for hydrographic survey. The data were collected using Hypack Hysweep®
acquisition software, and from the HSX output file, a conversion methodology was
determined that best accommodates the goals of parallel processing.
The first step in preparing the data for processing is uploading files to BLOB storage
(Binary Large OBject). A hierarchy of identifiers is used as an internal reference for the
API, typically Organization, Project and Survey. Files are then located as per their
intended application. For example, vessel files are placed at the organization level,
sound velocity can be at the project level, and Hypack HSX files at the individual survey
level.
The Pre-processing API’s create temporary table files populated with the relevant
spatial data streams contained in the HSX file, parsed and geo-referenced sound
velocity values, a group identification table and a static multibeam parameters table.
Time segmented position, attitude, heading, speed and tide correctors are combined
in a single table and interpolated between time records. Multibeam data populates a
separate table and is given a group identifier based on user selected time segments.
Multibeam records are given an additional identifier based on the method of
acquisition, i.e. Equal distant or Equal area mode, and corresponding beam angle
information.
Pre-Processing
Qimera Clean used for data cleaning and
CUBE surface comparison
Difference Surface
Geopoint CUBE Surface
Processed Data
HSX File
ShipConfig
SVP, Tide
Spatial
Orientation
Grouped
Multibeam
Records
Vessel and Sonar
Specifics
The Processing API uses the Python programming language and Object Oriented
design to allow easy interpretation and modification of functions. Individual modules
perform each of the required steps to determine the xyz and uncertainty solution.
Computational models are based on peer reviewed academic publications, and are in
common usage with commercial and academic processing software. Each module is
documented and includes appropriate test cases.
Processing uses methods as described by Beaudoin et al (2004)[1], and begins with
creating the sounding geometry at transmit and receive to determine the beam’s
geographic launch vector. The raytrace is then performed to provide depth and
horizontal range with beam azimuth used to reduce the horizontal measurement into
across-track and along-track components. To reduce to vessel reference point, lever
arms are rotated between the reference point and the transducer using the transmit
orientation, and then added to the depth, across-track and along-track offsets to yield
the sounding solution with respect to the reference point
Tide data are queried directly from reference gauges via the NOAA Coops web portal,
and applied using the zone definition file included in the project data. The horizontal
and vertical uncertainty assigned to each sounding is calculated using the methods
described by R. Hare (2001) [2] and Hare-Godin-Meyer (1995) [3].
Processing API
Converting the XYZ and Uncertainty .csv files into a CUBE surface requires third party
software, and for this demonstration we used QPS Chimera Clean. Data cleaning was
performed using the CUBE surface as a reference to locate flyers. A one meter NOAA
surface was selected in CUBE parameters for the test surface as this was the resolution
used to generate the corresponding NOAA Bag file.
A single vessel day from Launch 3101 was selected for comparison, which used Reson
8125 multibeam. The area has significant bathymetric features in the form of sand
waves and provides a good reference for spatial comparison. Data from the other two
platforms was processed with similar results, although these used the Reson 7125 in
both Equal Distant and Equal Angle mode.
The processed Geopoint data follows the terrain of the NOAA Bag surface
proportionately, although a centimeter level horizontal shift could be seen on closer
examination. A change to the code was necessary to correct the problem, which was a
result of sign changes in the lever arm offsets as the ship heading went through
rotational sectors.
With the horizontal offset considered, the comparison gave a median difference of
+0.08m for the entire test area, while in general the majority of data points are within
±15cm of the NOAA Bag surface. Other factors may have influenced the result, such as
differences in raytracing algorithms, interpolation of spatial correctors and zoned tide
application. As such, differences in the methods used by various processing software to
arrive at the final sounding solution is beyond the scope of this study.
Results
A prototype User Interface (UI) is under development, with the goal of transitioning
from the current command line interface to a secure, web accessible portal
designed to manage the core aspects of the processing cycle. This begins with user
authentication, based on either individual or organization managed access. Once
the project specific elements have been defined (Project, Vessels, Survey) the user
will upload survey data through the Project Portal, such as multibeam, attitude,,
sound velocity profiles and tide data. Vessel, SVP and tide editing capability will give
the user the ability to quality check these ancillary data.
The ability to grid and visualize data via a web interface presents significant
technical challenges, but the protocols and standards for this are rapidly evolving
and opportunities exist to create a viable set of work tools. An important, but
complex requirement is the ability to edit individual data points that can influence
the final CUBE surface. Alternatively, direct selection of CUBE hypothesis provide an
equally robust selection process. In either case, data editing via web interface will
have significant technical challenges.
As an extension to the multi-node processing used in this study, parallel processing
for producing CUBE surfaces has been shown to have potential advantages, as
described by Calder (2013)[4]. The application of scalable processing, combined
with a responsive web interface, has the potential to give the user near real time
results to large scale configuration or parameter changes. With this as our end goal,
we hope to add this level of capability to our cloud processing environment.
Future Work
For this project, we have focused on three basic functional areas of cloud computing,
batch compute processes, table and BLOB storage, and the linkage necessary to
connect the two.
Batch computing is structured around the creation of multiple compute nodes, each
provisioned with an operating system and an instance of a specific application. On start
up, a Linux operating system is called and the Geopoint API is provisioned to each
node. The application remains active until shut down automatically or by the user. The
number of nodes provisioned is based on a user selection. For the data set used in this
project, it was sufficient to provision 10 nodes in a session, which provided processing
speeds more or less equivalent to similar desktop processing applications. Processing
speeds increase proportionally with the number of nodes, therefore speed is limited
only by what the user is willing to spend.
As the user ques up each line (or set of lines) for processing, the multibeam group
records are called from table storage, and assigned to an available node. As each group
is processed, a time segmented spatial orientation record is called and applied relative
to each MB record. Upon completion, a csv file containing the XYZ + Uncertainty values
for each sounding is written back to BLOB storage. After the last group has processed
for a given line, the individual group csv files are then re-combined to complete the
new line file. The combined .csv line file is approximately 10% larger than the original
HSX file, which minimizes storage requirements.
For the program to effectively interact with the storage containers, proper
authentication is required at each step. Unlike reading from a local disk, each
read/write transaction requires an authenticated handshake to complete. This is built
in to the security requirements of the cloud service provider, and are integral to the
completion of the processing task.
Cloud Functions
The data set used for test and development was NOAA survey H12007, acquired by
NOAA ship Thomas Jefferson and survey launches in 2009. The raw data was archived at
NGDC, and retrieved by request along with the BAG files and survey documentation.
The survey area has distinct sand wave features, a number of wrecks and depth ranges
from 4m to 26m. The total area is approximately 6 km2 and required roughly 2 days per
platform to acquire. Systems used included the Reson 8125, 7125b and 7125c
multibeam, Applanix POS/MV inertial motion units, a Moving Vessel Profiler (MVP) and
CTDs for sound speed profiles. Due to the compact nature of the survey area, MVP data
was used as the primary SVP through out the survey. Survey processing and BAG file
generation was completed using CARIS Hips. Discrete Tide Zoning was specified and
applied as per the project instructions.
Test Data Set
NOAA Survey H12007
Pre-Processing
Upload files to
cloud storage
Data Inputs
Multibeam
Ranges
Heave, Pitch,
Roll, Heading
Tides
Surface SV
Position
SV Profiles
Data Processing
Apply Heave
Perform Ray
Trace
Apply Tides
Apply
Rotations
Dynamic Draft
Uncertainty
Data Outputs
Horizontal
& Vertical
Uncertainty
Northings
Eastings
Depth
Input Files
Hypack HSX
SVP profiles
Tide Zoning
Parse MB
datagrams to
table storage
Partition attitude
data to table
storage
CSV File
ShipConfig
Products
DEM
3rd Party
Software
BAG File
Batch
Process
Table
and
BLOB
Storage
File
Linkage RMB
Group
RMB
Group
RMB
Group
RMB
Group
RMB
Group
RMB
Group
RMB
Group
RMB
Group
Line file
Processor
Nodes
Batch Compute
Prototype UI
REFERENCES
[1] Beaudoin, J.D., Hughes Clarke, J.E., and Bartlett, J.E., “Application of Surface
Sound Speed Measurements in Post-Processing for Multi-Sector Multibeam
Echosounder,” International Hydrographic Review, Vol. 5, No. 3, 2004.
[2] R. Hare, “Error Budget Analysis for US Naval Oceanographic Office
(NAVOCEANO) Hydrographic Survey Systems,” University of Southern Mississippi,
Hydrographic Science Research Center (HSRC), 2001.
[3] Hare, Godin and Meyer, “Canadian Multibeam Accuracy Assessment”, Canadian
Hydrographic Service and University of New Brunswick- OMG, 1995.
[4] Calder, Brian, "Parallel and Distributed Performance of a Depth Estimation
Algorithm" (2013). Center for Coastal and Ocean Mapping.
Paper 858. http://scholars.unh.edu/ccom/858
Data
Upload
Project
Details
Tool
Menu