Content uploaded by W.W.Y. Hsu
Author content
All content in this area was uploaded by W.W.Y. Hsu on May 18, 2015
Content may be subject to copyright.
Constructing an Efficient State Space
Query System for the Voyage Data
Recorder
William W. Y. Hsu a,b,1, Yi-Wen Wu a, Min-Ruey You a, Cheng-Hsin Liao c,
Cheng-Yu Lu d, Hao-Hsun Wang a
aDepartment of Computer Science and Engineering, National Taiwan Ocean
University, Keeling, Taiwan
bInstitute of Information Science, Academia Sinica, Nankang, Taiwan
cDepartment of Environmental Biology and Fisheries Science, National Taiwan Ocean
University, Keeling, Taiwan
dPixNET, Taipei, Taiwan
Abstract. Voyage data recorders are devices kept on vessels to track their trajec-
tories when overseas. It receives signals from global positioning satellites to com-
pute its current location on Earth and records the current speed, course, and bear-
ing within its internal memory. The information is downloaded and recorded each
time the fishing vessel docks for fueling. Currently, over 10 thousand devices are
installed and tracked, collecting over 3 billion GPS samples over 8 years period.
This paper presents our approach to clean and reorganize the data, and design an
efficient incremental updatable database for fast query on the state of vessels and
the spacial locations of vessels. We have also adapted the WebGL technology for
visualization of trajectories with more than 100 thousand points. The initialization
of this work will provide a better platform for future fisheries management such
as resource estimation, catch per unit effort analysis, environmental protections,
operational efficiency, and activity surveillance.
Keywords. Voyage data recorder (VDR), Fisheries, Trajectories, Global positioning
systems (GPS), Visualization
1. Introduction
Taiwan fisheries is a major industry supported by the government by providing fishing
vessel fuel stipends. The Fisheries Agency, Council of Agriculture, Executive Yuan of
Taiwan, in the purpose of detecting unauthorized or illegal application for fishing ves-
sel fuel stipend and future marine resource management, has asked the National Cheng-
1Corresponding author. The authors are supported by the Ministry of Science and Technology of Taiwan,
under grant MOST-103-2221-E-019-041, Council of Agriculture, Fisheries Agency of Taiwan under grant
103AG-11.2-1-FA-F7, and by the National Taiwan Ocean University under grant NTOU-RD-AA-2014-2-
05021. This research has no affiliations with or involvement in any organization or entity with any financial
interest or non-financial interest in the subject matter or materials discussed in this manuscript. (Email:
wwyhsu@ntou.edu.tw)
Kung University to develop the Voyage Data Recorder (VDR) system in 2006. After a
year of research, development, prototyping, testing, and onboard evaluation, the Fish-
eries Agency has finally regulated that all fishing vessels in Taiwan must be armed with
the VDR system. Establishing such a system is a must for future researches. Currently,
coast guard records of vessels leaving and entering harbors, statistic of catch (fish type,
amount, and location) from seed vessels, fuel stipend records, and the VDR data are
kept separately. It is difficult to join the data together for further statistical analysis and
estimation without an efficient system.
The VDR systems use global positioning satellites (GPS) for trajectory tracking.
The latitude, longitude, course bearing, course speed, and time stamp is recorded every 3
minutes. Upon returning to the harbor for refueling, the raw VDR records are uploaded to
a central server at the Center of Systems and Naval Mechatronic Engineering (CSNME)
for storing. Currently, over 10 thousand unique devices are monitored. Due to budget
limitations, the VDR can not use satellite communication to relay data back in real-time,
however, the delayed VDR data can still be used for off-line analysis, including trajectory
analysis and operational efficiency.
To our knowledge, the VDR data center at the National Cheng-Kung University is
focused on preserving raw data and organized to maintain a minimal relational database
system for analysis and queries. Due to the vast amount of accumulated VDR data, the
original system design cannot follow up the need for fast responses. We propose this
system to clean and reorganize the raw data, redesign the database structure, and create
an environment for fast post-processing, queries, and data visualization.
2. Background
Researches on the use of GPS data has already been widespread. Wu et al. uses GPS data
to discover moving styles of monitored objects in different regions [1]. Based on clus-
tering techniques, trajectories with similar behavior can be joined represent a condition
in a region at a specific time. Pao et al. proposed a Markov chain model to find hidden
patterns embedded in the trajectories produced by user inputs [2]. Although the data in
Pao et al. is not from GPS, the property of the data is similar. Mobile devices with GPS
can also produce trajectory data for use. Yin et al. [3] and Chen et al. [4] used data min-
ing techniques to predict future possible trajectories of subjects from past information.
Sematic analysis has also been used in mining GPS information, i.e., Ying et al. tried
to identify the GPS device bearer’s future possible location using semantic analysis [5].
Finding frequent trajectory patterns is also a research topic. Wang et al. has used vague
space partition to analyze and find frequent patterns in spatiotemporal data [6]. This is
can be a relatively important technique for VDRs as fishing vessels travel irregularly and
may span halfway around the Earth.
Voyage management systems (VMS) are similar to VDRs, both are specialized GPS
devices used on ship vessels, only that VMS takes samples at a much longer interval and
relays information using satellite communications. Walker et al. did research on VMS
to estimate the behavior of vessels on sea [7]. They tried to identify whether the vessel
is cruising, idling, or fishing using fuzzy logic. Mak et al. used the Vessel Performance
Monitoring and Analysis System (VPMAS) to support efficient operations to reduce fuel
consumption of vessels [8]. All such researches depend on a stable and fast database for
storing VDR data which contains GPS information.
User Terminal
Voyage database
service provider
Web portal
3rd party GIS Internal GIS
Internal server
(VDR data processor)
Private terminal
Data computing and
preprocessing module Read/write
Upload raw data
Read only
Http/Ajax/Jquery
Http request
(Ajax/JQuery)
Figure 1. System architecture. Our architecture isolates private and public data, and communication between
modules are built on web services.
3. System Architecture
A cloud infrastructure may host an ensemble of software delivered as services with Soft-
ware as a Service (SaaS) model and computing hardware and networking [9]. It provides
high computing performance, security, and compliance requirements. Cloud infrastruc-
ture also help users to outsource their IT framework and to reduce establishment cost
for concentrating on their business. We modularize our system to provide isolation as
required and reserve flexibility for the future. Our system modules communicates using
web service protocols, which allows each module to be programming language and op-
erating system independent. De-identified is necessary at this point due to the personal
information included in the data, which is highly confidential. Shown in Figure 1, we
isolate the database from the frontend user. Only through private secured terminals can
the data be viewed or modified directly without any de-identification process. For regular
de-identified queries, users will need to cross the web portal, which prepares the data
for display at the user terminal. The web portal requests data from the computing and
preprocessing module, where the data is prepared for display with Cesium WebGL [10].
The data computing and preprocessing modules can only be seen by authorized and users
with security clearance. This module extracts information from the central VDR voyage
database and process it for visualization. Finally, GIS information from either internal
servers or third party providers are loaded to form the geographical layer. The rendering
process is done without sending any trajectory information to the third party providers.
The detailed flow is shown in Figure 2. Upon received the raw VDR data (in text
files as NMEA strings), we parse the data and sort them according to the timestamp into
the database. Following, we join related GPS points to form a trajectory, i.e., a voyage
segment for each vessel. For voyages that span over a month, we analyze to see if it
is an ongoing voyage from the previous segment or a voyage to be continued into the
following month. The indexes for the database are created after all the analysis have been
done. At this point, our system is ready for visualization and queries.
We chose MariaDB as our underlying database management system [11]. Accord-
ing to documentation, it has a lot of optimizer enhancements, faster and safer replica-
Start Raw Data Parsing Construct Voyage
Segments
Voyage Segment
Analysis
Is there any ongoing
voyage?
Implement the
virtual double linked
list
Yes
Create Database
Indexes
No
Query Display
Figure 2. System flow diagram. The detailed procedures of our system.
tions. Second, it supports NoSQL. Third, sharding is supported, which allows tables to
be dispersed across servers in the future.
4. Database Construction
4.1. Definition of GPRMC Strings
VDR information are stored using the NMEA (National Marine Electronics Association)
format [12]. An example string from our raw data is
$GPRMC,033416.00,A,2619.06271,N,11952.82288,E,4.059,198.65,120514,,,A*6D
Detailed field definition is given in Table 1.
Table 1. Field in the VDR sample. The fields in our VDR sample is defined using NMEA standards (obtained
from [12]).
Field Comment
$GPRMC Recommended Minimum sentence C
033416.00 Fix taken at 03:34:16.00 UTC
AStatus A=active or V=Void
2619.06271,N Latitude 26 degree 19.06271’ N
11952.82288,E Longitude 119 deg 52.82288’ E
4.059 Speed over the ground in knots
198.65 Track angle in degrees
120514 Date 2014/05/12
,,,A Magnetic Variations and “Unknown”
*6D The checksum data, always begins with *
GPRMC uses the degree (D), minutes (M), and second (S) for angle measurements.
We have converted this representation into decimal degree (DD) representation using
DD =D+M
60 +S
3600 .
The raw data acquired has a precision up to five decimal places, indicating that the max-
imum accuracy of the minute field can be 0.00001◦
60 =0.00000016, up to a precision of
6 digits in decimal degree representation. The GPS is precise to 11.132cm at the equa-
tor and 4.3496cm at 64◦N/S. This conversion is adequate since the accuracy of GPS
is around 7.8 meters at a 95% confidence interval2. When storing information into our
database, our truncation of the decimal degree to 6 decimal digits would not generate any
bias under this supporting fact.
The VDR equipment sometimes produce erroneous results due to harsh environ-
ments on board of the fishing vessels out at the seas. We check every sample by comput-
ing the checksum (see Table 1). We discard samples which are not parsable or have incor-
rect checksum. Moreover, error factors from others reasons should also be considered,
including invalid triangulation from hardware faults, accidental human intervention that
changes file contents, and power shortage onboard fishing vessels which can cause long
break between VDR samples. These types of data account for 0.6% of the total samples
over 8 years.
4.2. Distributing the Raw VDR Samples
The VDR data we obtained are not real-time data. The reason is that using real-time
reporting method such as the vessel monitoring system (VMS), which relays information
through satellite communication is very costly. The government declares relaying the
information in real-time is a tremendous burden to the finance as VDR data are recorded
very frequently; as compared to VMS which are recorded once every 2 to 6 hours, VDRs
are recorded once every 3 minute in current configuration and regulation. As result, VDR
samples can only be acquired when a vessel docks for refueling. The VDR data will
be downloaded from the fishing vessel and uploaed to the server at CSNME during the
fueling process. The files downloaded at this point will contain trajectory since the last
download to today, which may span over a month.
After parsing the raw VDR data, we re-distribute the samples into its corresponding
year and month tables. Illustrated in Figure 3, the data for 2014/06 (green) may contain
information before 2014/06 as the vessel may operate overseas for various number of
months. After filtering out the samples that belongs to 2014/06 (blue), we merge the
remaining data (light purple) with the data of 2014/05 for the next iteration. This process
is continued until the original set is empty or contains only trash data.
4.3. Voyage Construction
The next step construct voyages from the VDR data. We define a single voyage of a
vessel to be:
•After a vessel enters a harbor and before the vessel exists a harbor. (Docking)
•When a vessel leaves a harbor and then re-enters a harbor. (Operating)
•When a vessel is at a harbor, and the next sample shows up in another harbor.
(Possibly the device was carried on land to be installed somewhere else.)
We group related VDR samples together to form individual voyage segments. The pseu-
docode of the partitioning algorithm based on the three criterions above is shown in Al-
gorithm 1. We classify the vessel into 2 states in this paper. First is the idling state, which
the vessel is within a port. As long as the vessel does not leave this port, all activities
2Source: http://www.gps.gov/systems/gps/performance/accuracy
2014
20142014
2014/
//
/06
0606
06 2014
20142014
2014/
//
/05
0505
05 2014
20142014
2014/
//
/04
0404
04 2014
20142014
2014/
//
/03
0303
03 2014
20142014
2014/
//
/02
0202
02
2014
20142014
2014/
//
/06
0606
06
Filter
FilterFilter
Filter
2014
20142014
2014/
//
/06
0606
06
Filter
FilterFilter
Filter
2014
20142014
2014/
//
/05
0505
05
2014
20142014
2014/
//
/06
0606
06
2014
20142014
2014/
//
/05
0505
05
Filter
FilterFilter
Filter
2014
20142014
2014/
//
/04
0404
04
2014
20142014
2014/
//
/06
0606
06
2014
20142014
2014/
//
/05
0505
05
2014
20142014
2014/
//
/04
0404
04
Filter
FilterFilter
Filter
2014
20142014
2014/
//
/03
0303
03
Filter
FilterFilter
Filter
2014
20142014
2014/
//
/06
0606
06
2014
20142014
2014/
//
/05
0505
05
2014
20142014
2014/
//
/04
0404
04
2014
20142014
2014/
//
/03
0303
03
2014
20142014
2014/
//
/02
0202
02
2014
20142014
2014/
//
/06
0606
06
2014
20142014
2014/
//
/05
0505
05
2014
20142014
2014/
//
/04
0404
04
2014
20142014
2014/
//
/03
0303
03
2014
20142014
2014/
//
/02
0202
02
Figure 3. Distributing raw samples into the database. A single file may contain information not only from
the current month, but also information of the previous months.
Algorithm 1 Voyage partitioning algorithm.
Require: The set of vessels v∈ {CT 0.., CT 1..,.. . , CT 9,C T X};
Ensure: Voyage fragments r’s for all v;
1: Read from database of v’s trajectory Ts sorted by timestamp;
2: Let hbe the set of harbors in Taiwan;
3: Initialize a new voyage r;
4: for all VDR samples tx∈Tdo
5: if tx∈hand tx−1/∈hthen
6: We have found a voyage rof v;
7: Record it into the database and initialize rfor the next voyage;
8: else if tx/∈hand tx−1∈hthen
9: We have found a voyage rof v, record it into the database and reset r;
10: else if txand tx−1both /∈hor both are ∈hthen
11: Merge txinto current voyage r;
12: end if
13: end for
14: return
are considered as idling state. This including docking, fueling, unloading goods, and
maintenance. Second is the active state, where the vessel leaves a port for operation.
4.4. Route Stitching
Since VDR data stored in our database are partitioned into monthly resolution, the voy-
ages constructed using Algorithm 1 can be fragmented for vessels who has been docking
in a harbor or navigating around the world across month boundaries. Algorithm 2 is ap-
plied to stitch related voyage fragments into a complete trajectory. We maintain a virtual
double linked list structure in our database, which as the advantage of future mainte-
nance and minimal modification to the original structure. First, we do not need another
database schema to store the full trajectory. Second, as data arrive asynchronous from
different VDR downloading sites, the virtual double linked list pointers can be updated
incrementally without affecting other data. For any given time frame, we can trace both
Algorithm 2 Voyage stitching algorithm.
Require: The set of voyages rfrom the database;
Ensure: Complete voyages Rfor all vessels;
1: for all Voyage rnof a vessel vdo
2: if rnends in a harbor hthen
3: if The next route rn+1of vstarts in the same harbor hthen
4: Update rnto point to the next voyage rn+1;
5: Update rn+1to point back to the current voyage rn;
6: end if
7: end if
8: if rends outside a harbor then
9: Let next voyage of vbe rn+1;
10: Update rnto point to the next voyage rn+1;
11: Update rn+1to point back to the current voyage rn;
12: end if
13: end for
14: return
forward and backward of the voyage trajectory by following the links. This data structure
allows us to trace the beginning and end of a voyages through months within linear O(n)
queries, where nis the number of months this particular trajectory has covered. For ex-
ample, in Figure 4, CT8000001’s voyage ends at month x, and previous trajectory can be
obtained by tracing backward to month x−1. CT1000001 starts at month xand extends
into months x+1 and continues on. We make the following observations in designing
our voyage stitching algorithm:
•If a vessel stays within the same harbor (or adjacent and connected harbor) it is
stitched into the same voyage.
•If a vessel voyage terminates somewhere outside any harbor, then the initial point
of the next voyage from the next month should be a consecutive trip.
•If a vessel end in some harbor and begins in another non adjacent harbor (VDR is
manually moved or something), it is considered to be two voyages.
Shown in Figure 5, we have a vessel whose complete voyage span 4 months. We use our
algorithm to join the segments forming a complete trajectory as in Figure 6. Partitioning
by using time-distance information from 2 consecutive VDR samples may not work for
earlier models of the VDR, which are not equipped with internal battery. Fishing vessels
tend to anchor and power down the whole vessel when waiting or idling on the sea
to conserve fuel. This will lead to an arbitrary gap between the two consecutive VDR
samples.
4.5. Incremental Data Flow
The VDR data is collected each time the vessel fuels, but the raw data is pooled and
delivered once per season. Thus, the database would be updated on a quarterly interval.
The process for updating the database when new data arrives are:
1. The raw data is parsed and checked for any errors.
CT
CTCT
CT1000001
10000011000001
1000001
Month
Month Month
Month
x
xx
x
...
......
...
CT
CTCT
CT8000001
80000018000001
8000001
...
......
...
CT
CTCT
CT9000001
90000019000001
9000001
CT
CTCT
CT1000001
10000011000001
1000001
Month
Month Month
Month
x
xx
x
+
++
+1
11
1
...
......
...
CT
CTCT
CT9000001
90000019000001
9000001
...
......
...
...
......
...
CT
CTCT
CT8000001
80000018000001
8000001
Month
Month Month
Month
x
xx
x
-
--
-1
11
1
CT
CTCT
CT9000001
90000019000001
9000001
...
......
...
...
......
...
...
......
...
...
......
...
...
......
...
...
......
...
Figure 4. Double linked list implementation in the database. This structure allows our algorithm to find
the initial position and final position of a voyage given any segment.
(a) The vessel initiates a voyage in month x.(b) The vessel is traveling through month x+1.
(c) The vessel is traveling through month x+2. (d) The vessel returns to its port in month x+3.
Figure 5. Voyage stitching. We join voyage segments which should be one single trip into a single trajectory.
2. The data will be distributed among the months preceding the arrival month.
3. For each month’s data that has been increased, Algorithm 1 should be executed
to generate new voyage segments.
4. Months that have new voyage segments generated should execute Algorithm 2 to
extend (or merge) incomplete voyage segments.
5. Experiment Results
We have process VDR data from January 2008 to June 2014. These data are confidential
government data and thus masking and de-identification of vessel owners is required. We
have processed approximately 3.4 billion VDR samples and over 12 thousand unique
VDR devices. The front desk visualization is done using Cesium WebGL [10], which
Figure 6. Complete voyage constructed from segments. The complete voyage can be constructed with our
algorithm with minimal changes to the database.
utilizes 3D acceleration in web browsers. GIS informations are acquired from Microsoft
Bing Maps and Open Street Maps. The primary server is Asus RS700/E7 with Xeon
2.0Mhz processor and the primary database storage is offloaded to the QNAP TS-469
network attach storage (NAS).
Figure 7 shows 7 trajectories of a vessel with over 8000 GPS points plotted with
Cesium WebGL (with some modifications to conserve browser memory) using Microsoft
Bing Maps and Open Street Maps as the GIS layer. We have added bearing information
to the trajectories (see Figure 8, with longer and larger arrows indicating that the vessel is
traveling faster. The query time for a single trajectory usually ranges from 0.5 second to
3 seconds, depending on the length of the voyage and the number of months the voyage
span.
For the next experiment, we try visualizing a larger number of points. We did 2
queries for a lengthy voyage. The first voyage contains 37488 points and the second
voyage contains 53869 points, and the queries were completed in 1.281 seconds and 2.57
seconds respectively. The results are shown in Figure 9. There are 91357 points on this
figure. Regular Google 2D map API is unable to process such a number of data points,
but by resolving to 3D WebGL, the problem is solved.
6. Conclusion
Computers and equipment today generate information at a tremendous rate. For our case,
VDR samples could be adjusted to 1 sample per minute or even 1 sample per second. In
Figure 7. Trajectory of a vessel. This plot shows 7 trajectories of a vessel leaving a harbor for operation and
returning.
other disciplines, approximation algorithms and methods have been used to analysis this
mass of information (see [13,14]). The first difficulty lies in exactness. Our system needs
to be exact in some aspects when it comes to computing stipends for fishing vessel. A
little error will cause the government to pay too much, or on the other hand, encroaches
the right of fishermen to receive their exact fuel compensation.
Second, VDR data are collected each time fishing vessels returns to harbor for fuel-
ing. This means that events overseas have already happened and we can not have a real
time visual. Although VMS uses satellite communication to relay information real-time,
it does it at a 2 to 6 hour interval. It would cost too much for VDRs which takes samples
at a high frequency.
Third, our database provides efficient queries of the vessels state and trajectory. A
systematic way of parsing the raw data to the database and well-designed data represen-
tation speeds up the process of acquiring meaningful information. In combined with 3D
WebGL visualization, we can show the position of designated vessels on the globe.
Figure 8. Zooming the map. The arrows showing the bearing of the vessel is now visible. Larger arrows
represent that the vessel is traveling faster.
Figure 9. Plotting lengthy trajectories. The 2 trajectories span a period over 6 months.
Last, our research provides a platform for future fisheries management and surveil-
lance. We know that there are more than 20 fishing methods over the vessels monitored.
With extra information such as the yield of catch reports, we can estimate the efficiency
of each vessel. Moreover, we can see if any vessel is trespassing into no fishing zone or
territories belonging to outer countries which can cause international disputes. Providing
this efficient system with clean data will save preprocessing time for future researches.
Future development of this research will include merging coast guard records, catch-
ing statistics, fuel records, and hydrological information. State analysis are broadly clas-
sified into to sets in current systems, and will be subdivided into more categories in
the future. For example, docking state will contain fueling, unloading, maintenance, and
powered off. Operating state will contain navigating, seeking, net casting, harvesting, and
idling. With this system, further analysis including resource estimation and environment
monitoring can be possible.
References
[1] H.-R. Wu, M.-Y. Yeh, and M.-S. Chen, “Profiling moving objects by dividing and clustering trajectories
spatiotemporally,” Knowledge and Data Engineering, IEEE Transactions on, vol. 25, no. 11, pp. 2615–
2628, 2013.
[2] H.-K. Pao, J. Fadlil, H.-Y. Lin, and K.-T. Chen, “Trajectory analysis for user verification and recogni-
tion,” Knowledge-Based Systems, vol. 34, pp. 81–90, 2012.
[3] P. Yin, M. Ye, W.-C. Lee, and Z. Li, “Mining GPS data for trajectory recommendation,” in Advances in
Knowledge Discovery and Data Mining. Springer, 2014, pp. 50–61.
[4] L. Chen, M. Lv, and G. Chen, “A system for destination and future route prediction based on trajectory
mining,” Pervasive and Mobile Computing, vol. 6, no. 6, pp. 657–676, 2010.
[5] J. J.-C. Ying, W.-C. Lee, T.-C. Weng, and V. S. Tseng, “Semantic trajectory mining for location pre-
diction,” in Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geo-
graphic Information Systems. ACM, 2011, pp. 34–43.
[6] C. Wang, D. De, and W.-Z. Song, “Trajectory mining from anonymous binary motion sensors in smart
environment,” Knowledge-Based Systems, vol. 37, pp. 346–356, 2013.
[7] E. Walker and N. Bez, “A pioneer validation of a state-space model of vessel trajectories (VMS) with
observers data,” Ecological Modelling, vol. 221, no. 17, pp. 2008–2017, 2010.
[8] L. Mak, M. Sullivan, A. Kuczora, and J. Millan, “Ship performance monitoring and analysis to improve
fuel efficiency,” in Oceans-St. John’s, 2014. IEEE, 2014, pp. 1–10.
[9] A. Agopyan, E. Sener, and A. Beklen, “Financial business cloud for high-frequency trading: A research
on financial trading operations with cloud computing,” International Journal On Advances in Intelligent
Systems, vol. 4, no. 3-4, pp. 203–217, 2012.
[10] P. Cozzi and D. Bagnell, “A webgl globe rendering pipeline,” GPU Pro 4: Advanced Rendering Tech-
niques, vol. 4, pp. 39–48, 2013.
[11] (2015) MariaDB. [Online]. Available: https://mariadb.org
[12] K. Betke, “The NMEA 0183 protocol,” Standard for Interfacing Marine Electronics Devices, National
Marine Electronics Association, Severna Park, Maryland, USA, 2001.
[13] T. Heinis, “Data analysis: Approximation aids handling of big data,” Nature, vol. 515, no. 7526, pp.
198–198, 2014.
[14] G. Luo and G. Pei, “A novel multivariate polynomial approximation factorization of big data,” in Intel-
ligent Computation in Big Data Era. Springer, 2015, pp. 484–496.