Page 1

Estimating Violation Risk for Fisheries Regulations

Hans Chalupsky1, Robert DeMarco2, Eduard H. Hovy3, Paul B. Kantor2, Alisa Mat-

lin2, Priyam Mitra2, Birnur Ozbas2, Fred S.Roberts2, James Wojtowicz2, Minge Xie2

1 USC Information Sciences Institute, CA, USA

2 Rutgers, the State University of New Jersey, NJ, USA

3 Carnegie Mellon University, PA, USA

Acknowledgements. This report was made possible by a grant from the U.S. Coast

Guard District 1 Fisheries Law Enforcement Division to Rutgers University. The

statements made herein are solely the responsibility of the authors.

We extend a special thanks to LCDR Ryan Hamel and LT Ryan Kowalske for

working with us on this project, for their support and patience throughout this process.

Thanks also to CCICADA researchers Andrew Philpot and William Strawderman.

Dedication. This paper is dedicated in memoriam to Dr. Tayfur Altiok. Without

his efforts and motivation this project would not have been possible.

Abstract. The United States sets fishing regulations to sustain healthy fish pop-

ulations. The overall goal of the research reported on here is to increase the ef-

ficiency of the United States Coast Guard (USCG) when boarding commercial

fishing vessels to ensure compliance with those regulations. We discuss scoring

rules that indicate whether a given vessel might be in violation of the regula-

tions, depend on knowledge learned from historical data, and support the deci-

sion to board and inspect. We present a case study from work done in collabo-

ration with USCG District 1 (HQ in Boston).

Keywords: Regulatory compliance, Coast Guard, Fisheries, Machine learning,

Statistical models

1

Introduction

This paper describes a targeted risk-based approach to enforcing fisheries laws in the

United States Coast Guard First District 1 (USCG D1), based in Boston, Massachu-

setts. The work is a joint project of the Laboratory for Port Security (based at Rutgers

University) and the Command, Control and Interoperability Center for Advanced

Data Analysis (CCICADA, a US nationwide consortium headed by Rutgers).

Fisheries rules and regulations have been established through a complex process

whose key aims include preservation of the fisheries biomass. The primary mission of

the fisheries law enforcement program is to maintain a balanced playing field among

industry participants (professional fishing companies) through effective enforcement

of the regulations. Over the years USCG D1 has developed an approach to fisheries

Page 2

law enforcement, which among other things includes scheduling fishing vessel in-

spections using a scoring matrix. In this paper we describe a project aimed at validat-

ing and extending the scoring matrix by further refining the ability to determine the

risk target profile of active vessels within the population of the First District.

Our research seeks a model that determines which vessels pose a higher safety risk

through non-compliance with safety codes and which vessels are most likely to be

contravening fishing laws and regulations. The main measure of effectiveness ex-

plored here, “boarding efficiency” (BE), is defined as the fraction of recommended

boardings that yield either a fishery or a safety violation. We also formulate other

measures of effectiveness and study approaches to improving them.

Currently the USCG determines whether to board a fishing vessel using a rule

called OPTIDE (created by LCDR Ryan Hamel and LT Ryan Kowalske of USCG

D1), which constructs a score by assigning points to known factors describing a ves-

sel, such as the time since last boarding and the vessel’s history of fisheries violations.

The OPTIDE system recommends boarding if the sum of points exceeds a threshold.

The developers of the method used expert opinion to select the factors in the rule, and

to set their relative weights. The scoring matrix was developed using expert

knowledge. This paper addresses the question: Can naïve researchers using methods

of data analysis approach the effectiveness of such expert rules?

The USCG made available 11 years of data on USCG boarding activities and vio-

lations incurred by commercial fishing vessels. Our project studied introducing other

features, such as weather, seasonality, fish price, fish migration, key fish species,

home port, and detailed vessel history. The project team worked with economic data

such as fish market prices and considered socio-economic factors such as family fish-

ing boats in comparison to large commercial fishing vessels and fishers’ attitudes

toward law enforcement. We looked at the seasonal variation in boardings and out-

comes. In the analysis, fisheries violations were separated from safety violations.

Machine learning methods were used to seek other features, or combinations of

present and added features, that might lead to decision rules increasing the BE. In

addition, alternative models for the boarding decision were considered. One model

poses a choice of which boat to board, within a set of K alternatives. Section 2 de-

scribes this approach. Another approach sought regression models that derive alterna-

tive weights for the same features used in OPTIDE. This method is discussed in detail

in Section 3. Section 4 discusses alternative goals, including balanced deterrence,

balanced policing, and balanced maintenance of safe operations. Here we discuss

alternative measures of effectiveness, e.g., violations found per hour rather than per

boarding. We also discuss alternative decision strategies: random strategies; varying

the number of boats used based on weather, season, or economics; alternative search-

ing protocols to find the candidate vessels for boarding.

2

RIPTIDE: A Machine Learning Approach

In this section we describe a scoring rule, RIPTIDE, which loosely stands for Rule

Induction OPTIDE. RIPTIDE extends OPTIDE by learning a more fine-grained and

Page 3

data-driven prediction and ranking model from past activity data, using a machine

learning approach. Using the best model found so far, RIPTIDE outperforms OPTIDE

by up to 75% with regard to a specific scoring rule, described in more detail below. A

software package implementing RIPTIDE can be used to experiment with the learned

models, and can be applied to rank operational data.

The OPTIDE rule was built based on expert judgment and intuition. It is an ab-

straction of a set of features that a commanding officer will routinely consider when

deciding whether to board a vessel. However, to our knowledge, there had been little

or no optimization of the rule based on historical data.

To extend OPTIDE, we used a data-driven machine learning approach to learn a

classification model from historic boarding activity data. RIPTIDE uses machine

learning to automatically find regularities in past boarding activity data and encodes

them in a model (or classifier) that can then be used to rank new, previously unseen

candidate boarding opportunities. The classifier takes a single (new) data instance and

applies the previously learned model to assign the new instance to one of two classes

(e.g., “violation” or “no violation”). In doing so, the classifier estimates a probability

that may be interpreted as the “confidence” of the prediction. This estimate is based

on how well the model performed for similar cases on the training data. These proba-

bilities can then be used to rank instances, as does the OPTIDE risk score.

Machine learning is built upon two core principles, data representation and gener-

alization. First, every data instance is represented in a computer-understandable form.

This is generally done by engineering a set of features or attribute/value pairs that

carry relevant information and that can be either directly observed or computed from

the data. In the generalization phase, the classifier uses many data instances for which

the class is known as training data, and seeks regularities in that data that allow it to

predict the class of a new data instance. There are many different data representation

schemes and learning algorithms that can be used (see, e.g., [2, 5, 9] for an overview).

For RIPTIDE, we chose a learning algorithm called a boosted decision tree that is a

good general-purpose tool for problems with a small to medium number of features.

One advantage of decision trees is that the learned models are (large) ‘if-then-else’

statements that can be inspected by humans, and that are therefore to some extent

understandable. This is useful for comparison to a rule-based approach such as

OPTIDE, as the experts want to be able to decide whether they should trust such a

model. Other learning methods such as support vector machines or neural nets pro-

duce largely if not completely opaque models, which can be judged only by their

input/output behavior.

Classification performance can be improved by combining multiple classifiers that

were trained using different algorithms, features, sections of the data, etc. One such

strategy is called boosting. In boosting, instead of learning a single decision tree, we

learn multiple trees on different subsets of the training data. An algorithm such as

AdaBoost [4] (for Adaptive Boosting) then learns the “best” weights for combining

the results of those individual decision trees into an overall boosted decision tree. For

our currently best-performing classifier (Model 58), boosting improves performance

on a boarding tradeoff task (described below) by about 25%.

Page 4

Some 10,000 boarding activities from 2002 to the end of 2011 were used as train-

ing data and a set of about 1000 boardings in 2012 was used as a held-out test set to

evaluate the models. To use a classifier such as RIPTIDE, one must set a threshold,

which we can estimate from the training data. If the estimated probability of finding a

violation is above the threshold, we recommend boarding a vessel; otherwise, not. Let

TP be the number of true positives, that is, cases where the score is above threshold,

and the boarding in fact found a violation; the remaining cases where the classifier

says “board” are the false positives FP. Standard measures of effectiveness (MOEs)

for classifiers are recall R (the percentage of vessels having some violation that are

flagged for boarding), precision P = TP/(TP+FP) measuring the fraction of true deci-

sions, and their harmonic mean, known as the F1 value: F1 = 2*P*R/(P+R). Picking

a low probability threshold will give high recall but low precision; conversely, a high

threshold will give high precision but low recall. Every choice represents a tradeoff

between TP and FP, and what is acceptable depends on external factors such as task

objectives and resources. Using a generic rule such as maximizing R or F1 value will

generally not give the best compromise in practical applications.

The best way to compare classifiers without setting a threshold is to plot ROC (Re-

ceiver Operating Characteristic) curves. An ROC curve shows the true positive rate

(or recall) plotted against the false-positive rate, that is the ratio of false predictions to

the number of non-violating vessels, for each possible threshold point. The curve

shows a tradeoff space showing how many more false positives one must accept to

get additional true positives.

We can use the area under the ROC curve to compare different classifiers; a higher

area under the curve generally means a better classifier. Figure 1 shows a comparison

of ROC curves for OPTIDE and Model 58 for the held-out test data covering the year

2012. Both models have more or less identical area under the curve (AUC) of about

0.65, This shows that they are doing better than random choice (the dotted line with

an AUC of 0.5), but not very much so, indicating that there is not a very strong signal

in the data to begin with. Model 58 is doing significantly better at picking up the

higher yield boardings (the bump at the beginning of the curve), but it loses that ad-

vantage towards lower-risk boardings. It also is much more fine-grained than

OPTIDE, a feature we will explore in more detail below.

In the current formulation of OPTIDE, for values of the score, the yield distribu-

tion is very flat, which can be seen in the long straight sections of the OPTIDE ROC

curve. About 84% of all boardings fall in a very narrow band of yield close to the

threshold level. This means a large number of ships are apparently indistinguishable.

Our analysis of the data suggests that there are no standout “red flags” that positively

indicate that a ship might be in violation of some regulation. Even among vessels

having the highest risk score, only one third of boardings yield a violation. This

means we cannot assign a strong meaning to any of the OPTIDE risk categories.

Instead of focusing on absolute risk scores with a global interpretation, we explore

an alternative MOE: How well can a model select among a small set of alternative

vessels? For example, a set of ships may be encountered more or less simultaneously,

calling for an informed decision as to which ships to board, given available time and

resources. Technically, this calls for ranking the boats in the small candidate set.

Page 5

Fig. 1. ROC curves for OPTIDE and Model 58 for the held-out test data in 2012. Model 58 is a

weighted combination of 20 different tree models, found using AdaBoost.

To evaluate ranking performance we consider the following MOE. Given a test set

of boarding activities such as the 2012 held-out set, we randomly pick a set (or buck-

et) of size k and rank the elements in the bucket according to our model. We then pick

the top-ranked boarding activity in the bucket (choosing randomly in case of ties) and

test whether it actually had a violation or not. We repeat this experiment many times

and compute the fraction of trials in which we picked a winner (i.e., a boarding with a

violation). The probability of picking a winner is strongly dependent on the bucket

size, since smaller buckets have a smaller chance of containing a vessel with a very

high score. For example, for the held-out set of 1002 boardings of which 14% yielded

a violation, the probability that a random set of two boardings contains at least one

with a violation is about 26%, for 5 it is 53%, for 10 it is 78% and for 20 it is 97%

(almost certain). Note that this high probability doesn't mean that it is easier to find

one with a violation; that aspect still requires a good ranking function to find the best

item in the bucket. Since all of our analysis is based on data collected under historical

boarding policies, and, more recently, OPTIDE, the practical implications of the find-

ings in this section remain to be explicated in future work, which our USCG partners

are currently undertaking in exploration of our new ideas.

Table 1 shows the results of these experiments. It compares our currently best

model, Model 58, to OPTIDE and two other models. Model 58 includes features not

used in OPTIDE, such as distance to coast and vessel subtype. An alternative model

(Model 57) omits a feature (distance to coast) and still a third model (Model 48) adds

something called observed activity as a feature. The top of Table 1 shows standard

AUC and Max-F1 metrics, and all models perform fairly similarly. In the lower por-

tion, we show results on ranking experiments with bucket sizes ranging from 2 to 50.

We find that our best model improves up to 76% over OPTIDE for a bucket size of

20, where we have an almost 45% chance to pick a winner, and even for a more real-

istic bucket size of 10, the improvement is still a good 38%. This shows that the ap-

parently small advantage of RIPTIDE at higher levels of yield can become a substan-

Page 6

tial improvement if it is possible to batch the candidate vessels and choose the most

likely one to board.

Table 1. Evaluation results for OPTIDE and several alternate models

Random OPTIDE Model

57

Model

48

Model

58

58 vs.

OPTIDE

N-Thresh

Max-F1

AUC

Bucket Size

15 135

0.300

0.626

191

0.310

0.656

206

0.328

0.646

0.301

0.648

+9.0%

-0.3%

Choose 1 of k

5 0.135 0.210 0.217 0.236 0.243 +15.9%

10 0.135 0.237 0.279 0.311 0.328 +38.5%

15 0.135 0.244 0.328 0.364 0.393 +60.9%

20 0.135 0.251 0.363 0.403 0.443 +76.4%

25 0.134 0.261 0.399 0.440 0.484 +85.1%

30 0.135 0.276 0.422 0.466 0.516 +86.8%

35 0.135 0.290 0.447 0.488 0.542 +86.6%

40 0.134 0.307 0.464 0.505 0.567 +84.7%

50 0.137 0.336 0.492 0.542 0.601 +78.9%

We have developed a small RIPTIDE software suite that can be used to classify

and rank potential boardings based on the best models found so far, and to retrain

models if necessary. RIPTIDE builds upon the Weka toolkit [5] and adds a number of

methods for data translation and various other tasks. RIPTIDE is purely Java based

and can be run on Linux, MacOS and Windows platforms

Using the RIPTIDE approach in practice will require the users to retrain the ma-

chine learning models at regular intervals, perhaps on a yearly basis, to ensure that

significant changes in behavior are incorporated. This would be an uncomplicated

task, as long as the basic set of features to consider remains the same or similar. The

actual implementation of RIPTIDE is experimentally underway at the USCG.

3

DE-OPTIDE

In this section, we describe an alternative approach that utilizes regression methods

in statistics and the historical data to derive alternative weights for the same features

used in OPTIDE. Based on this approach, a new decision rule was developed, called

Data-Enhanced OPTIDE (DE-OPTIDE). We compare its performance with the origi-

nal OPTIDE rule.

An underlying assumption of OPTIDE is that probability of a violation is related to

an underlying score that is a weighted sum of some predictor variables X1, X2, …., Xn

(i.e., features used in the OPTIDE rule). The decision is made to board if the score

Page 7

exceeds a threshold. This assumption, plus potential random errors, leads us directly

to a statistical model called a logistic regression model (see [6]). Logistic regression is

an instance of a generalized linear model [1, 8]. It allows one to analyze and predict a

discrete outcome (known as a response variable), such as group membership, from a

set of variables (known as predictor variables) that may be continuous, discrete, di-

chotomous, or a mix of any of these. Generally, the response variable is dichotomous,

such as presence/absence or success/failure. In our case the response variable is the

violation indicator (presence/absence) of a vessel.

When sample data from such a model are available, we can perform a statistical

analysis to estimate the unknown coefficients and thus estimate the relationship be-

tween the response and predictor variables. We can then use the logistic regression

model to predict the category to which new individual cases are likely to belong.

We assume a violation is related to an underlying latent score S which is a

weighted sum of some predictor variables plus potential errors, i.e., S = W1X1 + W2X2

+ … +WnXn + error, where the Ws are weights describing the contributions of the

feature and the random “error” follows a normal distribution with mean 0 and vari-

ance σ². As with the tree-based rules, if the score of a vessel exceeds a certain thresh-

old value, the vessel should be boarded. Mathematically, these assumptions lead to

the aforementioned logistic regression [3,10]. We used logistic regression and the data

set available to us to estimate the coefficients W1, W2, …, Wn and we then used these

weights to create a new decision rule. Since the new decision rule uses the same fea-

tures as in the original OPTIDE rule but their weights are determined by the historical

data, we call the new rule a Data-Enhanced OPTIDE (DE-OPTIDE) rule.

We note that in the original OPTIDE matrix, all of the features are categorical.

Although some of them are naturally continuous, they are categorized or binned for

the analysis, which may cause some loss of information. We therefore performed an

additional analysis using the same set of features, but retaining continuous values for

some of the features. Using the continuous versions does somewhat improve the per-

formance of the DE-OPTIDE rule. In treating the features as continuous, we em-

ployed standard imputation techniques for missing data.

In our analysis, we randomly split the entire boarding data set available to us into

two subsets: 50% used for training and 50% used for validating. We fit the logistic

regression model to the training data and used the estimated probabilities to determine

a new decision rule. Then we applied the new rule to the remaining 50% of data to

assess its effectiveness. In the new decision rule, the threshold for boarding was cho-

sen by either setting a required percentage of vessels to be boarded, or setting a target

boarding efficiency. To control variation caused by the random 50-50 splitting, the

calculations were repeated 10 times. Therefore, the results we describe do not corre-

spond to a single unique boarding rule.

Starting with just categorical data, we explored the relationship between the Board-

ing Efficiency BE and the percentage of recorded boardings (that is, the fraction of all

records in the data set for which boarding is recommended, at a given threshold).

Results are shown in Figure 2. When applied to the data that was not used to train the

model, DE-OPTIDE yields a somewhat higher or similar BE compared to OPTIDE

for almost the entire range of recorded boarding percentages. For DE-OPTIDE, effi-

Page 8

ciency ranges from 20% to 35%, and setting the threshold to reduce the number of

boardings yields higher efficiency. This is because the rule ranks vessels by their

probability of yielding violations. Therefore, when fewer are boarded, the average

chance of finding a violation is higher. In choosing the threshold for the decision rule

one may need to take into account not just efficiency but also the fraction of recorded

boardings.

Fig. 2. Boarding Efficiency vs. percentage of recorded boardings using both OPTIDE and DE-

OPTIDE for different thresholds (test data 50%). The results are based on 10 repetitions of the

random selection of training data.

We also compared the efficiency of DE-OPTIDE with that of OPTIDE using an-

other MOE. The threshold for DE-OPTIDE was chosen based on examining the effi-

ciency of the procedure over different percentages of recorded boardings. We found

that efficiency for DE-OPTIDE with a decreasing percentage of recorded boardings

starts to increase when the percentage of recorded boardings is less than 10%. Thus,

we chose the threshold corresponding to 10% of recorded boardings for DE-OPTIDE.

We also explored an alternative way of selecting the threshold for OPTIDE, i.e.,

letting threshold correspond to 10% of the recorded boardings (RBs), as we did with

DE-OPTIDE. We found that the efficiency of the DE-OPTIDE procedure reaches

32%, compared to 24% efficiency of OPTIDE when using an adjusted threshold (due

to our data omitting values for some of the OPTIDE features) and 27% if we use

OPTIDE with threshold corresponding to 10% of RBs. We recognize that the USCG

would not cut boardings to one tenth of the current level. However, some combination

of this rule in a randomized or mixed strategy for boarding might be effective. Note

that selecting vessels for boarding purely at random yields only 16% efficiency.

Figure 3 presents the ROC curves for both the OPTIDE and DE-OPTIDE rules.

This plot helps to illustrate the performance of these two decision rules as the thresh-

old is varied over the entire range of possible values. The ROC curve for OPTIDE has

an area of 0.576 under the curve, while that for DE-OPTIDE has AUC = 0.605.

Again, this indicates that the DE-OPTIDE rule is somewhat better than the OPTIDE

rule. These plots are based on a single random selection of the training data. Plots

from nine other repetitions are similar.

Page 9

Fig. 3. The ROC curves for both the OPTIDE and DE-OPTIDE rule for various choices of

thresholds (test data =50%). The plots are each based on a single run. Plots for 9 other runs

show the points for DE-OPTIDE lying almost always above those for OPTIDE itself.

Next we used logistic regression treating certain features as continuous. We com-

puted the relationship of the BE to the percentage of recorded boardings under the

modified DE-OPTIDE rule using some continuous features, a rule we call DE-

OPTIDE-C. DE-OPTIDE-C achieves better efficiency than OPTIDE. For OPTIDE,

efficiency ranges from 20% to 30%. For DE-OPTIDE-C efficiency rises to almost

35% at levels below 10% of recorded boardings. As with the discussion of batching in

Section 2, it is not known whether the set of candidates could be expanded enough for

such a lower fraction of sightings to yield an acceptable number of boardings.

We also compared the efficiency of DE-OPTIDE-C to that of OPTIDE using alter-

native ways of setting the threshold. The efficiency of the DE-OPTIDE-C procedure

reaches 34%, compared to 32% for DE-OPTIDE.

4

Other Approaches

In this section we consider other MOEs, e.g., violations per hour of enforcement

activity rather than violations per boarding. We also mention alternative decision

strategies: random strategies; changing the number of patrol boats based on factors

such as weather, season, or economics; and varying the protocols for finding

candidates for boarding.

4.1

Other Ways of Measuring Effectiveness

The models discussed so far consider all violations to be equally important. From the

perspective of deterrence, this is plausible. But in terms of economic impact on

fisheries and lives saved it may be more appropriate to group violations into classes

i=1,2,…,.I and seek to maximize the sum Σwixi where xi is the number of violations in

class i. For this to be meaningful the weights must be defined on an interval or ratio

scale, and not be simply ordinal [12,13].

Page 10

The “denominator” in the MOE has been “boardings.” Alternatively, we may want

to measure effectiveness against time. Time is spent both in boarding and in seeking

the next candidate. The choice of which to use will lead to different decisions.

Suppose (based on the scoring rule) Vessel A has estimated 12% yield (probability a

violation will be found) and the predicted time for the boarding is 4 hours. Vessel B

has 15% yield and predicted boarding time 6 hours. If efficiency is violations per

boarding (VPB), Vessel A has 0.12 VPB, and Vessel B has 0.15 VPB. We prefer to

board Vessel B. If efficiency is violations per hour (VPH), then Vessel A has

0.12/4=0.03 VPH, and Vessel B has 0.15/6=.025 VPH. So we prefer to board Vessel

A. In fact, boarding time varies randomly, according to some rule that could be

estimated from data. One might also include in the denominator time spent seeking

the next candidate.

4.2

Other Kinds of Enforcement Strategies

The OPTIDE-class rules discussed here are deterministic. Randomized strategies

make it harder for intentional violators. The variation in goals discussed in Section 4.1

might be incorporated into a randomized mixture: e.g. 30% of time use OPTIDE, 40%

of time use VPB, and 30% of time use VPH.

We can model the boarding decision as a choice between boarding and seeking

further targets. For simplicity we suppose that a patrol boat meets a fishing vessel

every T minutes, and must immediately decide whether to board it. That the decision

to board must be made immediately is based on observations from [7] that fishermen

can and do modify their behavior when they observe Coast Guard boats, seeking to

limit the violations found if boarded. One boat every T minutes is a simplifying model

of the random rate at which a patrol will encounter fishing vessels.

Suppose the yield p varies uniformly from 0 to 1. Suppose boarding takes time tT.

What value of p should be the threshold for boarding? It can be shown that under

certain assumptions, the optimal choice is

(2 2)t

p

=

2)

22

(2

2

4tt

t

+−+−

As boarding time tT increases, the threshold yield p increases. This confirms the

intuition that the longer boarding takes, the pickier one must be in boarding. More

realistic models for T,t, and the distribution of p can be developed from log data.

Finally, we considered patrol strategies, using analogies to ecology where the

limiting resource is the energy available to predators [11]. In particular, we have

compared pure pursuers and pure searchers. The former expend little or no energy in

seeking food; they wait until sufficiently valuable prey (sufficiently risky vessel) is in

sight and then act (e.g. anolis lizards). Pure searchers (e.g., warblers) spend time and

energy prowling to seek food; when they sight it they decide whether to try to catch it

and in that case spend little time on pursuit. We studied when a pure searcher should

adopt the patient strategy of waiting for the “best” type of food (vessel with highest

Page 11

risk score) or the impatient strategy of waiting for a while for the “best” type of food

and then choosing what is available.

4.3

Bringing in Other Goals of Fisheries Law Enforcement

In addition to efficiency of boardings, fisheries law enforcement seeks other goals:

balanced deterrence, balanced policing, and balanced maintenance of safe operations.

To balance deterrence, the USCG might seek to board all vessels at least once a year.

This would require, at times, boarding a low yield vessel. When should this be done?

Should the rule depend on recent prior boardings? Suppose Vessel A has an estimated

yield of 13% and has been boarded twice in the past year while Vessel B has a 15%

yield and has been boarded six times in the past year. In some cases we might prefer

to board A rather than B. We might want to board neither, and wait for some boat that

has not been examined in two years.

We have developed a simple model representing a tradeoff between balance and

yield. The score is based on three parameters, y(v) = the yield assigned to Vessel v,

D(v) = days since Vessel v was last boarded, and α, a model parameter. The modified

score is S(v) = y(v) + αD(v). The probability y(v) depends on an initial class probabil-

ity for that boat and on its boarding history. The class probability reflects differences

that affect the probability of violation. Explicitly, we take y for a vessel with b past

boardings and u “successful” past boardings to be y = f(b,u) + .05Z where Z is uni-

formly distributed between −1 and 1, and f(b,u) is presumed to come from observed

data.

We ran simulations of this model, with five candidates per day, selected uniformly

at random from the 100 vessels having the highest score at the start of the day. We do

not simply take the five with highest scores because they might not all be accessible:

the patrol might stay in a particular area and not all boats are fishing each day.

Running the model 20 times for 1095 simulated days (3 years), and for each α

between .0001 and .001 (incrementing α by .0001), we found the average output. A

scatter plot comparing average number of observed violations over the entire 3-year

period to average number of vessels boarded in the last year of the simulation can

offer predictions on what the outcome might be under different scoring rubrics. Future

work will consider more general scoring metrics.

5

Conclusions

Our analysis supports several conclusions. First, the existing OPTIDE approach ex-

tracts a nearly optimal rule based on the data that are used in it. The ROC curves pro-

duced by state of the art techniques for learning rules are somewhat above the curve

for the existing OPTIDE rule. If the number of vessels considered could be increased,

operation at a higher threshold for boarding would likely result in discovering a larger

absolute number of violations per year, contributing to both fishery management and

safety goals. Second, automated methods, as described in this paper, can be used to

extract optimal rules by analysts who have no subject area expertise in this domain.

Page 12

Indeed, such methods can find decision rules that perform as well as, or somewhat

better than, models that require substantial knowledge of the data and domain exper-

tise to develop. This means that as the USCG considers adding additional variables to

the rules that trigger boardings, the automated methods used here can assess, in ad-

vance, the effectiveness of using that additional data. All that is required is to develop

a data set in which the values of those new variables are reported along with the exist-

ing key variables and the results of the boarding. Finally, we have identified ways in

which the objectives of the scoring rule work can be made more complex and closer

to the operational realities of the USCG. Preliminary theoretical work has produced

simple models showing how to include those realities in the computation of the more

sophisticated yield representing complex goals of fisheries law enforcement.

We presented the results described here to USCG D1 in a briefing to the highest-

level Coast Guard leadership. The results were very well received and are in the pro-

cess of being implemented in USCG D1. In addition, the USCG Research and Devel-

opment Center is working with D1 to explore modifications in the methods that would

make them applicable to other Coast Guard districts around the country.

References

1. Agresti, A.: Categorical Data Analysis. Wiley-Interscience, New York (2002)

2. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2007)

3. Finney, D.J.: Probit Analysis (3rd edition). Cambridge University Press, Cambridge, UK,

(1971)

4. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Thirteenth In-

ternational Conference on Machine Learning, 148-156, San Francisco (1996)

5. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA

data mining software: An update. SIGKDD Explorations, Vol. 11, Issue 1 (2009)

6. Hilbe, J. M.: Logistic Regression Models. Chapman & Hall/CRC Press, London (2009)

7. King, D.M., Porter, R.D., Price, E.W.: Reassessing the value of U.S. Coast Guard at-sea

fishery enforcement. Ocean Development & International Law, vol. 40, pp. 350-372. Tay-

lor and Francis, London (2009)

8. McCullagh, P., Nelder, J.A.: Generalized Linear Models (Second Edition). Chapman and

Hall, London (1989)

9. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

10. Morgan, B.J.T.: Analysis of Quantal Response Data. Chapman and Hall, London (1992)

11. Roberts, F.S., Marcus-Roberts, H.: Efficiency of energy use in obtaining food II: Animals.

In: Marcus-Roberts, H., Thompson, M. (eds.), Life Science Models, pp. 286-348. Spring-

er-Verlag, New York (1983)

12. Roberts, F.S.: Limitations on conclusions using scales of measurement. In: Barnett, A.,

Pollock, S.M., Rothkopf, M.H. (eds.), Operations Research and the Public Sector, pp. 621-

671. Elsevier, Amsterdam (1994)

13. Roberts, F.S.: Measurement Theory, with Applications to Decisionmaking, Utility, and the

Social Sciences. Cambridge University Press, Cambridge, UK (2009)