Conference PaperPDF Available

Applying a framework for healthcare incentives simulation

Authors:
  • Transcend Engineering

Abstract and Figures

At WinterSim 2011, we originally proposed an agent-based framework for healthcare simulations, enabling flexible integration of multiple simulation models, including models of disease progression, effects of provider interventions, and provider behavior models that are responsive to contractual incentives. In this paper, we report results using our proposed framework to integrate two examples of provider behavior models, two examples of disease models, and four examples of payment models. We explore multiple combinations of these models and simulate the impact that alternative payment models may have on health and financial outcomes. These examples test the robustness of the simulation framework, and illustrate the value of such simulations to the policy makers who design incentives to improve cost and health outcomes, and to providers who wish to evaluate the financial impact of proposed incentives on their practice.
Content may be subject to copyright.
Proceedings of the 2012 Winter Simulation Conference
C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds.
APPLYING A FRAMEWORK FOR HEALTHCARE INCENTIVES SIMULATION
Joseph P. Bigus
Ching-Hua Chen-Ritzo
Keith Hermiz
Gerald Tesauro
IBM T.J. Watson Research Center
1101 Kitchawan Road
Yorktown Heights, NY 10598
Robert Sorrentino
IBM T.J. Watson Research Center
19 Skyline Drive
Hawthorne, NY 10532
Abstract
At WinterSim 2011, we originally proposed an agent-based framework for health care simulations, enabling
flexible integration of multiple simulation models, including models of disease progression, effects of provider
interventions, and provider behavior models that are responsive to contractual incentives. In this paper, we
report results using our proposed framework to integrate two examples of provider behavior models, two
examples of disease models, and four examples of payment models. We explore multiple combinations of
these models and simulate the impact that alternative payment models may have on health and financial
outcomes. These examples test the robustness of the simulation framework, and illustrate the value of
such simulations to the policy makers who design incentives to improve cost and health outcomes, and to
providers who wish to evaluate the financial impact of proposed incentives on their practice.
1 INTRODUCTION
The health care industry is a complex system involving payers, providers, and patients. This system is
under tremendous pressure for change as consumers, employers, and government health policymakers try
to control rapidly increasing costs while simultaneously maintaining or improving health outcomes. A wide
variety of major structural changes have been proposed, potentially impacting payer revenue and costs,
provider revenue and treatment plans, and patient costs and health outcomes. Using the flexible simulation
framework and Eclipse-based development tools introduced in Bigus, Chen-Ritzo, and Sorrentino (2011),
we have built a set of parameterized health system models that can be configured by executive-level
decision-makers to ask and answer what-if questions in realistic scenarios pertinent to their business.
In this paper, we use the framework of Bigus, Chen-Ritzo, and Sorrentino (2011) to implement two
examples of provider behavior models, two examples of disease models, and four examples of payment
models. We use these models to simulate the impact that alternative payment models may have on health and
financial outcomes. Since our simulations are not based on real data, our objective is to use these examples
to test the robustness of the simulation framework and illustrate the potential value of such simulations
to health policy-makers and health providers. Health policy-makers can use such simulations to aid the
design of provider incentives to improve population cost and health outcomes. Meanwhile, providers may
use such simulations to evaluate the financial impact of proposed incentives on their practices.
A depiction of the basic interaction cycle between components in our simulator can be seen in Figure 1.
A population of Patients interacts with a health care Provider over a sequence of several decision cycles,
typically of one year duration. Based on observation of the patient types and health states, the Provider
will select from among various options for treatment of each Patient. The selected treatments will result
in claims being submitted to a Payer, which will then issue payment based on the payment model being
used. Note that the payment model may take into account health states and outcomes, along with the actual
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
Figure 1: Basic interaction loop in simulated health care scenarios.
treatment procedures, in computing amounts paid. Outcomes of treatments, along with recalculated health
states for the next decision cycle, are computed using a Disease Progression model, based on the current
health states and the selected treatments.
The key assumption that we make in determining how payment models affect financial and health
outcomes is that the Provider selects treatments on the basis of optimizing a utility function that is specified
by the payment model and the Provider’s profit margin on the various treatment options. As such, the
Provider’s objective in our simulations is solely to maximize profitability, but non-financial objectives
such as concern for patient welfare could also be included in the utility function definition. We compare
two different optimization methodologies: a one-step “myopic” optimization, based on a known model
of immediate outcomes of treatments, and a Markov Decision Process (MDP) formulation that aims to
optimize cumulative utility over multiple time steps.
The rest of the paper is organized as follows: In §2, we motivate our work by presenting and describing
screen shots of the main simulation outputs that policymakers and healthcare providers can use. In §3 we
briefly describe the two versions of diabetes models that we utilize in our simulations. This is followed by
a section that describes the available provider interventions for each of the diabetes models. The different
forms of provider incentives used in our simulations are presented in §5. Sections 6 and 7 present the
algorithms used for simulating provider behavior, along with the corresponding simulation outputs. Finally,
we conclude this paper with a summary discussion.
2 MEASURING THE IMPACT OF INCENTIVES ON HEALTH AND COST OUTCOMES
In this section, we describe a decision-support tool for use by health care executives Chief Medical Officers,
Chief Financial Officers, policy strategists, and government agencies to aid in understanding the impact of
proposed changes to the standard fee-for-service payment model.
Figure 2 shows a screen shot of a web-based user interface or dashboard to specify scenarios for
simulation and to explore the financial and health outcome results. The dashboard comprises four major
areas: on the left hand side, a set of panels containing widgets or user-interface controls for configuring
the scenario; on the right hand side, a set of tabs containing graphs or charts showing results from the
simulation; on the top, a set of buttons for controlling the simulation run; and on the bottom, a status field
that shows progress and information regarding the current simulation.
The configuration controls include five panels, business models, patients, providers, payers, and
simulation. The business models section allows the user to specify two major parameters in the simulation,
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
Figure 2: Executive dashboard for payment incentives analysis
the type of Provider business organization and the clinical model used by primary care physicians. The
Patient parameters, as illustrated, include the patient population type (representative of a particular country
or locale, employer, or specific disease group), the number of patients in the population, and the specific
disease models to be used for disease progressions and treatment impacts on future health states. The
Provider section specifies behavior models used for learning decision-making models and policies, and the
number of providers. For Payers the single parameter specifies the reimbursement model to be used for
the selected Providers. Finally, the Simulation parameters include the forecast period (number of steps,
typically years) and the number of simulation runs to perform.
The configuration parameters are then mapped to a single simulation model instance, which is loaded
and run on the web server (cloud computing) environment. When the set of runs are complete, summary
results charts are displayed in the right hand tabs. In the example shown in Figure 2, these include a set of
Payer Results, the patient population Health Outcomes, and the resulting Provider Results. Key performance
indicators for Payers would include revenue vs costs (profits), any risk-sharing payments to providers, the
medical loss ratio, etc. For Providers, the results would include revenue vs costs as well as outcome based
measures such as patient panel biomarkers (e.g. average Hemoglobin A1C). Patient Health Outcomes would
present summary data on the patient population health states and levels of complications (hospitalization,
non-elective emergency treatments, etc.). The Cognos Reports button allows custom business intelligence
reports to be run for further analysis.
The purpose of the dashboard interface is to bring the locus of simulation control to the end users,
instead of requiring specialized programming for each variant of the model being used in the simulation.
In this way, a health policy leader could use our platform to forecast the clinical and financial outcomes
of managing one or more segments of the eligible population, i.e. all the diabetics in a given geographic
region, by specifying appropriate characteristics of the health care ecosystem. Our objective is to provide
a degree of usability to our platform that is seldom seen in health care simulation.
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
3 DIABETES MODEL
We consider two disease models in our simulations, both of which are diabetes models. The first of these
models was created at the University of Michigan (Barhak, Isaman, Ye, and Lee 2010) and calibrated
against real-world clinical studies. The second model was created at IBM. This model is a simplistic
Markov model and has not been calibrated against any real-world data. While the calibrated model is
more realistic, it is also more complex and its dynamics are more difficult to control using a few input
parameters. While both disease models are Markov models, the model developed at the University of
Michigan includes continuous state variables such as HbA1c levels, which influence the progression of the
disease. Meanwhile, all state variables are discrete in the IBM model.
An illustration of the simple diabetes model is provided in Figure 3. We consider the state ‘Macular
Edema Entered’ to be a complication state. This means that whenever a patient enters this disease state,
it will trigger a series of mandatory provider interventions. In the simple disease model, the probability
of leaving this state and entering the ‘History of Macular Edema’ state is 1.0. The reader is referred to
Figure 3: Simplistic model of retinopathy pathway for diabetes
Barhak, Isaman, Ye, and Lee (2010) for more details on the Michigan diabetes model, which includes
multiple disease pathways, which is typical in diabetes. The reasons that we chose to experiment with
two different disease models is so that we could demonstrate that our simulation framework was flexible
enough to accommodate different instantiations of a disease model. The fact that one of these models was
extremely simple and easy to control made it much easier for us to develop and debug our code, before
we attempted to run simulations with the more complex and realistic disease model.
4 PROVIDER INTERVENTIONS
In this section, we describe the elective and non-elective provider intervention models used in our simulations.
We first describe the modeling of elective interventions. In general, the elective provider interventions
available are determined by the interventions that the disease model was designed to support. For example,
in the University of Michigan diabetes model, there were three interventions: Treatment with Diet and
Exercise, Treatment of Oral Medications, and Treatment with Insulin. These three treatments corresponded
to the treatments that were used in the real-world clinical study that the disease model was based on.
Meanwhile, for the simple diabetes model, we invented four effective treatment options, which rep-
resented various combinations of two types of treatments, each of which could be implemented at two
different levels of intensity. The first type of treatment is medication management (MM), and the second
type of treatment is self-management training (SMT). We assume that each of these treatments is applied at
either of two levels (i.e., 1 = high and 0 = low) of intensity, simultaneously. Therefore, the four treatment
combinations are: MM-0/SMT-0, MM-1/SMT-0, MM-0/SMT-1, MM-1/SMT-1.
For both diabetes models, we assume that all treatments are feasible options in all patient states. The
transition probabilities that determine the progression of the disease are dependent on the treatment selected
by the provider. Since the Michigan diabetes model does not specify the reimbursable procedures that are
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
associated with each treatment, we implemented our own set of procedures corresponding to each treatment.
As described in Bigus, Chen-Ritzo, and Sorrentino (2011), the simulation framework allows every treatment
to be decomposed into a sequence of encounters (or visits) with a provider, and each encounter to be
decomposed into a collection of actions, where each action corresponds to a billable procedure that can be
claimed from the payer. In addition to specifying a reimbursement rate for each procedure, we also specify
a provider cost associated with each procedure.
With respect to non-elective interventions, we assume that these interventions only occur in the disease
state ‘Macular Edema Entered’. This state exists in both disease models. The patient enters this state with
some probability, but with probability 1.0, a non-elective treatment comprising of a series of laser eye
treatments will be implemented by an ophthalmologist. Unlike the elective interventions, the non-elective
interventions are mandatory treatments.
5 INCENTIVE MODELS
We consider three types of provider incentives. The first is a fee-for-service (FFS) incentive, where providers
are reimbursed a fixed amount from the payer for each procedure or service that they perform. Hence,
provider revenue is directly proportional to the number of procedures performed, and the profitability per
procedure. Under this scheme, there is no (financial) incentive for the provider to limit the number of
procedures performed. However, in practice, aside from the provider’s own code of ethics, providers face
capacity constraints, and there exist laws and other policies that may bound the number of procedures that
can be delivered to a single patient.
The second type of incentive is the global capitation (GC) incentive. In this case, providers are paid
a flat rate for each patient each month, regardless of the number of procedures or services that may be
delivered to the patient. This rate is called the capitation rate, and may vary by the patient’s general state
of health. Under this scheme, there is no (financial) incentive for the provider to provide any services to
the patient. However, again, there are practical reasons why providers reimbursed under this scheme still
provide care to their patients.
The third type of incentive, called the ‘shared risk’ incentive, is applied as a variation to the previous
two incentives. When shared risk incentives are applied, the provider is liable for a portion of the losses or
savings achieved, relative to a pre-defined threshold. We refer to this threshold as the ‘reference cost’. In a
fee-for-service with shared risk incentive scheme, the provider is credited in a fee-for-service fashion, with
the exception that a portion of any positive difference between the reference cost and the fee-for-service
credit (i.e., the savings) is added to the provider’s fee-for-service reimbursement as a bonus. The fraction of
the savings that the provider is eligible for is referred to as the ‘reward rate’. At the same time, a portion of
any negative difference between the reference cost and the fee-for-service credit (i.e., the excess spending)
is deducted from the provider’s fee-for-service reimbursement. The fraction of the excess spending that
the provider is liable for is referred to as the ‘penalty rate’.
Since the the shared risk incentive may be included with the fee-for-service or global capitation incentive,
we present simulation results for provider responses under four incentive scenarios: 1) fee-for-service; 2)
fee-for-service with shared risk; 3) global capitation; 4) global capitation with shared risk.
6 THE MYOPIC PROVIDER
The myopic provider is a provider whose choice of treatment is driven by maximizing his utility in the
current decision period only. In our simulations, the length of a decision period is assumed to be one
year. Additionally, the myopic provider assumes that patients’ diseases evolve in a deterministic fashion.
For example, such a provider assumes that the rate at which a diabetic without retinopathy develops non-
proliferative retinopathy is fixed. We refer to the algorithm that the provider uses to select a treatment for
a patient as the ‘provider decision model’. In the case of the myopic provider, the provider decision model
is formulated as a mixed-integer linear program and is implemented using CPLEX. This formulation is
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
provided in the Appendix. In this section, we present simulation results for scenarios in which the provider
is myopic.
As mentioned, the simple diabetes model requires that each patient be subjected to one of four possible
treatments by their primary care provider. These treatments vary in terms of intensity. Higher intensity results
in higher costs and slower progression through the disease states. Since the myopic model maximizes
provider profit over each year (step), a provider with no capacity constraints under the fee-for-service
incentive model will apply the maximum intervention to each patient. This results in the highest provider
profit and, consequently, the lowest rate of complications requiring non-elective interventions.
Figure 4: Results for FFS+Shared Risk vs. reward rate, using myopic optimization of the provider’s
treatment policy.
The fees for preventive care treatments are driven by the medication management intensity. The fee for
the most aggressive treatment is roughly $1,000 while the least is approximately half. All the fee-for-service
simulations exhibited were based on a reference cost of $7,500 per patient per year and assumed provider
cost of service equal to 60% of fees associated with the chosen treatments. The reference cost will support
half of the patient population at the highest intensity of medication management provided there were no
complications. The following results represent an average over forty simulations of a single provider with
a panel of ten patients progressed over 15 years.
The addition of shared reward component to the fee-for-service intervention causes the primary care
provider to trade off the profit from treatments with higher intensity against the potential share in a savings
pool. The provider is eligible for a reward if the total cost of patient treatment across her panel both
elective and non-elective comes in below the reference cost. The trade off is not attractive for very low
shared reward rates, as seen in Figure 4. When the reward rate is below the provider profitability (40% of
fees) her incentive is to apply the maximum intensity intervention to each patient. Once the reward rate
reaches the providers profitability she begins to reduce the intensity level of treatment on selective patients
in favor of sharing in the pooled savings. As she continues to reduce the level of intensity of preventive
care across her patient panel complications, and the cost of non-elective interventions, rise. These costs
reduce the pool but the providers profit continues to rise as the reward rate is increased. In the end, she
has managed the intensity of treatments so that half of her patients are at the highest level and half at the
lower. Note that the total payer cost is nearly constant.
Figure 5 shows the results for a penalty based fee-for-service incentive model. In this scenario the
provider shares the cost of elective treatments that exceed the reference cost. The provider once again
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
Figure 5: Results for FFS+Shared Risk vs. penalty rate.
continues to treat her entire patient panel at maximum intensity at the lower penalty rates. However, her
profitability is being impacted immediately. If she treats her patient panel in a way that results in the cost
exceeding the reference, she recoups her 40% profit but now is liable for a penalty as a percent of the
overrun cost. As the penalty rate exceeds the profitability rate she loses money on those treatments that
exceed the reference cost and she changes her strategy. She aggressively cuts back on treatment intensity
across her patient panel until she is treating each patient at the minimal level. The number of complications
and their associated cost rise and finally level off.
Figure 6: Results for FFS+Shared Risk vs. provider cost.
We can further examine the role that the providers profitability plays in their treatment policy. Figure 6
shows the prior fee-for service with penalty incentive model with the penalty rate fixed at 60%. In this case
we vary the providers cost as a percent of elective fees. Providers with very low cost of service will be
largely immune to incentives, whereas proivders with very high cost of service will be highly compliant.
However, that compliance saturates when all patients are being treated at the minimum level of intensity.
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
The incentive model can achieve compliance, cost savings, and adequate care without adversely affecting
the financial health of the provider only if the providers costs are not at the extreme highs or lows.
Figure 7: Results for GC+Shared Risk vs. penalty rate.
A carefully crafted global capitation incentive model can balance the interest of patient, provider and
payer. Beyond establishing a reference cost and a rate for shared risk, the payer also has to establish
appropriate flat rates for care for patients in various health states. These bundled payments are significant
levers in encouraging provider behavior and need to consider downstream effects on patients health. Figure 7
shows the results of simulations under a global capitation incentive model with penalty. The reference cost
was set to $10,000 for these runs. As the penalty rate is increased we see a smooth change in the providers
treatment policy which results in minimally increasing prevention costs while decreasing complication
costs. The overall cost to the payer decreases and the impact on provider profitability appears to be small.
7 REINFORCEMENT LEARNING PROVIDER
We now present a provider decision model based on an MDP formulation of treatment selection, wherein
the goal is to optimize long-range cumulative utility, wherein future utility may be progressively discounted
by a discount parameter γ1. This allows leveraging a vast literature covering a wide variety of techniques
for exactly or approximately solving for optimal policies in MDPs. The MDP solution technique used here
is based on Reinforcement Learning (RL) (Sutton and Barto 1998) , a trial-and-error technique known to
effectively optimize policies based on extensive simulation trials. The main advantages of the RL approach
are that an explicit formal specification of the MDP is not needed, and it effectively tackles large-scale
MDPs via use of linear or nonlinear function approximation.
For the IBM diabetes model, we specifically use Q-Learning (Watkins 1989), which learns a table of
values Q(s,a)estimating long-range value of initially performing action ain state s, and thereafter using the
optimal policy. Q-Learning is a low complexity, step-by-step learning method that is guaranteed to find the
optimal policy given sufficient “exploration” of all available state-action pairs. Since there are six possible
patient health states in the IBM model, and four available treatments, a table of 24 cells can represent
the expected value of treating individual patients. The resulting single-patient treatment policy may not
coincide with the optimal joint policy over a population of patients, in cases where there are constraints
on the joint actions (e.g. due to limited capacity for patient visits) or non-trivial joint rewards such as
aggregate bonus payments. However, the the aggregate bonus in Shared Risk models can be separated into
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
well-defined per-patient bonus payments in the case where the reward rate and penalty rates are equal.
This is the scenario that we study below.
Figure 8 illustrates results using a treatment policy trained by Q-Learning with discount parameter γset
to zero, so that the learned policy should match the myopic policy of Section 6. We utilize a fee-for-service
plus Shared Risk payment model, where we vary both the reward rate and penalty rate, keeping them equal
in each experiment. We use the same configuration of 10 patients, each beginning in the No-Retinopathy
state, and being treated annually for 15 years, that was described previously. The provider’s profit margin
is 40% of revenue for all available treatments. At a given value of reward rate, we first train the Q-table
until converged; this usually takes at most a few hundred simulation runs. We then benchmark the resulting
financial metrics over 500 additional runs. Similar to our earlier experiments, we see that a low reward
rate results in the provider treating all patients with the most expensive treatment, yielding the lowest
complications cost. When the reward rate exceeds the profit margin, the provider prefers to switch to the
least expensive treatments, except in the Proliferative and Non-Proliferative states. As these states risk
an immediate complication on the next time step, the provider prefers an increased level of treatment to
reduce the immediate complication risk. The best contract from the payer perspective has reward rate and
penalty rate set to 50%; this reduces the total cost of care by more than 20%. However, there are more
complications and the provider profit is reduced when compared with a straight fee-for-service contract.
Figure 8: Results for FFS+Shared Risk vs. reward rate, using Q-Learning with γ=0 to optimize the
provider’s treatment policy.
Metric (K$) Myopic (γ=0) Lookahead (γ=0.5) Lookahead Improvement
Total Cost of Care 135.6 133.4 1.6%
Provider Profit 54.9 56.3 2.6%
Complications Cost 19.9 17.5 12.1%
Table 1: Illustration of differences in results using myopic vs. lookahead-based optimization of provider
treatment policies. rewardRate = penaltyRate = 0.5, provider profit margin = 40%.
Having matched the myopic policy with γ=0, we then examined how the Q-Learning policy changes
when we enable lookahead effects via non-zero γ. There is no change at reward rates less than 0.4, but
above that level, we see increased provider profits and reduced complications. The lookahead-based policy
switches in the No-Retinopathy state from MM-0/SMT-0 to MM-0/SMT-1. This shows that Q-Learning
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
is picking up the longer-term risks of complications in the No-Retinopathy state. The magnitude of
improvement over a myopic policy can be appreciable, as seen in Table 1. For γ=0.5, complications are
reduced by over 12%, while total cost of care drops by nearly 2% and provider profits increase by nearly
3%.
8 SUMMARAY
In this paper, we presented several examples illustrating how our healthcare simulation framework can
facilitate interaction among multiple types of simulated components of a multi-agent healthcare system,
including payment models, provider behavior models, and disease progression models. For an artificial
model of diabetes, we also gave several examples showing how health and financial outcomes can be
influenced by specific terms and parameter settings of Shared Risk payment models, and we discussed how
this type of analysis can be useful to decision-making executives in the healthcare industry. In our ongoing
work, we are extending the provider optimization techniques to accommodate increased complexity in the
more realistic Michigan diabetes model, and we expect to have results using that model for the final version
of the paper. Also, as many Shared Risk contracts utilize a multi-year timeframe for calculation of bonus
payments, we are also working on requisite modifications to our MDP formulations that can accommodate
such delayed reimbursement schemes.
APPENDIX
In this section we present the formulation of the myopic provider’s decision model. A primary care provider
(PCP) cares for a panel of patients. We assume that the type of care provided by the PCP is preventive. If
complications arise, we assume that they are treated by a non-PCP. The reimbursements paid to the PCP
is referred to as the ‘cost of prevention’. Meanwhile, the medical costs incurred with non-PCPs is referred
to as the ‘cost of complications’. The total medical cost incurred by a patient is referred to as the patient’s
‘system cost’.
The PCP is assumed to be a profit maximizer. She is presented with alternative reimbursement contracts
by the payer and her decision problem is to select interventions for her patients in such a way that she
maximizes her profit. We assume a single period decision problem in which the PCP selects an intervention
policy at the start of the decision horizon.
The PCP’s panel comprises npatients. These patients may be partitioned into Msubsets, based on
their health status. Let nmbe the number of patients in health status m, where m=1,2,...,M. We define
an intervention very generally to comprise any combination of various clinical tests, procedures and/or
consultations. Suppose that there are Kmutually exclusive interventions. xkm is the fraction of patients in
health state mthat will receive the kth intervention. We assume that each patient must be assigned exactly
one intervention. This implies that kxkm =1m. Let xm= [x1m,x2m,...,xKm]be the PCP’s intervention
for patients in health state m. We define the matrix X= [x1,x2,...,xM]|represents the PCP’s intervention
policy.
Each intervention kis associated with a demand on the PCP’s time in the form of number of office
visits. Let dkbe the number of visits required per year for intervention k. We assume that this number is
independent of the patient’s health state and that a visit for one intervention occurs separately from a visit
for another intervention. We consider the PCP’s time to be the primary resource constraint. Assuming
that there are Aappointment slots available per year, her intervention policy should be selected so that her
capacity is not exceeded. The total demand for appointments each year is given by kmnmdkxkm.
The cost (to the payer) of prevention for all patients, given interventions X, is v(X). The cost of
prevention will depend on the reimbursement model being used, and will be defined for each model later.
The cost of complications for a patient in health state m, given intervention k, is given by hkm. The
probability of complications occurring in a patient in health state mand who received intervention kis
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
pkm. The expected system cost for all patients is the sum of the costs of prevention and the expected cost
of complications and is given by Q(X) = mhv(X) + knmpkmhkmxkmi.
We represent the PCP’s expected reimbursement from the payer by R(X). In this section we present
3 types of reimbursement models: fee-for-service, global capitation, and shared risk. In fee-for-service,
there exists a fee schedule that lists a fee for each procedure, test and consultation. In our model, each
intervention may comprise a collection of such procedures, tests and consultations. This collection may
comprise multiple encounters with the PCP. Let fkbe the total fees that are reimbursed to the PCP when
implementing intervention k. Given the PCP’s intervention policy, X, the PCP’s expected reimbursement
under a fee-for-service model, R1(X) = kfkmxkmnm. In any fee-for service setting, we equate the cost
of prevention, vm(xm), with the total fee-for-service reimbursements that the PCP receives. Therefore,
v(X) = R1(X).
Under a capitation contract, the PCP receives a fixed reimbursement for each patient on her panel. This
reimbursement may be risk-adjusted so that patients who are less-healthy will bring the PCP a higher fixed
reimbursement. Let rmbe the reimbursement that the PCP receives per year per patient in health state m.
The PCP’s expected reimbursement under a global capitation model, R2(X) = R2=mnmrm, is a constant.
In any global capitation setting, we equate the cost of prevention, v(X), with the total capitation amount
that that the PCP receives for its members. Therefore, in this case, v(X) = R2.
In a shared risk setting, the PCP’s reimbursement includes specifies a reference cost,τ, to the payer.
If the the total medical cost, Q, exceeds the reference cost, the PCP is liable to the payer for α(Qτ).
If the total medical cost is less than the reference cost, the payer will pay the PCP a bonus in the amount
β(τQ). Assume that 0 α,β1. This liability/bonus payment is in addition to the reimbursement
that the PCP receives for providing preventive care. Therefore, the PCP’s ‘gain’ is given by G(X) =
β[τQ(X)]++α[τQ(X)]. A negative gain is really a loss to the PCP. It is the amount that the PCP
must remit to the payer.
The PCP’s expected reimbursement, in a fee-for-service with shared risk setting is given by R3(X) =
R1(X) + G(X). The PCP’s expected reimbursement, in a global capitation with shared risk setting is given
by R4(X) = R2(X) + G(X)
We ignore fixed costs at this time. Suppose that the PCP incurs a cost of hkper patient for intervention
k. Her total costs are given by C(X) = kckmnmxkm. Given the PCP’s intervention policy, X, the
PCP’s profit, P(X), is expressed as the difference between her reimbursements and her costs. That is,
P(X) = R(X)C(X). Her decision problem is to select an intervention policy, X, that will maximize her
profit, while satisfying her capacity constraints. This problem is expressed as the following optimization
problem.
max
XP(X)
s.t.
k
m
nmdkxkm A(1)
0xkm 1k,m(2)
P(X)is non-linear in the cases where there is shared risk. It is possible to perform a linearization of the
problem so that it can be solved using a mixed-integer linear programming solver such as CPLEX.
REFERENCES
Barhak, J., D. J. Isaman, W. Ye, and D. Lee. 2010. “Chronic disease modeling and simulation software”.
Journal of Biomedical Informatics 43 (5): 791–799.
Bigus, J. P., C.-H. Chen-Ritzo, and R. Sorrentino. 2011. “A Framework for Evidence-Based Health Care
Incentives Simulation”. In Proceedings of the 2011 Winter Simulation Conference, edited by S. Jain
et al., 1103–1116.
Sutton, R. S., and A. G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press.
Bigus, Chen-Ritzo, Hermiz, Tesauro and Sorrentino
Watkins, C. 1989. Learning from Delayed Rewards. Ph. D. thesis, Cambridge University.
AUTHOR BIOGRAPHIES
JOSEPH P. BIGUS is a Senior Technical Staff Member at the Thomas J. Watson Research Center, where
he leads the ABLE research project. He is a member of the IBM Academy of Technology and is an IBM
Master Inventor, with over 25 U.S. patents. He was an architect of the IBM Neural Network Utility and
Intelligent Miner for Data products. Dr. Bigus received his M.S. and Ph.D. degrees in computer science
from Lehigh University, his MBA from U. Mass., Amherst, and a B.S. in computer science from Villanova
University. His current research interests include agent-based modeling and simulation and the applica-
tion to understanding healthcare systems payments and incentives. His email address is bigus@us.ibm.com.
CHING-HUA CHEN-RITZO is a Research Staff Member in the Business Analytics and Mathematical
Sciences Department at the IBM T.J. Watson Research Center. She received a B.S. degree in physics from
the University of North Carolina at Chapel Hill in 1997, an M.S. degree in Architectural Engineering in
1999, and a dual-title Ph.D. degree in Business Administration and Operations Research in 2006, both from
Penn State University. She has been at IBM since 2005, where she has worked on applied mathematical
optimization and simulation for applications in services and manufacturing operations. Her email address
is chenritzo@us.ibm.com.
KEITH HERMIZ is a Business Analytics Researcher at the IBM T.J. Watson Research Center. He has held
a variety of positions at IBM including leadership of internal and client facing analytics consulting practices.
He received his BS in Applied Mathematics from Brown University, MBA from the University of Rhode Is-
land and Ph.D. from the University of Maryland at College Park. His email address is khermiz@us.ibm.com.
GERALD TESAURO is a Research Staff Member at IBM’s TJ Watson Research Center. He has worked
on theoretical and applied machine learning in a wide variety of settings, including multi-agent learning,
dimensionality reduction, computer virus recognition, computer chess (Deep Blue), intelligent e-commerce
agents and autonomic computing. Most recently, he developed game-playing strategies for Watson, IBM’s
Jeopardy! supercomputer. Dr. Tesauro received BS and PhD degrees in physics from University of
Maryland and Princeton University, respectively. His email address is gtesauro@us.ibm.com.
ROBERT SORRENTINO is a Research Staff Member in the Healthcare Transformation Department at
the T.J. Watson Research Center. He received a S.B. degree in physics from MIT in 1973, and M.D. degree
from Stony Brook University School of Medicine in 1978. He practiced Emergency Medicine for fifteen
years at a high-acuity, inner-city hospital in Arizona. In addition to a long-term role as an independent
healthcare management consultant for Mercer Health and Benefits, Dr. Sorrentino was Chief Medical
Information Officer at CareMore Health Plan, and its MSO affiliate, CareMore Medical Enterprises, from
2002 through its acquisition by JPMorgan Partners in 2006. He subsequently served as Chief Medical
Officer for ARTA Medicare Healthplan, and its affiliate MSO, Western Medical Management in Southern
California. At IBM Dr. Sorrentino has worked on advanced healthcare analytics, and innovative payment
models for healthcare. His e-mail address is sorrentino@us.ibm.com.
... A generalization beyond the patient-provider interaction has diverse additional components as described in Figure 1. The figure, itself a more detailed elaboration of the workflow defined in [2], shows behaviors, influences, and other elements of a healthcare "knowledge core" which can be called out as modules and parameters in an individual simulation model. Each interaction between agents has its own characteristics, and modeled actions to be taken. ...
Article
Full-text available
An agent-based simulation model hierarchy emulating disease states and behaviors critical to progression of diabetes type 2 was designed and implemented in the DEVS framework. This model was built to approximately reproduce some essential findings that were previously reported for a rather complex model of diabetes progression. Our models are translations of basicelements of this previously reported system dynamics model of diabetes. The system dynamics model, which mimics diabetes progression over an aggregated US population, was disaggregated and reconstructed bottom-up at the individual (agent) level. Four levels of model complexity were defined in order to systematically evaluate which parameters are needed to mimic outputs of the system dynamics model. The four estimated models attempted to replicate stock counts representing disease states in the system dynamics model while estimating impacts of an elderliness factor, obesity factor and health-related behavioral parameters. Health-related behavior was modeled as a simple realization of the Theory of Planned Behavior, a joint function of individual attitude and diffusion of social norms that spread over each agent’s social network. Although the most complex agent-based simulation model contained 31 adjustable parameters, all models were considerably less complex than the system dynamics model which required numerous time series inputs to make its predictions. All three elaborations of the baseline model provided significantly improved fits to the output of the system dynamics model, although behavioral factors appeared to contribute more than the elderliness factor. The results illustrate a promising approach to translate complex system dynamics models into agent-based model alternatives that are both conceptually simpler and capable of capturing main effects of complex local agent-agent interactions.
Article
Full-text available
Photocopy. Supplied by British Library. Thesis (Ph. D.)--King's College, Cambridge, 1989.
Article
Full-text available
The aim of this study was to develop a simulation model for type 2 diabetes that can be used to estimate the likely occurrence of major diabetes-related complications over a lifetime, in order to calculate health economic outcomes such as quality-adjusted life expectancy. Equations for forecasting the occurrence of seven diabetes-related complications and death were estimated using data on 3642 patients from the United Kingdom Prospective Diabetes Study (UKPDS). After examining the internal validity, the UKPDS Outcomes Model was used to simulate the mean difference in expected quality-adjusted life years between the UKPDS regimens of intensive and conventional blood glucose control. The model's forecasts fell within the 95% confidence interval for the occurrence of observed events during the UKPDS follow-up period. When the model was used to simulate event history over patients' lifetimes, those treated with a regimen of conventional glucose control could expect 16.35 undiscounted quality-adjusted life years, and those receiving treatment with intensive glucose control could expect 16.62 quality-adjusted life years, a difference of 0.27 (95% CI: -0.48 to 1.03). The UKPDS Outcomes Model is able to simulate event histories that closely match observed outcomes in the UKPDS and that can be extrapolated over patients' lifetimes. Its validity in estimating outcomes in other groups of patients, however, remains to be evaluated. The model allows simulation of a range of long-term outcomes, which should assist in informing future economic evaluations of interventions in type 2 diabetes.
Article
Approximate dynamic programming has evolved, initially independently, within operations research, computer science and the engineering controls community, all searching for practical tools for solving sequential stochastic optimization problems. More so than other communities, operations research continued to develop the theory behind the basic model introduced by Bellman with discrete states and actions, even while authors as early as Bellman himself recognized its limits due to the "curse of dimensionality" inherent in discrete state spaces. In response to these limitations, subcommunities in computer science, control theory and operations research developed practical methods for solving stochastic, dynamic optimization problems which has emerged as a seemingly disparate family of algorithmic strategies. In this article, we show that there is actually a common theme to these strategies, and underpinning the entire field remains the fundamental algorithmic strategies of value and policy iteration that were first introduced in the 1950's and 60's.
Article
A Dynamic Programming Example: A Shortest Path Problem The Three Curses of Dimensionality Some Real Applications Problem Classes The Many Dialects of Dynamic Programming What is New in this Book? Bibliographic Notes
Article
We present a general simulation framework designed for modeling incentives in a health care delivery system. This first version of the framework focuses on representing provider incentives. Key framework components are described in detail, and we provide an overview of how data-driven analytic methods can be integrated with this framework to enable evidence-based simulation. The software implementation of a simple simulation model based on this framework is also presented.
Article
Computers allow describing the progress of a disease using computerized models. These models allow aggregating expert and clinical information to allow researchers and decision makers to forecast disease progression. To make this forecast reliable, good models and therefore good modeling tools are required. This paper will describe a new computer tool designed for chronic disease modeling. The modeling capabilities of this tool were used to model the Michigan model for diabetes. The modeling approach and its advantages such as simplicity, availability, and transparency are discussed.
Article
Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911) The idea of learning to make appropriate responses based on reinforcing events has its roots in early psychological theories such as Thorndike's "law of effect" (quoted above). Although several important contributions were made in the 1950s, 1960s and 1970s by illustrious luminaries such as Bellman, Minsky, Klopf and others (Farley and Clark, 1954; Bellman, 1957; Minsky, 1961; Samuel, 1963; Michie and Chambers, 1968; Grossberg, 1975; Klopf, 1982), the last two decades have wit- nessed perhaps the strongest advances in the mathematical foundations of reinforcement learning, in addition to several impressive demonstrations of the performance of reinforcement learning algo- rithms in real world tasks. The introductory book by Sutton and Barto, two of the most influential and recognized leaders in the field, is therefore both timely and welcome. The book is divided into three parts. In the first part, the authors introduce and elaborate on the es- sential characteristics of the reinforcement learning problem, namely, the problem of learning "poli- cies" or mappings from environmental states to actions so as to maximize the amount of "reward"
He has held a variety of positions at IBM including leadership of internal and client facing analytics consulting practices
  • J Ibm T
  • Watson Research
  • Center
KEITH HERMIZ is a Business Analytics Researcher at the IBM T.J. Watson Research Center. He has held a variety of positions at IBM including leadership of internal and client facing analytics consulting practices. He received his BS in Applied Mathematics from Brown University, MBA from the University of Rhode Island and Ph.D. from the University of Maryland at College Park. His email address is khermiz@us.ibm.com.