Content uploaded by R. Jerome Dixon
Author content
All content in this area was uploaded by R. Jerome Dixon on Aug 29, 2018
Content may be subject to copyright.
1
Submitted by:
R. Jerome Dixon
DAPT 681/682
Analytics Practicum
dixonrj@vcu.edu
1 May 2018
2
1.0 Introduction
1.1 Purpose
This research is in support of Dr. McCarty and Massey Cancer Center. Dr. McCarty
would like a predictive model from historical data that will estimate patient volume
through the various clinical phases of the Bone Marrow Transplant (BMT) process. He
plans to use model for capacity planning and resource scheduling. We plan to use System
Dynamics modeling framework with machine learning to calibrate our model. Here is our
initial model for Bone Marrow Transplant Patient Flow.
Figure 1. Initial System Dynamics Model for Bone Marrow Transplant Forecasting
3
In our initial model, we have three structures. We have a structure for each type of
transplant – Allogenic and Autologous. We also have a structure to account for changing
population age demographics. Based on our research from the Federal Reserve Bank – Fifth
District and the University of Virginia’s Center of Public Policy, we expect increases in the
age bands that are most likely to require bone marrow transplants.
1.2 Analytical Background
Machine learning is the process by which computers can learn from data without
explicit programming. Machine learning, in general, requires large training sets, high
computational power, and domain knowledge experts to set up experiments and interpret
results. As shown in Figure 1 below, machine learning falls between statistical reasoning
and artificial intelligence in the spectrum of decision-making algorithms.
Machine Learning
Figure 2. Decision-making algorithm spectrum. Slide from presentation by Timo Selvaraj,
SearchBlox Software, Inc.
4
Within machine learning, there are two further divisions - supervised and
unsupervised learning. Supervised learning is more concerned about the key input
variables predicting the output variable - model structure is known, whereas unsupervised
learning is where the model structure is unknown or inferred.
Supervised vs Unsupervised Machine Learning
Figure 3. Machine Learning Methods. Slide from presentation by Timo Selvaraj, SearchBlox
Software, Inc.
We will use a myriad of these tools and methods to analyze Bone Marrow
Transplant (BMT) patient data from January 2010 to February 2018.
1.3 Domain Knowledge and Research
Our domain background was greatly influenced by meetings with Dr. McCarty,
Dr. Catherine Roberts, Judith Davis (RN), and Cheryl L. Jacocks-Terrell (MS). We also
leveraged the Center for International Blood and Marrow Transplant Research (CIBMTR),
5
University of Virginia’s Weldon Cooper Center for Public Service, and the Federal Reserve
Bank of Richmond for additional reports and statistics related to population health and
metrics regarding changing local demographics.
From meetings with Massey Cancer Staff, we developed the below bone marrow
process steps that are performed by five (5) physicians, five (5) nurse
practitioners/physician assistants at a twenty-two (22) bed hospital facility.
Bone Marrow Transplant Process
Figure 4. Massey Bone Marrow Transplant Process
6
1.4 Model Selection
For model selection, we chose to use system dynamics. Time series gives us great
insight into what has happened but will not necessarily tell us what will happen. Based on
our research and review, we believe the changing population demographics plus any
advances in Medicaid expansion will add some level of increase to Massey Cancer Center’s
patient demand signal. We will start with a Time Series Model and then add causality to
transform our final product into a Systems Dynamics model for patient forecasting and
policy exploration.
2.0 Data
2.1 Collection
The data we collected is from Massey Cancer Core Informatics Center. We give special
thanks to Nevena Skoro and her team for the below Cerner datasets provided for analysis:
Figure 5. Cerner Datasets
bmt_pats_evals. Pre-bone marrow transplant evaluations data. Includes records where
either consultation date or evaluation date were in calendar year 2010 and later. Not all
patients were transplanted.
7
bmt_pats_transplants. Bone marrow transplant data. Includes records where transplant
date was in calendar year 2010 and later.
bmt_idx_clinic_visits. Appointments data. Includes arrived visits to the MCV Bone Marrow
Transplant clinic (based on scheduling department) where appointment date is in 2010
and later.
bmt_mcvhbill_hemonc_accounts. For identified patients, includes hospital accounts where
discharging division is Hematology Oncology or Pediatric Hematology Oncology and year of
admission or discharge is in 2010 and later.
bmt_mcvhbill_accounts_dx. For above accounts, all diagnosed listed on the accounts. Can
link data on acctnum.
bmt_mcvhbill_accounts_px. For above accounts (inpatient and observation only), all
procedures listed on the accounts.
bmt_mcvpbill_hemonc_invoices. For identified patients, includes physician invoices where
Medical Department is Hematology Oncology or Pediatric Hematology Oncology and start
date is 2010 and later.
ref_icdcmcodes. Reference table of ICD 9 and 10 diagnoses codes
ref_icdpcscodes. Reference table of ICD 9 and 10 procedure codes
All data is from Jan 2010 to Feb 2018
2.2 Data Exploration
We used SAS Enterprise Miner and Tableau for our data exploration tools. Here are
some key visualizations for understanding and developing our model parameters:
8
2.3.1 SAS Enterprise Miner Output
Figure 6. Distribution of Transplant ID (TPID)
Transplant ID (TPID) ranges from 1 to 7 and represents the number of transplants a
single patient receives (i.e. TPID 3 indicates patient had three bone marrow transplants).
Figure 7. Distribution of Transplant Types
Autologous (auto) compromise 62% of transplants, Allogenic Unrelated (AlloU) compromise
25%, and Allogenic Related (AlloR) compromise 13% of all transplant types.
9
Figure 8. Distribution of Patient Status after Transplant by Percent
Time Period: Jan 2010 to Feb 2018
We feed the percentage of Re-Transplants back into the model workflow. We use the
remaining status categories to calculate our attrition rates as patients flow through the
model.
10
Figure 9. Decision Tree for Transplant Patient Status
We set Decision Tree ‘Target’ to Patient Status to see what statistical variables drive patient
outcomes. Tandem transplants are significant (multiple transplants per single patient), age,
failure to engraft, and whether patient went into remission are all significant contributors to
final patient outcomes.
11
Figure 10. Variable Importance for Proceeding to Transplant
An interesting piece of information in this visualization is the contribution by Primary
Insurance Carrier to whether patient proceeds to transplant or not. This brings into
discussion and a question for further exploration the state of Virginia’s Medicaid Expansion
policy, and its impact to our attrition rates and final effect on model’s bone marrow
transplant forecast.
Figure 11. Variable Importance for Patient Status
Again, we see a significant contribution by Primary Insurance Carrier to Patient Status.
12
2.3.2 Tableau Output
Figure 12. Location of Bone Marrow Patients vs Census Age
Distributions
Figure 12 is bone marrow patients overlaid over census age distributions. Our
hypothesis is that since bone marrow patients are of an older demographic (Figure 13) we
should see more bone marrow transplant patients in these older areas. Our hypothesis was
incorrect. Our secondary hypothesis is that we would see additional bone marrow patients in
high growth areas (Figure 14). This hypothesis was correct.
Figure 13. Age
Distributions by
Transplant Type
Figure 14. Age Distributions by Census Population Growth Areas
13
Figure 15. Transplant Activity by Clinic Location
Figure 16. Physician Activity by Clinic Location
These two charts (Figure 15 and Figure 16) represent bone marrow transplant activity by
clinic and by physician.
14
Figure 17. Patient Transplant Status by Transplant Type
Here we see Patient Status by Transplant Type. We use these numbers with missing data
(NAs) to calculate attrition rates as patients proceed through the various steps of the bone
marrow transplant process.
Figure 18. Diagnosis Codes by Record Count
Here we show the dominant disease types and afflictions for autologous, allogenic related,
and allogenic unrelated bone marrow transplants performed at Massey Cancer Center
15
3.0 Analysis
For our analysis, we will use Time Series Forecasting complemented with System
Dynamics and Machine Learning to capture additional process and system interactions.
3.1 Time Series Forecasting
We perform a few minor data preparation steps in RStudio to prepare data for SAS
Enterprise Miner Time Series Analysis. We strip out extraneous columns and convert
transaction column to ‘Numeric’ to capture bone marrow transplant single transaction
counts. Our initial time series forecasting results from SAS Enterprise Miner are below.
Figure 19. Allogenic Related Transplant Forecast
We see oscillations around 1 patient per week with spikes to 2 patients per week around the
first quarter calendar year (CY)
16
Figure 20. Allogenic Unrelated Forecast
We see oscillations around 1-2 patients per week with spikes to 2 patients per week around
the fourth quarter calendar year (CY)
Figure 21. Autologous Transplant Forecast
We see oscillations around 2-3 patients per week with random spikes to 4 patients per week
17
Figure 22. Forecast Comparison by Transplant Type
With our forecast comparison plot we see spikes in activity around July/August, a dip
in activity in October with an additional spike in activity November timeframe just prior to
the holidays.
3.2 System Dynamics
System dynamics is a modeling methodology used to model complex systems by and
using differential equations with electrical engineering concepts to simulate real-world
dynamic problems. Jay Forrester invented this methodology in the 1960’s at MIT. The heart
and soul of system dynamics models are stocks and flows. A stock is something that stores
an accumulation of objects. A flow is something that depletes or increases a stock or
accumulation of objects. These two structures allow for complex feedback that is typically
hard to capture for non-linear, dynamic systems.
For our bone marrow patient flow model stocks will be patients at the various bone
marrow process steps, and flows will represent process cycle times and attrition rates that
move patients through the model.
18
3.2.1 Process Cycle Times
We use our Cerner data to calculate process cycle times for each step of the process.
We merge the Cerner bone marrow patient transplant dataset (bmt_pats_transplants) with
the bone marrow patient evaluations dataset (bmt_pats_evals) to get a complete picture of
the patient from evaluation to transplant. We do one more data manipulation with the
hospital billing data set since ‘Discharge Date’ is not in our initial bmt_pats_transplants data
set. Once we have our data in the required format we then calculate process cycle times by
subtracting date/time from previous date/time step in our bone marrow transplant
process. We exclude discretely singling out the ‘Mobilization’ step since this is specific to
autologous transplants only. Our results appear below:
Figure 23. Bone Marrow Process Cycle Times by Transplant Type
Figure 24. Bone Marrow Process Cycle Times Table Format
19
3.2.2 Attrition Rates
We use missing values (NAs) as a proxy for attrition rates from step to step. Initial
count of Patients in ‘Evaluations’ database is 2,408. Final number of patients receiving
transplants is 1,334. We count ‘NAs’ through each process step to determine approximate
attrition rates. Our results for cycle time attrition rates are below.
Figure 25. Bone Marrow Attrition Rates
We also add the results from Figure 26 (Distribution of Patient Status after
Transplant) to Step 6 (Att6) and our Follow Up timelines at (+30, +60, +90, +180, +365)
day to calculate our final model attrition rates for each step of our patient flow model.
Figure 26. Distribution of Patient Status after Transplant by Percent
Time Period: Jan 2010 to Feb 2018
20
To simplify our final Systems Dynamics model we perform one additional step and
model validation with time series analysis. The Center for International Bone Marrow
Transplant Research (CIBMTR) collects data from all the bone marrow clinics in the United
States and abroad.
Figure 27. Center for International Blood and Marrow Transplant Research:
United States 2010-2014
We are going to investigate whether we should include the regional data for the
state of Virginia to improve the accuracy and performance of our model. We will perform
our additional time series analysis with regional data from the state of Virginia and
evaluate for correlation. Below is our time series forecast for Virginia regional bone
marrow transplants from CIBMTR.
21
The data presented here is preliminary and were obtained from the Coordinating Center of the Center for International Blood and
Marrow Transplant Research. The analysis has not been reviewed or approved by the Statistical or Scientific Committees of the CIBMTR.
Figure 28. CIBMTR Regional Forecast for Virginia Bone Marrow Transfer Clinics
Here is a comparison of our Regional forecast with Massey Cancer Center forecast.
Figure 29. CIBMTR Regional Forecast vs Massey Cancer Center Local Forecast
22
We see the expected step increase in our Regional forecast but not in our local
Massey forecast. Below is an additional comparison where we scale and compare the two
time series for regional and local forecasts. Comparing the CIBMTR regional forecast to our
local Massey Cancer Data we now see correlation with some interesting behavior and
trends. We are in alignment until year 2015. The regional forecast appears to be catching
the population growth dynamic that we expect by showing a continual step trend for
additional bone marrow transplant patients. We do not see this same continual step trend
locally with Massey Cancer Center.
Figure 30. Regional vs Local Bone Marrow Patients
This is consistent with ‘the limits to growth’ system dynamics modeling structure. In
our system, we have bone marrow transplants reaching a certain carrying capacity and
oscillating around that number whereas the regional model is load-balancing transplants
23
across all clinics. The regional and local models both hit their peak for transplants but then
in 2015 we trend in opposite directions. The period 2015 – 2016 is a point for additional
research. In the meantime, we will disregard the additional population growth structure in
our local Massey Center model because this dynamic is currently switched off.
We combine our two data sets, Massey local data and CIBMTR Regional VA data, to
perform feature importance with machine learning (more specifically XGBoost algorithm).
We come up with the following variables to validate the final variables for our model.
Figure 31. Feature Importance with XGBoost
24
We implement our final forecast model with SAS Enterprise Miner Neural Network.
We chose this model due to the ability to scale up additional variables for future models.
For a future iteration, we would like to include diagnosis codes to better predict the type of
transplant. For our current model, we are primarily focused on predicting when the
transplant will occur and total throughput for staffing requirements.
Figure 32. Final Patient Flow Forecast Model with SAS Enterprise Miner – HP Neural Network
Node
25
4.0 Final Model
4.1 Final System Dynamics Model
After accounting for population growth with regional forecast model and
aggregating transplant types into an array, we have our final system dynamics model for
calculating staffing requirements during each phase of the bone marrow transplant
process.
Figure 33. Final Patient Flow System Dynamics Model
Model Summary:
Forecast (Input) + Cycle Times (System Flow Rates) – Attrition Times (System Exit Rates)
Again, the cycle times and rates for each step are calculated from Figure 24.
Figure 24. Bone Marrow Process Cycle Times Table Format
26
Moreover, model final attrition rates are calculated from Figures 25 and Figure 26.
Figure 25. Bone Marrow Attrition Rates
Figure 26. Distribution of Patient Status after Transplant
Figure 27: Secondary Calculation Table for Attrition Rates by Transplant Type
27
Final Attrition Rate Table:
Figure 28: Table Calculations for Attrition Rates
5.0 Results
To implement the model’s results into hospital workflow we evaluated R’s deSolve
package, python’s PySD module, and Microsoft Excel (MS Excel). Due to the flexibility of our
data sets we could either develop mathematical models to simulate our results (deSolve or
PySD) or use SAS Enterprise Miner output directly in a Microsoft Excel model. We decided
to use Microsoft Excel due to our client’s current technology stack and their current
familiarity with the Microsoft suite of products.
We created an Excel dashboard that shows the time series forecast for all three
types of transplants (Autologous, Allogenic Unrelated, and Allogenic Related) by Quarter.
28
Figure 29. Forecasted Bone Marrow Transplants by Quarter
We then created a utility function for when user enters the first consult date and
Transplant Type, the model auto populates the remaining bone marrow process steps and
outpatient services based on our previously calculated process cycle times.
Figure 28. Forecasted Staffing Requirements (dummy data)
29
Our table now contains projected staffing requirements for transplant date, length
of stay, outpatient plus 30 days, outpatient plus 60 days, and outpatient plus 90 days for
planning Massey Cancer Center staff requirements.
6.0 Next Steps
For project next steps, we would like to validate the accuracy of our bone marrow
transplant forecast and the accuracy of our staffing forecast model. For bone marrow
transplant forecast accuracy, we would like to evaluate nonlinear time series forecasting
methods for possible enhancements to our modeli. For staffing requirements we would like
to add diagnostic data and additional clinical data (specifically genetic data) for any
improvements to bone marrow transplant cycle times and our staffing projections.
We would like to explore further the 2014-2016 regional data against local Massey
data for insights and additional policy considerations. Regional bone marrow transplants
are trending up whereas Massey Cancer Center appears to be oscillating around a set
capacity. We would like to investigate this dynamic for future strategic healthcare policy
implications.
Medicaid expansion is a current topic for the state of Virginia and nationally. We
would also like to investigate the impact of Medicaid expansion to Massey’s attrition rates
and subsequent bone marrow transplant forecasts.
i Nonlinear Time Series Analysis with R. Oxford University Press. By Ray Huffaker, Marco Bittelli, and Rodolfo
Rosa