Content uploaded by Melodie Yashar
Author content
All content in this area was uploaded by Melodie Yashar on Nov 11, 2021
Content may be subject to copyright.
2021 AIAA
1
Evaluation of Self-Scheduling Exercises Completed by
Analog Crewmembers in NASA’s Human Exploration
Research Analog (HERA)
Jack W. Gale
1
NASA Ames Research Center, Moffett Field, CA, 94035, USA
Melodie Yashar
2
San José State University Research Foundation, Moffett Field, CA, 94035, USA
John Karasinski
3
and Jessica J. Marquez
4
NASA Ames Research Center, Moffett Field, CA, 94035, USA
NASA human spaceflight missions are inherently dynamic and require frequent
scheduling changes to adapt to shifting mission priorities and objectives. Tactical level changes
to the mission plan are traditionally made by a team of expert planners and operations
specialists on the ground. Astronauts are expected to execute more autonomously during
future long duration missions, however, and will need to take on some of the responsibility of
managing their own schedule while still abiding by the numerous constraints required by
human spaceflight operations. This paper summarizes salient elements of crew performance
in NASA’s Human Exploration Research Analog Campaign 3. Analog crewmembers
completed a series of self-scheduling exercises to evaluate Playbook’s usability towards
enabling self-scheduling without support from ground control. Playbook is a self-scheduling
software tool designed and developed by our team. We also investigated how to best
communicate self-scheduling tasks and constraints to the crew to facilitate efficient self-
scheduling during isolation in a realistic environment. Our analysis identified that 30 minutes
was sufficient to complete complex self-scheduling tasks. Our evaluation also identified
differences between individual and collaborative performance; analog crewmembers
completed self-scheduling exercises more quickly as a team as opposed to individually and
reported lower subjective difficulty ratings overall.
I. Nomenclature
HERA = human exploration research analog
C3 = campaign 3
LDEMs = long duration exploration missions
PDAT = playbook data analysis tool
n = number of data points
M = mean
SD = standard distribution
r = Pearson's correlation coefficient
1
Research AST Human/Machine Systems, NASA Ames HCI Group AIAA Member.
2
Senior Associate Researcher, NASA Ames HCI Group, AIAA Member.
3
Research AST Human/Machine Systems, NASA Ames HCI Group, AIAA Member.
4
Human Systems Engineer, NASA Ames HCI Group, AIAA Member.
2021 AIAA
2
p = p-value
II. Introduction
Self-scheduling is a novel concept of operations for in-flight astronauts, allowing crew to schedule or re-schedule
a shared timeline. Current NASA programs like the International Space Station do not support manipulation of
timelines by crewmembers during nominal operations; as such, the impacts related to self-scheduling need to be
understood. Future long duration exploration missions (LDEMs) will introduce new challenges including
communication delay between crewmembers and ground teams which increases the need for crew autonomy. To better
understand self-scheduling in an operationally-relevant environment, our team conducted an experiment in the NASA
Human Exploration Research Analog (HERA), an isolation and confinement analog at NASA's Johnson Space Center
[1][2]. HERA Campaign 3 (C3) consisted of four missions in 2016. Each mission was 30 days in duration and was
supported by four crewmembers. While HERA is primarily focused on studying the effects of isolation, this testbed
is also used to evaluate hardware and software tools in conditions similar to those on the International Space Station.
This paper summarizes a post-hoc analysis using data from HERA C3 to understand how crewmembers schedule
activities not only individually but also collaboratively as a team using Playbook.
III. Background
A. HERA Research Objectives
In 2016, our principal goal in HERA was to evaluate Playbook in terms of its ability to support self-scheduling,
its usability, and crew’s overall task completion experience. However, a post-hoc analysis allowed our team to assess
crew self-scheduling performance in additional ways. Our retrospective, exploratory study of self-scheduling tasks
during HERA Campaign 3 seeks to draw lessons and insights from participants’ subjective ratings of difficulty with
performance for various self-scheduling exercises. A number of research questions were posed at the beginning of
this exploratory analysis which included:
1. How might we characterize crew performance, given various measures of plan complexity?
2. How does subjective performance compare to actual performance?
Since crewmembers do not traditionally plan their missions, this data is key in understanding how crews plan and
understand activity constraints. Objectives of this retrospective analysis expand on existing and ongoing research in
crew self-scheduling with Playbook. One objective included identifying whether participant performance may be
measured as a function of different characteristics of the plan. The exercise plans given to HERA crew to self-schedule
were designed with varying levels of difficulty and complexity, including various types of constraints. This analysis
investigates whether a higher number of constraints (a function of plan characteristics) may be correlated with
participant performance (such as reduced instances of violations).
Another objective included answering and providing evidence for whether participants' subjective ratings of
difficulty were related to the plan characteristics. In the instances when user difficulty ratings did not correspond with
measures of performance, such as successful completion of an exercise, our analysis investigated whether a different
facet of the experiment or the campaign may have contributed to that disparity. This study posed the question of
whether participants’ perceived performance might correspond with objective measures of plan completion—such as
reduced instances of violations remaining in the plan or instances of double banding.
B. Self-Scheduling with Playbook
Playbook is a mobile, web-based scheduling software tool used on several analog missions and has been adapted to
fit a variety of mission profiles [3][4]. Currently, Playbook is used for planning, self-scheduling, and viewing of a
mission plan at a strategic and tactical level. During the planning phase, activities are defined, their constraints are
modeled, and activities are scheduled. Self-scheduling allows crew to schedule and/or reschedule activities; at this
point, activities are already defined and may have constraints. In the spaceflight context, large time-consuming
schedule changes occur frequently mid-mission [5]. Planners leverage activity constraint modeling to ensure that
schedules abide by the complex set of requirements and constraints involved in a mission. While some planning and
scheduling systems leverage automatic planners and/or leverage temporal flexibility in timelines [6], our team has
found that mixed-initiative planning and scheduling best supports self-scheduling [13]. Modeling and visualizing
constraints enables planners and hence, crew, the flexibility to schedule activities while still abiding to the required
complex set of spaceflight operational constraints.
2021 AIAA
3
Playbook displays the schedule of all crewmembers horizontally (Fig 1). Crew schedules are composed of multiple
activities that are assigned to the various crewmembers. Activities can have constraints, such as “must be done at a
particular time of day” or “requires communication bandwidth to complete.” At the time of the HERA experiment
(2016), most constraints could be modeled and visualized by Playbook, though some cannot (e.g., “a crewmember
can only be assigned 2 hours of exercise”). Modeled constraints are constraints that Playbook understands from
metadata associated with activities.
Fig 1. Timeline view within Playbook
IV. Experimental Design & Methods
A. Experimental Design
Each crewmember was asked to complete seven self-scheduling exercises over the course of their 30-day mission.
To ensure consistency, the seven self-scheduling exercises were done on the same mission day for each of the four
missions across Campaign 3. We scheduled one hour of the crew's time for the self-scheduling exercises. Three of
those exercises were done individually (without collaboration or assistance from other crewmembers) and four of
those exercises were done collaboratively as a team of four. A total of 16 crewmembers each completed seven
exercises. Crewmembers submitted a subjective rating of task difficulty (through a survey) after each exercise was
complete. Self-scheduling exercises were designed to increase in difficulty over time and presented new challenges,
varied scheduling complexity, and constraint types.
The crew was required to use the task list to self-schedule. The task list is a view within Playbook that presents a
list of activities and groups that are unscheduled or have yet to be placed in the timeline. Alongside each activity
within the task list is information related to that activity like duration, assignment, and constraint information.
Crewmembers had to use the timeline, the task list, and other views to complete the self-scheduling exercises. The
exercises also asked the crew to follow a procedure matrix that listed each exercise’s respective activities and groups
along with the constraints associated with each. The self-scheduling procedure matrix consisted of a spreadsheet (Fig
2) that provided the high-level description of the exercise and asked the crew to schedule activities on either a single
day or across two mission days. The matrix provided each activity that needed to be scheduled in a list along with a
series of constraints like crew assignment, a description of the activity, equipment constraints, and schedule
constraints. If a modeled constraint was not met when scheduled, the violated activity would have a red border,
indicating that the activity should be rescheduled. The matrix and procedure were provided in PDF format, separate
from the Playbook interface. Crewmembers were asked to create violation-free plans that satisfied activity constraints
and to not double band multiple activities in the timeline (i.e., in a violation-free plan, crewmembers can only be
assigned one activity at a particular time).
2021 AIAA
4
Fig 2. Example of an Exercise matrix listing activities and corresponding constraints
B. Methods
Our post-hoc analysis leveraged the Playbook Data Analysis tool (PDAT) to track the state of a particular plan and
experimental trial before and after an exercise was completed. PDAT enables passive data capture through screen
recordings, logging interactions between the user and Playbook, and saving varied states of the plan before and after
exercises were completed [10]. PDAT was used to capture data on participant trials. PDAT captures interactions based
on gesture type, feature, and the activity the user is interacting with.
Leveraging the data logs, we were able to identify several objective measures of participant performance from the
Campaign 3 exercises including: overall completion of plans, time to complete each exercise (time-on-task), number
of violations left in the timeline, and number of instances of double banding over the course of an exercise. Plan
completion was measured as either complete or incomplete. If an activity did not conform to the listed constraint in
the procedure matrix, it was “in violation.” Each constraint not met was counted as a single violation, regardless of
the type of constraint that it may be, such as “Comm,” “Equipment,” and “Schedule” constraints (Fig 2). The total
number of violations remaining in a plan was determined by comparing the state of the plan from the beginning of the
trial with the end of the exercise, and verifying that each constraint listed in the procedure was met. Violations and
instances of double banding were counted through analyses of Playbook PDAT recordings or through visual
representations of violations which were modeled.
Subjective measures of participant performance were identified via a survey administered at the end of each
exercise. The survey asked for the crewmember’s role, the level of perceived exercise difficulty based on a Likert
scale ranging from “Very Simple” (1) to “Very Difficult” (5), and were asked for any comments related to the exercise.
An additional survey was administered at the end of each mission to gather additional feedback on Playbook as well
as the experiment overall.
Despite our best efforts, several factors limited our analysis including inconsistent and variegated applications of
scheduling task complexity factors across exercises, data log inconsistencies, and missing data logs. Trials with
incomplete data in either survey responses or missing time-on-task information were omitted from the analysis.
Missing violations data have been omitted in all instances where that data was unavailable. Subjective ratings of user
difficulty were obtained from all participants for all exercises. While missing and incomplete data may have obscured
the identification of trends in the analysis, this study identifies salient elements of participant performance and the
self-scheduling experiment itself to be reconsidered for implementation in future experiments.
V. Analysis & Results
A. Approach
Participant data was grouped into three categories: plan complexity data, performance data, and survey results.
Our quantitative data was used to evaluate the role of plan complexity on performance and to understand how
2021 AIAA
5
individual performance compares to collaborative performance. Exercise difficulty was determined by evaluating
multiple elements of plan complexity (i.e., number of activities, number of constraints, and constraint types). Subject
matter experts were also asked to provide subjective ratings of exercise difficulty as a baseline for comparing crew
ratings.
B. Time-on-Task
Time-on-task (in minutes) is the length of time it took a participant to complete one self-scheduling exercise. The
mean time-on-task across C3 was 29.5 minutes for individual exercises and 22.9 minutes for collaborative exercises
(see Table 1).
Ex 1
(Indiv.)
Ex 2
(Collab.)
Ex 3
(Indiv.)
Ex 4
(Indiv.)
Ex 5
(Collab.)
Ex 6
(Collab.)
Ex 7
(Collab.)
Individual
Exercises
Collab
Exercises
Mean
32.8
19.26
23.23.
31.17
27.29
20.93
23.1
29.52
22.95
Median
34.94
19.3
20.43
28.02
22.11
15.87
17.7
26.77
20.46
Min
14.97
12.5
12.2
20.2
18.5
13.63
13.72
12.2
12.53
Max
43.7
25.9
52.6
47.1
46.43
30.4
45.7
47.1
46.43
Standard Deviation
8.7
5.49
11.96
10.11
12.96
8.05
15.1
10.69
10.35
Table 1. Descriptive Statistics of Time-on-Task (minutes) for HERA C3
In Figure 3, time-on-task is plotted sequentially in order of completion. The median is represented as the black
line within the box range, and the mean is represented as the white dot within the box range. The individual exercises
(1, 3, and 4) were designed to be progressively more challenging. A learning effect was not observed. Overall, the
trend indicates exercises completed individually took slightly longer than those completed as a team (i.e.,
collaborative), though the average difference is only about 6 minutes.
Fig 3. Average Time-on-task, Individual & Collaborative Exercises
C. Subjective Difficulty & Time-on-Task
Figure 4 shows the distribution of subjective difficulty (as ranked by participants) for all the exercises. Notably,
only one participant (out of 16) rated one exercise (Exercise 5) as “Very Difficult”. In general, most rated the exercises
2 or 3. The collaborative exercises (2, 5, 6, & 7) tended to be rated 2. While subjective ratings are useful, they do not
necessarily correlate with objective scheduling performance measures, as will be described below.
2021 AIAA
6
Fig 4. Subjective Difficulty Rating per Exercise
Our analysis explored whether subjective difficulty was correlated to time-on-task (Fig 5). The exercises
completed individually and those completed collaboratively were evaluated separately. Collaborative exercises only
have one time-on-task value (the result of multiple participants working together), and each analog crewmember to
complete an individual exercise had a distinct time-on-task measured. A salient trend for individual exercises is that
time-on-task seems to increase with subjective difficulty rating. In collaborative exercises, most participant groups
rated plan difficulty lower than individual participants, with few difficulty ratings given as 4 or 5. Collaborative
exercises tended to be completed in a range between 15-30 minutes with few instances completed over 45 minutes.
Fig 5. Time-on-task v. User Difficulty; Individual & Collaborative Exercises
While we identified that time spent for individual exercises increases with subjective difficulty rating, we found
no relationship between time spent on collaborative exercises and subjective difficulty. Self-reported subjective ratings
of difficulty suggest that over time, crewmembers found the exercises easier; however, the variability suggests that
other factors (such as plan complexity) played a role.
2021 AIAA
7
D. Violations & Constraints
We expected that increased plan complexity would lead to increased violations, time-on-task, and overall
difficulty. However, this was not always the case; the number of violations across each exercise was not correlated
with plan complexity. With respect to the total number of violations remaining in the timeline at the conclusion of an
exercise, the data is quite sparse. For individual exercises across 4 missions, violation data was missing for roughly
20% of trials due to data loss. Nonetheless, in all individual exercises (1, 3 and 4) 50% or more trials resulted in fewer
than 10 violations remaining in the plan at the conclusion of the experiment. The average number of violations for
individual exercises was 2.5, whereas the average number of violations remaining in the plan for collaborative
exercises was 5. There appears to be a slight trend due to order—the more exercises completed by participants, the
fewer violations remained (see Table 2). However, the number of potential or possible violations across exercises is
not consistent, and this may be due to the exercise setup. Exercise 1 appears to have the most variability with respect
to the number of violations—this might indicate that participants did not receive sufficient training to easily complete
the first task. Data on violations remaining in collaborative exercises is too sparse to conclude any definitive trends,
and violations data was not able to be retrieved for missions 3 and 4 due to data loss (missing files).
Comparing remaining violations to modeled constraints within exercises, a slight trend seems to be that the
performance of individual exercises improves over time. However, this pattern is not indicative of performance for
collaborative exercises given that the data and sample size is severely limited. In comparing both violations and double
banding remaining in the plan with user difficulty, subject ratings of difficulty do not seem to be indicative of
performance. In instances where more violations occurred and yet users rated low subjective difficulty ratings,
instructions for the plan activity may have been misinterpreted, or may not have been clearly described.
Ex 1
(Indiv)
Ex 2
(Collab)
Ex 3
(Indiv)
Ex 4
(Indiv)
Ex 5
(Collab)
Ex 6
(Collab)
Individual
Exercises
Collab
Exercises
Mean
6.58
6.5
4
5.09
4.5
3.86
2.5
5
Median
6.5
6.5
3
3
4.5
3
2.5
4
Min
0
1
0
1
4
3
2
1
Max
15
12
11
11
5
5
3
12
Standard Deviation
4.7
5.88
4.24
4.23
0.53
1.07
0.71
3.56
Table 2. Descriptive Statistics of Violations Remaining in Plan for HERA C3
E. Correlation Analysis
To understand the impacts of plan complexity on performance we ran a correlation analysis on four variables:
time-on-task, subjective difficulty, number of constraints, and number of violations post exercise. Our correlation
analysis was broken down further into individual and collaborative exercise groups to understand the differences
between the two exercise types. We used the Pearson's correlation coefficient (r) [11] to understand the relationship
between participants’ subjective ratings of difficulty with performance. In Table 3, we present correlations between
the four variables along with the number of data points (n), the mean (M) and standard deviation (SD). We found a
significant correlation between time-on-task and subjective difficulty for individual exercises r(34) = .40, p = .019.
The correlation decreases when looking at the collaborative exercise data set r(15) = .17, p = .52. Constraints and
resulting violations were not correlated with time-on-task or subjective difficulty. There was a moderate negative
correlation between violations and time-on-task for collaborative exercises. It may be that constraints were not fully
understood by participants—likely due to the large number of constraints, ranging from 81-138 constraints per
exercise.
Individual Exercises
n
M
SD
1
2
3
4
1. Time-on-task
33
29.52
10.69
-
2. Subjective Difficulty
33
2.73
0.94
0.40*
-
3. Violations
20
8.85
7.14
-0.01
-0.05
-
4. Constraints
33
95.30
12.33
0.22
0.12
-0.23
-
2021 AIAA
8
Collaborative Exercises
n
M
SD
1
2
3
4
1. Time-on-task
17
23.12
10.04
-
2. Subjective Difficulty
17
2.18
0.88
0.17
-
3. Violations
6
5.67
4.08
-0.28
-0.37
-
4. Constraints
13
106.46
23.36
0.34
-0.27
-0.15
-
Table 3. Correlation Analysis, *p < .05
F. Additional Qualitative Feedback
The crew was asked to complete an additional questionnaire at the conclusion of each mission asking crewmembers
to provide feedback on improvements, suggestions, and feature requests for Playbook. The survey also prompted users
to comment on the tool's general operability and ease-of-use within a communication-delayed environment. The data
from the surveys and questionnaires were placed into an affinity map (a thematic analysis method) [12] used to
generate emergent insights, recommendations, and feature improvements for Playbook. Our insights were grouped
into two categories: 1) self-scheduling exercises improvements and 2) Playbook's ability to communicate and represent
the implications of constraints within the tool.
The survey responses indicated that crewmembers experienced frustration due to the separation between the self-
scheduling exercise timeline and the actual operational mission timeline. Self-scheduling exercises used a separate
plan with unique activities and goals, which required reorientation. This insight enables the recommendation that to
better understand crew self-scheduling, future self-scheduling exercises should either self-schedule within the actual
operational mission timeline or use the same activities and groups as the mission timeline to reduce confusion.
Participants were also required to have two web browser tabs open at once: the exercise plan and the procedure matrix.
The survey results indicated that going back and forth between the matrix and the plan was inconvenient and often
frustrating for crewmembers. Crewmembers had to commit several tasks and constraints to memory which led to
increased frustration. Crewmembers suggested physical paper copies of the matrix may help to reduce cognitive load.
Participants also found the wording and complexity of the matrix and instructions confusing, likely adding time to
the self-scheduling exercise. One recurring request from participants included the possibility of being able to highlight
matrix steps so that crew could easily identify the current step they are on. Certain activities within the timeline
contained modeled constraints; and if these constraints were not met during trials, the activity would produce violation
(represented as a red border within Playbook). Other activities were not modeled and were only represented in the
matrix and therefore did not or could not produce a visual violation affordance. In future experiments such as HERA
C6, Playbook is anticipated to model more constraints so that visual affordance will apply for all violated activities.
Conversely, crewmembers suggested adding a positive affordance within the interface to indicate when an activity
was scheduled properly.
VI. Conclusions
After analyzing various measures associated with self-scheduling within NASA’s HERA C3 we found that
subjective difficulty had a strong, positive correlation with time-on-task for individual exercises, but did identify a
similar trend among collaborative exercises. During collaborative exercises, participants reported only average levels
of difficulty even as exercises became more difficult over the course of the mission. Future self-scheduling exercises
and experiments for future long duration missions may consider additional research to characterize the ways in which
collaborative performance differs from individual crewmember performance in self-scheduling. Additional avenues
of research may strive to improve constraints intelligibility to eliminate any potential confusion or performance
decrements by participants. HERA C6 (planned for 2021 and 2022) will use Playbook as a planning and scheduling
tool, and self-scheduling will occur within the operational mission timeline. In anticipation of C6, we have identified
new countermeasures and design aids including expanded constraint modeling and communication, priority indicators,
and expanded visualizations. This future experiment is larger in scope and will produce new insight into plan quality
as well as self-scheduling strategies used to solve difficult scheduling problems.
References
[1] Perez, J., “HERA - Human Exploration Research Analog”, URL: https://www.nasa.gov/analogs/hera [retrieved
1 August 2021].
[2] Mars, K. “HERA Research by Campaign”, https://www.nasa.gov/analogs/hera/research [retrieved 1 August
2021].
2021 AIAA
9
[3] Marquez, J. J., Pyrzak, G., Hashemi, S., McMillin, K., & Medwid, J. “Supporting real-time operations and
execution through timeline and scheduling aids,” 43rd International Conference on Environmental Systems.
43rd International Conference on Environmental Systems, Vail, CO, 2013, doi: https://doi.org/10.2514/6.2013-
3519.
[4] Marquez, J. J., Hillenius, S., Kanefsky, B., Zheng, J., Deliz, I. and Reagan, M. "Increasing crew autonomy for
long duration exploration missions: Self-scheduling," 2017 IEEE Aerospace Conference, 2017, doi:
10.1109/AERO.2017.7943838.
[5] Dempsey, R. C. (Ed.) “The International Space Station: Operating an Outpost in the New Frontier,” National
Aeronautics Space Administration, 2018, doi: https://www.nasa.gov/connect/ebooks/the-international-space-
station-operating-an-outpost.
[6] Muscettola, N. “Hsts: Integrating Planning and Scheduling,” Pittsburgh, Pa: Carnegie Mellon University, The
Robotics Institute, 1993, doi: https://apps.dtic.mil/sti/citations/ADA266991.
[7] Mailliez, M., Battaïa, O., and Roy, R. “Scheduling and Rescheduling Operation Using Decision Support Systems:
Insights From Emotional Influences on Decision-Making,” Frontiers in Neuroergonomics, 2021, doi:
10.3389/fnrgo.2021.586532.
[8] Reppa, I., McDougall, S., Sonderegger, A., and Schmidt, W. C. “Mood moderates the effect of aesthetic appeal
on performance,” Cognition and Emotion, 2021, doi: 10.1080/02699931.2020.1800446.
[9] Moshagen, M., Musch, J., and Göritz, A. “A blessing, not a curse: Experimental evidence for beneficial effects
of visual aesthetics on performance,” Ergonomics, 2009, doi: 10.1080/00140130903061717.
[10] Kanefsky B., Zheng J., Deliz I., Marquez J.J., Hillenius S. “Playbook Data Analysis Tool: Collecting Interaction
Data from Extremely Remote Users,” AHFE 2017, 2018, doi: 10.1007/978-3-319-60492-3_29.
[11] Freedman, D., Pisani, R., & Purves, R. Statistics, Pisani, R. Purves, 4th Edn. WW Norton & Company, New
York, 2007.
[12] Pernice, K. “Affinity Diagramming: Collaboratively Sort UX Findings & Design Ideas.” Nielsen Norman Group.
https://www.nngroup.com/articles/affinity-diagram/ [retrieved 2 August 2021].
[13] Bresina, J. L., & Morris, P. H. “Mixed-Initiative Planning in Space Mission Operations,” AI Magazine, 28(2), 75,
2007, doi: 10.1609/aimag.v28i2.2041.