Conference PaperPDF Available

# Estimating Completion Rates from Small Samples Using Binomial Confidence Intervals: Comparisons and Recommendations

Authors:
• MeasuringU

## Abstract and Figures

The completion rate – the proportion of participants who successfully complete a task – is a common usability measurement. As is true for any point measurement, practitioners should compute appropriate confidence intervals for completion rate data. For proportions such as the completion rate, the appropriate interval is a binomial confidence interval. The most widely-taught method for calculating binomial confidence intervals (the "Wald Method," discussed both in introductory statistics texts and in the human factors literature) grossly understates the width of the true interval when sample sizes are small. Alternative "exact" methods over-correct the problem by providing intervals that are too conservative. This can result in practitioners unintentionally accepting interfaces that are unusable or rejecting interfaces that are usable. We examined alternative methods for building confidence intervals from small sample completion rates, using Monte Carlo methods to sample data from a number of real, large-sample usability tests. It appears that the best method for practitioners to compute 95% confidence intervals for small-sample completion rates is to add two successes and two failures to the observed completion rate, then compute the confidence interval using the Wald method (the "Adjusted Wald Method"). This simple approach provides the best coverage, is fairly easy to compute, and agrees with other analyses in the statistics literature.
Content may be subject to copyright.
ESTIMATING COMPLETION RATES FROM SMALL SAMPLES
USING BINOMIAL CONFIDENCE INTERVALS:
COMPARISONS AND RECOMMENDATIONS
Jeff Sauro
Oracle
Denver, CO USA
jeff.sauro@oracle.com
James R. Lewis
IBM
Boca Raton, FL
jimlewis@us.ibm.com
The completion rate – the proportion of participants who successfully complete a task – is a common
usability measurement. As is true for any point measurement, practitioners should compute appropriate
confidence intervals for completion rate data. For proportions such as the completion rate, the appropriate
interval is a binomial confidence interval. The most widely-taught method for calculating binomial
confidence intervals (the “Wald Method,” discussed both in introductory statistics texts and in the human
factors literature) grossly understates the width of the true interval when sample sizes are small.
Alternative “exact” methods over-correct the problem by providing intervals that are too conservative.
This can result in practitioners unintentionally accepting interfaces that are unusable or rejecting interfaces
that are usable. We examined alternative methods for building confidence intervals from small sample
completion rates, using Monte Carlo methods to sample data from a number of real, large-sample usability
tests. It appears that the best method for practitioners to compute 95% confidence intervals for small-
sample completion rates is to add two successes and two failures to the observed completion rate, then
compute the confidence interval using the Wald method (the “Adjusted Wald Method”). This simple
approach provides the best coverage, is fairly easy to compute, and agrees with other analyses in the
statistics literature.
Introduction
Estimating completion rates with small samples is an
important and challenging task. Confidence intervals
are taught as an appropriate way to qualify results from
small samples. The addition of confidence intervals to
completion rate estimates helps both the engineer and
readers of usability reports understand the variability
inherent in small samples. While the importance of
adding confidence intervals is widely agreed upon, the
best method for computing them is not.
Most practitioners interpret a 95% confidence interval
to indicate that in 95 out of 100 experiments, the
interval constructed from the sample will contain the
true value for the population. The extent to which this
is the case for any given method of computing intervals
is the “coverage” for that method.
The Wald method is the most commonly presented
formula for calculating binomial confidence intervals
(see Figure 1 below).
Figure 1: Wald Confidence Interval
Task completion rates are often modeled using a
binomial distribution because the outcome of a task
attempt is usually a binomial value (complete / didn’t
complete). The Wald interval is simple to compute, has
been around for some time (Laplace, 1812) and is
presented in most introductory statistics texts and some
writings in the human factors literature (e.g., Landauer,
1988). Unfortunately, it produces intervals that are too
narrow when samples are small, especially when the
completion rate is not near 50%. Under these
conditions its average coverage is approximately 60%,
not 95% (Agresti and Coull, 1998). This is a real
problem considering that HF practitioners rely on
confidence intervals to have true coverage that is equal
to nominal coverage in the long run.
To improve the poor average coverage of the Wald
interval, advanced statistics texts often present a more
complicated method called the Clopper-Pearson or
“Exact” method (see Figure 2 below).
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING—2005 2100
Figure 2: “Exact” / “Clopper-Pearson” Interval
The Exact method provides more reliable confidence
intervals with small samples (Clopper and Pearson,
1934) and has also been discussed in the HF literature
(e.g., Lewis, 1996, and Sauro, 2004). In actual
practice, however, the Exact interval produces overly
conservative confidence intervals with true coverage
closer to 99% when the nominal confidence is 95%. It
is especially vulnerable to this overly conservative
nature when samples sizes are small (n <15) (Agresti
and Coull, 1996). Thus, Exact intervals are too wide
and Wald intervals are too narrow.
A third method called the “Score” interval (Wilson,
1927) is not overly conservative, and provides average
coverage near 95% for nominal 95% intervals.
Unfortunately, its computation is as cumbersome as the
Exact method (see Figure 3 below), and it has some
serious coverage problems for certain values when the
completion rate is near 0 or 1 (Agresti and Coull,
1998).
Figure 3: “Score” / Approximate Interval
Another alternative method, named the Adjusted Wald
method by Agresti and Coull (1998, based on work
originally reported by Wilson, 1927), simply requires,
for 95% confidence intervals, the addition of two
successes and two failures to the observed completion
rate, then uses the Wald formula to compute the 95%
binomial confidence interval. Its coverage is as good as
the Score method for most values of p, and is usually
better when the completion rate approaches 0 or 1.
The method is astonishingly simple, and has been
recommended in the statistical literature (Agresti and
Coull, 1998). The “add two successes and two
failures” (or adding two to the numerator and 4 to the
denominator) is derived from the critical value of the
normal distribution for 95% intervals (1.96, which is
approximately 2). Squaring this critical value provides
the 4 for the denominator. For example, an observed
completion rate of 80% with 10 users (8 successes and
2 failures) would be converted to 10 successes and 4
failures, and these values would then be used in the
Wald formula.
Table 1 displays the four differing results for each of
the interval methods for a sample of five users with
four successes and one failure (80% completion rate).
Table 1: 95% confidence intervals by method for an
80% completion rate (4 successes, 1 failure)
CI Method Low % High % CI Width
Exact 28.4 99.5 71.1
Score 37.6 96.4 58.8
Wald 44.9 100 55.1
As can be seen from Table 1, the different methods
provide different end points and differing confidence
interval widths. While one would like a narrower
confidence interval (which provides less uncertainty),
the interval should not be so narrow as to exclude more
completion rates than expected from the stated or
nominal rate – that is, a nominal 95% confidence
interval should have a likelihood of 95% of containing
the population parameter. The implication is clear,
depending on which method the HF practitioner
chooses, the boundaries presented with a completion
usability of an interface.
The Wald and Exact methods are by far the most
popular ways of calculating confidence intervals.
Depending on which method practitioners are using to
calculate their intervals, they will either work with
intervals that provide a false sense of precision (Wald
method) or work with intervals that are consistently
less precise than their nominal precision (Exact
method). If the Adjusted Wald method can provide the
best average coverage while still being relatively simple
to compute (as suggested in the statistical literature,
Agresti and Coull, 1998), it will provide the HF
practitioner with the easiest and most precise way of
computing binomial confidence intervals for small
samples.
Method
One way to test the effectiveness of a confidence
interval calculation is to take a sample many times
from a larger data set and see how well the calculated
confidence interval contained the actual completion
rate of the data set. We took data from several tasks
across five usability evaluations with completion rates
between 20% and 97%. The usability analyses were
performed on commercially available desktop and web-
based software applications in the accounting industry.
these completion rates as the best estimate of the
population completion rate.
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING—2005 2101
Using a Monte Carlo simulation method written in
Minitab, we took 10,000 unique random samples of 5,
10 and 15 completion rates to test each of the
confidence interval methods (Wald, Exact, Score and
Adjusted Wald). We then counted how many of the
10,000 completion rates fell outside the calculated
intervals for each of the methods. For example, on one
sample of 5 users from a dataset with a population
completion rate of 65.3%, we observed one success
and four failures (a 20% completion rate). The Exact
method provided a 95% confidence interval from .5%
to 71.6%, so it did contain the true population
completion rate of 65.3%. The Score method provided
intervals from 3.6% to 62.5%, so it did not contain the
true rate. Since we calculated nominal 95% confidence
intervals, we expect coverage of 95%. In other words,
about 9,500 of the 10,000 intervals computed during a
Monte Carlo simulation should contain the true value.
A Note on the Methodology
We could have chosen any hypothetical completion
rates to test the confidence intervals (as is often the
case in the statistical literature) but we used values
from a known large sample usability study so as to
focus our analysis on likely completion rates for
commercially available software. While the HF
practitioner usually doesn’t know ahead of time what
the population completion rate is, this exercise allowed
us to work backwards to see how well the smaller
samples predicted the known completion rates. We
were in essence running 10,000 usability evaluations
with small samples, calculating the confidence interval
with the different methods, and seeing how many times
the known completion rate was contained within the
intervals. While a sample size of 49 may not seem large
enough to test 10,000 combinations of completion
rates, even this modest sample size contains about 2
million unique combinations of five users.
Results
Table 2 contains the results of Monte Carlo simulations
for nine tasks with varying completion rates (e.g.,
91.8%, 93.8%, etc.) for sample sizes of 5, 10 or 15. As
expected, the Wald interval provided the worst
coverage, only containing the actual proportion 10% of
the time for the task with a 97.8% completion rate and
in the bottom left cell of Table 2. Next, find the
intersection with the completion rate of 97.8% (the
rightmost column). The first value in this cell (10.06)
means that 10.06% of the calculated intervals
contained the true values using the Wald method with a
sample of 5 users (the second and third values are for
10 and 15 user samples respectively). For the Wald
method to be a legitimate method to apply to these
types of data, one would expect this value to be
approximately 95%. Even at the less extreme
completion rate of 85.7%, the Wald interval only
contained the true value about half of the time
(53.75%) – a far cry from the 95% many practitioners
would have expected from a nominal 95% confidence
interval calculation.
The Exact interval showed the expected conservative
coverage with many of the nominally 95% confidence
intervals capturing over 99% of the 10,000 completion
rates (see especially the completion rates above 90% in
Table 2). The Adjusted Wald and Score methods
provided average coverage closest to the 95% nominal
level, which confirms earlier recommendations in the
statistical literature (Agresti and Coull, 1998). The
Table 2: Percent coverage for nine task completion rates by confidence interval method and number of users.
Expected width is 95.0. Values are derived from sampling 5, 10 or 15 completion rates (or hypothetical users)
10,000 times.
CI Method Users 20.4% 42.9% 61.2% 65.3% 77.6% 85.7% 91.8% 93.8% 97.8%
Exact
5
10
15
99.5
99.72
97.73
98.74
98.93
99.02
99.11
98.96
99.68
99.73
97.73
99.81
99.34
99.60
98.88
98.55
99.81
99.70
99.78
99.86
100
99.88
99.35
100
100
100
100
5
10
15
94.98
98.23
99.36
98.74
98.93
99.02
99.11
96.54
98.92
96.05
97.73
97.89
93.48
96.89
97.96
98.55
97.46
97.88
95.40
97.50
99.43
97.50
99.35
97.38
89.94
100
100
Score
5
10
15
94.98
98.23
97.73
93.50
96.87
99.02
91.47
96.54
97.70
96.05
97.73
97.89
93.48
91.17
97.96
98.55
97.46
97.88
95.40
97.50
99.43
97.50
99.35
97.38
89.94
100
100
Wald
5
10
15
69.35
92.01
88.11
84.93
96.87
96.46
85.70
93.26
97.70
84.84
91.66
94.82
73.10
93.88
92.04
53.75
81.80
92.87
35.93
60.20
77.61
28.30
51.77
67.15
10.06
20.74
30.53
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING—2005 2102
mean and standard deviation of the coverage for each
of the methods appears in Table 3.
Table 3: Average coverage by confidence interval
method (n= 27 for each cell). Expected mean is 95.00.
CI Method Mean % SD
Exact 99.39 0.64
Score 97.56 2.17
Wald 72.06 26.43
Discussion
The Monte Carlo simulations show that the Adjusted
Wald method provides the coverage closest to 95%.
is its ease of calculation. Thus, HF practitioners should
use the Adjusted Wald method to calculate confidence
intervals for small sample completion rates. This can
be accomplished by simply adding two successes and
two failures to their observed sample, then computing a
95% confidence interval using the standard Wald
method. If a practitioner needs a higher level of
confidence than 95%, then he or she should substitute
the appropriate Z-critical values for 2 and 4. For
example, a 99% confidence interval would use the Z-
critical value of 2.58. The confidence interval would
then be calculated by adding 2.58 successes and 6.63
failures to the observed completion rate.
The Score method provided coverage better than the
Exact and Wald methods but fell short of the Adjusted
Wald method. Additionally, its drawback is its
computational difficulty and its poor coverage for some
values when the population completion rate is around
98% or 2%, regardless of sample size (Agresti and
Coull, 1998). The only advantage in using the Score
method is that it provides more precise endpoints when
the ends of the intervals are close to 0 or 1. For some
values (e.g. 9/10) the adjusted Wald’s crude intervals
go beyond 1 and a substitution of >.999 is used. For
the Score method, however, the upper interval is
calculated as a more precise .9975.
The Exact method was designed to guarantee at least
95% coverage, whereas approximate methods (such as
the Adjusted Wald) provide an average coverage of
95% in the long run. HF practitioners should use the
Exact method when they need to be sure they are
calculating a 95% or greater interval – erring on the
conservative side. For example, at the population
completion rate of 97.8% both the Score and Adjusted
Wald methods had actual coverage that fell to 89%
(See Table 2 above). When the risk of this level of
actual coverage is inappropriate for an application, then
the Exact method provides the necessary precision.
The Wald method should be avoided if calculating
confidence intervals for completion rates with sample
sizes less than 100. Its coverage is too far from the
nominal level to provide a reliable estimate of the
population completion rate. As the sample size
increases above 100, all four methods converge to
similar intervals. A calculator for all four methods is
available online at
http://www.measuringusability.co m/w ald.ht m.
When All Users Pass or Fail
With small sample sizes, it is a common occurrence
that all users in the sample will complete a task (100%
completion rate) or all will fail the task (0% completion
rate). For these scenarios, it is often unpalatable to
report 100% or 0%. After all, how likely is it that the
true population parameter is as extreme as 100% or
0%? One alternative is to use the midpoint of the
binomial confidence interval derived from the Adjusted
Wald method as the point estimate (called the Wilson
Point Estimator). For example, if 15 out of 15 users
method provides a 94.01% completion rate. While this
value may seem too far from the observed 100%, its
attractiveness is that it is a function of the sample
size—the greater the sample size, the closer this value
will be to 100%. Whether this method provides a
consistent advantage in improving the accuracy of
point estimates is a topic for future research.
Conclusion
There is a strong need to continue to encourage HF
practitioners to include confidence intervals when
reporting estimates of completion rates. Because the
Adjusted Wald method is just a slight modification to
the widely-taught Wald method, it should be easy to
teach with other basic statistics without overwhelming
students.
Confidence intervals are a way to build a reasonable
boundary to capture unknown population completion
rates. For a 95% confidence interval, “reasonable
boundary” means a 5% chance of not containing the
population completion rate after repeated samples.
“Reasonable boundary” is not a 1% chance and
certainly not a 40% chance– the typical rates obtained
when using the Exact or Wald methods to generate
binomial confidence intervals. To use the Adjusted
Wald interval, the HF practitioner can use their own
software, a spreadsheet calculation, or the calculator at
http://www.measuringusability.com/wald.htm, which
also computes the Exact, Score and Wald intervals for
comparison.
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING—2005 2103
Acknowledgements
We’d like to thank Lynda Finn of Statistical Insight for
providing the Monte Carlo macro in Minitab and
assistance with interpreting the statistical literature.
We’d also like to thank Erika Kindlund of Intuit for
providing the large sample completion rates.
References
Agresti, A., and Coull, B. (1998). Approximate is
better than ‘exact’ for interval estimation of binomial
proportions. The American Statistician, 52, 119-126.
Clopper, C. J., and Pearson, E. (1934). The use of
confidence intervals for fiducial limits illustrated in the
case of the binomial. Biometrika, 26, 404-413.
Landauer, T. K. (1997). Behavioral research methods
in human-computer interaction. In M. Helander, T. K.
Landauer, and P. Prabhu (Eds.), Handbook of Human-
Computer Interaction (pp. 203-227). Amsterdam,
Netherlands: North Holland.
Laplace, P. S. (1812). Theorie analytique des
probabilitites. Paris, France: Courcier.
Lewis, J. R. (1996). Binomial confidence intervals for
small sample usability studies. In G. Salvendy and A.
Ozok (eds.), Advances in Applied Ergonomics:
Proceedings of the 1st International Conference on
Applied Ergonomics -- ICAE '96 (pp. 732-737).
Istanbul, Turkey: USA Publishing.
Sauro, J. (2004). Restoring confidence in usability
results. From Measuring Usability, article retrieved Jan
2005 from
http://www.measuringusability.com/conf_intervals.htm
Wilson, E. B. (1927). Probable inference, the law of
succession, and statistical inference. Journal of the
American Statistical Association, 22, 209-212.
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING—2005 2104
... To assess sample reliability, a 95% CI was calculated for each specific rate antecedent using the adjusted Wald binomial method [27]. This test produces superior results for small samples when compared to other common binomial methods (e.g., Wald or Exact) [28]. ...
... For this comparison, the derived drowning ratio antecedent fell within the specific drowning rate antecedent 95% CI, although due to the small sample size and high proportion of male surfers, the specified range was large. In calculating binomial confidence intervals for extreme and small distributions, the binomial Score method may provide more precise endpoints compared to the adjusted Wald method [28]. The Score method revealed a comparatively smaller specific drowning rate antecedent 95% CI (3.24 to 111.36) but this result still bounded the derived drowning ratio antecedent for male surfers relative to female surfers. ...
Article
Full-text available
The study assessed the utility of risk analysis for advancing knowledge on drowning risk factors. The setting was unintentional drowning of surf bathers in Australia. Bathers reported earlier exposure to selected risk factors (swimming ability, wave height associated with rip currents and surf bathing experience) and were observed for water exposure (in minutes). These data were then assembled in mathematical models. The analysis forecast relative drowning risk pertaining to risk markers representing selected surf bather subgroups (gender, age and water activity). Contextualized through previous study findings, comparison of results with a gold standard obtained from mortality data generated new surf bather drowning hypotheses suitable for future testing by rigorous analytical epidemiologic designs. The hypotheses were: (1) The male to female comparative surf bather drowning rate is explained primarily by differences in crude water exposure; (2) the association of cardio-vascular medical conditions with surf bather drowning is stronger for older surf bathers compared to younger surf bathers; and (3) other risk contributors to surf bather drowning are: Poorly calibrated perception of bathing ability (overconfidence) and use of alcohol. Nonetheless, drowning rates appear generally consistent with time exposure to water. The study findings may also support drowning prevention strategies targeting risk marker subgroups.
... coverage and Landsat data, the confidence method was used to determine 0.5% and 99.5% of cumulative frequency as the values of VI water and VI veg , respectively (Sauro and Lewis 2005). Then, the following equation was used to calculate the Huangtai algal coverage: ...
Article
Full-text available
Huangtai algal blooms are key indicators of eutrophication and lake-ecosystem damage. Understanding the spatiotemporal heterogeneity of their growth is critical for preserving the ecological environment. The dimidiate pixel model is commonly used to estimate vegetation coverage; however, indices such as the normalized difference vegetation index have not been specifically constructed for the Huangtai algae spectrum and thus are not specific or sufficiently precise for use as indicators. Therefore, we propose a new dimidiate pixel model based on a novel additive vegetation index to calculate the Huangtai algal coverage for each pixel using Landsat multispectral satellite images with 30-m resolution. The results showed that the additive vegetation index with R² = 0.994 is a better indicator than the normalized difference vegetation index, enhanced vegetative index, and ratio vegetative index, with the accuracy of the new model reaching 86.61%. Monthly Landsat images from 2006 to 2016 were used to calculate the Huangtai algal coverage. Analysis of the inter-monthly variation indicated increased coverage from May to July, with an annual maximum and minimum of 14.43% and 0.33% in 2008 and 2013, respectively. This study provides a new reference map of Huangtai algal cover, which is important for monitoring and protecting the Lake Ulansuhai environment.
... For binomial outcomes where a study contained a zero observation (e.g. no patients had an event), 0.5 was added [43,44]. A p value of less than 0.05 was considered statistically significant; p values were not adjusted to account for multiple comparisons. ...
Article
Full-text available
Introduction: Lusutrombopag is an oral thrombopoietin receptor agonist (TPO-RA). Clinical trials have shown lusutrombopag's efficacy in reducing need for preoperative platelet transfusion in patients with chronic liver disease (CLD) and severe thrombocytopenia. This analysis assessed efficacy and safety of lusutrombopag in patients with severe thrombocytopenia and CLD undergoing planned invasive procedures. Methods: An electronic database search (through 1 December 2020) identified three randomised, placebo-controlled, double-blind clinical trials comparing lusutrombopag with placebo in patients with CLD and platelet count below 50 × 109/L scheduled to undergo a procedure with a perioperative bleeding risk. A random-effects meta-analysis examined treatment effect, with Cochrane Collaboration's tool assessing risk of bias. Results: The meta-analysis included 343 (lusutrombopag 3 mg, n = 173; placebo, n = 170) patients. More patients met the criteria for treatment response (platelet count at least 50 × 109/L and increase of at least 20 × 109/L from baseline anytime during the study) with lusutrombopag versus placebo (risk ratio [RR] 6.39; 95% confidence interval [CI] 3.69, 11.07; p < 0.0001). The primary efficacy outcome, proportion of patients requiring no platelet transfusion and no rescue therapy for bleeding for at least 7 days post procedure, was achieved by more patients treated with lusutrombopag versus placebo (RR 3.42; 95% CI 1.86, 6.26; p = 0.0001). The risk of any bleeding event was significantly lower with lusutrombopag compared to placebo (RR 0.55; 95% CI 0.32, 0.95; p = 0.03); conversely, thrombosis event rates were similar between lusutrombopag and placebo (RR 0.79; 95% CI 0.19, 3.24; p = 0.74). Conclusion: This meta-analysis showed that treatment of severe thrombocytopenia with lusutrombopag in patients with CLD prior to a planned invasive procedure was efficacious and safe in increasing platelet counts, avoiding the need for platelet transfusions, and reducing risk of bleeding, thereby enhancing the certainty of evidence supporting the efficacy and safety of lusutrombopag.
... The number of minor criteria required was defined using ROC curves analysis. Confidence intervals (CI) for sensitivity and specificity were calculated by Wald method (32). ...
Article
Full-text available
Objectives: To identify ultrasound (US) features associated with the presence of shoulder complaints. Methods: This observational, case-control study, compared US findings between participants with and without shoulder complaints, matched for age, sex, and dominancy. Data was collected from February 2018 to June 2020. Two-tailed Fisher's and Mann-Whitney U -tests were used, with p -values < 0.05 considered significant. Results: A total of 202 participants were enrolled (median age 56 years, range 18–70, 155 women), comprising 140 cases and 62 controls. A calcification size ≥6 mm, when age < 56 ( p = 0.02), and a distance to tendon insertion ≥6 mm, when age ≥56 ( p = 0.009), were only found in symptomatic shoulders. Color Doppler in rotator cuff (RC) tendons predominated in the presence of symptoms (26/140 vs. 2/62, p = 0.003). An algorithm also combining the number of calcifications, tendon echotexture and insertional thickening, osseous irregularity, cuff tears, and subacromial effusion showed a 92% (57/62) specificity for shoulder pain on this study sample. Conclusion: Calcification diameter of 6 mm or more is associated with shoulder pain in patients younger than 56 years. A distance from calcification to tendon insertion of 6 mm or more is related to pain in older patients. Doppler signal also is associated with shoulder pain. An algorithm based on a set of specific ultrasonographic criteria have a strong association with the presence of symptoms.
Article
Toolkits like the Arduino system have brought embedded programming to STEM education. However, learning embedded programming is still hard, requiring an understanding of coding, electronics, and how both sides interact. To investigate the opportunities of using a different programming paradigm than the imperative approach to learning embedded coding, we developed Flowboard . Students code in a visual iPad editor using flow-based programming , which is conceptually closer to circuit diagrams than imperative code. Two breadboards with I/O pins mirrored on the iPad connect electronics and program graph more seamlessly than existing IDEs. Program changes take effect immediately. This liveness reflects circuit behavior better than edit-compile-run loops. A first study confirmed that students can solve basic embedded programming tasks with Flowboard while highlighting important differences to a typical imperative IDE, Ardublock. A second, in-depth study provided qualitative insights into Flowboard’s impact on students’ conceptual models of electronics and embedded programming and exploring those.
Article
The Replicative Assessment of Spectroscopic Equipment (RASE) is an open-source software that uses experimental data as the basis to simulate the response of commercial radiation detectors to sources in various situations, particularly in the context of nuclear security and safeguards applications. Dynamic RASE introduces the capability to simulate scenarios where sources and detector are in relative motion. Position-dependent experimentally acquired gamma spectra are ingested by Dynamic RASE to build maps that describe the detector response over all space. These response maps are used to replicate the time-dependent energy spectra collected as sources move on a path near the detector. A Gaussian process is used to build each map, incorporating a novel kernel adapted to the special case of radiation detection. The approach has been validated against experimental data acquired using a NaI-based detector for ¹³⁷Cs and ⁵⁴Mn sources. The capability to create accurate simulations using either long-dwell static measurements or dynamic pass-by measurements as source data has been demonstrated. Quantitative relative performance, benefits, and shortcomings are discussed.
Article
Breaking bores are commonly observed in a number of natural processes, often associated with the presence of a transient mixture of air and water, with intense recirculation, air bubble entrainment, and splashing. Two-phase flow measurements in such highly unsteady flows cannot be based on long-duration measurements and require novel ensemble-statistical approaches based on multiple repetitions. Detailed measurements of air–water flow properties were then conducted in a breaking bore with Fr 1 = 2.4 using an array of multiple dual-tip phase-detection probes. Based on an extensive experimental program, inclusive of 2000 tests at a single position and 100 tests at multiple elevations, a detailed sensitivity analysis was conducted on the necessary number of repetitions to obtain physically meaningful and statistically reliable air–water flow properties. The results led to a robust methodology to estimate ensemble-statistical values, including confidence intervals and residual error. In addition, these results provided a detailed characterization of the behavior of air–water flow properties in highly unsteady flows, including void fraction, number of interfaces, and bubble chord time/length. Despite the transient nature, all physical processes showed consistent behaviors with theoretical models and other stationary flows, including hydraulic jumps and plunging jets. Overall, this study provided two-phase flow characteristics that go beyond the limitations imposed by the unsteady nature of the flow, proving thoroughly the importance of large datasets for the estimation of air–water flow properties in highly unsteady flows.
Conference Paper
Full-text available
Article
This mixed-methods study details the development, usability testing, and user experience evaluation of an informational mHealth app for older women who are considering diagnosis, treatment, and prevention of osteoporosis. Developers used heuristics from Universal Design theory adapted for older users. Formative usability testing measured 16 functional, informational, and navigational tasks. Data included transcripts of audio recordings, observer notes from video recordings, task completion times, and the results of a post-testing participant survey that evaluated user experience for app functions and information content. Participants interacted with the app in productive ways and with relative ease. The study also identified several app- and context-specific challenges that designers will address in future iterations of the tool. Researchers who are developing other mHealth products may benefit from using this study’s methodological framework, which includes both qualitative and quantitative results.
Article
Bottom-up design of high-entropy ceramics is a promising approach for realizing materials with unique combination of high hardness and fracture-resistance at elevated temperature. This work offers a simple yet fundamental design criterion – valence electron concentration (VEC) ⪆9.5 e⁻/formula unit to populate bonding metallic states at the Fermi level – for selecting elemental compositions that may form rocksalt-structure (B1) high-entropy ceramics with enhanced plasticity (reduced brittleness). Single-phase B1 (HfTaTiWZr)C and (MoNbTaVW)C, chosen as representative systems due to their specific VEC values, are here synthesized and tested. Nanoindentation arrays at various loads and depths statistically show that (HfTaTiWZr)C (VEC = 8.6 e⁻/f.u.) is hard but brittle, whilst (MoNbTaVW)C (VEC = 9.4 e⁻/f.u.) is hard and considerably more resistant to fracture than (HfTaTiWZr)C. Ab initio molecular dynamics simulations and electronic-structure analysis reveal that the improved fracture-resistance of (MoNbTaVW)C subject to deformation may originate from the intrinsic material’s ability to undergo local lattice transformations beyond tensile yield points, as well as from relatively facile activation of lattice slip. Additional simulations, carried out to follow the evolution in mechanical properties as a function of temperature, suggest that (MoNbTaVW)C may retain good resistance to fracture up to ≈900-1200 K, whereas (HfTaTiWZr)C is predicted to remain brittle at all investigated temperatures.
Conference Paper
Full-text available
Efficiency is an important consideration in the design of industrial usability studies. One way to reduce the cost of a usability study is to reduce its sample size. Small samples are not always appropriate, but in this paper I will describe a way to use binomial confidence intervals to determine rapidly if a usability defect rate exceeds a criterion. In these situations, relatively small samples are often adequate to meet the goals of a usability evaluator. I will also discuss the risks of using small samples in these situations.
Article
For interval estimation of a proportion, coverage probabilities tend to be too large for "exact" confidence intervals based on inverting the binomial test and too small for the interval based on inverting the Wald large-sample normal test (i.e., sample proportion ± z-score × estimated standard error). Wilson's suggestion of inverting the related score test with null rather than estimated standard error yields coverage probabilities close to nominal confidence levels, even for very small sample sizes. The 95% score interval has similar behavior as the adjusted Wald interval obtained after adding two "successes" and two "failures" to the sample. In elementary courses, with the score and adjusted Wald methods it is unnecessary to provide students with awkward sample size guidelines.
Chapter
This chapter discusses the conduct of research to guide the development of more useful and usable computer systems. Experimental research in human-computer interaction involves varying the design or deployment of systems, observing the consequences, and inferring from observations what to do differently. For such research to be effective, it must be owned—instituted, trusted and heeded—by those who control the development of new systems. Thus, managers, marketers, systems engineers, project leaders, and designers as well as human factors specialists are important participants in behavioral human-computer interaction research. This chapter is intended as much for those with backgrounds in computer science, engineering, or management as for human factors researchers and cognitive systems designers. It is argued in this chapter that the special goals and difficulties of human-computer interaction research make it different from most psychological research as well as from traditional computer engineering research. The main goal, the improvement of complex, interacting human-computer systems, requires behavioral research but is not sufficiently served by the standard tools of experimental psychology such as factorial controlled experiments on pre-planned variables. The chapter contains about equal quantities of criticism of inappropriate general research methods, description of valuable methods, and prescription of specific useful techniques.
Article
This chapter discusses the conduct of research to guide the development of more useful and usable computer systems. Experimental research in human-computer interaction involves varying the design or deployment of systems, observing the consequences, and inferring from observations what to do differently. For such research to be effective, it must be owned—instituted, trusted and heeded—by those who control the development of new systems. Thus, managers, marketers, systems engineers, project leaders, and designers as well as human factors specialists are important participants in behavioral human-computer interaction research. This chapter is intended as much for those with backgrounds in computer science, engineering, or management as for human factors researchers and cognitive systems designers. It is argued in this chapter that the special goals and difficulties of human-computer interaction research make it different from most psychological research as well as from traditional computer engineering research. The main goal, the improvement of complex, interacting human-computer systems, requires behavioral research but is not sufficiently served by the standard tools of experimental psychology such as factorial controlled experiments on pre-planned variables. The chapter contains about equal quantities of criticism of inappropriate general research methods, description of valuable methods, and prescription of specific useful techniques.
Restoring confidence in usability results. From Measuring Usability, article retrieved Probable inference, the law of succession, and statistical inference
• J Sauro
• E B Wilson
Sauro, J. (2004). Restoring confidence in usability results. From Measuring Usability, article retrieved Jan 2005 from http://www.measuringusability.com/conf_intervals.htm Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209-212.