PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

We investigate the effect of team formation and task characteristics on performance in high-stakes team tasks. In two natural field experiments, we found that randomly assigned teams performed significantly better than self-selected teams in a task that allowed for an unequal work distribution. If the task required the two team members to contribute more equally, the effect was reversed. Investigating mechanisms, we observe that teams become more similar in terms of ability and cooperate better when team members can choose each other. We show how different levels of skill complementarity across tasks may explain our results: If team performance largely depends on the abilities of one team member, random team assignment may be preferred because it leads to a more equal distribution of skills across teams. However, if both team members’ abilities play a significant role in team production, the advantage of random assignment is reduced, and the value of team cooperation increases.
Content may be subject to copyright.
When, and Why, Do Teams Benefit from Self-Selection?
Mira Fischer Rainer Michael Rilke B. Burcin Yurtoglu
September 25, 2021
We investigate the effect of team formation and task characteristics on performance in
high-stakes team tasks. In two natural field experiments, we found that randomly assigned
teams performed significantly better than self-selected teams in a task that allowed for an
unequal work distribution. If the task required the two team members to contribute more
equally, the effect was reversed. Investigating mechanisms, we observe that teams become
more similar in terms of ability and cooperate better when team members can choose each
other. We show how different levels of skill complementarity across tasks may explain our
results: If team performance largely depends on the abilities of one team member, random
team assignment may be preferred because it leads to a more equal distribution of skills across
teams. However, if both team members’ abilities play a significant role in team production,
the advantage of random assignment is reduced, and the value of team cooperation increases.
Keywords: Team Performance, Self-selection, Field Experiment, Education
JEL Classification: I21, M54, C93
Fischer: WZB Berlin Social Science Center, Reichpietschufer 50, 10115 Berlin, Germany, email:; Rilke: WHU - Otto Beisheim School of Management, Economics Group, Burgplatz
2, 56176 Vallendar, Germany, email:; Yurtoglu: WHU - Otto Beisheim School of Manage-
ment, Finance Group, Burgplatz 2, 56176 Vallendar, Germany, email: This paper
analyses two natural field experiments. The field experiments were pre-registered with the code AEARCTR-
0002757 and AEARCTR-0003646 under the title "Peer selection and performance - A field experiment in higher
education". We thank Steffen Loev, Marek Becker, and Andrija Denic for their extremely helpful assistance with
the data. We also thank Bernard Black, Robert Dur, Ayse Karaevli, Simeon Schudy, Gari Walkowitz, participants
of the Advances with Field Experiments Conference in Boston, and seminar participants at the Higher School
of Economics in Moscow, Humboldt University of Berlin, University of Trier, University of Duisburg-Essen,
University of Mannheim, Burgundy School of Business in Dijon, University of Amsterdam, and WHU - Otto
Beisheim School of Management for their helpful comments and suggestions on earlier versions of this paper.
Financial support by Deutsche Forschungsgemeinschaft through CRC TRR 190 (project number 280092119) is
gratefully acknowledged.
1 Introduction
In today’s highly complex economic environment, cooperation among individuals is crucial
for organizational success. As businesses become increasingly global and cross-functional, the
need for teamwork has been growing in all domains of work and life (O’Neill and Salas, 2018;
Cross et al., 2016). Indeed, firms and organizations create value by providing mechanisms
for people to work together, and to take advantage of complementarities in their skills and
interests (Lazear and Oyer, 2012). The nature and the effectiveness of teamwork in a variety
of productive activities matter for outcomes in diverse settings, ranging from entrepreneurial
ventures (Reagans and Zuckerman, 2019) to the mutual fund industry (Patel and Sarkissian,
2017), and from medical practices (Geraghty and Paterson-Brown, 2018) to research projects
seeking to achieve scientific breakthroughs (Wuchty et al., 2007).
Economists and management scholars have studied extensively the influence of various forms of
team incentives (e.g., team bonuses or tournaments) on team performance, while recognizing
the importance of cooperation in teams. Although research has shown that team bonuses and
team piece rates tend to have a positive effect on productivity (e.g., Englmaier et al., 2018;
Friebel et al., 2017; Hamilton et al., 2003; Erev et al., 1993), the evidence on the effects of team
tournament incentives on performance has been inconclusive (e.g., Delfgaauw et al., 2019, 2018;
Bandiera et al., 2013). Moreover, because the underlying team tasks these previous studies
examined varied, the transferability of existing findings to different types of tasks is limited. For
example, while some team tasks may require one person to be the team’s main driver, other
tasks may require all team members to pull in the same direction. Thus, as different team
tasks require different team compositions, which team assignment mechanism is used can have a
substantial impact on team performance.
Two potential mechanisms through which the team assignment process affects team performance
are the composition and the motivation of teams. For example, when people are allowed to
choose their teammates, they match with people they like (e.g., Curranrini et al., 2009; Leider
et al., 2009), but they also trade off both the pecuniary benefits of better cooperation and
the non-pecuniary benefits of working in teams with friends against the pecuniary benefits of
working with higher-ability team members (Bandiera et al., 2013; Hamilton et al., 2003).1
Our study analyzes how self-selection and random assignment influence composition, cooperation,
and performance of teams on different team tasks using two natural field experiments. We argue
that the impact of the team formation process hinges on the degree of skill complementarity
among the team members and on the collaborative efforts required to perform well on a particular
team task. When the team task requires the team members’ abilities to be substitutes, which
renders collaboration relatively unimportant, we expect to find that randomly assigned teams
perform better. In such cases, self-selection is detrimental to average team performance, because
it leads to a concentration of skills in some teams. Thus, when performing well requires high
levels of skill complementarity and collaboration, we hypothesize that self-selection is beneficial
for average team performance.
We embedded the experiments in a mandatory microeconomics course for first-year undergraduate
students at a major German business school. The course consisted of two parallel study groups
who were receiving the same course content from the same instructor. In the winter quarters of
2017/18 and 2018/19, two cohorts of students were randomly assigned to those study groups. In
one class, students were allowed to choose a teammate during the first week of class (treatment
Self ). In the other class, students were randomly assigned to a team of two during the first week
of class (treatment Random).
The teams had to work on two types of high-stakes tasks that varied in the distribution of the
work required to achieve a high level of performance that counted towards students’ course
grades: either a written task that required the team members to submit a written team solution,
or a video task that required the team members to submit a videotaped team solution in which
each of the team members was equally visible. The teams’ scores depended solely on the accuracy
of their solutions. Since the written task required the team to submit a joint written solution,
the contributions of the individual team members could be unequal. By contrast, the video task
required the two team members to be equally visible.
We find that compared to teams that were randomly assigned, teams that were self-selected
were more homogeneous in terms of their abilities, and had higher levels of perceived team
Laboratory experiments examined the link between different group formation mechanisms in cooperation
games. This literature has shown that cooperation in endogenously formed groups were similar to the contribution
levels in groups with exogenous matching (e.g., Gächter and Thöni, 2005; Guido et al., 2019; Chen, 2017).
cooperation. Furthermore, our results show that self-selected teams performed significantly worse
than randomly assigned teams on the written task, but tended to be better on the video task.
These findings can be explained by a simple formal model thatdemonstrates that the benefits of
self-selection in terms of the homogeneity of the team members’ abilities and the motivation
of the team members come into play only if the contributions and the cooperation of both
team members are needed to complete the task. However, if a task can be solved by one main
contributor, and the ability of the other team member and their cooperation were, therefore,
of little importance, random assignment may lead to superior average team performance, as it
generally results in a more equal distribution of abilities across teams. In other words, if the
skills needed to perform a task are substitutable, this task is, on average, performed better by
randomly assigned teams; whereas if the skills needed to perform a task are complementary,
and the level of cooperation required to complete the task was sufficiently high, the task is, on
average, performed better by self-selected teams.
This study adds to the small body of existing work on the consequences of team assignment
mechanisms in real-world settings. Chen and Gong (2018) found that university students who
self-selected their teammates performed better on a presentation task than students who were
randomly assigned to teams. Likewise, Dahlander et al. (2019) found that students who could
freely choose with whom they worked performed better when they were given an entrepreneurial
task than another group of students who were free to choose their entrepreneurial task.
Chen and Gong (2018) showed that self-selection led to a process of team formation that was
based on the members’ social connections rather than on their skills, neither they nor Dahlander
et al. (2019) examinee the mechanisms that underlie their findings. The question of whether
their results can be generalized to other settings and to other types of team tasks thus remains
open. Our setting, by contrast, allows us to shed light on several important mechanisms (task
characteristics, team composition, and cooperation) and to advance a straightforward explanation
for the existing findings.
Our study makes three contributions to the literature. First, using two randomized natural
field experiments, we tested how the self-selection of teams affected the composition of the team
members’ abilities, their cooperation levels, and their performance across different tasks. Second,
In a laboratory experiment Büyükboyaci and Robbett (2019) investigated the interaction of complementarity
of skills and specialization. They found that when specialization was not possible, self-selection had no effect
on performance; and that the option to specialize had a positive effect on performance, which was significantly
magnified when agents had a say in who joined their team.
we demonstrat that self-selection (compared to random selection) can have opposite effects on a
team’s performance depending on the task’s production function. Finally, our study combines
these insights to offer an explanation for why self-selected teams may be expected to perform
better than randomly assigned teams on highly collaborative tasks, but not on other types of
The paper proceeds as follows: Section 2 presents a slightly formalized exposition of how random
team assignment versus self-selection may affect team performance on different tasks; Section 3
describes the field experiment; Section 4 presents the results; and Section 5 concludes.
2 Team performance on different tasks: Relative importance of
abilities and collaboration
Though our field setting did not allow us to impose a specific production function for the team
tasks, and we do not intend to test a theoretical model of team performance, we use a short,
slightly formalized exposition that captures the key features of our experiment to facilitate
the development of our hypotheses. To illustrate how the composition of the team members’
abilities and the intensity of their collaboration may affect the team’s performance depending on
the type of task they are engaged in, we assume a hypothetical setting that involves two team
tasks that vary in their production function: Two individuals, denoted as
and j, form a team.
Each teammate has a uni-dimensional cognitive ability level
, and the team can invest
collaborative effort (q).
We assume that a team’s output, which determines their score, s, is given by:
max(ai, aj)α·min(ai, aj)β·qγ.
, and
represent the elasticities of the score with respect to the ability of the more able
teammate, the less able teammate, and the collaborative effort, respectively. In other words,
these parameters measure the responsiveness of the team’s output to a change in the levels of
the team members’ abilities and of the collaborative effort. This exposition allows us to capture
the intuition that the division of labor and the level of collaboration tasks require may differ.
To illustrate this intuition, we discuss two extreme examples. For example, if the structure
of a task requires that both team members implement a solution together, even if one team
member’s ability is more important in finding the solution (
α > β
), the abilities of both team
members, as well has the quality of their collaboration, matter for team performance; thus,
α >
, β >
γ >
0. Therefore, the team’s score on this kind of task – i.e., a task in
which the team members’ abilities and collaborative efforts are complements – is determined by:
sC=max(ai, aj)α·min(ai, aj)β·qγ.
However, if the task is best done by the most able person alone, the ability of the ablest team
member may be of paramount importance for team performance; thus, in such cases, the ability
of the other team member and team collaboration may not matter. Under these assumptions,
= 1
, β
= 0 and
= 0. The team’s score on this kind of task – i.e., a task in which the team
members’ abilities are substitutes – is thus given by: sN C =max(ai, aj).
If the score of one individual depends positively on the productivity of their teammate, there is
an incentive for subjects to match with a high-ability teammate. If the matching is two-sided –
i.e., if all individuals can actively search for a teammate – the subjects will assortatively match
by ability. This tendency results in high-ability individuals forming teams with other high-ability
individuals, and low-ability individuals forming teams with other low-ability individuals. If the
productivity of one individual additionally depends on the team’s collaborative efforts, there
is an incentive to choose teammates who are likely to put in considerable effort. In line with
this theoretical result, the empirical literature has suggested that when subjects are allowed to
choose their teammates, they tend to choose teammates who have similar abilities, and with
whom they are acquainted (Leider et al., 2009; Ai et al., 2016; Chen and Gong, 2018). Based
on this reasoning, we would expect to find that the maximum ability is, on average, lower in
self-selected teams than in randomly assigned teams, because high-ability individuals tend to
cluster in some of the teams. Nevertheless, we would expect the levels of collaborative effort to
be higher in self-selected teams, as the team members may enjoy working together more, and
may thus work together more productively than the team members in randomly assigned teams.
Combining the above strands of reasoning, we expect to observe that the performance of
randomly assigned teams is, on average, better and more heterogeneous if they are performing
a task in which the team members’ abilities are substitutes, and collaboration is unimportant.
Furthermore, we expect to find that the benefit of randomly assigned teams over self-selected
teams is smaller when they are performing a task in which the team members’ abilities are
complements, and collaboration matters. Thus, we also expect to observe smaller differences in
the performance of the randomly assigned teams and the self-selected teams on the video task
than on the written task. If
is sufficiently small (i.e., if the ability of the lower-ability team
member, as well as the collaborative effort of both team members are sufficiently high for the
team to perform well), then self-selected teams may even outperform randomly assigned teams
in the latter task.
3 Study
3.1 Context and background
The field experiment was conducted with students of the BSc program at a well-known German
business school between October 2017 and April 2019. The business school offers university
education in business administration, with degrees at the BSc, MSc, MBA, and PhD levels,
as well as executive education programs. The school has around 2,000 students. At the BSc
level, the school offers the International Business Administration program. In academic year
2017/2018, a total of 672 students were enrolled in the program, 26% of whom were female.
Studying the impact of team formation mechanisms on team performance requires an environment
in which participants can choose teammates, in which the selection mechanism can be exogenously
varied, and in which team performance can be objectively measured. The environment of the
business school class we studied fulfilled all of these criteria, while allowing us to maintain a high
degree of control. Furthermore, to observe self-selection not only on demographic characteristics,
but also on ability, we needed a sample of participants who were already acquainted with each
other. This was assumed to be the case for our student subjects, given that at the point in time
when they were attending the class, they had already completed courses together, and had ample
opportunities to get to know each other through extracurricular activities (e.g., through student
societies and sports teams; and through involvement in music, drama, political campaigning, or
community work) that took place at the business school.
Figure 1: Sequence of events and data sources
- Team work quality
- Course quality
- Performance beliefs
- Social preferences
- Discussion
- Presentation
- Interview
- Analytical test
Figure displays the variables and the sequence of events in the experiments. The sequence of events is
the same for both Experiment I and Experiment II.
3.2 Experimental timeline and treatments
The field experiments took place in the Microeconomics I course, with two cohorts of first-year
students in the BSc program in International Business Administration participating. In each
cohort, students were randomly assigned to two separate classes, both taught by the same
instructor (one in the morning and one in the afternoon of the same day). During the first week,
students learned that to fulfil the course requirements, they had to complete two tasks in teams
of two, and to pass an exam at the end of the quarter. As the instructor did not announce any
task-specific details about the team tasks in the first week, the students only knew that these
tasks were take-home assignments that they had to complete during study hours, and that they
would have to complete both tasks together with the same team member, because re-matching
was not permitted.
For each cohort, in one class – i.e., the Self treatment – the instructor told the students on the
first day to form a team with a fellow student of their choice. The students had to write down
their team’s composition and submit it to the instructor before the second meeting. In the other
class – i.e., the Random treatment – the students were randomly assigned to a team of two, and
they were informed of their team’s composition by email before the second meeting.
The first team task was assigned to the students in mid-November, and had to be completed by
early December. The second team task was assigned to the students in early December, and
had to be completed by the end of January. The final exam took place in March. During the
course, the students received no feedback on their performance on the team tasks. After the
final exam, the feedback consisted only of the students’ overall course grades. Upon request, the
students could also receive detailed information about both their team’s performance on the
different tasks, and their individual performance on the exam. Figure 1 displays the timeline of
the experiment.
In the winter quarter of 2017/18 (Experiment I, n=190, 31% female) the students completed two
written team tasks. In the winter quarter of 2018/19 (Experiment II, n=192, 29% female), the
first task was a written task, and the second task was a video task. Across the two experiments,
the first task was identical, and the students were supposed to submit their solutions in written
form. By contrast, the second task differed across the two experiments, although it had very
similar content. In Experiment I, the students were supposed to submit their solutions in
written form; whereas in Experiment II, students were required to videotape their solution. This
design allowed us to identify interaction effects of the team formation mechanism with the task
characteristics, as well as heterogeneous trends in collaboration across the treatments.
3.3 Tasks
Given that we expected the effect of the team assignment mechanism on team performance to
hinge on the degree to which the abilities of both team members and their levels of collaboration
mattered for productivity, we aimed to design two types of tasks that required the same levels
of cognitive ability, but that differed in the extent to which they required both inputs and
collaboration from both team members. We chose to use microeconomics exercise sets that
required very similar cognitive skills to complete, but for which the solutions were submitted in
different forms: i.e., through a written or a videotaped presentation. The students’ submissions
for both types of tasks were evaluated based on whether they gave correct, concise, and coherent
answers to the microeconomics problems. However, the instructions for the video task contained
the additional requirement that both team members present part of the solution.
In the written
tasks, the students were required to reach an agreement about which of the teammates’ solutions
was best. However, the students could produce the correct solution by themselves. In contrast,
for the solution to the video task to be considered acceptable, the students had to jointly prepare
3The exercise sets appear in the online appendix.
the presentation, and each student had to correctly present part of the solution, which required
a higher level of cooperation and inputs from both team members.
The written task consisted of problems for which students had to submit written solutions. These
problems called for the application of the theoretical knowledge that the students had acquired
during lectures, such as analyzing demand patterns, calculating market outcomes, or designing
pricing strategies. Providing a solution involved explaining the theoretical background, applying
a correct approach to the solution, and performing a series of calculations that possibly included
one or two graphs. In addition, the instructions for the written tasks specified that the students
had to present their written answers clearly. The answers could be either typed or handwritten,
but they had to be legible.
The video task consisted of questions for which students had to submit their solutions in a
five-minute video. The questions required a level of microeconomics skills very similar to that
required in the written task, and the solutions also consisted of explaining the theoretical
background, applying a correct approach to the solution, and performing a series of calculations.
The teams were allowed to use whiteboards, graphs, illustrations, and slides to make their videos
more effective. In addition, the instructions specified that the video should be comprehensible;
i.e., that the presenters’ speech should be understandable. The instructions further stated that
the teams could use their smartphones to produce the video, and that the technical quality of the
video itself would not be graded. Finally, and crucially, the instructions stated that both team
members, along with their individual contributions, had to be visible in the video. The lecturer
explained that videos in which only one team member could be seen giving the presentation
were not acceptable. All students’ submissions met this criterion, and were thus evaluated for
their correctness, conciseness, and coherence.
3.4 Data
Data for the study were gathered from three sources (see Figure 1). The pre-experiment data
contained the students’ high school performance (GPA) and their performance on the business
school’s admission tests. Both the GPA and the results of the admission tests were independent
measures of each student’s academic ability prior to the experiment, as they were not affected
by their peers at the business school. Moreover, our endline data included information on each
student’s performance on the two team tasks and on the final course exam at the end of the
quarter. The data also included information on each student’s perceptions of the cooperative
behavior within their team, their relationship with their team member, their evaluation of
the teacher’s performance in the course, and an incentivized measure of pro-sociality collected
through a post-experiment survey that was conducted after the final exam and before the
students received feedback about their performance.
3.4.1 Pre-experiment ability measures
Our pre-experiment ability measures came from the business school’s student registry; specifi-
cally, from its admissions data.
The business school’s program, which is known to be highly
competitive, uses a selective admissions procedure. In the first step of the admissions process,
applicants to the BSc program provide basic demographic information and their high school
grade point average (GPA).
The admissions office ranks applicants by their GPA, and invites the
top 10% to an admissions day, where the applicants take a written test designed to measure their
analytical reasoning (quantitative) skills. They also take an oral test that has a presentation,
a group discussion, and an interview component, and is intended to measure the applicants’
communication, social, and problem-solving skills, and to assess whether they are a good fit for
the program. The components of the oral test are each rated by two independent evaluators,
whose ratings are then averaged.
For our analysis, we will use the students’ GPA scores, the quantitative part of the admission
test (henceforth called the Analytical Test), and an aggregate measure of the oral part of the
admission test (henceforth called the Admission Test).6
We did not have access to an IRB at the beginning of the project, we could not obtain a formal IRB approval.
In the students’ contract with the school, they consent to the anonymous processing of their data. The agreement
stipulates that the university can use the administrative data for statistical and scientific purposes. Moreover, the
variation was implemented with the permission of the business school’s Dean of Studies and is within the normal
range of changes the private business school regularly implements to improve its teaching.
The German GPA (Abiturnote) ranges from 4.0 (sufficient) to 1.0 (excellent) grade and is the most important
criterion for university admission in Germany (e.g., Fischer and Kampkötter, 2017). Our entire sample had an
average GPA of 1.79 (SD=.504). For our analysis, we inverted the GPA so that higher values indicated better
high school performance.
6For each student, we averaged the scores over all components and standardized them.
3.4.2 Team outcomes
Each student’s performance on both team tasks and the individual exam determined their final
grade. Each team received a common grade for their performance per task, and each task had a
weight of 15% toward the individual final grade. The exam was written at the end of the course,
and contributed 70% to the final grade. A teaching assistant who had previous experience with
the course, but who was unaware that an experiment was taking place, graded the students’
performance on the written and the videotaped team tasks, and on the exam.7
3.4.3 Post-experiment survey
On the day following the final exam, we invited the students to take part in an online post-
experiment survey.
This survey elicited the students’ perceptions of the quality of the collab-
oration in their team, of their relationship with their team member, and of the teaching. To
incentivize participation, we used a raffle in which one survey participant was picked randomly
to receive a 200 EUR reward. For an incentivized measure of the students’ pro-sociality, we
asked the students what fraction of this amount they would like to donate to UNICEF if they
4 Results
We begin our analysis by establishing the internal validity of our experimental approach. We
show that the student sample did not differ between the treatments on any observable variable
elicited before the experiments. We then present the experimental results together with an
analysis of the effects of the two assignment mechanisms on team performance, our primary
outcome measure. Next, we show how the two assignment mechanisms affected the team
formation process, while focusing on the effects of these mechanisms on the composition and the
cooperation levels of the teams.
As we were concerned that the ratings of the video task might suffer from low reliability due to the video
format, we subjected them to a validation exercise. In this exercise, two additional independent raters rated the
videos based on the same instructions as those used by the teaching assistant who made the original assessments.
The additional ratings had correlations of 0.72 and 0.71 with the original rating, and a correlation of 0.81 between
each other. Thus, the reliability of the presentation ratings can be considered satisfactory. All results were found
to be robust to using these additional ratings.
8The survey was accessible until the exam grades were published; which usually takes up to six weeks.
Table 1: Randomization checks
Experiment I Experiment II Experiment I + II
Variable Self Random p-value Self Random p-value Self Random p-value
GPA .057 -.053 .195 .074 -.084 .523 .066 -.068 .134
Analytical test .022 -.010 .889 -.056 .064 .428 -.019 .026 .666
Admission test -.039 .035 .479 .037 -.042 .591 .001 -.002 .864
% female .287 .323 .593 .356 .220 .038 .323 .273 .282
Note: Descriptive statistics of pre-experiment data, admission test scores and pro-sociality. GPA is inverted and
z-standardized, with a higher GPA indicating better school performance. The Analytical Test, and the Admission test
are z-standardized. The p-values are from a Mann-Whitney U test (MWU) test (two-sided) comparing differences in
mean ranks between the two treatments. The p-values for the comparison of %female are from a
-test (one-sided).
Unstandardized values (Table A.1) and a correlation matrix (Table A.3) appear in the appendix.
4.1 Randomization checks
Table 1 provides an overview of the properties of our sample in the treatments and the experiments.
We show separate summary statistics for Experiment I and Experiment II, and pooled statistics
for both experiments. The table shows that the randomization was successful in producing
highly similar groups based on observable characteristics, such as high school performance (GPA)
and performance on the admission tests. The only characteristic that differed significantly
between treatments in Experiment II was the percentage of female students (
We therefore provide results from two regression specifications, both with and
without controlling for gender (and other observables).
4.2 Team performance
Our primary outcome measure, team performance, is the score that the teams received for their
work on two separate tasks during the quarter. We summarize our results in Figure 2, which
plots the standardized average team score for each task by treatment, and also shows individual
exam performance. The left panel shows the outcomes for Experiment I, while the right panel
shows the outcomes for Experiment II.
For Experiment I, in which the solutions to the first and the second team tasks had to be
submitted in written form, the figure indicates that, on average, the teams in Random performed
better than the teams in Self. A non-parametric comparison of average team scores yielded
a significantly lower score for the teams in Self than for the teams in Random (
9Unless otherwise stated, all p-values are based on two-sided tests.
Figure 2: Team assignment, performance, and task characteristics
Experiment I Experiment II
Performance (z−Score)
1st team task
2nd team task
1st team task
2nd team task
Self Random 95% Confidence interval
Figure shows the average team performance (z-standardized) for the tasks in our experiments. The left panel
shows the results from Experiment I, while the right panel shows the results from Experiment II.
Mann-Whitney U test; hereafter, MWU test). The results of a non-parametric test for the
equality of variances between the treatments underlined this pattern, and showed that the
variance of team performance was significantly larger in the Self treatment (
=.002, Levene’s
We also observed no change in performance over time. A comparison of the average
performance between the first and the second team tasks revealed no significant differences (Self :
p=.885, Random:p=.9291, MWU test).
First, for Experiment II, the figure indicates that the teams in Self performed worse on the
written task than those in Random, while the effect appears to have reversed when the teams
were working on the video task. Indeed, in the first team task of Experiment II, we replicated
the observed pattern of Experiment I. The teams in Self performed marginally significantly
worse than those in Random when the task was written (
=.064, MWU test), but this time the
variances were not significantly different (
=.194, Levene’s test). Figure 2 appears to show that
A separate analysis of the first and the second team task yielded similar significant differences in averages
team task:
=.011, 2
team task:
=.068, MWU test), and (marginally) significant differences in variances
team task:
=.104, 2
team task:
=.001, Levene’s test). A detailed pairwise comparison appears in
Table A.2 in the appendix.
the average team performance was higher in Self than in Random in the second team task (the
video task). However, the results of non-parametric tests comparing the mean and the variance
of average team performance between Self and Random did not reject the null hypothesis that
the performance in both treatments was equal (
=.156, MWU test;
=.381, Levene’s test).
This time, however, we observed a large change in performance between the two tasks. The
performance of the self-selected teams was marginally significantly better on the video task than
on the written task (
=.0790, Wilcoxon Signed Rank test; hereafter, WSR test), while the
performance of the randomly assigned teams was marginally significantly worse on the video
task than on the written task (p=.0556).
More evidence for this change in behavior across types was provided by a difference-in-difference
analysis. Calculating the difference between the first and the second team task for both experi-
ments and comparing them between treatments yielded a significant difference for Experiment
II (p=.0095, MWU test), but no significant difference for Experiment I (p=.9424).
Furthermore, the figure also shows that the exam performance was unaffected by the treatment.
Neither the average student performance nor the variance of the student performance on the
final exam differed significantly across treatments (Experiment I:
=.455, MWU test;
Levene’s test; Experiment II:
=.984, MWU test,
=.603, Levene’s test). This finding indicates
that the team assignment mechanism did not have a spillover effect on exam performance. It
can also be seen as evidence that the effectiveness of teaching did not differ between the two
treatment groups, and, therefore, that the lecturer’s behavior was unlikely to have influenced
the different levels of team performance.11
Second, we ran regressions controlling for pre-experiment observables to verify these observations.
To do so, we analyzed the teams’ performance on the first (written) and the second (written
or video) team tasks. The first team task in both experiments was identical, with the students
submitting their work in writing. To test the influence of the task characteristics on the team
performance, we varied the second team task. In Experiment I, the teams had to submit their
solutions in written form, while in Experiment II, the teams had to submit video clips (as
described earlier).
To check whether the students perceived the quality of the teaching differently between the two treatments,
we included four items in our post-experimental survey. We did not observe a significant difference between the
two experimental conditions for any of these questions. This finding supports our assumption that the teacher
had no influence on the study results. In Panel C of Table 6, we display the respective items and results.
Table 2: Predicting team performance
Dependent variable:
Performance on 1st team task Performance on 2nd team task
(Exp. I and II: written) (Exp. I: written, II: video)
Independent variables (1) (2) (3) (4) (5) (6)
1 if Self -0.415*** -0.473** -0.478** -0.107 -0.496** -0.521***
(0.141) (0.201) (0.201) (0.145) (0.201) (0.199)
1 if Experiment II -0.045 0.031 -0.392** -0.320*
(0.163) (0.149) (0.177) (0.163)
Self x Experiment II 0.114 0.029 0.775*** 0.728**
(0.283) (0.276) (0.288) (0.286)
GPA 0.117** 0.123**
(0.057) (0.061)
1 if female -0.024 -0.250
(0.119) (0.154)
Admission Test 0.006 -0.080
(0.052) (0.053)
Constant 0.212*** 0.234** 0.240** 0.054 0.245*** 0.332***
(0.081) (0.114) (0.121) (0.089) (0.086) (0.107)
Observations 382 382 377 382 382 377
R-squared 0.043 0.044 0.067 0.003 0.041 0.067
Note: Columns (1) - (3) show OLS regressions of z-standardized team performance on the first task.
In both experiments, the students had to submit a written solution to the task. Columns (4) - (6)
show OLS regressions of z-standardized team performance on the second task. In Experiment I, the
students had to submit a written solution to the task; while in Experiment II, the students had to
submit a video clip. The control variables are GPA, admission test scores, and gender. GPA and
Admission Test have been z-standardized. Standard errors clustered on teams are in parentheses.
Significance indicators: ∗∗∗ p <.01, ∗∗ p <.05, p <.1.
Table 2 shows the results of OLS regressions with standard errors clustered at the team level,
where the the dependent variable is the team performance (z-standardized) for the both team
tasks, separately. In Models (1)-(3), we predicted the team performance on the first team task.
Model (1) included only a dummy variable for the Self treatment (“1 if Self ”). The self-selected
teams performed, on average, .415 (
=.004; CI = [-.694; -.136]) standard deviations worse on
the first task than the randomly assigned teams. Model (2) included a dummy variable for
the experiment (“1 if Experiment II”) and an interaction term of the Self treatment and the
experiment (“1 if Self x Experiment II”) to control for potential interactions. While both of
these control variables remained insignificant, the coefficient on the treatment dummy Self
remained significant and almost unchanged at -.473 (
=.02; CI = [-.870; -.077]), which indicates
that for the first task, the treatment effect was not significantly different across the experiments.
In Model (3), we included additional controls, and found that the treatment effect was not
affected by their inclusion (
=.018; CI = [-.875; -.082]). Interestingly, we found that
the students’ GPAs, but not their admission test scores, predicted the team performance.
Next, we studied the second team task. The regression results appear in Models (4)-(6). In
Model (4), we pooled observations from both experiments (ignoring the type of task), and
included only a treatment dummy. Consistent with the results of the non-parametric analysis,
we found no significant effect of self-selection, which suggests that a meaningful investigation of
the effects of the team assignment process on performance should take into account the task
characteristics. After we controlled for the experiment and interacted with the treatment, we
found that the teams in the Self treatment in Experiment I performed .496 (
=.014.; CI=[-.892;
-.100]) standard deviations worse on the second task than the teams in the Random treatment
(model 5). We thus found very similar treatment effects for the first and the second tasks in
Experiment I, which suggests that there was no heterogeneous learning across treatments, and
that the ordering of the tasks did not matter. In addition, after adding up the first and the third
coefficients in Model (5), we found that in Experiment II, the teams in Self tended to perform
.279 standard deviations better than the teams in Random on the second task (a video).In line
with the non-parametric analysis, a joint F-test showed that this difference was not significant
(p= .1774). Adding additional control variables did not significantly change the coefficients.
Interestingly, we again found that GPA positively predicted the performance on the second team
4.2.1 Heterogeneity analysis
When we split the sample at the median high school GPA, we found (in Table 3) that the Self
treatment had significant negative effects on the team performance on the first written task
(Model 1) for both low-ability (
=.040; CI=[-.979; -.026]) and high ability students (
=.001; CI=[-.669; -.169]). These effects were not significantly different from each other.
Furthermore, the Self treatment had a significantly negative effect on the team performance of
high-ability (
= -.371,
=.067; CI=[-.749; .05]) and low-ability (
=.010; CI=[-1.377;
-.198]) students on the second written task (Experiment I only). The negative effect of the
Self treatment was significantly larger for low-ability students than for high-ability students.
When we looked at the video task (only in Experiment II), we found that high-ability students
tended to perform better in the Self treatment than in the Random treatment; however, this
difference was not significant (
=.174; CI=[-.141; .73]). We did not find that the team
performance on the video task of low-ability students was significantly different between the
two treatments (
=.984; CI=[-.564; .573]). Overall, the results of this heterogeneity
analysis suggested that allowing for self-selection into groups harmed the performance of both
low- and high-ability students on the written task, but that low-ability students tended to suffer
more. However, self-selection did not harm the performance of low-ability students on the video
task, while it tended to benefit the performance of high-ability students on this task.
Table 3: Heterogeneity analysis
Dependent variable: Performance on ...
... 1st team task ... 2nd team task
(Exp. I + II: written) (Exp. I: written) (Exp. II: video)
Independent variables (1) (2) (3)
1 if Self -0.491** -0.782*** -0.006
(0.238) (0.296) (0.293)
Self x GPA >Median 0.046 0.411 0.303
(0.237) (0.268) (0.313)
1 if GPA >Median 0.156 0.037 0.037
(0.163) (0.283) (0.265)
Constant 0.160 0.348** -0.048
(0.142) (0.152) (0.207)
Self + (Self x GPA >Median) -0.445*** -0.371* 0.297
(0.133) (0.200) (0.217)
Observations 377 189 188
R-squared 0.070 0.130 0.043
This table displays the result of a OLS regression analysis (robust standard errors clustered on the team level in
parentheses). All specifications include GPA, Admission Test, and female as control variables. All scores have been
z-standardized. Significance indicators: ∗∗∗ p <.01, ∗∗ p <.05, p <.1.
4.3 Team formation
4.3.1 Ability composition
In this subsection, we investigate how allowing team members to self-select affected the team
composition, and how the team composition affected the team performance. We begin by
looking at how students (in the Self treatment) formed teams. To do so, we used pre-experiment
Table 4: Self selection and composition of teams
Observed Simulation
Variable Self Random Random
GPA .978 1.093 ∗∗ 1.118∗∗∗
% female .204 .348 ∗∗ .409 ∗∗
Analytical Test 1.012 1.150 ∗∗ 1.167∗∗∗
Admission Test 1.237 1.072 1.096
Note: The table displays the average absolute differences between teammates
on the pre-experiment observables. Simulation Random denotes the average
absolute difference for the respective variable from a simulation in which we
pairwise matched all students within a treatment within an experiment. The
stars indicate the two-sided significance level of a WSR test that compared
the observed score from Self the simulated value from Simulation Random,
or the outcome of a MWU test that compared the observed score from Self
and the observed score from Random. Significance indicators:
∗∗∗ p <
p <.05, p <.1.
registry data on each student’s ability (measured as their performance on the various tasks in
the admission test and their GPA), gender, and an incentivized measure of pro-sociality from
the post-experiment survey. For each team and measure
, we calculate the absolute difference
between both teammates
mij =|xixj|
where iand jare teammates.
Thus, lower absolute differences indicate that the teammates were more similar, and higher values
indicate that they were less similar. If the students in the Self treatment were matched on certain
measures, we would observe a higher degree of similarity; i.e., a lower average absolute difference.
Moreover, as a reference point, we calculated the average absolute difference after simulating
the matching of each student with all potential teammates from the respective treatment. This
simulation provided us with information about what a hypothetical within-sample random team
composition might look like.
The results appear in Table 4. The first column shows the absolute differences for all measures in
Self, while the second column shows the absolute differences for all measures in Random and the
simulated Random “treatment. A comparison of the values suggests that the students sorted
themselves into teams with students of similar levels of ability and of the same gender. More
specifically, we observed that the self-selected teams were more similar in terms of their GPAs,
their scores on the written admission test, and their gender. These differences were significant
=.0111; Analytical test:
=.0155; Female:
=.0155, MWU test). Interestingly, we
did not find significant differences in the Self treatment or the Random for the admission test.
Up to now, we have made two main observations: First, we have established that the teams in
Random performed better on the written task, for which less skill complementarity was needed;
and that the team performance on the video task, for which more skill complementarity was
needed, did not differ between Random and Self. Second, we showed that the skill composition
of the teams differed between Random and Self ; i.e., that in the latter treatment, the students
tended to choose a partner with similar skills.
To better understand the role of individual skills in team tasks, we will now focus on the
relationship between skills and team performance in Random.
We operationalized each
student’s ability with their exam score, as this measure appeared to be a reliable measure of
ability in the context of this course.14
In Table 5, we predicted team performance. As independent variables, we used the maximum
ability and the minimum ability of the team members. For the written task (Model 1 and 2),
we observed that – if anything – the maximum ability tended to positively influence the team
outcome (
=[.102, .164]). For the same models, the coefficients for the minimum ability were
very small and negative (
= [-.0719, -.0132]). For the video task (Model 3), we observed that
both coefficients tended to positively influence the performance (β=[.359; .108]).15
While the signs of the coefficients on max(
)and min(
)were consistent with our line of
reasoning, both variables lacked statistical significance, and the predictive power of the model
was low. Given the small sample size, we cautiously interpret these results as being mildly
The Admission Test score also did not correlate with the performance on the different team tasks (see Table
A.3). In principle, it is possible that we were lucky in the team composition in Random. For this reason, we show
the results of the simulation in the third column of the table. Comparing Self with the results of our simulation
yielded similar results.
We concentrated our analysis on this treatment, since we can be sure that in Random, the composition of
the team members’ abilities was exogenous and was confounded by other factors of the team member selection
process, unlike in Self.
The students’ GPAs or scores on the analytical test contained more noise, but yielded qualitatively similar
Figure A.2 , we display the linear relationships between the team performance and individual abilities. The
black line shows the relationship for the team member with the highest ability (max(
ai, aj
)), and the gray line
shows the relationship for the team member with the lowest ability (min(
ai, aj
)). In line with our reasoning above,
we observed a positive relationship between the team performance and the maximum ability for the task that
required low levels of skill complementarity. It appears that the minimum ability had no impact on the teams’
outcomes. For the video task, in which higher levels of skill complementarity were required, both the maximum
and the minimum ability had a positive impact on the team performance.
Table 5: Team performance and individual abilities
Dependent variable: Log Performance on ...
...1st team task ...2nd team task
(written) (written) (video)
Independent variables (1) (2) (3)
max(ai;aj)0.164 0.102 0.359
(0.88) (0.62) (1.22)
min(ai;aj)-0.0719 -0.0132 0.108
(-1.03) (-0.24) (0.79)
Constant 2.100∗∗∗ 2.116∗∗∗ 1.205∗∗∗
(4.87) (5.72) (1.84)
Observations 92 48 44
Experiment (I+II) I II
R20.020 0.008 0.084
Note: The table displays regression coefficients (Standard errors are in brackets) of OLS
regressions. Column (1) predicts the log transformed team performance on the first task across
both experiments in Random. In both experiments, the students had to submit a written
solution to the task. This model includes a dummy variable for the experiments, which was
insignificant (
=-.0000188) and is not displayed. Columns (2) and (3) show OLS regressions
for the second task. Significance indicators: ∗∗∗ p <.01, ∗∗ p <.05, p <.1.
suggestive of differences in the relationship between the composition of the team members’
abilities and the team performance.
4.3.2 Perceived cooperation
A second mechanism that we hypothesized to be affected by the treatment variation and to
influence the team performance was the quality of cooperation. In our post-experiment survey,
we asked the students to evaluate their collaboration experience in their team during the course
(see Table 6 for an overview of all of the questions).
We asked the students to agree or disagree (on a 7-point Likert scale) with several statements
aimed at capturing various aspects of team collaboration and organization. More specifically, we
also asked questions about the perceived quality of the cooperation and the pleasure of working
Table 6 displays the results from the post-experiment survey, pooled for Experiments I and
II, and for each experiment separately. When asked about their experience during the task,
the students in Self reported that they communicated more (“We communicated a lot”;
p <
.0001, MWU test) and that they cooperated better (“We helped each other a lot”;
than the students in Random. Moreover, they indicated that the teammates’ contributions were
more equally distributed (“Both team members contributed equally”;
=.021), and that both
teammates exerted effort (“Both team members exerted effort”;
=.002). These comparisons
clearly show that the teams in Random used a different approach to solving the problem sets
than the teams in Self, likely by assigning the task to the more able teammate, but also by
cooperating less.17
Furthermore, we found that the students’ moods (“The mood in our team was good”;
levels of stress (“Our team was very stressed.”;
=.134) and motivation levels (“Our team
was very motivated”; p=.151) for the teams in Self did not differ from those for the teams in
Although the students in Self were more likely to report being friends (“My team member was a
We also ask a battery of questions about the perceived teaching quality, which might have influenced
performance. However, we found no significant differences between the treatments and experiments, which
indicates that the lecturer’s teaching was of the same quality in both classes and experiments.
As Table 6 shows, these differences between the treatments were mostly driven by the reports of students from
Experiment I, in which the students were not explicitly required to collaborate. As the video clip in Experiment
II required each teammate to cooperate equally and to appear in the video to present the results, the students
might have tried to fulfil this expectation. Therefore, a desirability bias might explain why we did not find as
strong a difference in self-reported cooperation in Experiment II as we did in Experiment I. Our finding that the
average ratings of cooperation also tended to be higher in Self in Experiment II than in Experiment I points in
the same direction.
Table 6: Overview of survey items and survey results
Experiment I Experiment II Experiment I + II
Survey item Random Self Random Self Random Self
A. Perceived quality of cooperation (1=Not agree, 7=Completely agree)
We communicated a lot. 5.16 <6.08 5.58 <∗∗ 6.14 5.35 <∗∗∗ 6.11
We helped each other a lot. 5.41 <∗∗ 5.95 5.82 <6.07 5.60 <∗∗ 6.01
Both team members exerted effort. 5.46 <∗∗ 6.07 5.93 <6.34 5.67 <∗∗∗ 6.20
Both team members contributed equally. 5.12 <5.59 5.44 <5.87 5.26 <∗∗ 5.73
Our individual skills complemented very well. 4.99 <∗∗ 5.56 5.26 <5.52 5.11 <∗∗ 5.54
Our team was very stressed. 2.87 <3.30 2.53 <2.69 2.71 <3.00
Our team was very motivated. 5.53 <5.78 5.70 <5.85 5.61 <5.81
The mood in our team was good. 5.79 <6.10 6.19 <6.27 5.98 <6.18
The coordination of our team was very good. 5.03 <5.38 5.56 <5.59 5.27 <5.49
I was dominant in leading the team. 4.49 >4.22 4.30 >4.21 4.40 >4.22
One person was dominant in leading the team. 4.10 >3.84 4.14 >3.85 4.12 >3.84
B. Attitude towards the other (1=Not agree, 7=Completely agree)
My team member is a friend. 3.82 <∗∗∗ 6.25 4.33 <∗∗∗ 6.06 4.06 <∗∗∗ 6.15
I knew the team member very well before the course. 2.60 <∗∗∗ 6.19 2.93 <∗∗∗ 5.66 2.75 <∗∗∗ 5.93
C. Perceived teaching quality (1=Not agree, 7=Completely agree)
I learned a lot from the professor to complete the exercises. 5.57 <5.66 5.77 >5.34 5.66 >5.50
The professor asked questions to test our understanding. 5.72 <5.74 5.88 >5.55 5.79 >5.65
Professor was too fast in explaining the contents. 2.41 <2.62 2.58 <2.86 2.49 <2.74
The lecturer spend too much time on simple things. 3.28 <3.30 3.16 <3.38 3.22 <3.34
The professor gave too complicated answers. 2.24 <2.23 2.18 <2.62 2.21 <2.42
Observations 68 73 57 71 125 144
Table reports descriptive statistics of student responses in the post-experimental survey. P-values stem from a two-sided Mann-Whitney U test for a comparison of
averages between Self and Random. Significance indicators: ∗∗∗ p <.01, ∗∗ p <.05, p <.1.
p <
.0001) or having been acquainted with their teammate before the course (“I knew
the team member very well before the course”;
p <
.0001), it was not the overall pleasure of
working together, but rather the higher level of cooperation, that was different between the
teams in the two treatments.
These findings highlight a potentially important channel through which random assignment
may have increased performance on the written task, while tending to decrease performance
on the video task. For the written task, which required less collaborative effort, letting the
ablest student perform the task was most efficient; whereas for the video task, which required
more collaborative effort, both the communication and the coordination worked better in the
self-selected teams.18
5 Conclusion
This paper has provided evidence from natural field experiments that studied how team formation
processes influenced team performance. We used data on students’ individual characteristics
and behavior at a business school to examine the effects on team performance of varying both
the team formation process and the skill complementarity needed to perform well on different
tasks. The results of our randomized field experiments add a new dimension to the debate
on the effects of the team formation process on team performance. Previous experiments did
not use objective ability measures to capture team formation patterns, and they did not offer
an explanation for the observed effects of the team formation process on the team members’
performance on different tasks. By contrast, we used data on student ability generated prior to
the experiments to study how the team formation process affected the teams’ abilities and social
composition, which, in turn, affected the teams’ cooperation and performance on team tasks
with different skill complementarities.
We found that the team formation mechanism chosen for assigning subjects to teams was a
70% (Experiment I: 74%, Experiment II: 67%) of the students responded to our request to participate in the
survey. We tested and found no significant difference in the fraction of participating students between the Random
and the Self treatment (Experiment I:
=.282, Experiment II:
test). Furthermore, participation in
the survey was balanced in terms of GPA (
=.466, MWU test), the analytical test scores (
p=.334, MWU test), and gender (p=.730, p=.822, χ2test).
useful tool for strategically inducing performance.
Importantly, this relationship hinged on
the specific requirements of the underlying task. When the subjects were allowed to choose
their teammate, the team assignment mechanism substantially influenced their performance on
the team tasks through assortative selection patterns. These selection patterns proved to be
performance-enhancing when the underlying task required a high degree of skill complementarity.
In contrast, the random assignment of teammates led to better team performance when the
task required little or no skill complementarity. After the students completed the team tasks,
we measured the individual performance of the subjects, and found no differences between the
team formation mechanisms, which indicates that the effect observed at the team level did not
translate into individual performance differences.
Our study offers insights for managers and team leaders; i.e., for individuals who decide how
teams are put together in firms and other organizations. If managers want to maximize team
performance, they first need to consider the type of task involved before deciding whether
employees should be able to self-select their teammates. Given that randomly assigned teams
can produce superior outcomes for tasks that are characterized by a low level of collaboration
intensity, our findings also reveal a weakness in the trends towards more “agile work practices”
(e.g., Mamoli and Mole, 2015), which give employees the freedom to choose their working groups
regardless of the circumstances.
Moreover, our results also provide insights into the trade-off between diversity and ability. When
managers want to create a more inclusive work environment by forming more diverse teams or
teams with similar average skill levels, random team assignment might prove more beneficial.
Our field experiment showed that students are more likely to match with teammates of the same
gender when they are allowed to self-select. This finding suggests that self-selection might create
not just inequalities in abilities across teams, but also less gender-diverse teams.
In this study, we focused on the contrast between self-selection and random assignment. An alternative
approach would be to assign subjects based on algorithms that maximize team performance (e.g., Wei et al.,
2020). For tasks with low collaboration intensity, this could be an algorithm that maximizes the differences in the
team members’ abilities.
Ai, W., R. Chen, Y. Chen, Q. Mei, and W. Phillips (2016). Recommending teams promotes proso-
cial lending in online microfinance. Proceedings of the National Academy of Sciences 113(52),
Bandiera, O., I. Barankay, and I. Rasul (2013). Team incentives: Evidence from a firm level
experiment. Journal of the European Economic Association 11(5), 1079–1114.
Büyükboyaci, M. and A. Robbett (2019). Team formation with complementary skills. Journal
of Economics & Management Strategy 28 (4), 713–733.
Chen, R. (2017). Coordination with endogenous groups. Journal of Economic Behavior &
Organization 141 (5), 177–187.
Chen, R. and J. Gong (2018). Can self selection create high-performing teams? Journal of
Economic Behavior & Organization 148, 20–33.
Cross, R., R. Rebele, and A. Grant (2016). Collaborative overload. Harvard Business Review.
Curranrini, S., M. O. Jackson, and P. Pin (2009). An economic model of friendship: Homophily,
minorities, and segregation. Econometrica 77 (4), 1003–1045.
Dahlander, L., V. Boss, C. Ihl, and R. Jayaraman (2019). The effect of choosing teams and
ideas on entrepreneurial performance: Evidence from a field experiment. Mimeo.
Delfgaauw, J., R. Dur, O. A. Onemu, and J. Sol (2019). Team incentives, social cohesion, and
performance: A natural field experiment. Tinbergen Institute Discussion Paper .
Delfgaauw, J., R. Dur, and M. Souverijn (2018). Team incentives, task assignment, and
performance: A field experiment. The Leadership Quarterly.
Englmaier, F., S. Grimm, D. Schindler, and S. Schudy (2018). The effect of incentives in
non-routine analytical team tasks - Evidence from a field experiment. CESifo Working Paper
Series (6903).
Erev, I., G. Bornstein, and R. Galili (1993). Constructive intergroup competition as a solution to
the free rider problem: A field experiment. Journal of Experimental Social Psychology 29(6),
Fischer, M. and P. Kampkötter (2017). Effects of German universities’ excellence initiative on
ability sorting of students and perceptions of educational quality. Journal of Institutional and
Theoretical Economics 173 (4), 662.
Friebel, G., M. Heinz, M. Krüger, and N. Zubanov (2017). Team incentives and performance:
Evidence from a retail chain. American Economic Review 107 (8), 2168–2203.
Gächter, S. and C. Thöni (2005). Social learning and voluntary cooperation among like-minded
people. Journal of the European Economic Association 3 (2), 303–314.
Geraghty, A. and S. Paterson-Brown (2018). Leadership and working in teams. Surgery
(Oxford) 36 (9), 503–508.
Guido, A., A. Robbett, and R. Romaniuc (2019). Group formation and cooperation in so-
cial dilemmas: A survey and meta-analytic evidence. Journal of Economic Behavior &
Organization 159, 192 – 209.
Hamilton, B. H., J. A. Nickerson, and H. Owan (2003). Team incentives and worker heterogeneity:
An empirical analysis of the impact of teams on productivity and participation. Journal of
Political Economy 111 (3), 465–497.
Lazear, E. P. and P. Oyer (2012). Chapter 12: Personnel Economics. The Handbook of
Organizational Economics (Ed.) Robert Gibbons and John Roberts, 479–519.
Leider, S., M. M. Möbius, T. Rosenblat, and Q.-A. Do (2009). Directed altruism and enforced
reciprocity in social networks. Quarterly Journal of Economics 124 (4), 1815–1851.
Mamoli, S. and D. Mole (2015). Creating Great Teams: How Self-selection Lets People Excel.
Pragmatic Bookshelf.
O’Neill, T. A. and E. Salas (2018). Creating high performance teamwork in organizations.
Human Resource Management Review 28 (4), 325–331.
Patel, S. and S. Sarkissian (2017). To group or not to group? Evidence from mutual fund
databases. Journal of Financial and Quantitative Analysis 52(5), 1989–2021.
Reagans, R. and E. W. Zuckerman (2019). Networks, diversity, and productivity: The social
capital of corporate R&D teams. Organization Science 12 (4), 502–517.
Wei, A., Y. Chen, Q. Mei, J. Ye, and L. Zhang (2020). Putting teams into the gig econonmy: A
field experiment at a ride-sharing platform. Working Paper.
Wuchty, S., B. F. Jones, and B. Uzzi (2007). The increasing dominance of teams in production
of knowledge. Science 316 (5827), 1036–1039.
A Appendix
Table A.1: Randomization checks (unstandardized)
Experiment I Experiment II Experiment I + II
Variable Self Random p-value Self Random p-value Self Random p-value
GPA 5.260 5.208 .196 5.214 5.128 .524 5.236 5.170 .167
[.520] [.406] [.488] [.595] [.503] [.506]
Analytical Test 5.324 5.267 .889 5.207 5.438 .429 5.263 5.349 .649
[1.720] [1.865] [1.794] [2.060] [1.755] [1.958]
Admission Test 6.139 6.225 .480 6.391 6.295 .592 6.269 6.259 .957
[1.195] [1.124] [1.260] [1.127] [1.232] [1.123]
Presentation 6.351 6.174 .550 7.085 6.727 .125 6.729 6.440 .078
[1.860] [1.710] [1.474] [1.652] [1.708] [1.701]
Interview 6.548 6.837 .128 6.085 6.091 .879 6.309 6.478 .282
[1.623] [1.750] [1.770] [1.523] [1.712] [1.682]
Discussion 5.516 5.649 .689 5.959 6.018 .932 5.742 5.823 .776
[1.709] [1.647] [1.723] [1.520] [1.726] [1.595]
Descriptive statistics (unstandardized) of pre-experiment data. The p-values are from a Mann-Whitney U test (two-sided)
comparing the differences in the mean ranks of the two treatments. Standard deviations are in brackets.
Table A.2: Average and standard deviation of performance (z-Standardized)
Experiment I Experiment II
Variable Self Random p-value Self Random p-value
Total team task -.303*** 297 .007 .049 -.089 .451
[1.214]*** [.621] .002 [.967] [1.046] .534
1st team task -.239** 234 .011 -.196* .183 .064
[1.140] [.791] .104 [1.150] [.791] .193
2nd team task .251* 245 .068 .114 -.153 .156
[1.248]*** [.599] .001 [.956] [1.047] .381
Exam -.022 .028 .455 .004 -.005 .984
[1.004] [1.002] .995 [.989] [1.018] .603
Descriptive statistics (z-Scores) of the students’ performance in the experiment. Average [standard
deviation] of the team performance for the team tasks at the team level and for the exam at
the individual student level. The p-values stem from a two-sided Mann-Whitney U test for a
comparison of averages between Self and Random. Leven’s p-values are the results of a comparison
of variances between the two treatments. Significance indicators: ∗∗∗ p <.01, ∗∗ p <.05, p <.1.
Table A.3: Pairwise correlations of variables
(1) (2) (3) (4) (5) (6) (7) (8)
Pre-Experiment data
(1) GPA 1.000
(2) Female 0.135*** 1.000
(3) Analytical Test 0.205*** -0.310*** 1.000
(4) Admission Test 0.176*** 0.033 -0.112** 1.000
Experiment data
(5) Total team task 0.126** -0.056 0.056 -0.020 1.000
(6) 1st team task 0.101* 0.000 -0.011 0.027 0.566*** 1.000
(7) 2nd team task 0.092* -0.073 0.073 -0.057 0.919*** 0.220*** 1.000
(8) Exam score 0.320*** -0.090* 0.291*** -0.009 0.122** 0.127** 0.078 1.000
The table displays correlation coefficients of pairwise correlations. All scores are z-standardized. The table includes data
from all treatments and experiments. Significance indicators: ∗∗∗ p <.01, ∗∗ p <.05, p <.1.
Figure A.1: Distribution of performance for (a) 1
team task in Experiment I and Experiment
II (both written) and (b) 2
team task in Experiment I (written) and Experiment II (video)
across treatments
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Experiment I, Random Experiment I, Self
Experiment II, Random Experiment II, Self
Performance on written task (unstandardized)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Experiment I, Random Experiment I, Self
Experiment II, Random Experiment II, Self
Performance on video task (unstandardized)
Figure A.2: Team performance and individual abilities
(a) 1st team task (b) 2nd team task (c) 2nd team task
(Experiment I + II, written) (Experiment II, written) (Experiment II, video)
Log Team Performance
2 2.1 2.2 2.3 2.4
2 2.1 2.2 2.3 2.4
Log Individal ability
2 2.1 2.2 2.3 2.4
90% Confidence Interval
max (ai,aj)
min (ai,aj)
Note: The figure shows the relationship between the team performance and the individual exam performance (as a measure of ability) of team members. The lines show linear
fits, all variables are log transformed.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
We survey the growing literature on group formation in the context of three types of social dilemma games: public goods games, common pool resources, and the prisoner's dilemma. The 62 surveyed papers study the effect of different sorting mechanisms-endogenous, endogenous with the option to play the game, and exogenous-on cooperation rates. Our survey shows that cooperators are highly sensitive to the presence of free-riders, independently of the sorting mechanism. We complement the survey with a meta-analysis showing no difference in terms of cooperation between studies implementing an endogenous and exogenous sorting. What is more, we find that it is no more likely for a cooperator to be matched with like-minded partners in endogenously formed groups than in exogenously formed groups. These observations are related. As we show in the survey, the success of a sorting method in matching like-minded individuals and the levels of cooperation are closely interlinked.
Full-text available
One explanation for the prevalence of self‐managed work teams is that they enable workers with complementary skills to specialize in the tasks they do best, a benefit that may be enhanced if workers can sort themselves into teams. To assess this explanation, we design a real‐effort experiment to study the endogenous formation of teams, and its effect on productivity, when specialization either is or is not feasible. We find a strong positive interaction between endogenous team formation and the ability to specialize, indicating that endogenous team formation is a particularly effective mechanism for promoting team output in production environments that enable the exploitation of skill complementarities.
Leadership and teamwork are essential components of surgical practice and now recognized in the Generic Professional Capabilities framework from the GMC. Surgical application of these skills can be usefully explored through the NOTSS (Non-Technical Skills for Surgeons) taxonomy. This identifies the essential elements of performance in key areas: exchanging information; establishing a shared understanding and co-ordinating team activities (for teamwork); and setting and maintaining standards, supporting others and coping with pressure (for leadership). In addition, there are a number of well-evidenced tools and techniques which can be utilized to improve performance. This includes team briefing and debriefing, closed loop communication, SBAR (situation, background, assessment, recommendation), graded assertiveness and ‘flying by voice’. Practising and integrating these techniques into surgical practice can improve individual and overall team performance. Finally, to optimize our performance requires careful balancing of each non-technical element to stretch and develop teamwork while providing support and maintenance of a healthy working environment. And of course, always aligning these developments with the goal of improvement in patient care.
The performance of a work team commonly depends on the effort exerted by the team members as well as on the division of tasks among them. However, when leaders assign tasks to team members, performance is usually not the only consideration. Favouritism, employees’ seniority, employees’ preferences over tasks, and fairness considerations often play a role as well. Team incentives have the potential to curtail the role of these factors in favor of performance — in particular when the incentive plan includes both the leader and the team members. This paper presents the results of a field experiment designed to study the effects of such team incentives on task assignment and performance. We introduce team incentives in a random subsets of 108 stores of a Dutch retail chain. We find no effect of the incentive, neither on task assignment nor on performance.
Does the way that teams are formed affect their productivity? To address this question, we run an experiment comparing different methods of team formation: (1) random assignment; (2) self selection; and (3) algorithm assignment designed to maximize skill complementarity. We find that self selection creates high-performing teams. These teams perform better on a team task than randomly-assigned teams and as well as those assigned using the algorithm. Exploring the mechanism, we find evidence that, when given the choice, individuals self select into teams primarily based on their social networks and exert higher effort towards the team task.
Despite the overwhelming trend in mutual funds toward team management, empirical studies find no performance benefits for this phenomenon. We show it is caused by large discrepancies in reported managerial structures in Center for Research in Security Prices and Morningstar Principia data sets versus U.S. Securities and Exchange Commission records, resulting in up to 50-basis-points underestimation of the team impact on fund returns. Using more accurate Morningstar Direct data, we find that team-managed funds outperform single-managed funds across various performance metrics. The relation between team size and fund performance is nonlinear. Also, team-managed funds take on no more risk than single-managed funds. Overall, team management benefits fund industry performance.
The adoption of teams continues to increase in almost every domain of modern work life. In the current article we review evidence of the complexity of modern work, industry trends in the use of teams, and the challenges of achieving the full potential of organizational work teams. We aimed to meaningfully move forward the science of high performance teamwork by assembling a focused set of review articles in the present special issue. We consider four themes that capture the articles in this special issue and avenues for achieving the full potential of teams: (1) work across boundaries; (2) build effective team processes and states; (3) manage team development issues; and (4) leverage human capital. Collectively, the contents of this special issue offer important new opportunities for advancing future research and for making a practical difference in the effectiveness of teams in organizations. We identify six areas in which future research efforts in high performance teamwork should be directed based on "realities" that, in our view, need to be addressed.
The endogenous choice of groups can have an important effect on coordination behavior, but it is an underexplored area of research. In this study, I examine how endogenous group choice affects coordination in a laboratory setting using the minimum-effort game. Most studies on coordination use randomly assigned groups, with some showing that successful coordination can be achieved if the subjects have some social interaction. This study shows that an alternative strategy to improving coordination behavior and equilibrium play is to allow subjects some choice over their group membership.