Assessing professional competence: From methods to programs

Department of Educational Development and Research, University of Maastricht, Maastricht, The Netherlands.
Medical Education (Impact Factor: 3.2). 04/2005; 39(3):309-17. DOI: 10.1111/j.1365-2929.2005.02094.x
Source: PubMed


INTRODUCTION: We use a utility model to illustrate that, firstly, selecting an assessment method involves context-dependent compromises, and secondly, that assessment is not a measurement problem but an instructional design problem, comprising educational, implementation and resource aspects. In the model, assessment characteristics are differently weighted depending on the purpose and context of the assessment. EMPIRICAL AND THEORETICAL DEVELOPMENTS: Of the characteristics in the model, we focus on reliability, validity and educational impact and argue that they are not inherent qualities of any instrument. Reliability depends not on structuring or standardisation but on sampling. Key issues concerning validity are authenticity and integration of competencies. Assessment in medical education addresses complex competencies and thus requires quantitative and qualitative information from different sources as well as professional judgement. Adequate sampling across judges, instruments and contexts can ensure both validity and reliability. Despite recognition that assessment drives learning, this relationship has been little researched, possibly because of its strong context dependence. ASSESSMENT AS INSTRUCTIONAL DESIGN: When assessment should stimulate learning and requires adequate sampling, in authentic contexts, of the performance of complex competencies that cannot be broken down into simple parts, we need to make a shift from individual methods to an integral programme, intertwined with the education programme. Therefore, we need an instructional design perspective. IMPLICATIONS FOR DEVELOPMENT AND RESEARCH: Programmatic instructional design hinges on a careful description and motivation of choices, whose effectiveness should be measured against the intended outcomes. We should not evaluate individual methods, but provide evidence of the utility of the assessment programme as a whole.

30 Reads
  • Source
    • "The means, standard deviations and reliability estimates are similar within each administration. The reliability estimates under all models are moderately high, ranging from 0.74 to 0.78, consistent with reliability for OSCE examinations such as the MCCQE Part II of two to four hours in length (Van der Vleuten & Schuwirth 2005). More importantly, the three simpler scoring models yielded scores that are as reliable "
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Background: Past research suggests that the use of externally-applied scoring weights may not appreciably impact measurement qualities such as reliability or validity. Nonetheless, some credentialing boards and academic institutions apply differential scoring weights based on expert opinion about the relative importance of individual items or test components of Observed Structured Clinical Examinations (OSCEs). Aims: To investigate the impact of simplified scoring models that make little to no use of differential weighting on the reliability of scores and decisions on a high stakes OSCE required for medical licensure in Canada. Method: We applied four different weighting models of various complexities to data from three administrations of the OSCE. We compared score reliability, pass/fail rates, correlations between the scores and classification decision accuracy and consistency across the models and administrations. Results: Less complex weighting models yielded similar reliability and pass rates as the more complex weighting model. Minimal changes in candidates' pass/fail status were observed and there were strong and statistically significant correlations between the scores for all scoring models and administrations. Classification decision accuracy and consistency were very high and similar across the four scoring models. Conclusions: Adopting a simplified weighting scheme for this OSCE did not diminish its measurement qualities. Instead of developing complex weighting schemes, experts' time and effort could be better spent on other critical test development and assembly tasks with little to no compromise in the quality of scores and decisions on this high-stakes OSCE.
    Medical Teacher 05/2014; 36(7). DOI:10.3109/0142159X.2014.899687 · 1.68 Impact Factor
  • Source
    • "Many of the articles we found showed that there is no single assessment tool that can provide sufficient information about the development or current level of competence in trainees and that a combination of tools would be necessary to measure residents’ level of care-management competence in a valid way. Our findings fall in line with the report by van der Vleuten and Schuwirth41 who proposed that the combination of qualitative and quantitative assessments in evaluating professional behavior during training was apparently superior for assessing clinical competence. They argued that it was impossible to have a single method of assessment capable of covering all aspects of competencies of the layers of Miller’s pyramid. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The increasing demands for effective and efficient health care delivery systems worldwide have resulted in an expansion of the desired competencies that physicians need to possess upon graduation. Presently, medical residents require additional professional competencies that can prepare them to practice adequately in a continuously changing health care environment. Recent studies show that despite the importance of competency-based training, the development and evaluation of management competencies in residents during residency training is inadequate. The aim of this literature review was to find out which assessment methods are currently being used to evaluate trainees' management competencies and which, if any, of these methods make use of valid and reliable instruments. In September 2012, a thorough search of the literature was performed using the PubMed, Cochrane, Embase®, MEDLINE®, and ERIC databases. Additional searches included scanning the references of relevant articles and sifting through the "related topics" displayed by the databases. A total of 25 out of 178 articles were selected for final review. Four broad categories emerged after analysis that best reflected their content: 1) measurement tools used to evaluate the effect of implemented curricular interventions; 2) measurement tools based on recommendations from consensus surveys or conventions; 3) measurement tools for assessing general competencies, which included care-management; and 4) measurement tools focusing exclusively on care-management competencies. Little information was found about (validated) assessment tools being used to measure care-management competence in practice. Our findings suggest that a combination of assessment tools should be used when evaluating residents' care-management competencies.
    02/2014; 5:27-37. DOI:10.2147/AMEP.S58476
  • Source
    • "That is, instead of considering the individual TBA when judging students’ PBL performance, all TBAs in a given period (for example a year or a specific phase within a program) should be aggregated. This approach to assessment is in-line with current preoccupations and developments that aim to promote the use of programs of assessment instead of individual assessment strategies [11-13]. This study investigated the overall generalizability of the Tutotest-Lite; a shortened version of the Tutotest [5] when considered as a program of assessment. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Tutorial-based assessment commonly used in problem-based learning (PBL) is thought to provide information about students which is different from that gathered with traditional assessment strategies such as multiple-choice questions or short-answer questions. Although multiple-observations within units in an undergraduate medical education curriculum foster more reliable scores, that evaluation design is not always practically feasible. Thus, this study investigated the overall reliability of a tutorial-based program of assessment, namely the Tutotest-Lite. More specifically, scores from multiple units were used to profile clinical domains for the first two years of a system-based PBL curriculum. G-Study analysis revealed an acceptable level of generalizability, with g-coefficients of 0.84 and 0.83 for Years 1 and 2, respectively. Interestingly, D-Studies suggested that as few as five observations over one year would yield sufficiently reliable scores. Overall, the results from this study support the use of the Tutotest-Lite to judge clinical domains over different PBL units.
    BMC Medical Education 02/2014; 14(1):30. DOI:10.1186/1472-6920-14-30 · 1.22 Impact Factor
Show more


30 Reads
Available from