List of Figures |
|
xvii | |
Foreword |
|
xix | |
|
Preface |
|
xxi | |
Part I Fundamentals Of Assessment |
|
1 | (106) |
|
Chapter 1 Introduction to Assessment in the Health Professions |
|
|
3 | (14) |
|
|
|
|
|
4 | (1) |
|
Four Major Assessment Methods |
|
|
5 | (2) |
|
|
5 | (1) |
|
|
6 | (1) |
|
|
6 | (1) |
|
Workplace-Based Assessment (Clinical Observational Methods) |
|
|
7 | (1) |
|
Narrative Assessments and Portfolios |
|
|
7 | (1) |
|
Some Basic Terms and Definitions |
|
|
7 | (7) |
|
Competency-Based Education |
|
|
7 | (2) |
|
Instruction and Assessment |
|
|
9 | (1) |
|
Assessment, Measurement, and Tests |
|
|
10 | (1) |
|
|
10 | (1) |
|
Fidelity to the Criterion |
|
|
11 | (1) |
|
Formative and Summative Assessment |
|
|
12 | (1) |
|
Norm- and Criterion-Referenced Measurement |
|
|
12 | (1) |
|
High-Stakes and Low-Stakes Assessments |
|
|
13 | (1) |
|
Large-Scale and Local or Small-Scale Assessments |
|
|
13 | (1) |
|
Translational Science in Education |
|
|
14 | (1) |
|
|
15 | (1) |
|
|
15 | (2) |
|
Chapter 2 Validity and Quality |
|
|
17 | (16) |
|
|
The Principle of Purpose-Driven Assessment |
|
|
17 | (1) |
|
Understanding Your Purposes |
|
|
18 | (1) |
|
Tying Methods to Your Purposes |
|
|
19 | (1) |
|
Investigating Validity, Evaluating Quality |
|
|
20 | (1) |
|
Kane's Validity Framework: The Validity Argument |
|
|
21 | (3) |
|
Messick's Validity Framework: Sources of Evidence |
|
|
24 | (3) |
|
|
27 | (1) |
|
Taking a Prevention Focus: Understanding Threats to Validity |
|
|
27 | (5) |
|
|
32 | (1) |
|
|
32 | (1) |
|
|
33 | (18) |
|
|
|
33 | (1) |
|
Theoretical Framework for Reliability |
|
|
34 | (1) |
|
|
34 | (1) |
|
Classical Test Theory (CTT) |
|
|
34 | (1) |
|
|
35 | (1) |
|
|
35 | (1) |
|
Reliability Indices: Test-Based Methods |
|
|
35 | (6) |
|
|
36 | (2) |
|
|
38 | (1) |
|
Internal-Consistency Reliability: Cronbach's Alpha |
|
|
39 | (2) |
|
Reliability Indices: Raters |
|
|
41 | (1) |
|
Conceptual Basis for Inter-Rater Reliability |
|
|
41 | (1) |
|
Illustrative Example: Inter-Rater Reliability |
|
|
41 | (1) |
|
Implications: Inter-Rater Reliability |
|
|
41 | (1) |
|
Standard Error of Measurement (SEM) |
|
|
42 | (1) |
|
|
42 | (1) |
|
|
43 | (1) |
|
How to Increase Reliability |
|
|
43 | (1) |
|
Projections in Reliability: Spearman-Brown Formula |
|
|
43 | (1) |
|
|
44 | (1) |
|
|
44 | (1) |
|
Composite Scores and Composite Score Reliability |
|
|
44 | (1) |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
45 | (2) |
|
Appendix: Supplement-Composite Scores and Composite Score Reliability |
|
|
47 | (1) |
|
|
47 | (1) |
|
|
47 | (1) |
|
Calculating the Composite Score Reliability |
|
|
48 | (1) |
|
Calculating the Composite Scores of the Three Exemplar Students |
|
|
49 | (1) |
|
|
50 | (1) |
|
Chapter 4 Generalizability Theory |
|
|
51 | (19) |
|
|
|
|
|
51 | (1) |
|
|
51 | (2) |
|
The Hypothetical Measurement Problem-An Example |
|
|
53 | (1) |
|
Defining the G Study Model |
|
|
53 | (1) |
|
Obtaining G Study Results |
|
|
54 | (1) |
|
Interpreting G Study Results |
|
|
55 | (1) |
|
|
56 | (2) |
|
|
58 | (1) |
|
G and D Study Model Variations |
|
|
59 | (1) |
|
|
60 | (1) |
|
Multivariate Generalizability |
|
|
61 | (2) |
|
Additional Considerations |
|
|
63 | (1) |
|
|
64 | (1) |
|
Appendix 4.1: Statistical Foundations of a Generalizability Study |
|
|
65 | (2) |
|
Appendix 4.2: Statistical Foundations of a Decision Study |
|
|
67 | (1) |
|
Appendix 4.3: Software Syntax for Estimating Variance Components From Table 4.1 Data |
|
|
68 | (1) |
|
SPSS Syntax for Estimating Variance Components From Table 4.1 Data |
|
|
68 | (1) |
|
SAS Syntax for Estimating Variance Components From Table 4.1 Data |
|
|
68 | (1) |
|
GENOVA Syntax for Estimating Variance Components From Table 4.1 Data |
|
|
68 | (1) |
|
|
69 | (1) |
|
Chapter 5 Statistics of Testing |
|
|
70 | (16) |
|
|
|
|
|
70 | (1) |
|
|
70 | (5) |
|
|
70 | (1) |
|
Number Correct Scores or Raw Scores |
|
|
71 | (1) |
|
|
71 | (1) |
|
Derived Scores or Standard Scores |
|
|
72 | (1) |
|
Normalized Standard Scores |
|
|
73 | (1) |
|
|
73 | (1) |
|
Corrections for Guessing (Formula Scores) |
|
|
73 | (1) |
|
|
74 | (1) |
|
|
74 | (1) |
|
Correlation and Disattenuated Correlation |
|
|
75 | (1) |
|
|
76 | (4) |
|
Item Analysis Report for Each Test Item |
|
|
76 | (1) |
|
|
77 | (1) |
|
|
77 | (1) |
|
|
78 | (1) |
|
Point Biserial Correlation as Discrimination Index |
|
|
78 | (1) |
|
What Is Good Item Discrimination? |
|
|
78 | (1) |
|
General Recommendations for Item Difficulty and Item Discrimination |
|
|
79 | (1) |
|
|
80 | (1) |
|
Number of Examinees Needed for Item Analysis |
|
|
80 | (1) |
|
Summary Statistics for a Test |
|
|
80 | (1) |
|
|
81 | (1) |
|
|
82 | (1) |
|
|
82 | (1) |
|
Appendix: Some Useful Formulas With Example Calculations |
|
|
83 | (2) |
|
|
21 | (62) |
|
|
83 | (1) |
|
Standard Error of Measurement (SEM) |
|
|
83 | (1) |
|
Spearman-Brown Prophecy Formula |
|
|
84 | (1) |
|
Disattenuated Correlation: Correction for Attenuation |
|
|
84 | (1) |
|
|
85 | (1) |
|
Chapter 6 Standard Setting |
|
|
86 | (21) |
|
|
|
|
|
86 | (1) |
|
Eight Steps for Standard Setting |
|
|
87 | (5) |
|
Step 1: Select a Standard-Setting Method |
|
|
87 | (1) |
|
|
88 | (1) |
|
Step 3: Prepare Descriptions of Performance Categories |
|
|
88 | (1) |
|
|
89 | (1) |
|
Step 5: Collect Ratings or Judgments |
|
|
89 | (1) |
|
Step 6: Provide Feedback and Facilitate Discussion |
|
|
90 | (1) |
|
Step 7: Evaluate the Standard-Setting Procedure |
|
|
90 | (1) |
|
Step 8: Provide Results, Consequences, and Validity Evidence to Final Decision Makers |
|
|
91 | (1) |
|
Special Topics in Standard Setting |
|
|
92 | (2) |
|
Combining Standards Across Components of an Examination: Compensatory vs. Non-Compensatory Standards |
|
|
92 | (1) |
|
Setting Standards for Performance Tests |
|
|
92 | (1) |
|
Setting Standards for Clinical Procedures |
|
|
93 | (1) |
|
Setting Standards for Oral Exams, Essays, and Portfolios |
|
|
93 | (1) |
|
Setting Standards in Mastery Learning Settings |
|
|
93 | (1) |
|
Multiple Category Cut Scores |
|
|
93 | (1) |
|
Setting Standards Across Institutions |
|
|
93 | (1) |
|
Seven Methods for Setting Performance Standards |
|
|
94 | (9) |
|
|
94 | (2) |
|
|
96 | (1) |
|
|
97 | (2) |
|
|
99 | (1) |
|
Contrasting Groups Method |
|
|
100 | (2) |
|
|
102 | (1) |
|
|
102 | (1) |
|
|
103 | (1) |
|
|
104 | (1) |
|
|
104 | (3) |
Part II Assessment Methods |
|
107 | (90) |
|
Chapter 7 Written Tests: Writing High-Quality Constructed-Response and Selected-Response Items |
|
|
109 | (18) |
|
|
|
|
|
109 | (1) |
|
Assessment Using Written Tests |
|
|
109 | (2) |
|
CR and SR Item Formats: Definition |
|
|
111 | (2) |
|
Constructed-Response Items: Short-Answer vs. Long- Answer Formats |
|
|
113 | (1) |
|
Constructed-Response Items: Scoring |
|
|
114 | (1) |
|
|
114 | (2) |
|
|
116 | (1) |
|
Computer-Based Ratings of CR Tasks |
|
|
117 | (1) |
|
Constructed-Response Items: Threats to Score Validity |
|
|
117 | (1) |
|
|
118 | (6) |
|
SR Item Formats: General Guidelines for Writing MCQs |
|
|
118 | (2) |
|
SR Item Formats: Avoiding Known MCQ Flaws |
|
|
120 | (2) |
|
SR Item Formats: Number of MCQ Options |
|
|
122 | (1) |
|
SR Item Formats: MCQ Scoring Methods |
|
|
122 | (1) |
|
SR Item Formats: Non-MCQ Formats |
|
|
123 | (1) |
|
|
124 | (1) |
|
|
125 | (2) |
|
Chapter 8 Oral Examinations |
|
|
127 | (14) |
|
|
|
|
Oral Examinations Around the World |
|
|
127 | (2) |
|
Threats to the Validity of Oral Examinations |
|
|
129 | (1) |
|
Structured Oral Examinations |
|
|
130 | (2) |
|
Scoring and Standard Setting |
|
|
132 | (2) |
|
Preparation of the Examinee |
|
|
134 | (1) |
|
Selection, Training, and Evaluation of the Examiners |
|
|
134 | (2) |
|
|
136 | (1) |
|
|
136 | (1) |
|
|
137 | (1) |
|
|
137 | (1) |
|
|
137 | (4) |
|
Chapter 9 Performance Tests |
|
|
141 | (19) |
|
|
Strengths of Performance Tests |
|
|
141 | (1) |
|
Defining the Purpose of the Test |
|
|
142 | (1) |
|
|
143 | (2) |
|
|
145 | (3) |
|
|
148 | (1) |
|
|
148 | (1) |
|
Multiple-Station Performance Tests: The Objective Structured Clinical Exam (OSCE) |
|
|
148 | (2) |
|
Scoring an OSCE: Combining Scores Across Stations |
|
|
150 | (1) |
|
|
150 | (1) |
|
|
151 | (1) |
|
Threats to the Validity of Performance Tests |
|
|
151 | (5) |
|
Consequential Validity: Educational Impact |
|
|
156 | (1) |
|
|
156 | (1) |
|
|
157 | (3) |
|
Chapter 10 Workplace-Based Assessment |
|
|
160 | (13) |
|
|
|
|
Workplace-Based Assessment |
|
|
160 | (1) |
|
Competency-Based Medical Education |
|
|
161 | (1) |
|
|
162 | (1) |
|
Workplace-Based Assessment |
|
|
163 | (5) |
|
|
164 | (2) |
|
The Rater and Learner Dyad |
|
|
166 | (2) |
|
|
168 | (1) |
|
Assessment Administration |
|
|
168 | (1) |
|
Making Sense of Assessment Data |
|
|
169 | (1) |
|
|
169 | (1) |
|
|
169 | (1) |
|
|
170 | (1) |
|
|
170 | (3) |
|
Chapter 11 Narrative Assessment |
|
|
173 | (8) |
|
|
|
Narrative Assessment Instruments |
|
|
173 | (1) |
|
|
174 | (1) |
|
Pragmatic Issues in Using Narrative Assessment |
|
|
174 | (3) |
|
Define the Purpose of Narrative Assessment |
|
|
174 | (1) |
|
|
175 | (1) |
|
|
175 | (1) |
|
Manage the Collected Data |
|
|
175 | (1) |
|
Combine Narratives Across Observers (Qualitative Synthesis) |
|
|
176 | (1) |
|
Provide Feedback to Trainees |
|
|
176 | (1) |
|
Evaluating the Validity of Narrative Assessment |
|
|
177 | (1) |
|
|
178 | (1) |
|
|
179 | (2) |
|
Chapter 12 Assessment Portfolios |
|
|
181 | (16) |
|
|
|
|
|
181 | (2) |
|
Formative Purposes of Portfolios: Learner Driven, Mentor Supported |
|
|
183 | (1) |
|
The Importance of Mentors in Portfolio-Facilitated Lifelong Learning |
|
|
184 | (1) |
|
Summative Purposes of Portfolios: Learner Driven, Supervisor Evaluated |
|
|
185 | (1) |
|
Addressing Threats to Validity |
|
|
186 | (2) |
|
Potential Challenges of Portfolio Use |
|
|
188 | (2) |
|
|
190 | (2) |
|
|
192 | (1) |
|
|
193 | (4) |
Part III Special Topics |
|
197 | (110) |
|
Chapter 13 Key Features Approach |
|
|
199 | (9) |
|
|
|
|
199 | (1) |
|
|
200 | (1) |
|
Preparing KF Test Material |
|
|
201 | (1) |
|
|
201 | (1) |
|
|
201 | (1) |
|
|
202 | (4) |
|
Construct Underrepresentation |
|
|
206 | (1) |
|
Construct-Irrelevant Variance |
|
|
206 | (1) |
|
|
206 | (1) |
|
|
206 | (2) |
|
Chapter 14 Simulations in Assessment |
|
|
208 | (21) |
|
|
|
|
|
208 | (1) |
|
What Is Simulation and Why Use It? |
|
|
209 | (1) |
|
When to Use Simulation in Assessment |
|
|
209 | (1) |
|
How to Use Simulation in Assessment |
|
|
210 | (13) |
|
Determine Learning Outcome(s) |
|
|
210 | (2) |
|
Choose an Assessment Method |
|
|
212 | (2) |
|
Choose a Simulation Modality |
|
|
214 | (3) |
|
Develop Assessment Scenario |
|
|
217 | (2) |
|
|
219 | (2) |
|
Set Standards for the Assessment |
|
|
221 | (2) |
|
Standardize Assessment Conditions |
|
|
223 | (1) |
|
Threats to Validity of Simulation-Based Assessments |
|
|
223 | (2) |
|
Faculty Development Needs |
|
|
225 | (1) |
|
Consequences and Educational Impact |
|
|
225 | (1) |
|
|
226 | (1) |
|
|
226 | (3) |
|
Chapter 15 Situational Judgment Tests |
|
|
229 | (16) |
|
|
|
|
229 | (1) |
|
|
229 | (2) |
|
|
229 | (1) |
|
|
230 | (1) |
|
|
231 | (1) |
|
|
231 | (1) |
|
|
231 | (1) |
|
|
231 | (1) |
|
Recent History of SJT Development for Assessment in Health Professions Education |
|
|
231 | (2) |
|
|
231 | (1) |
|
|
232 | (1) |
|
Defining Key Components of SJTs and Desired Outcomes |
|
|
233 | (2) |
|
Converting Research Data Into Practice |
|
|
233 | (1) |
|
|
233 | (2) |
|
Designing a Situational Judgment Test |
|
|
235 | (5) |
|
|
235 | (1) |
|
Phase I-How Toxic Is It? Potentiating Diversity, aka Construct Specificity |
|
|
235 | (2) |
|
Phase II-Does It Work? Identifying Better- and Lower- Performing Learners aka Construct Sensitivity |
|
|
237 | (2) |
|
Phase III-Should I Use It? Real-World Considerations |
|
|
239 | (1) |
|
|
240 | (1) |
|
|
240 | (1) |
|
|
240 | (1) |
|
|
240 | (1) |
|
|
241 | (1) |
|
|
241 | (4) |
|
Chapter 16 Programmatic Assessment: An Avenue to a Different Assessment Culture |
|
|
245 | (12) |
|
|
|
|
The Traditional Approach to Assessment |
|
|
245 | (1) |
|
|
246 | (5) |
|
Pass/Fail Decisions Are Not Based on a Single Data Point |
|
|
247 | (1) |
|
The Program Includes a Deliberate Mix of Different Assessment Methods |
|
|
247 | (1) |
|
Feedback Use and Self-Directed Learning Are Promoted Through a Continuous Dialogue With the Learner |
|
|
248 | (1) |
|
The Number of Data Points Needed Is Proportionally Related to the Stakes of the Assessment Decision |
|
|
248 | (1) |
|
High-Stakes Decisions Are Professional Judgments Made by a Committee of Assessors |
|
|
248 | (3) |
|
Evaluation of Programmatic Assessment |
|
|
251 | (2) |
|
|
253 | (1) |
|
|
253 | (4) |
|
Chapter 17 Assessment Affecting Learning |
|
|
257 | (15) |
|
|
Reconsidering Key Concepts and Terms |
|
|
258 | (2) |
|
Mechanisms of Action in Assessment for Learning |
|
|
260 | (1) |
|
Mechanism of Action #1: Course Development |
|
|
260 | (2) |
|
Mechanism of Action #2: Anticipation of an Assessment Event |
|
|
262 | (2) |
|
Mechanism of Action #3: The Assessment Event Itself |
|
|
264 | (1) |
|
Mechanism of Action #4: Post-Assessment Reflection and Improvement |
|
|
265 | (2) |
|
Identifying Performance Gaps |
|
|
265 | (1) |
|
Generating New Approaches |
|
|
266 | (1) |
|
Applying and Reinforcing New Approaches |
|
|
267 | (1) |
|
|
267 | (1) |
|
|
268 | (1) |
|
|
268 | (4) |
|
Chapter 18 Assessment in Mastery Learning Settings |
|
|
272 | (15) |
|
|
|
|
|
|
|
Interpretations Of and Uses For Mastery Learning Assessments |
|
|
272 | (3) |
|
Sources of Validity Evidence: Content |
|
|
275 | (1) |
|
Sources of Validity Evidence: Response Process |
|
|
276 | (1) |
|
Sources of Validity Evidence: Internal Structure and Reliability |
|
|
276 | (1) |
|
Sources of Validity Evidence: Relationships to Other Variables |
|
|
277 | (1) |
|
Sources of Validity Evidence: Consequences of Assessment Use |
|
|
277 | (6) |
|
Standard Setting in Mastery Settings |
|
|
278 | (4) |
|
Other Validity Evidence Regarding Consequences |
|
|
282 | (1) |
|
|
283 | (1) |
|
|
284 | (1) |
|
|
284 | (3) |
|
Chapter 19 Item Response Theory |
|
|
287 | (11) |
|
|
|
287 | (1) |
|
Classical Measurement Theory: Challenges in Sample-Dependent Inference |
|
|
288 | (1) |
|
Comparison Between CMT and IRT |
|
|
288 | (1) |
|
Item Response Theory: An Overview |
|
|
289 | (4) |
|
IRT Model: Logistic Parameter Model |
|
|
289 | (1) |
|
Item Characteristic Curve |
|
|
289 | (2) |
|
|
291 | (1) |
|
|
291 | (1) |
|
|
291 | (1) |
|
Assumptions for Conducting IRT |
|
|
291 | (1) |
|
|
291 | (1) |
|
Application of IRT: An Example |
|
|
292 | (1) |
|
Other Applications of IRT |
|
|
293 | (3) |
|
Computer-Adaptive Testing |
|
|
293 | (2) |
|
|
295 | (1) |
|
|
296 | (1) |
|
|
297 | (1) |
|
Chapter 20 Engaging With Your Statistician |
|
|
298 | (9) |
|
|
|
|
298 | (1) |
|
Planning for Data Analysis |
|
|
298 | (3) |
|
Finding the Appropriate Statistician |
|
|
298 | (1) |
|
When and What to Present to Your Statistician |
|
|
299 | (2) |
|
Collaborating During Analyses |
|
|
301 | (3) |
|
Talking Through the Analysis Plan |
|
|
301 | (1) |
|
|
301 | (2) |
|
Exploratory Work Before Planned Analyses |
|
|
303 | (1) |
|
Exploratory Work After Planned Analyses |
|
|
303 | (1) |
|
Writing the Manuscript and Beyond |
|
|
304 | (1) |
|
|
304 | (1) |
|
|
305 | (2) |
List of Contributors |
|
307 | (8) |
Index |
|
315 | |