|
Part I Statistical Interpretation of Test/Exam Results |
|
|
|
1 On Average: How Good Are They? |
|
|
3 | (6) |
|
1.1 Average Is Attractive and Powerful |
|
|
3 | (1) |
|
1.2 Is Average a Good Indictor? |
|
|
4 | (1) |
|
|
4 | (1) |
|
|
4 | (1) |
|
1.3 Two Meanings of Average |
|
|
5 | (1) |
|
|
6 | (1) |
|
1.5 Additional Information Is Needed |
|
|
7 | (1) |
|
1.6 The Painful Truth of Average |
|
|
8 | (1) |
|
2 On Percentage: How Much Are There? |
|
|
9 | (6) |
|
2.1 Predicting with Non-perfect Certainty |
|
|
9 | (2) |
|
2.2 Danger in Combining Percentages |
|
|
11 | (1) |
|
2.3 Watch Out for the Base |
|
|
12 | (1) |
|
2.4 What Is in a Percentage? |
|
|
13 | (1) |
|
2.5 Just Think About This |
|
|
13 | (2) |
|
|
13 | (2) |
|
3 On Standard Deviation: How Different Are They? |
|
|
15 | (10) |
|
3.1 First, Just Deviation |
|
|
15 | (1) |
|
|
16 | (1) |
|
3.3 Discrepancy in Computer Outputs |
|
|
17 | (1) |
|
3.4 Another Use of the SD |
|
|
18 | (1) |
|
|
18 | (2) |
|
3.6 Scores Are not at the Same Type of Measurement |
|
|
20 | (2) |
|
|
22 | (3) |
|
|
23 | (2) |
|
4 On Difference: Is that Big Enough? |
|
|
25 | (10) |
|
4.1 Meaningless Comparisons |
|
|
25 | (1) |
|
4.2 Meaningful Comparison |
|
|
26 | (1) |
|
4.3 Effect Size: Another Use the SD |
|
|
27 | (2) |
|
4.4 Substantive Meaning and Spurious Precision |
|
|
29 | (1) |
|
|
30 | (1) |
|
4.6 Common but Unwarranted Comparisons |
|
|
31 | (4) |
|
|
33 | (2) |
|
5 On Correlation: What Is Between Them? |
|
|
35 | (12) |
|
5.1 Correlations: Foundation of Education Systems |
|
|
35 | (1) |
|
5.2 Correlations Among Subjects |
|
|
36 | (1) |
|
5.3 Calculation of Correlation Coefficients |
|
|
37 | (3) |
|
5.4 Interpretation of Correlation |
|
|
40 | (1) |
|
|
41 | (3) |
|
|
44 | (1) |
|
|
45 | (2) |
|
|
45 | (2) |
|
6 On Regression: How Much Does It Depend? |
|
|
47 | (4) |
|
6.1 Meanings of Regression |
|
|
47 | (1) |
|
|
48 | (1) |
|
6.3 Procedure of Regression |
|
|
49 | (1) |
|
|
50 | (1) |
|
7 On Multiple Regression: What Is the Future? |
|
|
51 | (6) |
|
7.1 One Use of Multiple Regression |
|
|
51 | (2) |
|
7.2 Predictive Power of Predictors |
|
|
53 | (1) |
|
7.3 Another Use of Multiple Regression |
|
|
53 | (1) |
|
7.4 R-Square and Adjusted R-Square |
|
|
54 | (1) |
|
|
55 | (1) |
|
|
56 | (1) |
|
|
56 | (1) |
|
8 On Ranking: Who Is the Fairest of Them All? |
|
|
57 | (8) |
|
8.1 Where Does Singapore Stand in the World? |
|
|
57 | (2) |
|
|
59 | (2) |
|
8.3 Is There a Real Difference? |
|
|
61 | (1) |
|
8.4 Forced Ranking/Distribution |
|
|
61 | (1) |
|
8.5 Combined Scores for Ranking |
|
|
62 | (1) |
|
|
63 | (2) |
|
9 On Association: Are They Independent? |
|
|
65 | (10) |
|
9.1 A Simplest Case: 2 × 2 Contingency Table |
|
|
65 | (2) |
|
9.2 A More Complex Case: 2 × 4 Contingency Table |
|
|
67 | (1) |
|
9.3 Even More Complex Case |
|
|
68 | (2) |
|
9.4 If the Worse Come to the Worse |
|
|
70 | (1) |
|
|
71 | (4) |
|
|
71 | (4) |
|
Part II Measurement Involving Statistics |
|
|
|
10 On Measurement Error: How Much Can We Trust Test Scores? |
|
|
75 | (8) |
|
10.1 An Experiment in Marking |
|
|
76 | (2) |
|
10.2 A Score (Mark) Is not a Point |
|
|
78 | (1) |
|
10.3 Minimizing Measurement Error |
|
|
79 | (1) |
|
|
80 | (3) |
|
|
81 | (2) |
|
11 On Grades and Marks: How not to Get Confused? |
|
|
83 | (8) |
|
11.1 Same Label, Many Numbers |
|
|
83 | (1) |
|
11.2 Two Kinds of Numbers |
|
|
84 | (1) |
|
11.3 From Labels to Numbers |
|
|
85 | (2) |
|
11.4 Possible Alternatives |
|
|
87 | (1) |
|
11.5 Quantifying Written Answers |
|
|
88 | (1) |
|
|
89 | (2) |
|
|
89 | (2) |
|
12 On Tests: How Well Do They Serve? |
|
|
91 | (6) |
|
|
91 | (2) |
|
|
93 | (1) |
|
|
94 | (1) |
|
|
95 | (1) |
|
|
96 | (1) |
|
|
96 | (1) |
|
13 On Item-Analysis: How Effective Are the Items? |
|
|
97 | (8) |
|
|
98 | (2) |
|
|
100 | (1) |
|
|
100 | (1) |
|
|
101 | (1) |
|
13.5 Post-assessment Analysis |
|
|
102 | (1) |
|
|
103 | (2) |
|
|
103 | (2) |
|
14 On Reliability: Are the Scores Stable? |
|
|
105 | (6) |
|
14.1 Meaning of Reliability |
|
|
105 | (1) |
|
14.2 Factors Affecting Reliability |
|
|
106 | (1) |
|
14.3 Checking Reliability |
|
|
107 | (3) |
|
14.3.1 Internal Consistency |
|
|
107 | (2) |
|
14.3.2 Split-Half Reliability |
|
|
109 | (1) |
|
14.3.3 Test-Retest Reliability |
|
|
109 | (1) |
|
14.3.4 Parallel-Forms Reliability |
|
|
109 | (1) |
|
14.4 Which Reliability and How Good Should It Be? |
|
|
110 | (1) |
|
15 On Validity: Are the Scores Relevant? |
|
|
111 | (6) |
|
|
111 | (4) |
|
15.2 Relation Between Reliability and Validity |
|
|
115 | (2) |
|
|
116 | (1) |
|
16 On Consequences: What Happens to the Students, Teachers, and Curriculum? |
|
|
117 | (8) |
|
16.1 Consequences to Students |
|
|
117 | (3) |
|
16.2 Consequences to Teachers |
|
|
120 | (1) |
|
16.3 Consequences to Curriculum |
|
|
121 | (1) |
|
|
122 | (3) |
|
|
124 | (1) |
|
17 On Above-Level Testing: What's Right and Wrong with It? |
|
|
125 | (8) |
|
17.1 Above-Level Testing in Singapore |
|
|
126 | (1) |
|
|
127 | (1) |
|
17.3 Probable (Undesirable) Consequences |
|
|
127 | (2) |
|
17.4 Statistical Perspective |
|
|
129 | (2) |
|
|
131 | (1) |
|
|
132 | (1) |
|
|
132 | (1) |
|
18 On Fairness: Are Your Tests and Examinations Fair? |
|
|
133 | (8) |
|
18.1 Dimensions of Test Fairness |
|
|
134 | (1) |
|
18.2 Ensuring High Qualities |
|
|
134 | (3) |
|
18.3 Ensuring Test Fairness Through Item Fairness |
|
|
137 | (4) |
|
|
139 | (2) |
Epilogue |
|
141 | (2) |
Appendix A A Test Analysis Report |
|
143 | (6) |
Appendix B A Note on the Calculation of Statistics |
|
149 | |
Appendix C Interesting and Useful Websites |
|
153 | |