Muutke küpsiste eelistusi

E-raamat: Educational Measurement for Applied Researchers: Theory into Practice

  • Formaat: PDF+DRM
  • Ilmumisaeg: 02-Jan-2017
  • Kirjastus: Springer Verlag, Singapore
  • Keel: eng
  • ISBN-13: 9789811033025
  • Formaat - PDF+DRM
  • Hind: 147,58 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Formaat: PDF+DRM
  • Ilmumisaeg: 02-Jan-2017
  • Kirjastus: Springer Verlag, Singapore
  • Keel: eng
  • ISBN-13: 9789811033025

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This book is a valuable read for a diverse group of researchers and practitioners who analyze assessment data and construct test instruments. It focuses on the use of classical test theory (CTT) and item response theory (IRT), which are often required in the fields of psychology (e.g. for measuring psychological traits), health (e.g. for measuring the severity of disorders), and education (e.g. for measuring student performance), and makes these analytical tools accessible to a broader audience. 

Having taught assessment subjects to students from diverse backgrounds for a number of years, the three authors have a wealth of experience in presenting educational measurement topics, in-depth concepts and applications in an accessible format. As such, the book addresses the needs of readers who use CTT and IRT in their work but do not necessarily have an extensive mathematical background. The book also sheds light on common misconceptions in applying measurement models, and presents an integrated approach to different measurement methods, such as contrasting CTT with IRT and multidimensional IRT models with unidimensional IRT models. Wherever possible, comparisons between models are explicitly made. In addition, the book discusses concepts for test equating and differential item functioning, as well as Bayesian IRT models and plausible values using simple examples. 

This book can serve as a textbook for introductory courses on educational measurement, as supplementary reading for advanced courses, or as a valuable reference guide for researchers interested in analyzing student assessment data.


1 What Is Measurement?
1(18)
Measurements in the Physical World
1(1)
Measurements in the Psycho-social Science Context
1(1)
Psychometrics
2(1)
Formal Definitions of Psycho-social Measurement
3(1)
Levels of Measurement
3(3)
Nominal
4(1)
Ordinal
4(1)
Interval
4(1)
Ratio
5(1)
Increasing Levels of Measurement in the Meaningfulness of the Numbers
5(1)
The Process of Constructing Psycho-social Measurements
6(3)
Define the Construct
7(1)
Distinguish Between a General Survey and a Measuring Instrument
7(1)
Write, Administer, and Score Test Items
8(1)
Produce Measures
9(1)
Reliability and Validity
9(4)
Reliability
10(1)
Validity
11(1)
Graphical Representations of Reliability and Validity
12(1)
Summary
13(1)
Discussion Points
13(2)
Car Survey
14(1)
Taxi Survey
14(1)
Exercises
15(2)
References
17(1)
Further Reading
18(1)
2 Construct, Framework and Test Development---From IRT Perspectives
19(22)
Introduction
19(1)
Linking Validity to Construct
20(1)
Construct in the Context of Classical Test Theory (CTT) and Item Response Theory (IRT)
21(3)
Unidimensionality in Relation to a Construct
24(2)
The Nature of a Construct--Psychological Trait or Arbitrarily Defined Construct?
24(1)
Practical Considerations of Unidimensionality
25(1)
Theoretical and Practical Considerations in Reporting Sub-scale Scores
25(1)
Summary About Constructs
26(1)
Frameworks and Test Blueprints
27(1)
Writing Items
27(4)
Item Format
28(1)
Number of Options for Multiple-Choice Items
29(1)
How Many Items Should There Be in a Test?
30(1)
Scoring Items
31(3)
Awarding Partial Credit Scores
32(1)
Weights of Items
33(1)
Discussion Points
34(1)
Exercises
35(3)
References
38(1)
Further Reading
38(3)
3 Test Design
41(18)
Introduction
41(1)
Measuring Individuals
41(5)
Magnitude of Measurement Error for Individual Students
42(1)
Scores in Standard Deviation Unit
43(1)
What Accuracy Is Sufficient?
44(1)
Summary About Measuring Individuals
45(1)
Measuring Populations
46(2)
Computation of Sampling Error
47(1)
Summary About Measuring Populations
47(1)
Placement of Items in a Test
48(3)
Implications of Fatigue Effect
48(1)
Balanced Incomplete Block (BIB) Booklet Design
49(2)
Arranging Markers
51(2)
Summary
53(1)
Discussion Points
54(1)
Exercises
54(2)
Appendix 1 Computation of Measurement Error
56(1)
References
57(1)
Further Reading
57(2)
4 Test Administration and Data Preparation
59(14)
Introduction
59(1)
Sampling and Test Administration
59(5)
Sampling
60(2)
Field Operations
62(2)
Data Collection and Processing
64(4)
Capture Raw Data
64(1)
Prepare a Codebook
65(1)
Data Processing Programs
66(1)
Data Cleaning
67(1)
Summary
68(1)
Discussion Points
69(1)
Exercises
69(1)
School Questionnaire
70(2)
References
72(1)
Further Reading
72(1)
5 Classical Test Theory
73(18)
Introduction
73(1)
Concepts of Measurement Error and Reliability
73(3)
Formal Definitions of Reliability and Measurement Error
76(6)
Assumptions of Classical Test Theory
76(1)
Definition of Parallel Tests
77(1)
Definition of Reliability Coefficient
77(2)
Computation of Reliability Coefficient
79(2)
Standard Error of Measurement (SEM)
81(1)
Correction for Attenuation (Dis-attenuation) of Population Variance
81(1)
Correction for Attenuation (Dis-attenuation) of Correlation
82(1)
Other CTT Statistics
82(6)
Item Difficulty Measures
82(2)
Item Discrimination Measures
84(1)
Item Discrimination for Partial Credit Items
85(2)
Distinguishing Between Item Difficulty and Item Discrimination
87(1)
Discussion Points
88(1)
Exercises
88(1)
References
89(1)
Further Reading
90(1)
6 An Ideal Measurement
91(18)
Introduction
91(1)
An Ideal Measurement
91(1)
Ability Estimates Based on Raw Scores
92(2)
Linking People to Tasks
94(1)
Estimating Ability Using Item Response Theory
95(7)
Estimation of Ability Using IRT
98(3)
Invariance of Ability Estimates Under IRT
101(1)
Computer Adaptive Tests Using IRT
102(1)
Summary
102(3)
Hands-on Practices
105(1)
Task 1
105(1)
Task 2
105(1)
Discussion Points
106(1)
Exercises
106(1)
Reference
107(1)
Further Reading
107(2)
7 Rasch Model (The Dichotomous Case)
109(30)
Introduction
109(1)
The Rasch Model
109(2)
Properties of the Rasch Model
111(11)
Specific Objectivity
111(1)
Indeterminacy of an Absolute Location of Ability
112(1)
Equal Discrimination
113(1)
Indeterminacy of an Absolute Discrimination or Scale Factor
113(2)
Different Discrimination Between Item Sets
115(1)
Length of a Logit
116(1)
Building Learning Progressions Using the Rasch Model
117(3)
Raw Scores as Sufficient Statistics
120(1)
How Different Is IRT from CTT?
121(1)
Fit of Data to the Rasch Model
122(1)
Estimation of Item Difficulty and Person Ability Parameters
122(1)
Weighted Likelihood Estimate of Ability (WLE)
123(1)
Local Independence
124(1)
Transformation of Logit Scores
124(1)
An Illustrative Example of a Rasch Analysis
125(5)
Summary
130(1)
Hands-on Practices
131(5)
Task 1
131(3)
Task 2 Compare Logistic and Normal Ogive Functions
134(1)
Task 3 Compute the Likelihood Function
135(1)
Discussion Points
136(1)
References
137(1)
Further Reading
138(1)
8 Residual-Based Fit Statistics
139(20)
Introduction
139(1)
Fit Statistics
140(1)
Residual-Based Fit Statistics
141(2)
Example Fit Statistics
143(1)
Interpretations of Fit Mean-Square
143(7)
Equal Slope Parameter
143(2)
Not About the Amount of "Noise" Around the Item Characteristic Curve
145(1)
Discrete Observations and Fit
146(1)
Distributional Properties of Fit Mean-Square
147(3)
The Fit t Statistic
150(1)
Item Fit Is Relative, Not Absolute
151(2)
Summary
153(2)
Discussion Points
155(1)
Exercises
155(2)
References
157(2)
9 Partial Credit Model
159(28)
Introduction
159(1)
The Derivation of the Partial Credit Model
160(1)
PCM Probabilities for All Response Categories
161(1)
Some Observations
161(1)
Dichotomous Rasch Model Is a Special Case
161(1)
The Score Categories of PCM Are "Ordered"
162(1)
PCM Is not a Sequential Steps Model
162(1)
The Interpretation of δ&kapp;
162(5)
Item Characteristic Curves (ICC) for PCM
163(1)
Graphical Interpretation of the Delta (δ) Parameters
163(1)
Problems with the Interpretation of the Delta (δ) Parameters
164(1)
Linking the Graphical Interpretation of δ to the Derivation of PCM
165(1)
Examples of Delta (δ) Parameters and Item Response Categories
165(2)
Tau's and Delta Dot
167(3)
Interpretation of δ•et; and τ&kapp;
168(2)
Thurstonian Thresholds, or Gammas (γ)
170(3)
Interpretation of the Thurstonian Thresholds
170(1)
Comparing with the Dichotomous Case Regarding the Notion of Item Difficulty
171(1)
Compare Thurstonian Thresholds with Delta Parameters
172(1)
Further Note on Thurstonian Probability Curves
173(1)
Using Expected Scores as Measures of Item Difficulty
173(2)
Applications of the Partial Credit Model
175(6)
Awarding Partial Credit Scores to Item Responses
175(2)
An Example Item Analysis of Partial Credit Items
177(4)
Rating Scale Model
181(1)
Graded Response Model
182(1)
Generalized Partial Credit Model
182(1)
Summary
182(1)
Discussion Points
183(1)
Exercises
184(1)
References
185(1)
Further Reading
185(2)
10 Two-Parameter IRT Models
187(20)
Introduction
187(1)
Discrimination Parameter as Score of an Item
188(1)
An Example Analysis of Dichotomous Items Using Rasch and 2PL Models
189(5)
2PL Analysis
191(3)
A Note on the Constraints of Estimated Parameters
194(2)
A Note on the Parameterisation of Item Difficulty Parameters Under 2PL Model
196(1)
Impact of Different Item Weights on Ability Estimates
196(1)
Choosing Between the Rasch Model and 2PL Model
197(2)
2PL Models for Partial Credit Items
197(1)
An Example Data Set
198(1)
A More Generalised Partial Credit Model
199(1)
A Note About Item Difficulty and Item Discrimination
200(3)
Summary
203(1)
Discussion Points
203(1)
Exercises
204(1)
References
205(2)
11 Differential Item Function
207(20)
Introduction
207(1)
What Is DIF?
208(2)
Some Examples
208(2)
Methods for Detecting DIF
210(7)
Mantel Haenszel
210(2)
IRT Method 1
212(1)
Statistical Significance Test
213(2)
Effect Size
215(1)
IRT Method 2
216(1)
How to Deal with DIF Items?
217(5)
Remove DIF Items from the Test
219(1)
Split DIF Items as Two New Items
220(1)
Retain DIF Items in the Data Set
220(1)
Cautions on the Presence of DIF Items
221(1)
A Practical Approach to Deal with DIF Items
222(1)
Summary
222(1)
Hands on Practise
223(1)
Discussion Points
223(2)
Exercises
225(1)
References
225(2)
12 Equating
227(18)
Introduction
227(2)
Overview of Equating Methods
229(11)
Common Items Equating
229(1)
Checking for Item Invariance
229(4)
Number of Common Items Required for Equating
233(1)
Factors Influencing Change in Item Difficulty
233(1)
Shift Method
234(1)
Shift and Scale Method
235(1)
Shift and Scale Method by Matching Ability Distributions
236(1)
Anchoring Method
237(1)
The Joint Calibration Method (Concurrent Calibration)
237(1)
Common Person Equating Method
238(1)
Horizontal and Vertical Equating
239(1)
Equating Errors (Link Errors)
240(2)
How Are Equating Errors Incorporated in the Results of Assessment?
241(1)
Challenges in Test Equating
242(1)
Summary
242(1)
Discussion Points
243(1)
Exercises
244(1)
References
244(1)
13 Facets Models
245(16)
Introduction
245(1)
DIF Can Be Analysed Using a Facets Model
246(1)
An Example Analysis of Marker Harshness
246(8)
Ability Estimates in Facets Models
250(3)
Choosing a Facets Model
253(1)
An Example---Using a Facets Model to Detect Item Position Effect
254(4)
Structure of the Data Set
254(1)
Analysis of Booklet Effect Where Test Design Is not Balanced
255(2)
Analysis of Booklet Effect---Balanced Design
257(1)
Discussion of the Results
257(1)
Summary
258(1)
Discussion Points
258(1)
Exercises
259(1)
Reference
259(1)
Further Reading
259(2)
14 Bayesian IRT Models (MML Estimation)
261(22)
Introduction
261(1)
Bayesian Approach
262(5)
Some Observations
266(1)
Unidimensional Bayesian IRT Models (MML Estimation)
267(6)
Population Model (Prior)
267(1)
Item Response Model
267(1)
Some Simulations
268(1)
Simulation 1: 40 Items and 2000 Persons 500 Replications
269(2)
Simulation 2: 12 Items and 2000 Persons 500 Replication
271(1)
Summary of Comparisons Between JML and MML Estimation Methods
272(1)
Plausible Values
273(4)
Simulation
274(2)
Use of Plausible Values
276(1)
Latent Regression
277(3)
Facets and Latent Regression Models
277(2)
Relationship Between Latent Regression Model and Facets Model
279(1)
Summary
280(1)
Discussion Points
280(1)
Exercises
281(1)
References
281(1)
Further Reading
281(2)
15 Multidimensional IRT Models
283(16)
Introduction
283(1)
Using Collateral Information to Enhance Measurement
284(1)
A Simple Case of Two Correlated Latent Variables
285(3)
Comparison of Population Statistics
288(3)
Comparisons of Population Means
289(1)
Comparisons of Population Variances
289(1)
Comparisons of Population Correlations
290(1)
Comparison of Test Reliability
291(1)
Data Sets with Missing Responses
291(4)
Production of Data Set for Secondary Data Analysts
292(1)
Imputation of Missing Scores
293(2)
Summary
295(1)
Discussion Points
295(1)
Exercises
296(1)
References
296(1)
Further Reading
296(3)
Glossary 299
 





Margaret Wu has taught a number of Educational Measurement courses at the University of Melbourne. She has been involved in large-scale assessment programs such as PISA and TIMSS. Margarets research interests include the development of item response models and mathematics problem solving. She has co-authored two IRT software programs (ACERConQuest and the TAM package in R).  Hak Ping Tam has many years of experience teaching various topics in educational measurement and statistics. Hak Ping has been involved in the design and administration of various Taiwan public examinations and he is an advisor to examination bodies on data analysis procedures of student assessment data. He has also participated in standard setting procedures as well as delivered numerous training sessions on item writing. Tsung-Hau Jen has had an extensive involvement in Taiwans participation of TIMSS. He has an expertise in large-scale assessment methodologies, particularly in the area ofcomplex sampling. Tsung-Hau has taught university courses on large-scale assessments and quantitative methods. He frequently provides advice to researchers involved in Taiwans national student monitoring programs.