tristanz / OpenIRT (http://people.fas.harvard.edu/~tzajonc/openirt.html)

Bayesian and Maximum Likelihood Estimation of Item Response Theory (IRT) Models

Clone this repository (size: 589.9 KB): HTTPS / SSH
$ hg clone http://bitbucket.org/tristanz/openirt
commit 26: b0e343d41be4
parent 25: 09fe34d62567
branch: default
Updated help, examples.
Tristan Zajonc / tristanz
8 months ago

Changed (Δ601 bytes):

raw changeset »

Stata/openirt.sthlp (45 lines added, 21 lines removed)

Stata/openirt_examples.do (19 lines added, 12 lines removed)

Up to file-list Stata/openirt.sthlp:

29
29
{synopt :{opt fixed_item_file("filename")}}filename holding fixed item parameters{p_end}
30
30
31
31
{syntab:MCMC options}
32
{synopt :{opt samplesize(integer 2000)}}Sample size for MCMC estimation{p_end}
33
{synopt :{opt burnin(integer 1000)}}Burn in period for MCMC estimation{p_end}
32
{synopt :{opt samplesize(integer 2000)}}sample size for MCMC estimation{p_end}
33
{synopt :{opt burnin(integer 1000)}}burn in period for MCMC estimation{p_end}
34
34
35
35
{title:Description}
36
36
56
56
{dlgtab:Parameter Options}
57
57
58
58
{phang}
59
{opt model("3PL")} specifies the default model for items.  The default model is the three parameter logistic model (3PL).  {opt model("2PL")} forces the guessing parameter to zero, given the two parameter logistic model.  Fixed items specified in {opt fixed_item_file(filename)} will override the default model type.  This allows a mixture of 3PL and 2PL models, and fixed and free items.
59
{opt model("3PL")} specifies the default model for items.  The default model is the three parameter logistic model (3PL).  {opt model("2PL")} forces the guessing parameter to zero, given the two parameter logistic model.
60
Fixed items specified in {opt fixed_item_file(filename)} will override the default model type.  This allows a mixture of 3PL and 2PL models, and fixed and free items.
60
61
61
62
{phang}
62
{opt theta(varname)} specifies variable name holding fixed trait or ability parameters.  Any missing entries will be treated as free parameters.  In most cases applications, theta is free and therefore this options should be left out.{p_end}
63
{opt theta(varname)} specifies variable name holding fixed trait or ability parameters.  Any missing entries will be treated as free parameters.  In most applications, theta is free and therefore this options should be left out.{p_end}
63
64
64
65
{phang}
65
{opt fixed_item_file("filename")} specifies filename holding fixed item types and parameters, such as items from the TIMSS or NAEP item bank.  The file must include at least four variables: {it:id}, {it:type}, {it:a}, {it:b}, {it:c}.  {it:id} gives the unique numeric item identifier that matches the {opt item_prefix("prefix")} postfix.  {it:type} should equal 1 for 2PL items and 2 for 3PL items. {it:a} is the item discrimination parameter, {it:b} is the item difficulty parameter, and {it:c} is the  item guessing parameter. Note: {cmd: openirt} assumes all items use the normal ogive metric ({it:D = 1.7}).{p_end}
66
{opt fixed_item_file("filename")} specifies filename holding fixed item types and parameters, such as items from the TIMSS or NAEP item bank.  The file must include at least four variables: {it:id}, {it:type}, {it:a}, {it:b}, {it:c}.  
67
{it:id} gives the unique numeric item identifier that matches the {opt item_prefix("prefix")} postfix.
68
{it:type} should equal 1 for 2PL items and 2 for 3PL items.
69
{it:a} is the item discrimination parameter,
70
{it:b} is the item difficulty parameter, and {it:c} is the  item guessing parameter.
71
Note: {cmd: openirt} assumes all items use the normal ogive metric ({it:D = 1.7}).{p_end}
66
72
67
73
{dlgtab:MCMC Options}
68
74
69
75
{phang}
70
{opt samplesize(2000)} specifies the number of post burn in MCMC iterations (default = 2000). Plausible values are drawn at evenly spaced intervals from this sample, and EAP estimates are based on the mean of the entire sample. Larger sample sizes will reduce the monte carlo standard error.  In most applications the standard error of measurement dominates the monte carlo standard error after several thousand iterations, although longer chains should be used in any final analysis.{p_end}
76
{opt samplesize(2000)} specifies the number of post burn in MCMC iterations (default = 2000). Plausible values are drawn at evenly spaced intervals from this sample, and EAP estimates are based on the mean of the entire sample. Larger sample sizes will reduce the monte carlo standard error.
77
In most applications the standard error of measurement dominates the monte carlo standard error after several thousand iterations, although longer chains should be used in any final analysis.{p_end}
71
78
72
79
{phang}
73
{opt burnin(1000)} specifies the number of burn in MCMC iterations (default = 1000).  MCMC estimates rely on the chain converging to a stationary distribution.  In most IRT applications this occurs quite quickly -- within several hundred iterations.  If estimates do not appear to be converging, increase the burn in period.{p_end}
80
{opt burnin(1000)} specifies the number of burn in MCMC iterations (default = 1000).  MCMC estimates rely on the chain converging to a stationary distribution.  
81
In most IRT applications this occurs quite quickly -- within several hundred iterations.  If estimates do not appear to be converging, increase the burn in period.{p_end}
74
82
75
83
{title:Discussion}
76
84
77
85
{pstd}OpenIRT estimates 2PL and 3PL Item Response Theory (IRT) models for dichotomous (correct / incorrect) data using both Bayesian and Maximum Likelihood methods.{p_end}
78
86
79
{pstd}The software allows for missing (free) and fixed item parameters, abilities, and responses. This allows, for instance, equating of multiple overlapping test forms; equating test forms using a known reference population; and placing students on a known ability metric using fixed item parameters from an item data bank such as TIMSS or NAEP (see, e.g., Das and Zajonc(2009)).{p_end}
87
{pstd}The software allows for missing (free) and fixed item parameters, abilities, and responses. 
88
This allows, for instance, equating of multiple overlapping test forms; equating test forms using a known reference population; and placing students on a known ability metric using 
89
fixed item parameters from an item data bank such as TIMSS or NAEP (see, e.g., Das and Zajonc(2009)).{p_end}
80
90
81
{pstd}Unlike some other IRT software, OpenIRT includes both Bayesian MCMC and Maximum Likelihood estimates.  The software estimates expected posterior (EAP), plausible value (PV), and maximum likelihood (MLE) estimates of the underlying latent trait -- often called theta or ability.{p_end}
91
{pstd}Unlike some other IRT software, OpenIRT includes both Bayesian MCMC and Maximum Likelihood estimates.  
92
The software estimates expected posterior (EAP), plausible value (PV), and maximum likelihood (MLE) estimates of the underlying latent trait -- often called theta or ability.{p_end}
82
93
83
{pstd}Plausible values, or multiple imputations (Rubin, 1987), are draws from the posterior of each respondent's ability parameter.  While potentially poor measures of each respondent's ability, multiple imputations allow accurate estimation of distributional quantities, such as the upper and lower quartiles, or fraction of students passing a particular threshold.  If the number of items is small, EAP and MLE estimates will generally yield very poor estimates of such quantities, with EAP underestimating the standard deviation and MLE estimates overestimating the standard deviation.  See Das and Zajonc (2009) and Mislevy et al (1992).{p_end}
94
{pstd}Plausible values, or multiple imputations (Rubin, 1987), are draws from the posterior of each respondent's ability parameter.  
95
While potentially poor measures of each respondent's ability, multiple imputations allow accurate estimation of distributional quantities,
96
such as the upper and lower quartiles, or fraction of students passing a particular threshold.  
97
If the number of items is small, EAP and MLE estimates will generally yield very poor estimates of such quantities, with EAP underestimating the standard deviation and MLE estimates overestimating the standard deviation.
98
See Das and Zajonc (2009) and Mislevy et al (1992).{p_end}
84
99
85
{pstd}The exact priors used can be seen and changed by examining openirt.ini in the usersite directory.  The priors were calibrated using the NAEP item bank and should perform well under a broad range of scenarios.{p_end}
100
{pstd}The exact priors used can be seen and changed by examining openirt.ini in the usersite directory.  
101
The priors were calibrated using the NAEP item bank and should perform well under a broad range of scenarios.{p_end}
86
102
87
{pstd}{it:Note on speed}: Estimation can be slow due to the large number of free  parameters estimated using MCMC simulation.  Users with large data sets may wish to use small subsamples of data before running an analysis on the full sample.  On many systems a built in progress bar does not currently display in Stata.{p_end}
103
{pstd}{it:Note on speed}: Estimation can be slow due to the large number of free  parameters estimated using MCMC simulation.  
104
Users with large data sets may wish to use small subsamples of data before running an analysis on the full sample.  
105
On many systems a built in progress bar does not currently display in Stata.{p_end}
88
106
89
107
{title:General instructions}
90
108
92
110
93
111
{phang}1. For complex tests, create a {opt fixed_item_file}.  A fixed item file is required if you have both 2PL and 3PL items or if any of the items are fixed.{p_end}
94
112
95
{phang}2. Load the response data.  Responses should be coded 0/1 (numeric) for incorrect/correct.  For multiple tests forms, each row (unit) should include all possible items; items that a unit did not receive should be set to missing.  Items must all have the same prefix, e.g., item1, item2, etc.{p_end}
113
{phang}2. Load the response data.  Responses should be coded 0/1 (numeric) for incorrect/correct.  
114
For multiple tests forms, each row (unit) should include all possible items; items that a unit did not receive should be set to missing.  Items must all have the same prefix, e.g., item1, item2, etc.{p_end}
96
115
97
116
{phang}3. Run the appropriate openirt command.  {p_end}
98
117
113
132
{phang}Score overlapping exams:{p_end}
114
133
{phang2}{cmd:. openirt, id(id) save_item_parameters("items.dta") save_trait_parameters("traits.dta") item_prefix("item")}{p_end}
115
134
116
{title:Examples:  Linking to fixed item parameters, such as TIMSS or NAEP.}
135
{title:Examples:  Linking to fixed item parameters, e.g. TIMSS.}
117
136
118
{phang}Create 10 fixed items:{p_end}
119
{phang2}{cmd:. sysuse naep_items, clear}{p_end}
120
{phang2}{cmd:. keep if id < 10}{p_end}
137
{phang}Create item parameter file:{p_end}
138
{phang2}{cmd:. sysuse timss_items, clear}{p_end}
121
139
{phang2}{cmd:. save fixed_items, replace}{p_end}
122
140
123
{phang}Score exam using mixed fixed and free items:{p_end}
124
{phang2}{cmd:. sysuse naep_children, clear}{p_end}
125
{phang2}{cmd:. openirt, id(id) save_item_parameters("items.dta") save_trait_parameters("traits.dta") fixed_item_file("fixed_items.dta") item_prefix("item")}{p_end}
141
{phang}Score exam:{p_end}
142
{phang2}{cmd:. sysuse timss_children, clear}{p_end}
143
{phang2}{cmd:. openirt, id(id) save_item_parameters("items.dta") save_trait_parameters("traits.dta") fixed_item_file("fixed_items.dta") item_prefix(q)}{p_end}
144
{phang2}{cmd:. use traits, clear}{p_end}
126
145
127
146
{phang}Rescale to TIMSS scale (mu=500 sd=100), see TIMSS 1999.{p_end}
147
{phang2}{cmd:. foreach x of varlist theta_eap theta_mle theta_pv1 theta_pv2 theta_pv3 theta_pv4 theta_pv5 {	replace `x' = `x'*100 + 500 } }{p_end}
128
148
{title:References}
129
149
130
150
{phang}
@@ -137,6 +157,10 @@ Mislevy, R.J. and Beaton, A.E. and Kapla
137
157
{phang}
138
158
Patz, R.J. and Junker, B.W. (1999) "A straightforward approach to Markov chain Monte Carlo methods for item response models" {it:Journal of Educational and Behavioral Statistics}. 24:2
139
159
160
{phang}
161
TIMSS 1999, "Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales", Chapter 13.
162
Online: http://timss.bc.edu/timss1999b/pdf/T99B_TR_Chap13.pdf
163
140
164
{phang} 
141
165
Van der Linden, W.J. and Hambleton, R.K. (1997) {it:Handbook of modern item response theory}. Springer Verlag.
142
166

Up to file-list Stata/openirt_examples.do:

@@ -168,20 +168,27 @@ twoway (scatter theta_mle theta) (functi
168
168
* Example 3: Link to TIMSS using test formed from TIMSS item bank.
169
169
sysuse timss_items, clear
170
170
save fixed_items, replace
171
172
171
sysuse timss_children, clear
173
172
openirt, id(id) save_item_parameters("items.dta") save_trait_parameters("traits.dta") ///
174
	fixed_item_file("fixed_items.dta") item_prefix(item)
173
	fixed_item_file("fixed_items.dta") item_prefix(q)
174
* load results
175
use traits, clear
176
* place on TIMSS scale (mu=500 sd=100), see TIMSS 1999.
177
foreach x of varlist theta_eap theta_mle theta_pv1 theta_pv2 theta_pv3 theta_pv4 theta_pv5 {
178
	replace `x' = `x'*100 + 500
179
}
175
180
176
* Merge in ability estimates
177
merge id using traits, sort
181
kdensity(theta_pv1), bw(15) gen(x1 d1)
182
kdensity(theta_pv2), bw(15) gen(x2 d2) at(x1)
183
kdensity(theta_pv3), bw(15) gen(x3 d3) at(x1)
184
kdensity(theta_pv4), bw(15) gen(x4 d4) at(x1)
185
kdensity(theta_pv5), bw(15) gen(x5 d5) at(x1)
186
egen d = rowmean(d*)
187
line(d x1)
178
188
179
* Graph TRUE vs EAP
180
twoway (scatter theta_eap theta) (function y=x, range(-3 3)), ///
181
	xtitle("Theta (True)") ytitle("Theta (EAP)") title("") ///
182
  text(3 3 "y = x", place(e)) legend(off)
189
twoway (kdensity theta_eap, bw(20)) ///
190
	(line d x1) (kdensity theta_mle, bw(20)), ///
191
	xtitle("Theta") title("EAP, PV, MLE Estimates") ///
192
	legend(order(1 "EAP" 2 "PV" 3 "MLE"))
183
193
184
* Graph TRUE vs MLE
185
twoway (scatter theta_mle theta) (function y=x, range(-3 3)), ///
186
	xtitle("Theta (True)") ytitle("Theta (MLE)") title("") ///
187
  text(3 3 "y = x", place(e)) legend(off)
194
drop x* d*