/*************************************************/ /* Written by: */ /* */ /* G. John Geldhof */ /* Graduate Student, Department of Psychology */ /* University of Kansas */ /* */ /* For: */ /* */ /* HTTP://QUANT.KU.EDU */ /* */ /*************************************************/ /*This program reads in LISREL parameter estimates and standard errors, then combines them using Rubin's rules (Rubin, 1987; Shafer & Olsen, 1998). The program will provide output that specifies a point estimate, 95% confidence interval, and gamma for each unique parameter estimate. Gamma represents the 'fraction of missing data,' which is slightly different than the mathematical amount of missing data (see Graham , Olchowski, & Gilreath, 2007)./* Multiple imputed data sets must be run in LIREL using the "RP=" option, with all parameters and standard errors outputted into separate text files. This can be done by adding "PV=" and "SV=" to the OU line of your LISREL syntax. You will also need to provide your fit statistics, which can be outputted using the "GF=" option on LISREL's OU line. For details, see the "Special Topics" LISREL help document available from quant.ku.edu (Geldhof, McConnell, & Selig, 2008).*/ /* THE USER NEEDS TO PROVIDE LOCATIONS FOR THE ESTIMATES, STANDARD ERRORS, AND FIT STATISTICS, AND NEEDS TO INDICATE HOW MANY PARAMETERS WERE ESTIMATED IN LISREL. PLEASE FOLLOW THE BELOW INSTRUCTIONS: */ /*Indicate the location of the parameter estimates (the pv file from LISREL)*/ filename pv "C:\Documents and Settings\geldhof\Desktop\Completed Projects\LISREL Compiler\params.pv"; /*Indicate the location of the estimates of the standard errors (the sv file from LISREL)*/ filename sv "C:\Documents and Settings\geldhof\Desktop\Completed Projects\LISREL Compiler\STDERR.sv"; /*Indicate the location of the fit statistics (the gf file from LISREL)*/ filename gf "C:\Documents and Settings\geldhof\Desktop\Completed Projects\LISREL Compiler\fit.gf"; /*You will need to tell SAS how many parameter estimates should be read in from LISREL's output files. The largest value in the "Parameter Specifications" section of your LISREL output (the section that counts the number of unique parameter estimates, later used to calculate DF) is a good place to start, but this number will not include constrained or alternative parameters. The best way to get this number is to physically open up your PV file and count; each row in this file will contain six parameter estimates. A NOTE ABOUT CONSTRAINED PARAMETERS AND EFFECTS CODING: LISREL does not output constrained parameters in the PV matrix. If your model contains such estimates (for instance when the effects coding method of identification is used), you will need to create an "Alternative Parameter" that is equated IN THE SAME WAY as the constrained estimate. If you simply equate the two, the standard error will be outputted in the SV file, but are not displayed in the LISREL output. Alternative parameters are included in PV and SV files, with the parameter and standard error estimates added to the end of the file. See Geldhof et al., 2008 for details concerning alternative parameters.*/ /*The files that LISREL will output are in the format: so it is important to always leave the "imp t1 t2" statement on the input lines. Start your list of each parameter (or standard error) after the "t2" on the input lines; I find it easiest to label these "p1" "p2" "p3," etc. To make inputting easier, you can just specify a range of variables. In the following example, I have 49 parameter estimates that are read in using the following syntax. You can tailor this to read in however many parameters and standard errors are in your model. EXAMPLE CODE: input imp t1 t2 (p1-p49) (13.) The (13.) is an important aspect of this code, and needs to be left in. LISREL's output files are not space delimited, and this piece of code tells SAS that all of the variables listed in the parentheses are 13 characters long. Preform this process for both the usepv and usesv data sets. */ data usepv; infile pv; input imp t1 t2 (p1-p49) (13.); run; data usesv; infile sv; input imp t1 t2 (s1-s49) (13.); run; /*END OF USER INPUT*/ /************************************** *************************************** ********** DO NOT EDIT BELOW ********** ********** THIS POINT ********** *************************************** **************************************/ data usegf; infile gf; input imp t1 t2 df minfitchi minp normalchi normalp t3 t4 t5 t6 ncp ncplo ncphi minfitval popdiscrepval f0lo f0hi RMSEA RMSEAlo RMSEAhi closefit ECVI ECVIlo ECVIhi ECVIsat ECVIind chiind AICind AIC AICsat CAICind CAIC CAICsat RMR SRMR GFI AGFI PGFI NFI NNFI PNFI CFI IFI RFI CriticalN; run; data svu; set usesv (drop = imp t1 t2); /*drops the imp, t1, and t2 variables from the standard errors dataset*/ run; data pvu; set usepv (drop = imp t1 t2); /*drops the imp, t1, and t2 variables from the parameter estimates dataset*/ run; data gfu; set usegf (drop = imp t1 t2 t3 t4 t5 t6); /*drops uneeded variables from gf file*/ run; proc IML; use pvu;/* }*/ read all into pv;/* } reads the data from both data sets into separate matrices*/ use svu;/* }*/ read all into sv;/* }*/ sv = sv##2; /* turns standard errors into variances*/ use gfu; /*ready to input fit statistics*/ read all into gf; /*input fit statistics*/ avgfit = gf[:,]; /*averages statistics and places in a row vector*/ /*The parameter vector 'pv' contains the estimates that are labelled 'Q' in Rubin's Rules. The vector 'sv' contains the standard errors, which were labelled 'U' by Rubin (1987).*/ m=nrow(pv); /*sets a scalar to equal the number of imputations*/ divm = m##-1; /*one over the number of imputations*/ divm1 = (m-1)##-1; /*one over one minus the number of imputation*/ divmpl = divm + 1; /*creates a scalar for later operations*/ qbar = pv[:,]; /*get averages of each estimate across all imputations*/ ubar = sv[:,]; /*get averages of each standard error across all imputations*/ qmean= j(m,ncol(qbar),1); /*prepare an imputations by variables matrix for ss computation*/ qmean = qmean # qbar; /*impose the means on the qmean matrix using element-wise multiplication*/ ss = (pv-qmean)##2; /*create a matrix of the squared deviations*/ ss = ss[+,]; /*compute the sums of squares for each estimate*/ DO count=1 to m; /*Sets between-imputation SS to 0 for variables that had no missing data*/ IF ss[1,count] < .00000000000000000001 THEN ss[1,count] = 0; end; b = ss # divm1; /*compute the between-imputation variance; note that I had to use the SS method because other methods divide by M, not M-1*/ varcheck = b[+,+]; warning = 'The multiple datasets are not different: You should check your imputation program to make sure the data are correct'; if varcheck < .0000001 then do; print warning[rowname='' colname='']; end; T= ubar + (b # divmpl); /*calculates total variance (T)*/ sqrrtT = T##.5; /*calculates std. deviation for computing confidence intervals*/ df = (m-1) # ((1 + ((ubar#m)/(b#(m+1))))##2); /*degrees of freedom for t-test*/ DO count=1 to m; /*for variables with no missingness*/ IF df[1,count] > 99999999 THEN df[1,count] = 99999999; end; ci95 = tinv(.975,df) # sqrrtT; /*Deviation from the mean for a 95% CI*/ *ci99 = tinv(.995,df) # sqrrtT; do count = 1 to m;/*Adjusts the 'ci' variable for variables with no missingness*/ /*Note that this equation uses total model DF instead of the adjusted DF*/ if df[1,count] = 99999999 then ci95[1,count] = tinv(.975,avgfit[1,1]) # sqrrtT[1,count]; *if df[1,count] = 99999999 then ci99[1,count] = tinv(.995,avgfit[1,1]) # sqrrtT[1,count]; if df[1,count] = 99999999 then df[1,count] = .; end; CIL95 = qbar - ci95; /*lower bounds of the 95% CI*/ CIU95 = qbar + ci95; /*upper bounds for the 95% CI*/ *CIL99 = qbar - ci99; /*lower bounds of the 99% CI*/ *CIU99 = qbar + ci99; /*upper bounds for the 99% CI*/ r = (b#(divm+1))/ubar; /*relative increase in variance due to nonresponse*/ gamma = (r+(2/(df+3)))/(r+1); /*gamma, for calculating efficiency*/ param = 1:ncol(pv); /*create a vector of parameter numbers for output*/ results = param // qbar// CIL95 // CIU95 // ciu99 //gamma; /*concatenates relevant matrices into an output-friendly single matrix*/ rowpar = {'Parameter #', 'Point Estimate', '95%-lower', '95%-higher', 'Gamma'}; /*provide labels for the rows; these go into their own matrix*/ print results[rowname=rowpar colname='']; /*print the results matrix using the labels provided in the 'rows' matrix*/ dfstat = avgfit[1,1]; /*Moves particular fit indices to their own scalar matrices*/ chi = avgfit[1,2]; rmsea = avgfit[1,13]; nnfi = avgfit[1,35]; cfi = avgfit[1,37]; SRMR = avgfit[1,30]; rowfit = {'DF', 'Min. Fit Chi-Sq', 'RMSEA', 'NNFI', 'CFI', 'SRMR'}; Please_Note = 'It is not currently possible to pool fit statistics across imputations. The previous data presents only the average fit statistics across all imputations, which may or may not be an adequate measure of model fit. '; FIT_STATISTICS = dfstat // chi // rmsea // nnfi // cfi // srmr; /*Concactenates fit indices*/ print FIT_STATISTICS[rowname=rowfit colname='']; print Please_Note[rowname='' colname='']; quit;