Blog posts

2023

ppsy: An R Package for Operational Psychometric Work

5 minute read

Published:

I haven’t been updating my blog ever since graduation. One main reason is that being away from the research field, I don’t have much to say. I have been working as an operational psychometrician for the recent three years and I have to say, the real world is so much different from the theoretical world that I have learned from the college. Although operational world are full of unexpectations and surprises, it does instilled me a love of measurement and psychometrics. Without this love, I would never develop this ppsy package.

What is ppsy

ppsy is an R package that aims to facilitate the daily routine for operational psychometricians. Currently, it can call local IRT software and equating software to run calibration and equating and conduct drift analysis. It can also generate fit plots and conversion tables.

Why I develop ppsy

This idea came to my mind about two years ago when I was writing a script to do a similar job that I had done not long ago. Even though I could reuse most of my code, I still had to modify it because of new conditions and restrictions. When this type of things came back to me again and again, I ended up having multiple versions of scripts to do a “same” job (“same” means that they are not identical but very similar), and they were stored under different folders and administrations. It made me hard to keep track of the changes. I like to code and I enjoy coding with complicated conditions or rules, but reading, comparing, and debugging between different versions of scripts hurts my eye. After I decided that having separate versions of scripts for different administrations is not a good solution for me, I finally took some time and integrated them into one.

How I feel about ppsy

First, the developing process is fun. Even though I already had scripts before I got into this, deciding which blocks of code is the foundational source code and what variables is the function parameters is an interesting experience. I remember there was once that I had written a function, but the next day I had a new idea and I ended up rewriting it completely. The rewritten function turned out to be more concise, more efficient, and better aligned to the overall system of the package.

Ever since ppsy was born, I have been testing and running it for two months and it worked amazingly for my current project. I know that operational world is dynamic and I am always ready to rewrite, add, or give it a complete update.

One thing to note is that each project is different. When I was developing this package, I only have my own projects in my mind. Thus, it will not perfectly suit every needs. However, the logic and system behind it is generalizable.

The logic of ppsy

In my opinion, operational work has a very systematic structure for each administration. Each year or administration is a cycle. Thus, the first step is to build the strucure: create directories for each new administration (newadmin_setup()).

With directories created, the next step is to locate the correct source files and paths for various outputs (get_path()). To be honest, I didn’t expect this would take a seperate step. In the past, I just had it hardcoded in my script. However, when conditions getting more and more complicated, I feel it is much cleaner and simpler to put it into a seperate function. In this way, I will only need to think about this once and it will take care every files I need in different scenarios. This function serves as a hub in the sense that it controls all of the input and output paths.

Due to the requirements of my current project, currently, calibration (irt_cal()) and equating processes (alt_drift and eqSTUIRT_4drift()) utilize IRTPRO, WINSTEPS, and STUIRT softwares and they only support rasch, 2PL, 3PL, and GPC models. Model fit function (modfit_fun()) calculates Q1 and G2 statistics. Drift analysis is based on D2 and fit plots. I may add more model and software to it in future.

Lastly, conversion table is created based on inverse TCC method (get_rsss()).

In the development

I still have some scripts that I wrote before and I plan to add them into this package, including:

  • About sample, e.g., sample representativeness check, oversampling and downsampling, sample plan, etc.
  • Some reliability stuff, e.g. raw score reliability, scale score reliability, etc.
  • Unidimentionality tests
  • Summative reports
  • Propensity score matching
  • Keycheck and adjudication

As I said, each project is different. This package may not be a good fit for your project but I hope the logic I described above could give you some idea. If you are interested in ppsy or need help to build your own, please feel free to contact me. I will be very happy to hear or help.

Enjoy coding!!

2020

Pairwise Deletion v.s. Listwise Deletion

2 minute read

Published:

This summer I am very fortunate to have the chance to validate a newly developed statistical software. One thing I learned is the differences between pairwise deletion and listwise deletion. When both of these two methods are common practices in taking care of missing values, results can be different and choosing which method to use requires more consideration.

2019