Computer Adaptive Testing (CAT) Technology Substantially Reduces Patient Reported Outcome Measures (PROM) Survey Completion Time While Maintaining Results Accuracy*.
PROMS Quantify and Express Outcomes From Orthopedic Treatments
Patient reported outcome measures (PROMs) are well-recognized tools for evaluating the impact of orthopedic treatments. These surveys are a credentialed (and often the only) way to obtain quantitative, meaningful assessments of the effect of treatment on the quality of life and level of function experienced by patients. The idea that patient-produced assessments are a valuable, and perhaps the most pertinent, measure of health outcomes has widespread support in the clinical and medical research community.
CAT Technology Offers Opportunity for Reduced Completion Times
General adoption of the survey tools that obtain these assessments in everyday clinical practice is increasing but one limiting factor has been the burden of survey completion placed on patients and physicians. For patients, completing these often-numerous forms is time-consuming and should be repeated post-surgery at appropriate intervals to longitudinally evaluate outcomes. For elderly patients, the length of some PROMS, and the Quality of Life surveys that frequently accompany them, can be physically and mentally challenging. For physicians, encouraging and managing patients’ completion of outcome measures can interrupt clinic work-flow, be time-demanding of staff and physicians, and present challenges to managing secure data storage.
*Selected findings and charts presented from a forthcoming article in JSES (in press); https://doi.org/10.1016/j.jse.2018.11.068″.
While the use of computer-administrated media rather than paper-and-pencil forms has lessened these burdens,1 Computer Adaptive Testing (CAT) technology has been proven to meaningfully reduce the time and effort required by patients to complete the surveys which can result in higher levels of patient satisfaction for this activity and higher completion rates. By deploying “Machine Learning”2 techniques that tailor questions to the specific traits of the respondent, CAT technology can allow accurate outcome evaluations to be obtained with fewer questions and thus less patient effort. The defining characteristic of CAT is that, at any point, the next question to be asked is chosen based on the information already obtained.
OBERD CAT Optimizes Question Response Time While Maintaining Results Integrity
OBERD has developed CAT technology as a clinical practice tool for outcome measure collection. Using machine learning programs that analyze how response patterns affect the overall score for completed forms, the OBERD-CAT technology constructs rules for optimal ways to ask the questions. The initial application was for the ASES form used to assess outcomes from a variety of shoulder and elbow surgeries. This application was initially validated, with results published in a forthcoming article in the Journal of Shoulder and Elbow Surgery.3 Subsequent applications were applied successfully to all of the other PROMs recommended for use by the American Academy of Orthopedic Surgeons (AAOS)4, as well as the Oxford Hip and Knee surgery surveys (OHS and OKS, respectively)5.
The analysis process generating the OBERD CAT for all of the procedures included different collection sites, diagnoses, ages, and both pre-operative and post-operative assessments to ensure that the generalizability of the OBERD-CAT system was being adequately evaluated. Each patient completed a full survey as routinely employed in the clinic; the algorithms constructed by the OBERD-CAT system were then retrospectively applied to the actual patient responses. For each patient, when the CAT version was implemented, responses were supplied from the stored instrument rather than the live patient (which does not affect CAT functionality), and a score was calculated according to the proprietary CAT-specified algorithm developed by OBERD.
The significance of the accuracy achieved by the CAT model for each PROM instrument was viewed in the context of the minimum clinically important difference (MCID) relevant to each form, which is the minimum change in score that must occur for a patient to notice a difference in functional outcome. As an example, previous studies have identified a 12-point change in score as the threshold of noticeability (MCID) for the ASES form.
Based on this analysis, the OBERD-CAT system has proven to be a valid and reliable tool for completing all of the PROM surveys recommended by the AAOS and the TKA and THA surveys developed by Oxford University, at a minimum. The OBERD-CAT system can thus be used interchangeably with the aforementioned full PROM surveys for both research and clinical purposes. Further, there is a significant reduction in question burden with OBERD-CAT, which should translate into saved patient time, and reduced administrative and practice resources.
Appendix – OBERD-CAT Results for ASES
As an example of results produced, the ASES full form has 11 questions, with one question addressing pain and 10 questions addressing shoulder function. The OBERD-CAT model determined that the pain question needed to be asked first, likely because it accounts for 50% of the total score. Of the ten remaining questions in the analyzed data set, OBERD-CAT required 55% (1520/2763) of patients to answer five questions, 23% (635/2763) to answer six questions, 11% (304/2763) to answer seven questions, and 11% to answer eight. Counting the pain question, this produces an average of 6.6 questions per case.
There was near complete identity between the distribution of the OBERD CAT and the full OBERD scores (See Charts below). Additionally, the distribution of the differences in each pair of CAT and full-test ASES scores was clustered around zero. The OBERD-CAT result was within five points of the full test result in 95% (2625/2763) of ASES form scores (See Table Below). The mean OBERD-CAT score for the ASES forms studied was 0.14 points higher than the mean full form score, with a similar spread in values (63.53 to 26.51 vs 63.67 to 26.42); both Pearson’s and the intra-class correlation coefficients were greater than 0.99. The maximum difference between OBERD -CAT scores and full scores was 12.6 points, and only two ASES forms (0.07%; 2/2763) had differences greater than the ASES MCID of 12.
Summary Statistics for full form versus OBERD CAT ASES scores3
Comparison of Long Form and OBERD CAT Distribution Density For ASES3
Difference in ASES Scores Between Long Form and OBERD CAT3
1 Movsas B, Hunt D, Watkins-Bruner D, Lee WR, Tharpe H, Goldstein D, et al. “Can electronic web-based technology improve quality of life data collection? Analysis of Radiation Therapy Oncology Group 0828”, Pract Radiat Oncol 2014;4:187-91. doi: 10.1016/j.prro.2013.07.014.
2 David Champagne, Sastry Chilukuri, Martha Imprialou, Saif Rathore, and Jordan VanLare. “Machine learning and Therapeutics 2.0”, McKinsey and Company, 2019. Machine learning is a form of artificial intelligence in which algorithms learn from data, with or without explicit guidance, to improve predictions from, or classifications of, current data.
3 Otho R. Plummer, PhD, Joseph A. Abboud, MD, John-Erik Bell, MD, Anand M. Murthi, MD, Anthony A Romeo, MD, Priyanks Singh, PhD, Benjamin M Zmistowski, MD, , “A concise shoulder outcome measure: application of computerized adaptive testing to the American Shoulder and Elbow Surgeons Shoulder Assessment”, Journal of Shoulder and Elbow Surgery [In Press] https://doi.org/10.1016/j.jse.2018.11.068.
4 American Academy of Orthopedic Surgery website – URL: https://aaos.org, “Instruments For Collection Of Orthopaedic Quality Data”, as recommended by AAOS in 2018.
5 P. B. Pynsent, D. J. Adams, S. P. Disney, “The Oxford hip and knee outcome questionnaires for arthroplasty”, Outcomes and Standards For Surgical Audit, The Journal of Bone and Joint Surgery (Br), 2005.