Using self-generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students: a cohort study

Abstract Objective This study aimed to (i) describe the procedures for generating self-generated identification codes (SGICs) in a prospective longitudinal evaluation of a sexual health program for secondary school students in Hong Kong; (ii) outline the matching strategies and processes; (iii) exam...

Full description

Saved in:
Bibliographic Details
Main Authors: Edmond Pui Hang Choi, Ellie Bostwick Andres, Heidi Sze Lok Fan, Lai Ming Ho, Alice Wai Chi Fung, Kevin Wing Chung Lau, Neda Hei Tung Ng, Monique Yeung, Janice Mary Johnston
Format: Article
Language:English
Published: BMC 2025-06-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-03028-1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Objective This study aimed to (i) describe the procedures for generating self-generated identification codes (SGICs) in a prospective longitudinal evaluation of a sexual health program for secondary school students in Hong Kong; (ii) outline the matching strategies and processes; (iii) examine rates of successful matching and associated factors; and (iv) compare the responses of participants whose data could be matched to those whose data could not. Methods A prospective longitudinal cohort study was conducted. The SGIC comprised a 5-element code with 4 digits and 3 letters. A matching algorithm was developed to link baseline and follow-up data collected from students in Years 1 to 3 (n = 1,064) during the 2019–2020 school year. Matching success and associated factors were analyzed, and responses from matched and unmatched participants were compared. Results The rate of perfectly matched cases was 49.06%, while 23.59% were partially matched, and 27.35% were unmatched. Logistic regression analysis revealed that male students (adjusted odds ratio [aOR]: 0.63) and Year 1 students (vs. Year 3; aOR: 0.56) were less likely to be perfectly matched. Compared to unmatched cases, perfectly and partially matched cases were less likely to have missing values and more likely to exhibit positive attitudes toward the sexual health program and related topics, such as the importance of sexual health, equal relationships, and condom use. Conclusion The use of SGICs successfully matched approximately 72.65% of the study sample over a one-year period. These findings highlight the potential of SGICs as a tool for longitudinal data matching while underscoring the need for further refinement of code generation processes and matching algorithms to minimize data wastage and improve effectiveness.
ISSN:1472-6947