Using self-generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students: a cohort study

Abstract Objective This study aimed to (i) describe the procedures for generating self-generated identification codes (SGICs) in a prospective longitudinal evaluation of a sexual health program for secondary school students in Hong Kong; (ii) outline the matching strategies and processes; (iii) exam...

Full description

Saved in:
Bibliographic Details
Main Authors: Edmond Pui Hang Choi, Ellie Bostwick Andres, Heidi Sze Lok Fan, Lai Ming Ho, Alice Wai Chi Fung, Kevin Wing Chung Lau, Neda Hei Tung Ng, Monique Yeung, Janice Mary Johnston
Format: Article
Language:English
Published: BMC 2025-06-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-03028-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850224240726376448
author Edmond Pui Hang Choi
Ellie Bostwick Andres
Heidi Sze Lok Fan
Lai Ming Ho
Alice Wai Chi Fung
Kevin Wing Chung Lau
Neda Hei Tung Ng
Monique Yeung
Janice Mary Johnston
author_facet Edmond Pui Hang Choi
Ellie Bostwick Andres
Heidi Sze Lok Fan
Lai Ming Ho
Alice Wai Chi Fung
Kevin Wing Chung Lau
Neda Hei Tung Ng
Monique Yeung
Janice Mary Johnston
author_sort Edmond Pui Hang Choi
collection DOAJ
description Abstract Objective This study aimed to (i) describe the procedures for generating self-generated identification codes (SGICs) in a prospective longitudinal evaluation of a sexual health program for secondary school students in Hong Kong; (ii) outline the matching strategies and processes; (iii) examine rates of successful matching and associated factors; and (iv) compare the responses of participants whose data could be matched to those whose data could not. Methods A prospective longitudinal cohort study was conducted. The SGIC comprised a 5-element code with 4 digits and 3 letters. A matching algorithm was developed to link baseline and follow-up data collected from students in Years 1 to 3 (n = 1,064) during the 2019–2020 school year. Matching success and associated factors were analyzed, and responses from matched and unmatched participants were compared. Results The rate of perfectly matched cases was 49.06%, while 23.59% were partially matched, and 27.35% were unmatched. Logistic regression analysis revealed that male students (adjusted odds ratio [aOR]: 0.63) and Year 1 students (vs. Year 3; aOR: 0.56) were less likely to be perfectly matched. Compared to unmatched cases, perfectly and partially matched cases were less likely to have missing values and more likely to exhibit positive attitudes toward the sexual health program and related topics, such as the importance of sexual health, equal relationships, and condom use. Conclusion The use of SGICs successfully matched approximately 72.65% of the study sample over a one-year period. These findings highlight the potential of SGICs as a tool for longitudinal data matching while underscoring the need for further refinement of code generation processes and matching algorithms to minimize data wastage and improve effectiveness.
format Article
id doaj-art-39715bf4710140bbbb20c97b3594e757
institution OA Journals
issn 1472-6947
language English
publishDate 2025-06-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-39715bf4710140bbbb20c97b3594e7572025-08-20T02:05:41ZengBMCBMC Medical Informatics and Decision Making1472-69472025-06-0125111210.1186/s12911-025-03028-1Using self-generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students: a cohort studyEdmond Pui Hang Choi0Ellie Bostwick Andres1Heidi Sze Lok Fan2Lai Ming Ho3Alice Wai Chi Fung4Kevin Wing Chung Lau5Neda Hei Tung Ng6Monique Yeung7Janice Mary Johnston8School of Nursing, University of Hong KongDuke-NUS Medical SchoolThe University of British Columbia - Okanagan CampusSchool of Public Health, University of Hong KongMother’s ChoiceMother’s ChoiceMother’s ChoiceMother’s ChoiceSchool of Public Health, University of Hong KongAbstract Objective This study aimed to (i) describe the procedures for generating self-generated identification codes (SGICs) in a prospective longitudinal evaluation of a sexual health program for secondary school students in Hong Kong; (ii) outline the matching strategies and processes; (iii) examine rates of successful matching and associated factors; and (iv) compare the responses of participants whose data could be matched to those whose data could not. Methods A prospective longitudinal cohort study was conducted. The SGIC comprised a 5-element code with 4 digits and 3 letters. A matching algorithm was developed to link baseline and follow-up data collected from students in Years 1 to 3 (n = 1,064) during the 2019–2020 school year. Matching success and associated factors were analyzed, and responses from matched and unmatched participants were compared. Results The rate of perfectly matched cases was 49.06%, while 23.59% were partially matched, and 27.35% were unmatched. Logistic regression analysis revealed that male students (adjusted odds ratio [aOR]: 0.63) and Year 1 students (vs. Year 3; aOR: 0.56) were less likely to be perfectly matched. Compared to unmatched cases, perfectly and partially matched cases were less likely to have missing values and more likely to exhibit positive attitudes toward the sexual health program and related topics, such as the importance of sexual health, equal relationships, and condom use. Conclusion The use of SGICs successfully matched approximately 72.65% of the study sample over a one-year period. These findings highlight the potential of SGICs as a tool for longitudinal data matching while underscoring the need for further refinement of code generation processes and matching algorithms to minimize data wastage and improve effectiveness.https://doi.org/10.1186/s12911-025-03028-1AdolescentsLongitudinal studyCohort studiesAnonymitySelf-generated identification codesSexual health
spellingShingle Edmond Pui Hang Choi
Ellie Bostwick Andres
Heidi Sze Lok Fan
Lai Ming Ho
Alice Wai Chi Fung
Kevin Wing Chung Lau
Neda Hei Tung Ng
Monique Yeung
Janice Mary Johnston
Using self-generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students: a cohort study
BMC Medical Informatics and Decision Making
Adolescents
Longitudinal study
Cohort studies
Anonymity
Self-generated identification codes
Sexual health
title Using self-generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students: a cohort study
title_full Using self-generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students: a cohort study
title_fullStr Using self-generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students: a cohort study
title_full_unstemmed Using self-generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students: a cohort study
title_short Using self-generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students: a cohort study
title_sort using self generated identification codes to match anonymous longitudinal data in a sexual health study of secondary school students a cohort study
topic Adolescents
Longitudinal study
Cohort studies
Anonymity
Self-generated identification codes
Sexual health
url https://doi.org/10.1186/s12911-025-03028-1
work_keys_str_mv AT edmondpuihangchoi usingselfgeneratedidentificationcodestomatchanonymouslongitudinaldatainasexualhealthstudyofsecondaryschoolstudentsacohortstudy
AT elliebostwickandres usingselfgeneratedidentificationcodestomatchanonymouslongitudinaldatainasexualhealthstudyofsecondaryschoolstudentsacohortstudy
AT heidiszelokfan usingselfgeneratedidentificationcodestomatchanonymouslongitudinaldatainasexualhealthstudyofsecondaryschoolstudentsacohortstudy
AT laimingho usingselfgeneratedidentificationcodestomatchanonymouslongitudinaldatainasexualhealthstudyofsecondaryschoolstudentsacohortstudy
AT alicewaichifung usingselfgeneratedidentificationcodestomatchanonymouslongitudinaldatainasexualhealthstudyofsecondaryschoolstudentsacohortstudy
AT kevinwingchunglau usingselfgeneratedidentificationcodestomatchanonymouslongitudinaldatainasexualhealthstudyofsecondaryschoolstudentsacohortstudy
AT nedaheitungng usingselfgeneratedidentificationcodestomatchanonymouslongitudinaldatainasexualhealthstudyofsecondaryschoolstudentsacohortstudy
AT moniqueyeung usingselfgeneratedidentificationcodestomatchanonymouslongitudinaldatainasexualhealthstudyofsecondaryschoolstudentsacohortstudy
AT janicemaryjohnston usingselfgeneratedidentificationcodestomatchanonymouslongitudinaldatainasexualhealthstudyofsecondaryschoolstudentsacohortstudy