HIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data

Abstract Background Estimating the time since HIV infection (TSI) at population level is essential for tracking changes in the global HIV epidemic. Most methods for determining TSI give a binary classification of infections as recent or non-recent within a window of several months, and cannot assess...

Full description

Saved in:
Bibliographic Details
Main Authors: Tanya Golubchik, Lucie Abeler-Dörner, Matthew Hall, Chris Wymant, David Bonsall, George Macintyre-Cockett, Laura Thomson, Jared M. Baeten, Connie L. Celum, Ronald M. Galiwango, Barry Kosloff, Mohammed Limbada, Andrew Mujugira, Nelly R. Mugo, Astrid Gall, François Blanquart, Margreet Bakker, Daniela Bezemer, Swee Hoe Ong, Jan Albert, Norbert Bannert, Jacques Fellay, Barbara Gunsenheimer-Bartmeyer, Huldrych F. Günthard, Pia Kivelä, Roger D. Kouyos, Laurence Meyer, Kholoud Porter, Ard van Sighem, Mark van der Valk, Ben Berkhout, Paul Kellam, Marion Cornelissen, Peter Reiss, Helen Ayles, David N. Burns, Sarah Fidler, Mary Kate Grabowski, Richard Hayes, Joshua T. Herbeck, Joseph Kagaayi, Pontiano Kaleebu, Jairam R. Lingappa, Deogratius Ssemwanga, Susan H. Eshleman, Myron S. Cohen, Oliver Ratmann, Oliver Laeyendecker, Christophe Fraser, the HPTN 071 (PopART) Phylogenetics protocol team, the BEEHIVE consortium and the PANGEA consortium
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06189-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849738049800372224
author Tanya Golubchik
Lucie Abeler-Dörner
Matthew Hall
Chris Wymant
David Bonsall
George Macintyre-Cockett
Laura Thomson
Jared M. Baeten
Connie L. Celum
Ronald M. Galiwango
Barry Kosloff
Mohammed Limbada
Andrew Mujugira
Nelly R. Mugo
Astrid Gall
François Blanquart
Margreet Bakker
Daniela Bezemer
Swee Hoe Ong
Jan Albert
Norbert Bannert
Jacques Fellay
Barbara Gunsenheimer-Bartmeyer
Huldrych F. Günthard
Pia Kivelä
Roger D. Kouyos
Laurence Meyer
Kholoud Porter
Ard van Sighem
Mark van der Valk
Ben Berkhout
Paul Kellam
Marion Cornelissen
Peter Reiss
Helen Ayles
David N. Burns
Sarah Fidler
Mary Kate Grabowski
Richard Hayes
Joshua T. Herbeck
Joseph Kagaayi
Pontiano Kaleebu
Jairam R. Lingappa
Deogratius Ssemwanga
Susan H. Eshleman
Myron S. Cohen
Oliver Ratmann
Oliver Laeyendecker
Christophe Fraser
the HPTN 071 (PopART) Phylogenetics protocol team, the BEEHIVE consortium and the PANGEA consortium
author_facet Tanya Golubchik
Lucie Abeler-Dörner
Matthew Hall
Chris Wymant
David Bonsall
George Macintyre-Cockett
Laura Thomson
Jared M. Baeten
Connie L. Celum
Ronald M. Galiwango
Barry Kosloff
Mohammed Limbada
Andrew Mujugira
Nelly R. Mugo
Astrid Gall
François Blanquart
Margreet Bakker
Daniela Bezemer
Swee Hoe Ong
Jan Albert
Norbert Bannert
Jacques Fellay
Barbara Gunsenheimer-Bartmeyer
Huldrych F. Günthard
Pia Kivelä
Roger D. Kouyos
Laurence Meyer
Kholoud Porter
Ard van Sighem
Mark van der Valk
Ben Berkhout
Paul Kellam
Marion Cornelissen
Peter Reiss
Helen Ayles
David N. Burns
Sarah Fidler
Mary Kate Grabowski
Richard Hayes
Joshua T. Herbeck
Joseph Kagaayi
Pontiano Kaleebu
Jairam R. Lingappa
Deogratius Ssemwanga
Susan H. Eshleman
Myron S. Cohen
Oliver Ratmann
Oliver Laeyendecker
Christophe Fraser
the HPTN 071 (PopART) Phylogenetics protocol team, the BEEHIVE consortium and the PANGEA consortium
author_sort Tanya Golubchik
collection DOAJ
description Abstract Background Estimating the time since HIV infection (TSI) at population level is essential for tracking changes in the global HIV epidemic. Most methods for determining TSI give a binary classification of infections as recent or non-recent within a window of several months, and cannot assess the cumulative impact of an intervention. Results We developed a Random Forest Regression model, HIV-phyloTSI, which combines measures of within-host diversity and divergence to generate continuous TSI estimates directly from viral deep-sequencing data, with no need for additional variables. HIV-phyloTSI provides a continuous measure of TSI up to 9 years, with a mean absolute error of less than 12 months overall and less than 5 months for infections with a TSI of up to a year. It performs equally well for all major HIV subtypes based on data from African and European cohorts. Conclusions We demonstrate how HIV-phyloTSI can be used for incidence estimates on a population level.
format Article
id doaj-art-d0cd34dad9aa4ecb9eede01d14c17bbd
institution DOAJ
issn 1471-2105
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-d0cd34dad9aa4ecb9eede01d14c17bbd2025-08-20T03:06:43ZengBMCBMC Bioinformatics1471-21052025-08-0126112110.1186/s12859-025-06189-yHIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence dataTanya Golubchik0Lucie Abeler-Dörner1Matthew Hall2Chris Wymant3David Bonsall4George Macintyre-Cockett5Laura Thomson6Jared M. Baeten7Connie L. Celum8Ronald M. Galiwango9Barry Kosloff10Mohammed Limbada11Andrew Mujugira12Nelly R. Mugo13Astrid Gall14François Blanquart15Margreet Bakker16Daniela Bezemer17Swee Hoe Ong18Jan Albert19Norbert Bannert20Jacques Fellay21Barbara Gunsenheimer-Bartmeyer22Huldrych F. Günthard23Pia Kivelä24Roger D. Kouyos25Laurence Meyer26Kholoud Porter27Ard van Sighem28Mark van der Valk29Ben Berkhout30Paul Kellam31Marion Cornelissen32Peter Reiss33Helen Ayles34David N. Burns35Sarah Fidler36Mary Kate Grabowski37Richard Hayes38Joshua T. Herbeck39Joseph Kagaayi40Pontiano Kaleebu41Jairam R. Lingappa42Deogratius Ssemwanga43Susan H. Eshleman44Myron S. Cohen45Oliver Ratmann46Oliver Laeyendecker47Christophe Fraser48the HPTN 071 (PopART) Phylogenetics protocol team, the BEEHIVE consortium and the PANGEA consortiumPandemic Sciences Institute and Big Data Institute, Nuffield Department of Medicine, University of OxfordPandemic Sciences Institute and Big Data Institute, Nuffield Department of Medicine, University of OxfordPandemic Sciences Institute and Big Data Institute, Nuffield Department of Medicine, University of OxfordPandemic Sciences Institute and Big Data Institute, Nuffield Department of Medicine, University of OxfordPandemic Sciences Institute and Big Data Institute, Nuffield Department of Medicine, University of OxfordPandemic Sciences Institute and Big Data Institute, Nuffield Department of Medicine, University of OxfordPandemic Sciences Institute and Big Data Institute, Nuffield Department of Medicine, University of OxfordDepartment of Global Health, University of WashingtonDepartment of Global Health, University of WashingtonRakai Health Sciences ProgramLondon School of Hygiene and Tropical MedicineLondon School of Hygiene and Tropical MedicineInfectious Diseases Institute, Makerere UniversityDepartment of Global Health, University of WashingtonEuropean Molecular Biology Laboratory, European Bioinformatics InstituteCentre for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, PSL Research UniversityMedical Microbiology and Infection Prevention, Amsterdam UMC, Location AMCStichting HIV Monitoring, Amsterdam UMC, Location AMCWellcome Sanger InstituteDepartment of Microbiology, Tumor and Cell Biology, Karolinska InstitutetDivision for HIV and Other Retroviruses, Robert Koch InstituteSchool of Life Sciences, Ecole Polytechnique Fédérale de LausanneDepartment of Infectious Disease Epidemiology, Robert Koch-InstituteDivision of Infectious Diseases and Hospital Epidemiology, University Hospital ZurichDepartment of Infectious Diseases, Helsinki University HospitalDivision of Infectious Diseases and Hospital Epidemiology, University Hospital ZurichINSERM CESP U1018, APHP, Service de Santé Publique, Hôpital de Bicêtre, Université Paris SaclayInstitute for Global Health, University College LondonStichting HIV Monitoring, Amsterdam UMC, Location AMCAmsterdam UMC Location MeibergdreefMedical Microbiology and Infection Prevention, Amsterdam UMC, Location AMCDepartment of Infectious Diseases, Department of Medicine, Imperial College LondonMedical Microbiology and Infection Prevention, Amsterdam UMC, Location AMCStichting HIV Monitoring, Amsterdam UMC, Location AMCLondon School of Hygiene and Tropical MedicineDivision of AIDS, National Institute of Allergy and Infectious Diseases, National Institutes of HealthDepartment of Infectious Disease Epidemiology, Imperial CollegeDepartment of Epidemiology, Johns Hopkins Bloomberg School of Public HealthLondon School of Hygiene and Tropical MedicineInstitute for Disease ModelingRakai Health Sciences ProgramMedical Research Council (MRC), Uganda Virus Research Institute (UVRI)Department of Global Health, University of WashingtonMedical Research Council (MRC), Uganda Virus Research Institute (UVRI)Department of Pathology, Johns Hopkins University School of MedicineDepartment of Medicine, University of North Carolina at Chapel HillDepartment of Mathematics and Imperial-X, Imperial CollegeDivision of Intramural Research, National Institute of Allergy and Infectious Disease, National Institutes of MedicinePandemic Sciences Institute and Big Data Institute, Nuffield Department of Medicine, University of OxfordAbstract Background Estimating the time since HIV infection (TSI) at population level is essential for tracking changes in the global HIV epidemic. Most methods for determining TSI give a binary classification of infections as recent or non-recent within a window of several months, and cannot assess the cumulative impact of an intervention. Results We developed a Random Forest Regression model, HIV-phyloTSI, which combines measures of within-host diversity and divergence to generate continuous TSI estimates directly from viral deep-sequencing data, with no need for additional variables. HIV-phyloTSI provides a continuous measure of TSI up to 9 years, with a mean absolute error of less than 12 months overall and less than 5 months for infections with a TSI of up to a year. It performs equally well for all major HIV subtypes based on data from African and European cohorts. Conclusions We demonstrate how HIV-phyloTSI can be used for incidence estimates on a population level.https://doi.org/10.1186/s12859-025-06189-yHIVNext-generation sequencingRandom forestRecency of infectionTime since infection
spellingShingle Tanya Golubchik
Lucie Abeler-Dörner
Matthew Hall
Chris Wymant
David Bonsall
George Macintyre-Cockett
Laura Thomson
Jared M. Baeten
Connie L. Celum
Ronald M. Galiwango
Barry Kosloff
Mohammed Limbada
Andrew Mujugira
Nelly R. Mugo
Astrid Gall
François Blanquart
Margreet Bakker
Daniela Bezemer
Swee Hoe Ong
Jan Albert
Norbert Bannert
Jacques Fellay
Barbara Gunsenheimer-Bartmeyer
Huldrych F. Günthard
Pia Kivelä
Roger D. Kouyos
Laurence Meyer
Kholoud Porter
Ard van Sighem
Mark van der Valk
Ben Berkhout
Paul Kellam
Marion Cornelissen
Peter Reiss
Helen Ayles
David N. Burns
Sarah Fidler
Mary Kate Grabowski
Richard Hayes
Joshua T. Herbeck
Joseph Kagaayi
Pontiano Kaleebu
Jairam R. Lingappa
Deogratius Ssemwanga
Susan H. Eshleman
Myron S. Cohen
Oliver Ratmann
Oliver Laeyendecker
Christophe Fraser
the HPTN 071 (PopART) Phylogenetics protocol team, the BEEHIVE consortium and the PANGEA consortium
HIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data
BMC Bioinformatics
HIV
Next-generation sequencing
Random forest
Recency of infection
Time since infection
title HIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data
title_full HIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data
title_fullStr HIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data
title_full_unstemmed HIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data
title_short HIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data
title_sort hiv phylotsi subtype independent estimation of time since hiv 1 infection for cross sectional measures of population incidence using deep sequence data
topic HIV
Next-generation sequencing
Random forest
Recency of infection
Time since infection
url https://doi.org/10.1186/s12859-025-06189-y
work_keys_str_mv AT tanyagolubchik hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT lucieabelerdorner hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT matthewhall hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT chriswymant hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT davidbonsall hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT georgemacintyrecockett hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT laurathomson hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT jaredmbaeten hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT connielcelum hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT ronaldmgaliwango hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT barrykosloff hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT mohammedlimbada hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT andrewmujugira hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT nellyrmugo hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT astridgall hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT francoisblanquart hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT margreetbakker hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT danielabezemer hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT sweehoeong hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT janalbert hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT norbertbannert hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT jacquesfellay hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT barbaragunsenheimerbartmeyer hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT huldrychfgunthard hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT piakivela hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT rogerdkouyos hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT laurencemeyer hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT kholoudporter hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT ardvansighem hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT markvandervalk hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT benberkhout hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT paulkellam hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT marioncornelissen hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT peterreiss hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT helenayles hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT davidnburns hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT sarahfidler hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT marykategrabowski hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT richardhayes hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT joshuatherbeck hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT josephkagaayi hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT pontianokaleebu hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT jairamrlingappa hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT deogratiusssemwanga hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT susanheshleman hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT myronscohen hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT oliverratmann hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT oliverlaeyendecker hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT christophefraser hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata
AT thehptn071popartphylogeneticsprotocolteamthebeehiveconsortiumandthepangeaconsortium hivphylotsisubtypeindependentestimationoftimesincehiv1infectionforcrosssectionalmeasuresofpopulationincidenceusingdeepsequencedata