Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering

We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters li...

Full description

Saved in:

Bibliographic Details
Main Authors:	Samin Poudel, Marwan Bikdash
Format:	Article
Language:	English
Published:	Tsinghua University Press 2023-03-01
Series:	Big Data Mining and Analytics
Subjects:	collaborative filtering subsampling accuracy loss models performance loss recommendation system simulation rating matrix root mean square error
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2022.9020024
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832572737523023872
author	Samin Poudel Marwan Bikdash
author_facet	Samin Poudel Marwan Bikdash
author_sort	Samin Poudel
collection	DOAJ
description	We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.
format	Article
id	doaj-art-33ce6682ee824d5fb3454634dd238d60
institution	Kabale University
issn	2096-0654
language	English
publishDate	2023-03-01
publisher	Tsinghua University Press
record_format	Article
series	Big Data Mining and Analytics
spelling	doaj-art-33ce6682ee824d5fb3454634dd238d602025-02-02T07:53:41ZengTsinghua University PressBig Data Mining and Analytics2096-06542023-03-0161728410.26599/BDMA.2022.9020024Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative FilteringSamin Poudel0Marwan Bikdash1Department of Computational Data Science and Engineering, North Carolina A & T State University, Greensboro, NC 27401, USADepartment of Computational Data Science and Engineering, North Carolina A & T State University, Greensboro, NC 27401, USAWe postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.https://www.sciopen.com/article/10.26599/BDMA.2022.9020024collaborative filteringsubsamplingaccuracy loss modelsperformance lossrecommendation systemsimulationrating matrixroot mean square error
spellingShingle	Samin Poudel Marwan Bikdash Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering Big Data Mining and Analytics collaborative filtering subsampling accuracy loss models performance loss recommendation system simulation rating matrix root mean square error
title	Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering
title_full	Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering
title_fullStr	Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering
title_full_unstemmed	Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering
title_short	Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering
title_sort	closed form models of accuracy loss due to subsampling in svd collaborative filtering
topic	collaborative filtering subsampling accuracy loss models performance loss recommendation system simulation rating matrix root mean square error
url	https://www.sciopen.com/article/10.26599/BDMA.2022.9020024
work_keys_str_mv	AT saminpoudel closedformmodelsofaccuracylossduetosubsamplinginsvdcollaborativefiltering AT marwanbikdash closedformmodelsofaccuracylossduetosubsamplinginsvdcollaborativefiltering

Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering

Similar Items