LLM-Guided Crowdsourced Test Report Clustering

This paper proposes a clustering method for crowdsourced test reports based on a large language model to solve the limitations of existing methods in processing repeated reports and utilizing multi-modal information. Existing crowdsourced test report clustering methods have significant shortcomings...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ying Li, Ye Zhong, Lijuan Yang, Yanbo Wang, Penghua Zhu
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Large language model crowdsourced testing test report clustering
Online Access:	https://ieeexplore.ieee.org/document/10844085/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823857167562702848
author	Ying Li Ye Zhong Lijuan Yang Yanbo Wang Penghua Zhu
author_facet	Ying Li Ye Zhong Lijuan Yang Yanbo Wang Penghua Zhu
author_sort	Ying Li
collection	DOAJ
description	This paper proposes a clustering method for crowdsourced test reports based on a large language model to solve the limitations of existing methods in processing repeated reports and utilizing multi-modal information. Existing crowdsourced test report clustering methods have significant shortcomings in handling duplicate reports, ignoring the semantic information of screenshots, and underutilizing the relationship between text and images. The emergence of LLM provides a new way to solve these problems. By integrating the semantic understanding ability of LLM, key information can be extracted from the test report more accurately, and the semantic relationship between screenshots and text descriptions can be used to guide the clustering process, thus improving the accuracy and effectiveness of clustering. The method in this paper uses a pre-trained LLM (such as GPT-4) to encode the text in the test report, and uses a visual model such as CLIP to encode the application screenshots, converting the text descriptions and images into high-dimensional semantic vectors. The cosine similarity is then used to calculate the similarity between the vectors, and semantic binding rules are constructed to guide the clustering process, ensuring that semantically related reports are assigned to the same cluster and semantically different reports are assigned to different clusters. Through experimental verification, this method is significantly superior to traditional methods in several evaluation indicators, demonstrating its great potential in improving the efficiency and quality of crowdsourced test report processing. In the future, this method is expected to be widely used in the process of software testing and maintenance, and further promote technological progress.
format	Article
id	doaj-art-e529e1d6bef44620a1a7034523ae2796
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-e529e1d6bef44620a1a7034523ae27962025-02-12T00:02:26ZengIEEEIEEE Access2169-35362025-01-0113248942490410.1109/ACCESS.2025.353096010844085LLM-Guided Crowdsourced Test Report ClusteringYing Li0https://orcid.org/0009-0000-4846-7054Ye Zhong1https://orcid.org/0009-0000-4727-1551Lijuan Yang2Yanbo Wang3Penghua Zhu4North China Institute of Aerospace Engineering, Langfang, ChinaDalian University of Technology, Dalian, ChinaNorth China Institute of Aerospace Engineering, Langfang, ChinaBeijing Aerospace Automatic Control Institute, Beijing, ChinaNorth China Institute of Aerospace Engineering, Langfang, ChinaThis paper proposes a clustering method for crowdsourced test reports based on a large language model to solve the limitations of existing methods in processing repeated reports and utilizing multi-modal information. Existing crowdsourced test report clustering methods have significant shortcomings in handling duplicate reports, ignoring the semantic information of screenshots, and underutilizing the relationship between text and images. The emergence of LLM provides a new way to solve these problems. By integrating the semantic understanding ability of LLM, key information can be extracted from the test report more accurately, and the semantic relationship between screenshots and text descriptions can be used to guide the clustering process, thus improving the accuracy and effectiveness of clustering. The method in this paper uses a pre-trained LLM (such as GPT-4) to encode the text in the test report, and uses a visual model such as CLIP to encode the application screenshots, converting the text descriptions and images into high-dimensional semantic vectors. The cosine similarity is then used to calculate the similarity between the vectors, and semantic binding rules are constructed to guide the clustering process, ensuring that semantically related reports are assigned to the same cluster and semantically different reports are assigned to different clusters. Through experimental verification, this method is significantly superior to traditional methods in several evaluation indicators, demonstrating its great potential in improving the efficiency and quality of crowdsourced test report processing. In the future, this method is expected to be widely used in the process of software testing and maintenance, and further promote technological progress.https://ieeexplore.ieee.org/document/10844085/Large language modelcrowdsourced testingtest report clustering
spellingShingle	Ying Li Ye Zhong Lijuan Yang Yanbo Wang Penghua Zhu LLM-Guided Crowdsourced Test Report Clustering IEEE Access Large language model crowdsourced testing test report clustering
title	LLM-Guided Crowdsourced Test Report Clustering
title_full	LLM-Guided Crowdsourced Test Report Clustering
title_fullStr	LLM-Guided Crowdsourced Test Report Clustering
title_full_unstemmed	LLM-Guided Crowdsourced Test Report Clustering
title_short	LLM-Guided Crowdsourced Test Report Clustering
title_sort	llm guided crowdsourced test report clustering
topic	Large language model crowdsourced testing test report clustering
url	https://ieeexplore.ieee.org/document/10844085/
work_keys_str_mv	AT yingli llmguidedcrowdsourcedtestreportclustering AT yezhong llmguidedcrowdsourcedtestreportclustering AT lijuanyang llmguidedcrowdsourcedtestreportclustering AT yanbowang llmguidedcrowdsourcedtestreportclustering AT penghuazhu llmguidedcrowdsourcedtestreportclustering

LLM-Guided Crowdsourced Test Report Clustering

Similar Items