LLM-Guided Crowdsourced Test Report Clustering
This paper proposes a clustering method for crowdsourced test reports based on a large language model to solve the limitations of existing methods in processing repeated reports and utilizing multi-modal information. Existing crowdsourced test report clustering methods have significant shortcomings...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10844085/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823857167562702848 |
---|---|
author | Ying Li Ye Zhong Lijuan Yang Yanbo Wang Penghua Zhu |
author_facet | Ying Li Ye Zhong Lijuan Yang Yanbo Wang Penghua Zhu |
author_sort | Ying Li |
collection | DOAJ |
description | This paper proposes a clustering method for crowdsourced test reports based on a large language model to solve the limitations of existing methods in processing repeated reports and utilizing multi-modal information. Existing crowdsourced test report clustering methods have significant shortcomings in handling duplicate reports, ignoring the semantic information of screenshots, and underutilizing the relationship between text and images. The emergence of LLM provides a new way to solve these problems. By integrating the semantic understanding ability of LLM, key information can be extracted from the test report more accurately, and the semantic relationship between screenshots and text descriptions can be used to guide the clustering process, thus improving the accuracy and effectiveness of clustering. The method in this paper uses a pre-trained LLM (such as GPT-4) to encode the text in the test report, and uses a visual model such as CLIP to encode the application screenshots, converting the text descriptions and images into high-dimensional semantic vectors. The cosine similarity is then used to calculate the similarity between the vectors, and semantic binding rules are constructed to guide the clustering process, ensuring that semantically related reports are assigned to the same cluster and semantically different reports are assigned to different clusters. Through experimental verification, this method is significantly superior to traditional methods in several evaluation indicators, demonstrating its great potential in improving the efficiency and quality of crowdsourced test report processing. In the future, this method is expected to be widely used in the process of software testing and maintenance, and further promote technological progress. |
format | Article |
id | doaj-art-e529e1d6bef44620a1a7034523ae2796 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-e529e1d6bef44620a1a7034523ae27962025-02-12T00:02:26ZengIEEEIEEE Access2169-35362025-01-0113248942490410.1109/ACCESS.2025.353096010844085LLM-Guided Crowdsourced Test Report ClusteringYing Li0https://orcid.org/0009-0000-4846-7054Ye Zhong1https://orcid.org/0009-0000-4727-1551Lijuan Yang2Yanbo Wang3Penghua Zhu4North China Institute of Aerospace Engineering, Langfang, ChinaDalian University of Technology, Dalian, ChinaNorth China Institute of Aerospace Engineering, Langfang, ChinaBeijing Aerospace Automatic Control Institute, Beijing, ChinaNorth China Institute of Aerospace Engineering, Langfang, ChinaThis paper proposes a clustering method for crowdsourced test reports based on a large language model to solve the limitations of existing methods in processing repeated reports and utilizing multi-modal information. Existing crowdsourced test report clustering methods have significant shortcomings in handling duplicate reports, ignoring the semantic information of screenshots, and underutilizing the relationship between text and images. The emergence of LLM provides a new way to solve these problems. By integrating the semantic understanding ability of LLM, key information can be extracted from the test report more accurately, and the semantic relationship between screenshots and text descriptions can be used to guide the clustering process, thus improving the accuracy and effectiveness of clustering. The method in this paper uses a pre-trained LLM (such as GPT-4) to encode the text in the test report, and uses a visual model such as CLIP to encode the application screenshots, converting the text descriptions and images into high-dimensional semantic vectors. The cosine similarity is then used to calculate the similarity between the vectors, and semantic binding rules are constructed to guide the clustering process, ensuring that semantically related reports are assigned to the same cluster and semantically different reports are assigned to different clusters. Through experimental verification, this method is significantly superior to traditional methods in several evaluation indicators, demonstrating its great potential in improving the efficiency and quality of crowdsourced test report processing. In the future, this method is expected to be widely used in the process of software testing and maintenance, and further promote technological progress.https://ieeexplore.ieee.org/document/10844085/Large language modelcrowdsourced testingtest report clustering |
spellingShingle | Ying Li Ye Zhong Lijuan Yang Yanbo Wang Penghua Zhu LLM-Guided Crowdsourced Test Report Clustering IEEE Access Large language model crowdsourced testing test report clustering |
title | LLM-Guided Crowdsourced Test Report Clustering |
title_full | LLM-Guided Crowdsourced Test Report Clustering |
title_fullStr | LLM-Guided Crowdsourced Test Report Clustering |
title_full_unstemmed | LLM-Guided Crowdsourced Test Report Clustering |
title_short | LLM-Guided Crowdsourced Test Report Clustering |
title_sort | llm guided crowdsourced test report clustering |
topic | Large language model crowdsourced testing test report clustering |
url | https://ieeexplore.ieee.org/document/10844085/ |
work_keys_str_mv | AT yingli llmguidedcrowdsourcedtestreportclustering AT yezhong llmguidedcrowdsourcedtestreportclustering AT lijuanyang llmguidedcrowdsourcedtestreportclustering AT yanbowang llmguidedcrowdsourcedtestreportclustering AT penghuazhu llmguidedcrowdsourcedtestreportclustering |