Chinese Legal Case Similarity Matching Based on Text Importance Extraction

Similarity case matching can effectively enhance the efficiency of case adjudication and promote judicial fairness. Recent advances in natural language processing (NLP), particularly those based on deep learning technologies, have significantly enhanced the intelligent development of similar case ju...

Full description

Saved in:
Bibliographic Details
Main Authors: Aman Fan, Shaoxi Wang, Yanchuan Wang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11062845/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Similarity case matching can effectively enhance the efficiency of case adjudication and promote judicial fairness. Recent advances in natural language processing (NLP), particularly those based on deep learning technologies, have significantly enhanced the intelligent development of similar case judgments. The BERT model can efficiently extract features from legal texts by utilizing self-attention mechanisms, thereby facilitating subsequent matching tasks. However, traditional BERT models are often constrained by the input text length. To achieve better comparison results for long case descriptions, an iterative unsupervised clustering method is employed to evaluate the importance of legal case texts during contrastive learning. This results in extracted texts that align more closely with cluster centers in the feature space, thus becoming more representative. Extracting key information and retaining important legal texts can reduce the input length for the BERT model, thereby improving its performance. The selected case statement text is fed into a model based on the BERT framework for similarity case matching. Compared to inputting the original case statement text, the approach proposed in the paper can more effectively retain critical information about the case. The accuracy of our method on the test set is 75.08%, outperforming all existing methods on the public CAIL2019-SCM dataset.
ISSN:2169-3536