Real Estate Attribute Value Extraction Using Large Language Models
Attribute value extraction (AVE) is critical in transforming unstructured text into structured data for various applications. While existing datasets for AVE predominantly focus on e-commerce and English language data, there is a lack of publicly available datasets tailored to other domains. This pa...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10976655/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Attribute value extraction (AVE) is critical in transforming unstructured text into structured data for various applications. While existing datasets for AVE predominantly focus on e-commerce and English language data, there is a lack of publicly available datasets tailored to other domains. This paper introduces the Real Estate Attribute Value Extraction (RAVE) dataset, specifically designed for extracting structured attributes from unstructured real estate advertisements. The RAVE dataset consists of manually annotated Slovak real estate listings, which have been translated into English for broader applicability. The paper evaluates the performance of multiple publicly available large language models in solving the AVE task on RAVE. Through extensive experimentation, we analyse the impact of additional attribute descriptions, selecting relevant sentences, and using ground-truth-based attribute definition in structured output generation. The findings indicate that providing a schema with only relevant attributes (Oracle Attributes) significantly enhances performance and reduces computational overhead while improving the F1 score. Under basic conditions without modifications at the input, the largest model tested, Qwen2.5:32b, achieved a micro F1 score of 10.04%. Applying all tested input modifications (Oracle Attributes, Oracle Sentences, and Additional Descriptions) allowed the largest model tested to achieve a micro F1 score of 97.92%, demonstrating the effectiveness of these techniques in improving extraction accuracy and efficiency. The RAVE dataset is publicly available, facilitating further research in AVE and real estate information extraction. |
|---|---|
| ISSN: | 2169-3536 |