Real Estate Attribute Value Extraction Using Large Language Models

Attribute value extraction (AVE) is critical in transforming unstructured text into structured data for various applications. While existing datasets for AVE predominantly focus on e-commerce and English language data, there is a lack of publicly available datasets tailored to other domains. This pa...

Full description

Saved in:
Bibliographic Details
Main Authors: Michal Kvet, Miroslav Potocar, Slavomir Tatarka
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10976655/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Attribute value extraction (AVE) is critical in transforming unstructured text into structured data for various applications. While existing datasets for AVE predominantly focus on e-commerce and English language data, there is a lack of publicly available datasets tailored to other domains. This paper introduces the Real Estate Attribute Value Extraction (RAVE) dataset, specifically designed for extracting structured attributes from unstructured real estate advertisements. The RAVE dataset consists of manually annotated Slovak real estate listings, which have been translated into English for broader applicability. The paper evaluates the performance of multiple publicly available large language models in solving the AVE task on RAVE. Through extensive experimentation, we analyse the impact of additional attribute descriptions, selecting relevant sentences, and using ground-truth-based attribute definition in structured output generation. The findings indicate that providing a schema with only relevant attributes (Oracle Attributes) significantly enhances performance and reduces computational overhead while improving the F1 score. Under basic conditions without modifications at the input, the largest model tested, Qwen2.5:32b, achieved a micro F1 score of 10.04%. Applying all tested input modifications (Oracle Attributes, Oracle Sentences, and Additional Descriptions) allowed the largest model tested to achieve a micro F1 score of 97.92%, demonstrating the effectiveness of these techniques in improving extraction accuracy and efficiency. The RAVE dataset is publicly available, facilitating further research in AVE and real estate information extraction.
ISSN:2169-3536