Real Estate Attribute Value Extraction Using Large Language Models

Attribute value extraction (AVE) is critical in transforming unstructured text into structured data for various applications. While existing datasets for AVE predominantly focus on e-commerce and English language data, there is a lack of publicly available datasets tailored to other domains. This pa...

Full description

Saved in:
Bibliographic Details
Main Authors: Michal Kvet, Miroslav Potocar, Slavomir Tatarka
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10976655/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849324375725047808
author Michal Kvet
Miroslav Potocar
Slavomir Tatarka
author_facet Michal Kvet
Miroslav Potocar
Slavomir Tatarka
author_sort Michal Kvet
collection DOAJ
description Attribute value extraction (AVE) is critical in transforming unstructured text into structured data for various applications. While existing datasets for AVE predominantly focus on e-commerce and English language data, there is a lack of publicly available datasets tailored to other domains. This paper introduces the Real Estate Attribute Value Extraction (RAVE) dataset, specifically designed for extracting structured attributes from unstructured real estate advertisements. The RAVE dataset consists of manually annotated Slovak real estate listings, which have been translated into English for broader applicability. The paper evaluates the performance of multiple publicly available large language models in solving the AVE task on RAVE. Through extensive experimentation, we analyse the impact of additional attribute descriptions, selecting relevant sentences, and using ground-truth-based attribute definition in structured output generation. The findings indicate that providing a schema with only relevant attributes (Oracle Attributes) significantly enhances performance and reduces computational overhead while improving the F1 score. Under basic conditions without modifications at the input, the largest model tested, Qwen2.5:32b, achieved a micro F1 score of 10.04%. Applying all tested input modifications (Oracle Attributes, Oracle Sentences, and Additional Descriptions) allowed the largest model tested to achieve a micro F1 score of 97.92%, demonstrating the effectiveness of these techniques in improving extraction accuracy and efficiency. The RAVE dataset is publicly available, facilitating further research in AVE and real estate information extraction.
format Article
id doaj-art-a386687d2d45489c8fb75e474b65af18
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-a386687d2d45489c8fb75e474b65af182025-08-20T03:48:42ZengIEEEIEEE Access2169-35362025-01-0113730767309510.1109/ACCESS.2025.356451110976655Real Estate Attribute Value Extraction Using Large Language ModelsMichal Kvet0https://orcid.org/0000-0003-3937-7473Miroslav Potocar1https://orcid.org/0009-0005-9401-4183Slavomir Tatarka2https://orcid.org/0009-0005-9455-2264Department of Informatics, University of Žilina, Žilina, SlovakiaDepartment of Informatics, University of Žilina, Žilina, SlovakiaDepartment of Information Networks, University of Žilina, Žilina, SlovakiaAttribute value extraction (AVE) is critical in transforming unstructured text into structured data for various applications. While existing datasets for AVE predominantly focus on e-commerce and English language data, there is a lack of publicly available datasets tailored to other domains. This paper introduces the Real Estate Attribute Value Extraction (RAVE) dataset, specifically designed for extracting structured attributes from unstructured real estate advertisements. The RAVE dataset consists of manually annotated Slovak real estate listings, which have been translated into English for broader applicability. The paper evaluates the performance of multiple publicly available large language models in solving the AVE task on RAVE. Through extensive experimentation, we analyse the impact of additional attribute descriptions, selecting relevant sentences, and using ground-truth-based attribute definition in structured output generation. The findings indicate that providing a schema with only relevant attributes (Oracle Attributes) significantly enhances performance and reduces computational overhead while improving the F1 score. Under basic conditions without modifications at the input, the largest model tested, Qwen2.5:32b, achieved a micro F1 score of 10.04%. Applying all tested input modifications (Oracle Attributes, Oracle Sentences, and Additional Descriptions) allowed the largest model tested to achieve a micro F1 score of 97.92%, demonstrating the effectiveness of these techniques in improving extraction accuracy and efficiency. The RAVE dataset is publicly available, facilitating further research in AVE and real estate information extraction.https://ieeexplore.ieee.org/document/10976655/Additional attribute descriptionattribute value extractioninstructor AIJSONlarge language modelPydantic
spellingShingle Michal Kvet
Miroslav Potocar
Slavomir Tatarka
Real Estate Attribute Value Extraction Using Large Language Models
IEEE Access
Additional attribute description
attribute value extraction
instructor AI
JSON
large language model
Pydantic
title Real Estate Attribute Value Extraction Using Large Language Models
title_full Real Estate Attribute Value Extraction Using Large Language Models
title_fullStr Real Estate Attribute Value Extraction Using Large Language Models
title_full_unstemmed Real Estate Attribute Value Extraction Using Large Language Models
title_short Real Estate Attribute Value Extraction Using Large Language Models
title_sort real estate attribute value extraction using large language models
topic Additional attribute description
attribute value extraction
instructor AI
JSON
large language model
Pydantic
url https://ieeexplore.ieee.org/document/10976655/
work_keys_str_mv AT michalkvet realestateattributevalueextractionusinglargelanguagemodels
AT miroslavpotocar realestateattributevalueextractionusinglargelanguagemodels
AT slavomirtatarka realestateattributevalueextractionusinglargelanguagemodels