Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks
Data are crucial to the growth of e-commerce in today’s world of highly demanding hyper-personalized consumer experiences, which are collected using advanced web scraping technologies. However, core data extraction engines fail because they cannot adapt to the dynamic changes in website content. Thi...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2021-12-01
|
Series: | Big Data Mining and Analytics |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2021.9020012 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832572962499198976 |
---|---|
author | Sudhir Kumar Patnaik C. Narendra Babu Mukul Bhave |
author_facet | Sudhir Kumar Patnaik C. Narendra Babu Mukul Bhave |
author_sort | Sudhir Kumar Patnaik |
collection | DOAJ |
description | Data are crucial to the growth of e-commerce in today’s world of highly demanding hyper-personalized consumer experiences, which are collected using advanced web scraping technologies. However, core data extraction engines fail because they cannot adapt to the dynamic changes in website content. This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory (LSTM) networks to enable automated web page detection using the You only look once (Yolo) algorithm and Tesseract LSTM to extract product details, which are detected as images from web pages. This state-of-the-art system does not need a core data extraction engine, and thus can adapt to dynamic changes in website layout. Experiments conducted on real-world retail cases demonstrate an image detection (precision) and character extraction accuracy (precision) of 97% and 99%, respectively. In addition, a mean average precision of 74%, with an input dataset of 45 objects or images, is obtained. |
format | Article |
id | doaj-art-a826c864be2d4aa081bc5069b7b2b972 |
institution | Kabale University |
issn | 2096-0654 |
language | English |
publishDate | 2021-12-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Big Data Mining and Analytics |
spelling | doaj-art-a826c864be2d4aa081bc5069b7b2b9722025-02-02T06:14:04ZengTsinghua University PressBig Data Mining and Analytics2096-06542021-12-014427929710.26599/BDMA.2021.9020012Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning NetworksSudhir Kumar Patnaik0C. Narendra Babu1Mukul Bhave2<institution content-type="dept">Department of Computer Science and Engineering</institution>, <institution>M. S. Ramaiah University of Applied Sciences</institution>, <city>Bangalore</city> <postal-code>560054</postal-code>, <country>India</country><institution content-type="dept">Department of Computer Science and Engineering</institution>, <institution>M. S. Ramaiah University of Applied Sciences</institution>, <city>Bangalore</city> <postal-code>560054</postal-code>, <country>India</country><institution>Gibraltar India Solutions LLP</institution>, <city>Bangalore</city> <postal-code>560103</postal-code>, <country>India</country>Data are crucial to the growth of e-commerce in today’s world of highly demanding hyper-personalized consumer experiences, which are collected using advanced web scraping technologies. However, core data extraction engines fail because they cannot adapt to the dynamic changes in website content. This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory (LSTM) networks to enable automated web page detection using the You only look once (Yolo) algorithm and Tesseract LSTM to extract product details, which are detected as images from web pages. This state-of-the-art system does not need a core data extraction engine, and thus can adapt to dynamic changes in website layout. Experiments conducted on real-world retail cases demonstrate an image detection (precision) and character extraction accuracy (precision) of 97% and 99%, respectively. In addition, a mean average precision of 74%, with an input dataset of 45 objects or images, is obtained.https://www.sciopen.com/article/10.26599/BDMA.2021.9020012adaptive web scrapingdeep learninglong short-term memory (lstm)web data extractionyou only look once (yolo) |
spellingShingle | Sudhir Kumar Patnaik C. Narendra Babu Mukul Bhave Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks Big Data Mining and Analytics adaptive web scraping deep learning long short-term memory (lstm) web data extraction you only look once (yolo) |
title | Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks |
title_full | Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks |
title_fullStr | Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks |
title_full_unstemmed | Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks |
title_short | Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks |
title_sort | intelligent and adaptive web data extraction system using convolutional and long short term memory deep learning networks |
topic | adaptive web scraping deep learning long short-term memory (lstm) web data extraction you only look once (yolo) |
url | https://www.sciopen.com/article/10.26599/BDMA.2021.9020012 |
work_keys_str_mv | AT sudhirkumarpatnaik intelligentandadaptivewebdataextractionsystemusingconvolutionalandlongshorttermmemorydeeplearningnetworks AT cnarendrababu intelligentandadaptivewebdataextractionsystemusingconvolutionalandlongshorttermmemorydeeplearningnetworks AT mukulbhave intelligentandadaptivewebdataextractionsystemusingconvolutionalandlongshorttermmemorydeeplearningnetworks |