Unified Visual-Aware Representations for Data Analytics

One of the characteristics of big data is its internal complexity and variety manifested in many types of datasets that are to be managed, searched, or analyzed. In their natural forms, some data entities are unstructured, such as texts or multimedia objects, while some are structured but too comple...

Full description

Saved in:
Bibliographic Details
Main Authors: Ladislav Peska, Ivana Sixtova, David Hoksza, David Bernhauer, Jakub Lokoc, Tomas Skopal
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10854212/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:One of the characteristics of big data is its internal complexity and variety manifested in many types of datasets that are to be managed, searched, or analyzed. In their natural forms, some data entities are unstructured, such as texts or multimedia objects, while some are structured but too complex (e.g., high-dimensional tabular data). Due to the many different forms of data managed in many domain-specific problems, there are many different data representations used – tailored to a specific data form, domain and task. In this paper, we propose a framework for universal visual representations of complex data. The desired property of the visualizations is the ability to visually encode the semantic features of the original data. Hence, processing of visualizations (images) by generic deep learning models results in deep feature vectors that could be uniformly used in standard data retrieval/analytics tasks. Specifically, we develop a semi-automated transfer learning pipeline for transformation of input arbitrary tabular data into visual representations. The visual representations serve for data analytics tasks performed by human users as well as serve for universal data representations used in machine learning models for automated tasks. We show in large study that visual representations of complex data are effective in a number of domains while we also propose a recommender to help with the parameterization of the entire pipeline for certain domains and use cases. In summary, the proposed framework enables rapid prototyping of data representations (in an arbitrary domain) using a shared concept – visual representations applicable in data analytics using generic deep learning models.
ISSN:2169-3536