Docker Unified UIMA Interface: New perspectives for NLP on big data

Processing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To ove...

Full description

Saved in:
Bibliographic Details
Main Authors: Giuseppe Abrami, Markos Genios, Filip Fitzermann, Daniel Baumartz, Alexander Mehler
Format: Article
Language:English
Published: Elsevier 2025-02-01
Series:SoftwareX
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352711024004047
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849717468787900416
author Giuseppe Abrami
Markos Genios
Filip Fitzermann
Daniel Baumartz
Alexander Mehler
author_facet Giuseppe Abrami
Markos Genios
Filip Fitzermann
Daniel Baumartz
Alexander Mehler
author_sort Giuseppe Abrami
collection DOAJ
description Processing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To overcome the technological bottleneck involved, we have developed Docker Unified UIMA Interface, a system for the standardized, parallel, platform-independent, distributed and microservices-based solution for processing large and extensive text corpora with any NLP method. We present DUUI as a framework that enables automated orchestration of GPU-based NLP processes beyond the existing Docker Swarm cluster variant, and in addition to the adaptation to new runtime environments such as Kubernetes. Therefore, a new driver for DUUI is introduced, which enables the lightweight orchestration of DUUI processes within a Kubernetes environment in a scalable setup. In this way, the paper opens up novel text-technological perspectives for existing practices in disciplines that deal with the scientific analysis of large amounts of data based on NLP.
format Article
id doaj-art-1316e661c3664b698bde481cc8f3fbcf
institution DOAJ
issn 2352-7110
language English
publishDate 2025-02-01
publisher Elsevier
record_format Article
series SoftwareX
spelling doaj-art-1316e661c3664b698bde481cc8f3fbcf2025-08-20T03:12:39ZengElsevierSoftwareX2352-71102025-02-012910203310.1016/j.softx.2024.102033Docker Unified UIMA Interface: New perspectives for NLP on big dataGiuseppe Abrami0Markos Genios1Filip Fitzermann2Daniel Baumartz3Alexander Mehler4Corresponding author.; Goethe University Frankfurt, Texttechnology, GermanyGoethe University Frankfurt, Texttechnology, GermanyGoethe University Frankfurt, Texttechnology, GermanyGoethe University Frankfurt, Texttechnology, GermanyGoethe University Frankfurt, Texttechnology, GermanyProcessing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To overcome the technological bottleneck involved, we have developed Docker Unified UIMA Interface, a system for the standardized, parallel, platform-independent, distributed and microservices-based solution for processing large and extensive text corpora with any NLP method. We present DUUI as a framework that enables automated orchestration of GPU-based NLP processes beyond the existing Docker Swarm cluster variant, and in addition to the adaptation to new runtime environments such as Kubernetes. Therefore, a new driver for DUUI is introduced, which enables the lightweight orchestration of DUUI processes within a Kubernetes environment in a scalable setup. In this way, the paper opens up novel text-technological perspectives for existing practices in disciplines that deal with the scientific analysis of large amounts of data based on NLP.http://www.sciencedirect.com/science/article/pii/S2352711024004047DockerKubernetesUIMADistributed NLP
spellingShingle Giuseppe Abrami
Markos Genios
Filip Fitzermann
Daniel Baumartz
Alexander Mehler
Docker Unified UIMA Interface: New perspectives for NLP on big data
SoftwareX
Docker
Kubernetes
UIMA
Distributed NLP
title Docker Unified UIMA Interface: New perspectives for NLP on big data
title_full Docker Unified UIMA Interface: New perspectives for NLP on big data
title_fullStr Docker Unified UIMA Interface: New perspectives for NLP on big data
title_full_unstemmed Docker Unified UIMA Interface: New perspectives for NLP on big data
title_short Docker Unified UIMA Interface: New perspectives for NLP on big data
title_sort docker unified uima interface new perspectives for nlp on big data
topic Docker
Kubernetes
UIMA
Distributed NLP
url http://www.sciencedirect.com/science/article/pii/S2352711024004047
work_keys_str_mv AT giuseppeabrami dockerunifieduimainterfacenewperspectivesfornlponbigdata
AT markosgenios dockerunifieduimainterfacenewperspectivesfornlponbigdata
AT filipfitzermann dockerunifieduimainterfacenewperspectivesfornlponbigdata
AT danielbaumartz dockerunifieduimainterfacenewperspectivesfornlponbigdata
AT alexandermehler dockerunifieduimainterfacenewperspectivesfornlponbigdata