Docker Unified UIMA Interface: New perspectives for NLP on big data
Processing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To ove...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-02-01
|
| Series: | SoftwareX |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2352711024004047 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849717468787900416 |
|---|---|
| author | Giuseppe Abrami Markos Genios Filip Fitzermann Daniel Baumartz Alexander Mehler |
| author_facet | Giuseppe Abrami Markos Genios Filip Fitzermann Daniel Baumartz Alexander Mehler |
| author_sort | Giuseppe Abrami |
| collection | DOAJ |
| description | Processing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To overcome the technological bottleneck involved, we have developed Docker Unified UIMA Interface, a system for the standardized, parallel, platform-independent, distributed and microservices-based solution for processing large and extensive text corpora with any NLP method. We present DUUI as a framework that enables automated orchestration of GPU-based NLP processes beyond the existing Docker Swarm cluster variant, and in addition to the adaptation to new runtime environments such as Kubernetes. Therefore, a new driver for DUUI is introduced, which enables the lightweight orchestration of DUUI processes within a Kubernetes environment in a scalable setup. In this way, the paper opens up novel text-technological perspectives for existing practices in disciplines that deal with the scientific analysis of large amounts of data based on NLP. |
| format | Article |
| id | doaj-art-1316e661c3664b698bde481cc8f3fbcf |
| institution | DOAJ |
| issn | 2352-7110 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | Elsevier |
| record_format | Article |
| series | SoftwareX |
| spelling | doaj-art-1316e661c3664b698bde481cc8f3fbcf2025-08-20T03:12:39ZengElsevierSoftwareX2352-71102025-02-012910203310.1016/j.softx.2024.102033Docker Unified UIMA Interface: New perspectives for NLP on big dataGiuseppe Abrami0Markos Genios1Filip Fitzermann2Daniel Baumartz3Alexander Mehler4Corresponding author.; Goethe University Frankfurt, Texttechnology, GermanyGoethe University Frankfurt, Texttechnology, GermanyGoethe University Frankfurt, Texttechnology, GermanyGoethe University Frankfurt, Texttechnology, GermanyGoethe University Frankfurt, Texttechnology, GermanyProcessing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To overcome the technological bottleneck involved, we have developed Docker Unified UIMA Interface, a system for the standardized, parallel, platform-independent, distributed and microservices-based solution for processing large and extensive text corpora with any NLP method. We present DUUI as a framework that enables automated orchestration of GPU-based NLP processes beyond the existing Docker Swarm cluster variant, and in addition to the adaptation to new runtime environments such as Kubernetes. Therefore, a new driver for DUUI is introduced, which enables the lightweight orchestration of DUUI processes within a Kubernetes environment in a scalable setup. In this way, the paper opens up novel text-technological perspectives for existing practices in disciplines that deal with the scientific analysis of large amounts of data based on NLP.http://www.sciencedirect.com/science/article/pii/S2352711024004047DockerKubernetesUIMADistributed NLP |
| spellingShingle | Giuseppe Abrami Markos Genios Filip Fitzermann Daniel Baumartz Alexander Mehler Docker Unified UIMA Interface: New perspectives for NLP on big data SoftwareX Docker Kubernetes UIMA Distributed NLP |
| title | Docker Unified UIMA Interface: New perspectives for NLP on big data |
| title_full | Docker Unified UIMA Interface: New perspectives for NLP on big data |
| title_fullStr | Docker Unified UIMA Interface: New perspectives for NLP on big data |
| title_full_unstemmed | Docker Unified UIMA Interface: New perspectives for NLP on big data |
| title_short | Docker Unified UIMA Interface: New perspectives for NLP on big data |
| title_sort | docker unified uima interface new perspectives for nlp on big data |
| topic | Docker Kubernetes UIMA Distributed NLP |
| url | http://www.sciencedirect.com/science/article/pii/S2352711024004047 |
| work_keys_str_mv | AT giuseppeabrami dockerunifieduimainterfacenewperspectivesfornlponbigdata AT markosgenios dockerunifieduimainterfacenewperspectivesfornlponbigdata AT filipfitzermann dockerunifieduimainterfacenewperspectivesfornlponbigdata AT danielbaumartz dockerunifieduimainterfacenewperspectivesfornlponbigdata AT alexandermehler dockerunifieduimainterfacenewperspectivesfornlponbigdata |