Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability

Digital humanities scholars strongly need a corpus exploration method that provides topics easier to interpret than standard LDA topic models. To move towards this goal, here we propose a combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies in an ontology a ser...

Full description

Saved in:
Bibliographic Details
Main Authors: Anne Lauscher, Pablo Ruiz Fabo, Federico Nanni, Simone Paolo Ponzetto
Format: Article
Language:English
Published: Accademia University Press 2016-12-01
Series:IJCoL
Online Access:https://journals.openedition.org/ijcol/392
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850262652682502144
author Anne Lauscher
Pablo Ruiz Fabo
Federico Nanni
Simone Paolo Ponzetto
author_facet Anne Lauscher
Pablo Ruiz Fabo
Federico Nanni
Simone Paolo Ponzetto
author_sort Anne Lauscher
collection DOAJ
description Digital humanities scholars strongly need a corpus exploration method that provides topics easier to interpret than standard LDA topic models. To move towards this goal, here we propose a combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies in an ontology a series of descriptive labels for each document in a corpus. Then it generates a specific topic for each label. Having a direct relation between topics and labels makes interpretation easier; using an ontology as background knowledge limits label ambiguity. As our topics are described with a limited number of clear-cut labels, they promote interpretability and support the quantitative evaluation of the obtained results. We illustrate the potential of the approach by applying it to three datasets, namely the transcription of speeches from the European Parliament fifth mandate, the Enron Corpus and the Hillary Clinton Email Dataset. While some of these resources have already been adopted by the natural language processing community, they still hold a large potential for humanities scholars, part of which could be exploited in studies that will adopt the fine-grained exploration method presented in this paper.
format Article
id doaj-art-c405f13101ad4cbea5a36754fc54325c
institution OA Journals
issn 2499-4553
language English
publishDate 2016-12-01
publisher Accademia University Press
record_format Article
series IJCoL
spelling doaj-art-c405f13101ad4cbea5a36754fc54325c2025-08-20T01:55:08ZengAccademia University PressIJCoL2499-45532016-12-0122678710.4000/ijcol.392Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and EvaluabilityAnne LauscherPablo Ruiz FaboFederico NanniSimone Paolo PonzettoDigital humanities scholars strongly need a corpus exploration method that provides topics easier to interpret than standard LDA topic models. To move towards this goal, here we propose a combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies in an ontology a series of descriptive labels for each document in a corpus. Then it generates a specific topic for each label. Having a direct relation between topics and labels makes interpretation easier; using an ontology as background knowledge limits label ambiguity. As our topics are described with a limited number of clear-cut labels, they promote interpretability and support the quantitative evaluation of the obtained results. We illustrate the potential of the approach by applying it to three datasets, namely the transcription of speeches from the European Parliament fifth mandate, the Enron Corpus and the Hillary Clinton Email Dataset. While some of these resources have already been adopted by the natural language processing community, they still hold a large potential for humanities scholars, part of which could be exploited in studies that will adopt the fine-grained exploration method presented in this paper.https://journals.openedition.org/ijcol/392
spellingShingle Anne Lauscher
Pablo Ruiz Fabo
Federico Nanni
Simone Paolo Ponzetto
Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
IJCoL
title Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_full Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_fullStr Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_full_unstemmed Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_short Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_sort entities as topic labels combining entity linking and labeled lda to improve topic interpretability and evaluability
url https://journals.openedition.org/ijcol/392
work_keys_str_mv AT annelauscher entitiesastopiclabelscombiningentitylinkingandlabeledldatoimprovetopicinterpretabilityandevaluability
AT pabloruizfabo entitiesastopiclabelscombiningentitylinkingandlabeledldatoimprovetopicinterpretabilityandevaluability
AT federiconanni entitiesastopiclabelscombiningentitylinkingandlabeledldatoimprovetopicinterpretabilityandevaluability
AT simonepaoloponzetto entitiesastopiclabelscombiningentitylinkingandlabeledldatoimprovetopicinterpretabilityandevaluability