Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus

This study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possibl...

Full description

Saved in:
Bibliographic Details
Main Author: Yves Bestgen
Format: Article
Language:English
Published: Presses universitaires de Caen 2019-12-01
Series:Discours
Subjects:
Online Access:https://journals.openedition.org/discours/10256
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possible to automatically distinguish in texts sentences opening or not a discourse segment and to identify the indices that allow it. The identification of sentences in (dis)continuity situation was carried out on the basis of the segments made visible in the texts by means of the sections and paragraphs. The potential indices were n-grams of lemmas and part-of-speech tags. Analyses were conducted on three collections of texts of different genres: Wikipedia entries, newspaper articles and novels. In general, supervised learning has been relatively effective, with accuracy ranging from 64% to 74%, while chance alone would get 50%. The most useful indices for discrimination are for the most part interpretable in the context of the linguistic theory on segmentation and linking marks. While paragraph detection performance is equivalent in all three genres, there are significant differences when comparing the most useful indices in each genre. After discussing some of the limitations of the study, the conclusion considers the possibility of taking more fully into account the coreference indices, which have proved particularly useful.
ISSN:1963-1723