Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus

This study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possibl...

Full description

Saved in:
Bibliographic Details
Main Author: Yves Bestgen
Format: Article
Language:English
Published: Presses universitaires de Caen 2019-12-01
Series:Discours
Subjects:
Online Access:https://journals.openedition.org/discours/10256
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832581854069260288
author Yves Bestgen
author_facet Yves Bestgen
author_sort Yves Bestgen
collection DOAJ
description This study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possible to automatically distinguish in texts sentences opening or not a discourse segment and to identify the indices that allow it. The identification of sentences in (dis)continuity situation was carried out on the basis of the segments made visible in the texts by means of the sections and paragraphs. The potential indices were n-grams of lemmas and part-of-speech tags. Analyses were conducted on three collections of texts of different genres: Wikipedia entries, newspaper articles and novels. In general, supervised learning has been relatively effective, with accuracy ranging from 64% to 74%, while chance alone would get 50%. The most useful indices for discrimination are for the most part interpretable in the context of the linguistic theory on segmentation and linking marks. While paragraph detection performance is equivalent in all three genres, there are significant differences when comparing the most useful indices in each genre. After discussing some of the limitations of the study, the conclusion considers the possibility of taking more fully into account the coreference indices, which have proved particularly useful.
format Article
id doaj-art-77523120e85647e884418d1b55e7994a
institution Kabale University
issn 1963-1723
language English
publishDate 2019-12-01
publisher Presses universitaires de Caen
record_format Article
series Discours
spelling doaj-art-77523120e85647e884418d1b55e7994a2025-01-30T09:53:06ZengPresses universitaires de CaenDiscours1963-17232019-12-012510.4000/discours.10256Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpusYves BestgenThis study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possible to automatically distinguish in texts sentences opening or not a discourse segment and to identify the indices that allow it. The identification of sentences in (dis)continuity situation was carried out on the basis of the segments made visible in the texts by means of the sections and paragraphs. The potential indices were n-grams of lemmas and part-of-speech tags. Analyses were conducted on three collections of texts of different genres: Wikipedia entries, newspaper articles and novels. In general, supervised learning has been relatively effective, with accuracy ranging from 64% to 74%, while chance alone would get 50%. The most useful indices for discrimination are for the most part interpretable in the context of the linguistic theory on segmentation and linking marks. While paragraph detection performance is equivalent in all three genres, there are significant differences when comparing the most useful indices in each genre. After discussing some of the limitations of the study, the conclusion considers the possibility of taking more fully into account the coreference indices, which have proved particularly useful.https://journals.openedition.org/discours/10256adverbialsdiscourse markerstext linguisticsonomasiological approachco-referential expressionsconnectors
spellingShingle Yves Bestgen
Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
Discours
adverbials
discourse markers
text linguistics
onomasiological approach
co-referential expressions
connectors
title Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_full Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_fullStr Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_full_unstemmed Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_short Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_sort recherche d indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
topic adverbials
discourse markers
text linguistics
onomasiological approach
co-referential expressions
connectors
url https://journals.openedition.org/discours/10256
work_keys_str_mv AT yvesbestgen recherchedindiceslexicosyntaxiquesdesegmentationetdeliageparuneanalyseautomatiquedecorpus