Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
This study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possibl...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Presses universitaires de Caen
2019-12-01
|
Series: | Discours |
Subjects: | |
Online Access: | https://journals.openedition.org/discours/10256 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832581854069260288 |
---|---|
author | Yves Bestgen |
author_facet | Yves Bestgen |
author_sort | Yves Bestgen |
collection | DOAJ |
description | This study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possible to automatically distinguish in texts sentences opening or not a discourse segment and to identify the indices that allow it. The identification of sentences in (dis)continuity situation was carried out on the basis of the segments made visible in the texts by means of the sections and paragraphs. The potential indices were n-grams of lemmas and part-of-speech tags. Analyses were conducted on three collections of texts of different genres: Wikipedia entries, newspaper articles and novels. In general, supervised learning has been relatively effective, with accuracy ranging from 64% to 74%, while chance alone would get 50%. The most useful indices for discrimination are for the most part interpretable in the context of the linguistic theory on segmentation and linking marks. While paragraph detection performance is equivalent in all three genres, there are significant differences when comparing the most useful indices in each genre. After discussing some of the limitations of the study, the conclusion considers the possibility of taking more fully into account the coreference indices, which have proved particularly useful. |
format | Article |
id | doaj-art-77523120e85647e884418d1b55e7994a |
institution | Kabale University |
issn | 1963-1723 |
language | English |
publishDate | 2019-12-01 |
publisher | Presses universitaires de Caen |
record_format | Article |
series | Discours |
spelling | doaj-art-77523120e85647e884418d1b55e7994a2025-01-30T09:53:06ZengPresses universitaires de CaenDiscours1963-17232019-12-012510.4000/discours.10256Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpusYves BestgenThis study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possible to automatically distinguish in texts sentences opening or not a discourse segment and to identify the indices that allow it. The identification of sentences in (dis)continuity situation was carried out on the basis of the segments made visible in the texts by means of the sections and paragraphs. The potential indices were n-grams of lemmas and part-of-speech tags. Analyses were conducted on three collections of texts of different genres: Wikipedia entries, newspaper articles and novels. In general, supervised learning has been relatively effective, with accuracy ranging from 64% to 74%, while chance alone would get 50%. The most useful indices for discrimination are for the most part interpretable in the context of the linguistic theory on segmentation and linking marks. While paragraph detection performance is equivalent in all three genres, there are significant differences when comparing the most useful indices in each genre. After discussing some of the limitations of the study, the conclusion considers the possibility of taking more fully into account the coreference indices, which have proved particularly useful.https://journals.openedition.org/discours/10256adverbialsdiscourse markerstext linguisticsonomasiological approachco-referential expressionsconnectors |
spellingShingle | Yves Bestgen Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus Discours adverbials discourse markers text linguistics onomasiological approach co-referential expressions connectors |
title | Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus |
title_full | Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus |
title_fullStr | Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus |
title_full_unstemmed | Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus |
title_short | Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus |
title_sort | recherche d indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus |
topic | adverbials discourse markers text linguistics onomasiological approach co-referential expressions connectors |
url | https://journals.openedition.org/discours/10256 |
work_keys_str_mv | AT yvesbestgen recherchedindiceslexicosyntaxiquesdesegmentationetdeliageparuneanalyseautomatiquedecorpus |