Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus

This study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possibl...

Full description

Saved in:

Bibliographic Details
Main Author:	Yves Bestgen
Format:	Article
Language:	English
Published:	Presses universitaires de Caen 2019-12-01
Series:	Discours
Subjects:	adverbials discourse markers text linguistics onomasiological approach co-referential expressions connectors
Online Access:	https://journals.openedition.org/discours/10256
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832581854069260288
author	Yves Bestgen
author_facet	Yves Bestgen
author_sort	Yves Bestgen
collection	DOAJ
description	This study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possible to automatically distinguish in texts sentences opening or not a discourse segment and to identify the indices that allow it. The identification of sentences in (dis)continuity situation was carried out on the basis of the segments made visible in the texts by means of the sections and paragraphs. The potential indices were n-grams of lemmas and part-of-speech tags. Analyses were conducted on three collections of texts of different genres: Wikipedia entries, newspaper articles and novels. In general, supervised learning has been relatively effective, with accuracy ranging from 64% to 74%, while chance alone would get 50%. The most useful indices for discrimination are for the most part interpretable in the context of the linguistic theory on segmentation and linking marks. While paragraph detection performance is equivalent in all three genres, there are significant differences when comparing the most useful indices in each genre. After discussing some of the limitations of the study, the conclusion considers the possibility of taking more fully into account the coreference indices, which have proved particularly useful.
format	Article
id	doaj-art-77523120e85647e884418d1b55e7994a
institution	Kabale University
issn	1963-1723
language	English
publishDate	2019-12-01
publisher	Presses universitaires de Caen
record_format	Article
series	Discours
spelling	doaj-art-77523120e85647e884418d1b55e7994a2025-01-30T09:53:06ZengPresses universitaires de CaenDiscours1963-17232019-12-012510.4000/discours.10256Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpusYves BestgenThis study uses an automated corpus analysis technique to try to provide a complementary point of view to that of more qualitative studies of segmentation and linking indices, such as adverbial expressions, connectors and anaphora. The study is specifically aimed at determining whether it is possible to automatically distinguish in texts sentences opening or not a discourse segment and to identify the indices that allow it. The identification of sentences in (dis)continuity situation was carried out on the basis of the segments made visible in the texts by means of the sections and paragraphs. The potential indices were n-grams of lemmas and part-of-speech tags. Analyses were conducted on three collections of texts of different genres: Wikipedia entries, newspaper articles and novels. In general, supervised learning has been relatively effective, with accuracy ranging from 64% to 74%, while chance alone would get 50%. The most useful indices for discrimination are for the most part interpretable in the context of the linguistic theory on segmentation and linking marks. While paragraph detection performance is equivalent in all three genres, there are significant differences when comparing the most useful indices in each genre. After discussing some of the limitations of the study, the conclusion considers the possibility of taking more fully into account the coreference indices, which have proved particularly useful.https://journals.openedition.org/discours/10256adverbialsdiscourse markerstext linguisticsonomasiological approachco-referential expressionsconnectors
spellingShingle	Yves Bestgen Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus Discours adverbials discourse markers text linguistics onomasiological approach co-referential expressions connectors
title	Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_full	Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_fullStr	Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_full_unstemmed	Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_short	Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
title_sort	recherche d indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus
topic	adverbials discourse markers text linguistics onomasiological approach co-referential expressions connectors
url	https://journals.openedition.org/discours/10256
work_keys_str_mv	AT yvesbestgen recherchedindiceslexicosyntaxiquesdesegmentationetdeliageparuneanalyseautomatiquedecorpus

Recherche d’indices lexicosyntaxiques de segmentation et de liage par une analyse automatique de corpus

Similar Items