When Text and Speech are Not Enough: A Multimodal Dataset of Collaboration in a Situated Task
To adequately model information exchanged in real human-human interactions, considering speech or text alone leaves out many critical modalities. The channels contributing to the “making of sense” in human-human interactions include but are not limited to gesture, speech, user-interaction modeling,...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Ubiquity Press
2024-01-01
|
| Series: | Journal of Open Humanities Data |
| Subjects: | |
| Online Access: | https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/168 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | To adequately model information exchanged in real human-human interactions, considering speech or text alone leaves out many critical modalities. The channels contributing to the “making of sense” in human-human interactions include but are not limited to gesture, speech, user-interaction modeling, gaze, joint attention, and involvement/engagement, all of which need to be adequately modeled to automatically extract correct and meaningful information. In this paper, we present a multimodal dataset of a novel situated and shared collaborative task, with the above channels annotated to encode these different aspects of the situated and embodied involvement of the participants in the joint activity. |
|---|---|
| ISSN: | 2059-481X |