Data driven research: what data science know about politeness?
р.
р.
Krongauz, Maxim Anisimovich D.Sc. in Philology, Professor, Head of the Laboratory of Linguistic Conflict Resolution Studies and Contemporary Communicative Practices, HSE University, Head of the Research Group, Interdisciplinary Research Center, MIPT University, Moscow, Russia mkronhaus@yandex.ru
Shulginov, Valery Alexandrovich PhD in Philology, Leading Researcher, the Laboratory of Linguistic Conflict Resolution Studies and Contemporary Communicative Practices, HSE University, Leading Research Fellow Interdisciplinary Research Center, MIPT University, Moscow, Russia shulginov.val@yandex.ru
Klokova, Ksenia Sergeevna Junior Research Fellow, Interdisciplinary Research Center, MIPT University, Moscow, Russia kseniaklokova@gmail.com
Udina, Tatiana Alexandrovna Junior Research Fellow, Interdisciplinary Research Center, MIPT University, Moscow, Russia yudina.tatiana22@gmail.com
Abstract Politeness is fundamental for effective communication. Currently, there is a need to comprehensively study politeness strategies and markers to solve various theoretical and applied problems in linguistics, psychology and sociology. A relevant approach to resolve them is computer representation, e.g. modeling polite behavior or automatic correction of human communication. Such research needs a corpus that considers lexical and grammatical characteristics and interaction of politeness markers with sociocultural parameters (context of interaction). This requires handling data collection that accounts for oral communication, establishing criteria for separating cooperative and confrontational communication, creation of formalization and annotation principles for etiquette situations. The multimedia speech etiquette corpus will include typical etiquette frames from Russian media that reflect everyday communication in different historical periods. Each fragment will be annotated with linguistic and extralinguistic communication features and detalization of communication situation and etiquette markers. The annotated data can be used for machine learning tasks such as language modeling and autocorrect programs. Etiquette simultaneously changes under the influence of social processes and reflects them. The speech etiquette corpus can be used in both linguistics and sociology. One important example is to study relationship between the concept of “politeness” and such categories as “power”, “gender” and “status”.