Standardization of text markup and evaluation of predictive models in natural language understanding tasks

Vorontsov, Konstantin Vyacheslavovich
D.Sc. in Physics and Mathematics, Professor of the RAS,
Head of the Department of Mathematical Forecasting Methods, Head of the Laboratory of Machine Learning and Semantic Analysis, the Institute for Advanced Research of Artificial Intelligence and Intelligent Systems of MSU,
Head of the Department of Machine Learning and Digital Humanities at MIPT University, Moscow, Russia
voron@mlsa-iai.ru

Karabulatova, Irina Sovetovna
D.Sc. in Philology, Professor, Deputy Head of Machine Learning and Semantic Analysis, the Institute for Advanced Studies of Artificial Intelligence and Intelligent Systems, MSU,
expert of the Department of Machine Learning and Digital Humanities of MIPT University, Moscow, Russia
Radogost2000@mail.ru

Abstract
At the moment, we have extensive experience in using deep neural network models to solve problems of automatic text processing and natural language understanding. This experience concerns the solution of tasks set in the middle of the XXth century.
However, most of these tasks have only recently received solutions that exceed the human level of quality: tonality analysis, recognition of nominatives, syntactic analysis, fact extraction, text summarization, machine translation, search for answers to questions, and many others.
Modern research has opened up opportunities for solving problems that were previously considered defiantly difficult: identifying deception in the text, speech manipulation, psychoemotional influences, polarization of public opinion. We also include the technological competition PRO//CHTENIYE – ABOUT//READING (ai.upgrade.one) with the task of finding semantic and factual errors in school essays of the Unified State Exam in subjects of the humanities cycle (Russian language, literature, social studies, history, English) among the most important tasks. This competition has been going on for almost three years and has recently ended. Generalization of this experience allows us to talk about the standardization of the stages of data markup and evaluation of the quality of language models.
We have proposed a markup standard that is based on highlighting, tagging and linking text fragments with the possibility of creating "overtext"/ “subtext; — additional text fields that are tagged and linked along with fragments of the marked text. The proposed formalism generalizes most of the known tasks of text markup and allows you to build deep neural network models based on transformers.
The proposed assessment standard is based on a multi-criteria approach and collective expert assessments. It allows you to uniformly determine the criterion of relative accuracy for a wide class of markup tasks. Exceeding the threshold value of 100% by this criterion means that the model exceeds the human level of accuracy in solving this problem.
The proposed approaches are illustrated on the task of identifying speech manipulations in the text.

Keywords: natural language understanding, deep learning, text markup tasks