Идентификация единиц тезаурусного описания при интеграции лексических ресурсов RussNet и YARN

Ирина Владимировна Азарова
Павел Исаакович Браславский
Виктор Павлович Захаров
Юрий Александрович Киселев
Дмитрий Алексеевич Усталов
Мария Владимировна Хохлова

Thesauri and ontologies are widely used in many natural language processing tasks and applications. Wordnets are considered to be “a standard NLP tool” along with part-of-speech taggers, syntactic parsers, etc. The paper describes the basic procedure for the integration of two lexicographic resources (RussNet and YARN) that aims at building an online computer lexicon for Russian. The main issue can be seen in vague borders between synsets, the core wordnet “building blocks”. Such items include lexical components (lexemes and multiword expressions being semantic equivalents that is traditionally viewed as synonymy. Nevertheless there is still no agreement on dealing with this relation in RussNet and YARN. The authors present the methods for unification of the given synsets. An important aspect of the project is a combination of crowdsourcing-based and expert-based approaches. Crowd management methodology is a new and relevant direction of research in many areas.

Translated title of the contribution	Identification of Thesaurus Units in the Process of Integration RussNet and YARN
Original language	Russian
Title of host publication	Структурная и прикладная лингвистика : К 60-летию отделения прикладной, компьютерной и математической лингвистики СПбГУ
Subtitle of host publication	межвузовский сборник
Place of Publication	Санкт-Петербург
Publisher	Санкт-Петербургский государственный университет
Pages	34-52
Number of pages	19
Volume	Выпуск 12
Publication status	Published - 2019

GRNTI

16.00.00 LINGUISTICS

ID: 11276335

Идентификация единиц тезаурусного описания при интеграции лексических ресурсов RussNet и YARN: статья в сборнике статей

GRNTI