Thesauri and ontologies are widely used in many natural language processing tasks and applications. Wordnets are considered to be “a standard NLP tool” along with part-of-speech taggers, syntactic parsers, etc. The paper describes the basic procedure for the integration of two lexicographic resources (RussNet and YARN) that aims at building an online computer lexicon for Russian. The main issue can be seen in vague borders between synsets, the core wordnet “building blocks”. Such items include lexical components (lexemes and multiword expressions being semantic equivalents that is traditionally viewed as synonymy. Nevertheless there is still no agreement on dealing with this relation in RussNet and YARN. The authors present the methods for unification of the given synsets. An important aspect of the project is a combination of crowdsourcing-based and expert-based approaches. Crowd management methodology is a new and relevant direction of research in many areas.
Translated title of the contributionIdentification of Thesaurus Units in the Process of Integration RussNet and YARN
Original languageRussian
Title of host publicationСтруктурная и прикладная лингвистика : К 60-летию отделения прикладной, компьютерной и математической лингвистики СПбГУ
Subtitle of host publicationмежвузовский сборник
Place of PublicationСанкт-Петербург
PublisherСанкт-Петербургский государственный университет
Pages34-52
Number of pages19
VolumeВыпуск 12
Publication statusPublished - 2019

    GRNTI

  • 16.00.00 LINGUISTICS

ID: 11276335