The existing methods for interpreting the differential gene expression analysis results are mainly divided into three categories: cluster analysis, enrichment analysis, and the construction of genetic networks. Despite the rich abilities, all approaches take a lot of time to compute, and the final results are not always sufficient for understanding of the logic that binds genes into groups.In this paper, we propose a complete pipeline in order to make the process of understanding the results of differential gene expression analysis much faster, easier, and more efficient. The pipeline takes in Gene Ontology terms along with descriptions of collected genes, and returns the output of gene clusters, topics they are related to, and a filtered list of most common words that can be found in each of them. The processing involves an artificial neural network model BERT for semantic information extraction, BERTopic for unsupervised topic extraction, dimensional reduction for data simplification, and clustering for the search of dependencies.The pipeline was tested with ablation study and its performance was evaluated by an expert with gene expression datasets from NCBI GEO that include different types of cardiomyopathy: dilated, inflammatory, ischemic, non-ischemic, and healthy individuals.
Original languageEnglish
Title of host publication2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences, SIBIRCON 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages560-565
ISBN (Print)978-166546480-2
DOIs
Publication statusPublished - 11 Nov 2022
Event2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON) - Yekaterinburg, Russian Federation
Duration: 11 Nov 202213 Nov 2022

Conference

Conference2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)
Period11/11/202213/11/2022

    ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Signal Processing
  • Information Systems and Management
  • Energy Engineering and Power Technology
  • Electrical and Electronic Engineering
  • Health Informatics

ID: 34716358