Standard

Stylometry and Numerals Usage: Benford’s Law and Beyond. / Zenkov, Andrei V.
In: Stats, Vol. 4, No. 4, 2021, p. 1051-1068.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Zenkov AV. Stylometry and Numerals Usage: Benford’s Law and Beyond. Stats. 2021;4(4):1051-1068. doi: 10.3390/stats4040060

Author

Zenkov, Andrei V. / Stylometry and Numerals Usage: Benford’s Law and Beyond. In: Stats. 2021 ; Vol. 4, No. 4. pp. 1051-1068.

BibTeX

@article{50b0678b361e4458b0b1afb6afc37c25,
title = "Stylometry and Numerals Usage: Benford{\textquoteright}s Law and Beyond",
abstract = "We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford{\textquoteright}s Law and the analysis of the frequency distribution of various leading digits of numerals contained in the text. In coherent literary texts, the share of the leading digit 1 is even larger than prescribed by Benford{\textquoteright}s Law and can reach 50 percent. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic the author{\textquoteright}s style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in English and Russian.",
author = "Zenkov, {Andrei V.}",
year = "2021",
doi = "10.3390/stats4040060",
language = "English",
volume = "4",
pages = "1051--1068",
journal = "Stats",
issn = "2571-905X",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "4",

}

RIS

TY - JOUR

T1 - Stylometry and Numerals Usage: Benford’s Law and Beyond

AU - Zenkov, Andrei V.

PY - 2021

Y1 - 2021

N2 - We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford’s Law and the analysis of the frequency distribution of various leading digits of numerals contained in the text. In coherent literary texts, the share of the leading digit 1 is even larger than prescribed by Benford’s Law and can reach 50 percent. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic the author’s style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in English and Russian.

AB - We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford’s Law and the analysis of the frequency distribution of various leading digits of numerals contained in the text. In coherent literary texts, the share of the leading digit 1 is even larger than prescribed by Benford’s Law and can reach 50 percent. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic the author’s style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in English and Russian.

UR - https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=tsmetrics&SrcApp=tsm_test&DestApp=WOS_CPL&DestLinkType=FullRecord&KeyUT=000836840900001

U2 - 10.3390/stats4040060

DO - 10.3390/stats4040060

M3 - Article

VL - 4

SP - 1051

EP - 1068

JO - Stats

JF - Stats

SN - 2571-905X

IS - 4

ER -

ID: 30725566