After Benford’s law, which describes frequency distribution of leading digits in datasets and is often used for accounting fraud detection, data scientists developed a new method which helps to uncover fraud using data analysis. In a paper from 2016, Dmitry Kobak, Sergey Shpilkin and Maxim S. Pshenichnikov show that the results of Russian elections have some very disturbing artefacts indicating fraudulent behavior at polling stations.
The authors started with a simple assumption that people, when making up numbers, tend to go with round integers. So if polling stations do not report the real results but made-up numbers instead, there would be a disproportionate count of polling stations reporting round and neat percentages. This natural inclination to round numbers can be only intensified by thresholds that the central authority considers as “success”.
To test that some polling stations really do make up the reported numbers, the authors used a Monte Carlo simulation to estimate likelihoods of the whole spectrum of percentage results. This way, they were able to find 99.99% confidence intervals for the number of polling stations reporting round results. Their careful analysis shows that there are indeed improbable spikes in the empirical distributions which can be hardly explained in any other way than by fraud.
Interestingly, this phenomenon can be observed since the presidential elections in 2004 when Vladimir Putin was seeking his first reelection. The analysis works with the data from Russian elections in years 2000 till 2012, and it is only the early elections of 2000 and 2003 that do not suggest manipulation in the vote count. The paper does not attempt to answer the question of what the driver of this turning point is, but it is symptomatic that the regions showing persistent anomalies are largely located in the North Caucasian Federal District (e.g. Chechnya).
As a control, the same method is employed using data from German, Polish and Spanish elections. None of these countries show suspicious spikes in the data. Could a group of statisticians serve as a watchdog for national elections? It certainly seems so! Although fraudulent governments can randomize in their vote count manipulation or simply use other dishonest methods to influence democratic elections, let us hope that the honest data scientist will be always step ahead uncovering dirty practices.
Reference: Kobak, D., Shpilkin, S., Pshenichnikov, M.S., others, 2016. Integer percentages as electoral falsification fingerprints. The Annals of Applied Statistics 10, 54–73. Available here.