Checking our top users

There are many different types of users we want to compare. This comparison allows us to gain insight into how rapidly the posts spread, if there are people implicated by the post, if this content is being pushed by specific users, or if this is a trending category. To do so, we are interested in tracking the following variables:

  • Top users who send twitter posts,
  • Top users who mention others,
  • Top users who are mentioned by others,
  • Top users who reply to others,
  • Top users who are replied to.

All of the variables above assist us in identifying any anomalies, as well as detecting additional outliers that may arise in the data for further exploration. All of these variables will undergo a series of data processing steps so that trends associated with anomalies in the data may become apparent when the analysis is performed.

Top mentioned and mentioners

Top mentioned users in this instance can be defined as the top most frequent mention of a particular user throughout a variety of different posts. In the table below are the top mentioned users. As expected, the top users would be directly connected to the election itself including members or political parties, politicians themselves and/or state institutions.

user number_of_mentions
0 myanc 39799
1 cyrilramaphosa 25033
2 effsouthafrica 16352
3 our_da 16339
4 action4sa 9730
5 julius_s_malema 9354
6 presidencyza 7509
7 hermanmashaba 7476
8 iecsouthafrica 5263
9 jsteenhuisen 4712
10 mbalulafikile 3350
11 governmentza 2292
12 enca 1914
13 helenzille 1882
14 sabcnews 1783
15 ancparliament 1606
16 a_c_d_p 1561
17 forgoodza 1367
18 sapoliceservice 1324

These are the top mentioner patterns

Note: To preserve the privacy for users who are not known in the public domain, nor form part of public organisations we aggregated the information in a graph. For the instances where people are well known in the public domain, we will share some of the users that were classified as likely to spread misinformation later.

Text(0.5, 0.75, 'https://dsfsi.github.io/zaelection2021/')

Top replied to and top repliers

In addition to the metric employed above, tracking the top replies to accounts are also important as it indicates factors such as engagement. Again, as expected, the accounts with the highest frequency in terms of replies were associated with political figure heads/ organisations/ people well known in the public domain. In the table below are the top replied to accounts

These are the top replied to accounts

user number_of_replies
0 cyrilramaphosa 74793
1 myanc 71759
2 effsouthafrica 64435
3 hermanmashaba 63329
4 julius_s_malema 56567
5 our_da 42929
6 action4sa 30225
7 jsteenhuisen 18387
8 iecsouthafrica 14265
9 mbalulafikile 12506
10 niehaus_carl 11023
11 presidencyza 10617
12 zungulavuyo 8926
13 helenzille 8214
14 mzwanelemanyi 7842
15 governmentza 5436
16 sapoliceservice 5297
17 bantuholomisa 5277
18 a_c_d_p 4949

These are the top replier patterns:

Note: To preserve the privacy of people who are not part of the public domain, their data were aggregated and shown in the graph to protect their identity. . For the instances where people are well known in the public domain, We will share some of the users that were classified as likely to spread misinformation la

Text(0.5, 0.75, 'https://dsfsi.github.io/zaelection2021/')

BOT analytics

To identify the human vs BOT interaction from the data, we used BotoMeter to extract analytics on the users. To do so, we used 1000 accounts per type of top user we explored in this process. This approach was used so that the results can be repeated and understood from the data we collected if our methodological approach was used. As mentioned before, it is important to identify human posts from BOT posts as it is vital to understand the narrative associated with people, and to identify trends present in the data of humans, rather than that of a BOT (since a series of BOT posts can be conducted at a higher frequency than that humans)

Check a single account by screen name

Here we show the output of a single account in terms of the BOT score. The reason why we included this is so that the reader can contextualise what a BOT score means and how that differs with the equivalent of a human score.

User already in dict:  effsouthafrica
{'cap': {'english': 0.7967206940193189, 'universal': 0.8474634546636374},
 'display_scores': {'english': {'astroturf': 1.2,
   'fake_follower': 2.0,
   'financial': 0.0,
   'other': 3.3,
   'overall': 3.3,
   'self_declared': 1.4,
   'spammer': 0.4,
   'username': 'effsouthafrica'},
  'universal': {'astroturf': 1.2,
   'fake_follower': 1.8,
   'financial': 0.0,
   'other': 4.4,
   'overall': 4.4,
   'self_declared': 2.2,
   'spammer': 0.4}},
 'raw_scores': {'english': {'astroturf': 0.23,
   'fake_follower': 0.4,
   'financial': 0.01,
   'other': 0.66,
   'overall': 0.66,
   'self_declared': 0.27,
   'spammer': 0.07},
  'universal': {'astroturf': 0.24,
   'fake_follower': 0.35,
   'financial': 0.0,
   'other': 0.87,
   'overall': 0.87,
   'self_declared': 0.45,
   'spammer': 0.08}},
 'user': {'majority_lang': 'en',
  'user_data': {'id_str': '932163222', 'screen_name': 'EFFSouthAfrica'}}}
User already in dict:  myanc
{'cap': {'english': 0.7717813288270262, 'universal': 0.7334998320027682},
 'display_scores': {'english': {'astroturf': 1.4,
   'fake_follower': 0.4,
   'financial': 0.0,
   'other': 2.2,
   'overall': 1.4,
   'self_declared': 0.0,
   'spammer': 0.0,
   'username': 'myanc'},
  'universal': {'astroturf': 1.3,
   'fake_follower': 0.8,
   'financial': 0.0,
   'other': 1.5,
   'overall': 1.1,
   'self_declared': 0.0,
   'spammer': 0.0}},
 'raw_scores': {'english': {'astroturf': 0.29,
   'fake_follower': 0.08,
   'financial': 0.0,
   'other': 0.43,
   'overall': 0.29,
   'self_declared': 0.0,
   'spammer': 0.0},
  'universal': {'astroturf': 0.26,
   'fake_follower': 0.17,
   'financial': 0.0,
   'other': 0.3,
   'overall': 0.22,
   'self_declared': 0.0,
   'spammer': 0.0}},
 'user': {'majority_lang': 'en',
  'user_data': {'id_str': '18759465', 'screen_name': 'MYANC'}}}
User already in dict:  our_da
{'cap': {'english': 0.7971037475964349, 'universal': 0.7982287282125168},
 'display_scores': {'english': {'astroturf': 2.6,
   'fake_follower': 1.6,
   'financial': 0.0,
   'other': 2.7,
   'overall': 2.7,
   'self_declared': 0.0,
   'spammer': 0.0,
   'username': 'our_da'},
  'universal': {'astroturf': 1.6,
   'fake_follower': 1.0,
   'financial': 0.0,
   'other': 2.7,
   'overall': 1.8,
   'self_declared': 0.0,
   'spammer': 0.0}},
 'raw_scores': {'english': {'astroturf': 0.52,
   'fake_follower': 0.32,
   'financial': 0.01,
   'other': 0.54,
   'overall': 0.54,
   'self_declared': 0.01,
   'spammer': 0.0},
  'universal': {'astroturf': 0.32,
   'fake_follower': 0.2,
   'financial': 0.01,
   'other': 0.53,
   'overall': 0.35,
   'self_declared': 0.0,
   'spammer': 0.0}},
 'user': {'majority_lang': 'en',
  'user_data': {'id_str': '23594033', 'screen_name': 'Our_DA'}}}

How we got the BOTOMETER scores

To get a collection of all the BOT scores, we used the BotoMeter v4 API to get our data. The total number of scores we saved given the 1000 accounts we checked for each category are:

Number in botometer cache:  2651

Resources and References

  • Moodley, V Marivate. Topic Modelling of News Articles for Two Consecutive Elections in South Africa. 2019 6th International Conference on Soft Computing & Machine Intelligence (ISCMI). [Paper URL][Preprint]