Understanding our Top Users
In this post we start with an exploratory analysis based on the user information we have from the social media microblog post data we have mined. In addition to this, we also accessed a BOT evaluation service to get scores for an account on the basis of whether or not the account is likely a social media BOT or a person. This type of exploration is important as we want to understand the actual trends people are spreading about the elections, and not the type of information that is spread through a BOT. **Note:** *This post was updated on 1 March 2022 to take into account removal of retweeted content*
Checking our top users
There are many different types of users we want to compare. This comparison allows us to gain insight into how rapidly the posts spread, if there are people implicated by the post, if this content is being pushed by specific users, or if this is a trending category. To do so, we are interested in tracking the following variables:
- Top users who send twitter posts,
- Top users who mention others,
- Top users who are mentioned by others,
- Top users who reply to others,
- Top users who are replied to.
All of the variables above assist us in identifying any anomalies, as well as detecting additional outliers that may arise in the data for further exploration. All of these variables will undergo a series of data processing steps so that trends associated with anomalies in the data may become apparent when the analysis is performed.
Top mentioned and mentioners
Top mentioned users in this instance can be defined as the top most frequent mention of a particular user throughout a variety of different posts. In the table below are the top mentioned users. As expected, the top users would be directly connected to the election itself including members or political parties, politicians themselves and/or state institutions.
These are the top mentioner patterns
Note: To preserve the privacy for users who are not known in the public domain, nor form part of public organisations we aggregated the information in a graph. For the instances where people are well known in the public domain, we will share some of the users that were classified as likely to spread misinformation later.
Top replied to and top repliers
In addition to the metric employed above, tracking the top replies to accounts are also important as it indicates factors such as engagement. Again, as expected, the accounts with the highest frequency in terms of replies were associated with political figure heads/ organisations/ people well known in the public domain. In the table below are the top replied to accounts
These are the top replied to accounts
These are the top replier patterns:
Note: To preserve the privacy of people who are not part of the public domain, their data were aggregated and shown in the graph to protect their identity. . For the instances where people are well known in the public domain, We will share some of the users that were classified as likely to spread misinformation la
BOT analytics
To identify the human vs BOT interaction from the data, we used BotoMeter to extract analytics on the users. To do so, we used 1000 accounts per type of top user we explored in this process. This approach was used so that the results can be repeated and understood from the data we collected if our methodological approach was used. As mentioned before, it is important to identify human posts from BOT posts as it is vital to understand the narrative associated with people, and to identify trends present in the data of humans, rather than that of a BOT (since a series of BOT posts can be conducted at a higher frequency than that humans)
How we got the BOTOMETER scores
To get a collection of all the BOT scores, we used the BotoMeter v4 API to get our data. The total number of scores we saved given the 1000 accounts we checked for each category are: