Extracting the user mention graph - What can we learn?
(Draft - early version) We have spent some of our analysis for this project on looking at users and the content they create. It is important to engage with what users are saying to understand what type of information they are spreading. In this specific post, we will explore the relationship or connection between users, that is, how users interact with each other. To do so, we will look at the network graph associated with users. To do so, we will specifically focus on constructing connections between users based on what they say, especially if they mention one another. We will perform this analysis based on the data we already collected and use the previous examples as a template for the graph analysis. Finally, in this post, we will unpack what insights we can gain from this type of association analysis so that you too may gain insight and value from using this type of an analysis on text you mined.
- Prior Posts
- Building the mention graph
- Constructing the full graph
- Extracting relationships within the graph #1 - Node2Vec
- Resources and References
Building the mention graph
So, to construct the graph that we mentioned in the introduction, a few steps need to occur by looking at the data, specifically unpacking who mentions who in our full data set. To do so, we need to go through each microblog in each of the Twitter social media posts and extract who was mentioned (in the case of Twitter the “@user”) and then build up a connection network between the associated users. It is like constructing a mind map of a very large conversation to see who spoke about each other, so that insights can be gained based on the associations. If we refer back to our example where we used IEC and NMF topic modelling (in a previous blog post) Automated extraction of discussed topics using topic modelling, we will observe that there are associations in the data.
If we use the same logic based on our example of the IEC from the NMF topic modelling post, we see that in the preceding post we have context. Let us consider the following example of a conversation:
By the way, since the election wasn't weird enough already, the voter's roll has shrunk by 1.1 million people since 2019 from the recent IEC data I've seen.
— Dawie Scholtz (@DawieScholtz) September 3, 2021
Voters registered in 2019: 26.7 mn
Voters registered in 2021: 25.6 mn@IECSouthAfrica what's up with that?
In this conversation we see that DawieScholtz mentions the IEC. For the purpose of this example, we will reveal the names both users as we classify them as "organisations" or "public persons". Dawie, in this case, analyses elections and shares his insights with the public. We will expand more on this later in the blog.
Given the data from this conversation, we can construct a very simple graph, given the association between the users.
To create this graph (a directed one), we added a connection between DawieScholtz and IECSouthAfrica because of the aforementioned Twitter post. The illustration below is a representation of this directed graph, to show the association between the two users.
G_example = nx.DiGraph()
G_example.add_edge('dawiescholtz','iecsouthafrica')
Note the connection between the two has an arrow from Dawie to the IEC (showing that is directed). If we take a similar approach, but with more data and scale, then we can apply this technique to our large Twitter dataset from previous posts. The only difference is, that we need to ensure that we respect the privacy of our users as we construct these graphs. That said, we work to preserve privacy for people who we do not deam as public persons. Let's now talk about who is in our public person list.
Lets say we have ANC mention IECSouthAfrica and CyrilRamaphosa in the same tweet. here is how we would add the new edges.
G_example.add_edge('myanc','iecsouthafrica')
G_example.add_edge('myanc','cyrilramaphosa')
Here is how the graph changes now with added context.
print("List of public persons, total so far :", len(public_list))
public_list
Hashing usernames
For users who are not on the public list, we hash their usernames. A hash converts an input string (e.g. a username) into another another string of arbitrary (any size). We do this to hide the original usernames of users who are taken as private persons.
This allows for us to look at the graph with you (the reader) and allow you to navigate it, without us exposing individuals who still have some expertation of privacy.
Number of nodes in the full graph.
Number of edges (mentions)
A snapshort of the graph is shown below. You can see many users (dots) which are the nodes. If you click on the interactive graph (lower), you will also be able to see the edges (created when one user mentions another). Outside the public people, usernames are not revieled (you will just see a hashed username).
In the graph, we provide weights on the edges (connections between graphs) by how many times one user has mentioned the other. So having a weight of 5 on one edge means that the user has mentioned the otherr user 5 times (this is likely in 5 different Twitter posts).
Reduced number of nodes
Reduced number of edges
Extracting relationships within the graph #1 - Node2Vec
What is Node2Vec?
- Beginner:
- Advanced: Node2vec explained graphically
- Advanced: How node2vec works — and what it can do that word2vec can’t
Let s check who is similar to our_da as per the node2vec algorithm.
sample_user = 'our_da'
print("Top 10 similar to: ", sample_user)
print("===================================")
get_most_similar(sample_user, node2vec_model)
What we note here immediately is that there are clearly different social media campaign management by the different parties. The DA and EFF have in their top similar users their leadership in different roles. In the ANC one, we did not find even one. Maybe let us switch this.
CyrilRamaphosa Similar Users
We see above a mix of other African leaders as well as ANC leaders. This is fascinating and requires further scrutiny.
Enter PartyOfAction
PartyOfAction is anti-vaccination and has been spreading some vaccine misinformation during South Africa's battle with COVID-19. See information about the Infodemic. Further see our references below that will give your more insight into the party.
PartyOfAction has spread so much misinformation that they have been suspended a number of times by Twitter. A recent example (highlighted by the party leader),
10 November 2021
They have suspended the mighty @PartyOfAction account. 😄 Truth hurts neh? pic.twitter.com/x9wxYwqzCW
— Billy Nyaku 🇿🇦 (@billynyaku) November 10, 2021
We delved into their similar top users.
The users above are leaders or anonymous accounts that also spread antivaccination messages.