Cyxtera Blog: Brainspace

Written by David Aitel and Zeshan Aziz | Acknowledgment: William Grayson Hilliard - Cyxtera Data Engineering on April 24, 2019

How Bots Shape Our Politics

Social Media Sentiment Analysis of Brexit


Introduction

The Cyxtera research team was searching for a use case to test the continuous multimodal learning (CMML) insights featured in Brainspaceour investigative analytics platform. CMML is a predictive modeling capability that speeds discovery of insights. 

We opted to dig into social media sentiment around Brexit, a topic of interest worldwide and one that appears frequently in online conversations. We focused our research on followers of Theresa May’s Twitter account. At the time of writing this research, Theresa May was the Prime Minister of the U.K. and a central figure in Brexit. As a control subject, we selected U.S. President Donald Trump’s Twitter account due to the amount of traffic created. We sought to contrast and compare highly ideological political viewpoints involving both Brexit and Donald Trump to look for bot versus human activity.

We ingested Twitter data into Brainspace and utilized the Communication Analysis tool and the Brainspace API. Additionally, we performed manual external OSINT research to obtain more metrics about automated account activity. After training the predictive models, we were able to analyze thousands of Twitter accounts in seconds. We now have portable models that can be imported into other politically-oriented Twitter datasets for similar research.

Executive Summary

Within a dataset of Theresa May’s Twitter account, we discovered bots that amplify Pro-Brexit themes and bots that amplify Pro-Trump themes. On average, bots make up about half of highly political discussion (both Pro-Trump and/or Pro-Brexit) on Twitter. We expected to find a much higher bot percentage in accounts discussing both ideologies but did not. This may be evidence of how those that manage political bots are evolving their tactics and procedures to avoid detection. This could lead to a potential classification methodology in that humans tend to hold multiple ideologies instead of a singular focus.

In our research dataset, we determined that bots were responsible for 50% of the most Pro-Trump accounts and 66% of the most Pro-Brexit accounts; 38% of accounts scored highly on both.

Bot Analysis of Top Pro-subject Accounts
Graph 1: Bots amplifying Pro-Trump and Pro-Brexit Ideology

Introduction

As expected, many of the previously useful automated heuristics for bot detection have started to become less relevant as bot creators have updated their operational security (OPSEC) and tactics, techniques, and procedures (TTP) to account for common detection mechanisms. Bots that display obvious methods, like posting at odd hours, creating extremely high amounts of interaction, retweeting without ever posting original content, etc., are routinely suspended by Twitter’s internal team. This has produced a Darwinian effect on inauthentic accounts making them harder to discern using automated statistics.

What remains, especially for researchers without access to internal Twitter telemetry (such as log-in IP addresses or associated metadata such as email addresses or phone numbers), is analysis and classification of accounts based purely on behavior and content. Bots are useless to their creators unless influencing a conversation - aside from building some historical record. As a result, visually exposing accounts that attempt to communicate in a largely broadcast manner has traditionally been the most useful classification used by Brainspace and other data analytics tools (aka, the “star-pattern" analysis, see Figure 1).

Star Pattern Bot Analysis
Figure 1: A bot account with a star-pattern. It only Tweets outward, never receives Tweets

However, content-based sentiment analysis can also prove useful. Many sentiments are unusual for humans to have in conjunction with one another, simply because humans have a limited set of interests in which they Tweet about. An early example of this was discussed by Cyxtera analysts in research pertaining to seemingly Pro-Trump bots posting heavily about the U.S. leaving the Syrian war.

Dataset Composition

We ingested 4.8 million tweets posted by followers of Theresa May’s Twitter account and isolated her most recent 100,000 followers. Of those, 77,000 accounts had content and 33,000 accounts were dormant. “Peripheral” accounts were also included in the data. These are accounts which are referenced by another account, but don’t contain any Tweets. For example:

  • @AccountA is tagged in a tweet by @AccountB
  • @Account B is an account included in the 100,000 followers scraped
  • @AccountA would be included as a peripheral account because there aren't any @AccountA tweets in the dataset

The data we analyzed consisted of the Tweet bodies, the sender, recipient, the date and time, possible threading information, and in some cases the unshortened URLs in the Tweet.

Judging Botometer Results

As part of this work, we gathered scores from the "Botometer" formerly BotOrNo service, and judged it for accuracy. Botometer assigns a score, called the complete automation probability (CAP), to any account which can be used as an assessment of their confidence that the account is a bot (CAP > 50% equals a bot). Botometer agreed with our manual analysis 78% of the time. When it didn’t, it invariably thought an account was a human when in fact it was a bot. We can conclude that while services like Botometer are useful in helping a layman be more cognizant of social media verification, they do not keep up with evolving tactics, techniques, and procedures (TTPs) utilized by bot makers who are under constant pressure to outsmart these services. It is necessary to have subject matter experts and social media/OSINT analysts manually inspect samples of accounts to determine if the account is controlled by a bot or a human.

Top Pro-Brexit and Pro-Trump Accounts

We found 21,000 accounts in our dataset that discussed Brexit a significant amount of time (at least 5 Tweets). We then looked at the top 30 most Pro-Brexit accounts in the set as classified by Brainspace’s machine learning tool and a small Python script. Of those, 67% were bots. (Figure 2)

We found 10,000 accounts in our dataset that talked about Trump a significant amount of time (at least 5 Tweets). We then looked at the top 30 most Pro-Trump accounts and found 50% of them were bots. (Figure 3)

Top Pro-Brexit and Pro-Trump Accounts
Figure 2: The shape of the classifier's scores for Pro-Brexit tweets, (queried with "brexit")
Top Pro-Brexit and Pro-Trump Accounts
Figure 3: The shape of the classifier's scores for Pro-Trump tweets, (queried with "trump")

Overlap Between Pro-Brexit and Pro-Trump Accounts

It has been well researched that Russian Twitter accounts amplified both the Trump campaign and Brexit, respectively. But what are the implications when you find accounts promoting both at the same time?

We found an overlap between accounts that post Pro-Brexit Tweets and accounts that post Pro-Trump Tweets. In our data, accounts talked about both topics. From the top 500 Pro-Brexit and Pro-Trump accounts we found 29 accounts in both lists. Of those, 38% were suspicious and exhibited bot-like or automated activity.

We also note that most accounts (62%) in our sample weren’t bots and appeared to be normal human activity.

Pro-Brexit and Pro-Trump
Graph 2: Sentiment that is Pro-Brexit and Pro-Trump

Top Anti-Trump Accounts and Top Anti-Brexit Accounts

We found 2,014 accounts that talked about being Anti-Trump a significant number of times (at least 5 Tweets). We then reviewed the top 30 most Anti-Trump accounts in the dataset (as classified by Brainspace’s built-in machine learning tool). Of these, 17% were bots.(Figure 4)

We found 4,130 accounts that discussed Brexit in an opposition manner. Of those, we determined that 17% of the top 30 most Anti-Brexit accounts were bots. (Figure 5)

Note, we think it’s coincidental that both showed 17% bot-related activity.

Top Anti-Trump and Anti-Brexit Accounts
Figure 4: The shape of the classifier's scores for Anti-Trump tweets, (queried with "trump")
Top Anti-Trump and Anti-Brexit Accounts
Figure 5: The shape of the classifier’s scores for Anti-Brexit tweets, (queried with "brexit")

Overlap Between Anti-Trump and Anti-Brexit Accounts

Looking at the top 1,035 Anti-Brexit and Anti-Trump accounts respectively, we found 30 accounts in both lists. Of those, 23% our manual analysis determined that 23% were bots. Compared to the Pro-Trump and Pro-Brexit data, accounts in this subset seem to be more authentic although there is a slight increase in bot-load for accounts that combine the ideologies.

Overlap Between Anti-Trump and Anti-Brexit Accounts
Graph 3: Anti-Brexit and Anti-Trump

Overlap Between Pro-Brexit and Anti-Trump Accounts

Looking at the top 1,250 Pro-Brexit and Anti-Trump accounts respectively, we found 30 accounts in both lists. At the intersection of these two groups, we found 20% of accounts with automated behaviors. This is lower than the Anti-Trump + Anti-Brexit automation (23%) and Pro-Trump + Pro-Brexit accounts (38%).

Overlap Between Pro-Brexit and Anti-Trump Accounts
Graph 4: Pro-Brexit and Anti-Trump

Overlap between Anti-Brexit and Pro-Trump Accounts

Looking at the top 1,430 Anti-Brexit and Pro-Trump accounts respectively, we found 30 accounts in both lists. For accounts that are both Pro-Trump and Anti-Brexit, 10% display automated behaviors.

Overlap between Anti-Brexit and Pro-Trump Accounts
Graph 5: Anti-Brexit and Pro-Trump

Conclusion

Our research has shown that often a large percentage of highly ideological Pro-Brexit + Pro-Trump Tweets were bot-related. This result wasn’t a surprise as prior research has concluded the same. On the other hand, bots are pushing out unexpected combinations of ideologies (e.g. Pro-Brexit + Anti-Trump). We are currently unsure of why a bot network would push competing ideologies and it’s worth additional research.

Appendix I: Identifying Bots with Our Classifier using the Brainspace GUI

How the Pro-Brexit Classifier works:

How the Pro-Brexit Classifer Works
Figure 6: Our Pro-Brexit classifier as shown on the Brainspace UI. There are 110 tweets manually tagged. We applied training to the entire dataset so it would score every tweet (4.8M in total).
How the Pro-Brexit Classifier Works
Figure 7: This is a histogram separated by years and frequency of tweets. When we "turn on" the classifier and look at the top 70% to 100% confidence set, we get just over 90 thousand tweets.
How the Pro-Brexit Classifier Works
Figure 8: The frequency table shows the top terms. The more purple a term is, the more likely it is anomalous and pushed by bots. We see Trump, #brexit, @nigel_farage, and other people associated with the alt-right in the UK and US mentioned in the table.
Finding Bots with Star-Pattern Analysis

Visualizing the graph, we can find potential bots by looking for star patterns. @Oluwastevens is an account our research team picked at random.

Finding Bots with Star Pattern Analysis
Figure 9: Located at the top-right, @Oluwastevens is an account connected to both Realdonaldtrump and then a cluster of UK accounts, many of them being Pro-Brexit
Finding Bots with Star Pattern Analysis
Figure 10: The top terms tweeted by the account, they are all hashtags

This account is a Nigerian who claims to live in America. The account likes and retweets Trump. They also advertise for a social media platform that pays users to post inauthentic activity on other social media sites, clearly violating terms of service for Facebook and Twitter. So, what about Brexit? The account tweets directly to the Telegraph and to Nigel Farage often (see Figure 11). At the time our analysts reviewed the account manually, these tweets to the Telegraph and Nigel Farage had been deleted indicating an attempt by the account to cover its tracks after tweets have been posted for a set amount of time.

Finding Bots with Star Pattern Analysis
Figure 11: The top terms tweeted by @Oluwastevens
Finding Bots with Star Pattern Analysis
Figure 12: Account bio says user is from Los Angeles, CA. Links to social media site for getting paid to post inauthentic content. Also has Pro Trump hashtags.
Mapping the Top Brexit Bots

Back to the dashboard we can see the top accounts exhibiting bot-like behavior and to whom they tweet at:

Mapping the Top Brexit Bots
Figure 13: On the left column we are the top accounts tweeted "at". On the right are the top accounts with the most tweets to the people on the left

Picking an account at random from the dashboard list @president_the

Mapping the Top Brexit Bots
Figure 14: Days after ingesting the data, Twitter marked this account as suspicious, therefore validating our analysis.
Mapping the Top Brexit Bots
Figure 15: This account masquerades as an official U.S. Government account in Africa, even linking to a blog site in an attempt to appear more legitimate.

These accounts mostly retweet U.S. government verified accounts such as USAID, Department of State, The White House, FBI, and various U.S. embassies and missions in Africa. Strangely they also retweet the Ministry of Foreign Affairs Russia (Figure 16), which indicates they are not official accounts of the U.S. government. In Brainspace we can select the top bot accounts and visually see relationships. We selected 11 accounts from the dashboard list. We can see how bots are connected based on Tweet targets. (Figure 17)


Mapping the Top Brexit Bots
Figure 16: Here the account retweeted Russia's Ministry of Foreign Affairs alongside USAID
Mapping the Top Brexit Bots
Figure 17: Suspected Brexit bot accounts and their network

These bots frequently tweet/retweet to Donald Trump and Jacob Rees-Mogg. Previous research has shown that people in the Brexit movement, like Jacob Rees-Mogg, have been amplified by bots. Statistics below:

Mapping the Top Brexit Bots
Figure 18: Jacob Rees Mogg and which bots tweet at him the most

These bots tweet/retweet Donald Trump:

Mapping the Top Brexit Bots
Figure 19: Bot tweets directed at Donald Trump
The Top “Remain” Bots

On the opposite side of the spectrum, “remain” or anti-Brexit sentiment exhibits the following features/characteristics, as shown by Brainspace. These are the bottom 1% to 30% of the least Pro-Brexit accounts in the dataset. Of note, at 23,000 tweets we have about one quarter of the quantity of “remain” tweets versus Pro-Brexit tweets.

The Top Anti-Bexit Bots
Figure 20: Histogram of Remain tweets

In 2019, the terms “brexit” and tweets to @jeremycorbyn (the head of the UK Labour party) are among the most amplified and anomalous terms.

The Top Anti-Brexit Bots
Figure 21: Frequency table of Remain accounts show brexit and @jeremycorbyn are among the most anomalous terms
The Top Anti-Brexit Bots
Figure 22: List of top Remain accounts (suspected bots)

Represented here are the tweets and top terms broadcasted toward Jeremy Corbyn (Figure 23). Most of these accounts oppose Brexit.

The Top Anti-Brexit Bots
Figure 23: Graph of the main Remain accounts. We can see most of these accounts show star patterns which is indicative of potential bots

These are the tweets and top terms broadcasted to Theresa May. (Figures 24 & 25)

The Top Anti-Brexit Bots
Figure 24: Accounts tweeting at Jeremy Corbyn, some using the same terms
The Top Anti-Brexit Bots
Figure 25: Accounts tweeting at Theresa May, some using the same terms