Earlier today we captured seventy one Twitter accounts that we classified into three groups. These are Durant’s Dullards (21), Team Pillow Forts (23), and TheShed (27). The first are associated with RowdyPolitics[.]com, the second group are associated with CitJourno[.]org and Patribotics[.]blog, while the last group are unified by being stable, long term personas who are often forced to replace accounts due to suspension.
Visually, the Fortress of Pillowtude is on the left, the cluster of red accounts are the RowdyPolitics people, and The Shed’s frequent reincarnations leave them scattered around the perimeter on the right with fewer mentions.
This particular graphic has been filtered to remove 934 ‘CMP’ accounts – Celebrities, Media, and Politicians. The working theory behind this is that those accounts are ubiquitous, they cross group boundaries, and thus are not terribly useful for diagnostics. That thinly populated space in the middle are less notable CMP figures that haven’t been removed yet … but more importantly, some of those are ‘weak ties’, as covered in Mark Granovetter‘s 1973 classic social network analysis paper The Strength of Weak Ties.
Seeing The Whole Forest
While these groups lead in the creation, curation, and elevation of content, we want to be able to see them in the context of their operating environment. Graphs like this are useful for discerning structure, for identifying certain types of relationships, but those accounts generated over 262,000 mentions and over 12,000 others were mentioned twice or more. This is where we set aside Gephi and take up Elasticsearch.
Selecting the 2,739 accounts mentioned ten or more times is a good balance between getting what is important and not overrunning out available resources. Recent performance tuning means our collection system can now handle forty eight accounts in parallel. This run took 70 minutes to collect 6.48M tweets from 2,235 accounts that were actually available, an average of 32 accounts/minute. The 504 missing accounts are mostly those from The Shed that have been banned.
We want to see both overall features as well as group specifics, so JSON filters were created for each group. Applying them, we can see the top hashtags in use by each group over the last week. The fourth cloud is the overall set of hashtags employed by every account they mentioned. Here we begin to see what each group’s contribution to the overall conversation may have been.
6.5 million lines of text is a lot to digest. When we employ Kibana we have powerful ways to search, filter, and abstract content, coupled with fine grained control of time. If we want to know the top hashtags over the prior seven days, limited to those that occurred with #MAGA or #Anonyous, and see how they compare volume wise, that’s easily done.
What if we want to see who first noticed the news of Elena Khusyaynova’s indictment on Friday? A few mouse clicks and we have the data from when the story broke. Long term observations are just as smooth – if we set the system up to spool content, it’ll just continuously capture the accounts that we decide are interesting.
We are just getting started with the Kibana interface to Elasticsearch, using it as an advanced text search engine, and doing some simple infographics in the spirit of descriptive statistics. There are complex, powerful tools out there, such as Timesketch and Wazuh, that are built on the Elasticsearch foundation. If we find just the right person, we may start branching in that direction.