This week we had a chance to work with an analyst who is new to our environment. The conversation revealed some things we find pedestrian that are exciting to a new person, so we’re going to detail them.
Many people use Google’s Alerts, but far fewer are familiar with the service Talkwalker offers. This company offers social media observation tools and their free alerts service seems to be a way to gather cognitive excess, to learn what things might matter to actual humans. These alerts arrive as email, or as an RSS feed, which is a very valuable format.
Google Reader used to be a good feed reader, but it was canceled some years ago. Alternatives today include Feedly and Inoreader. The first is considered the best for day to day reading activity, while Inoreader gets high marks for archival and automation. The paid version, just $49 per year, will comfortably handle hundreds of feeds, including the RSS output from the above mentioned Talkwalker.
Talkwalker Alerts never sleep, Inoreader provides all sorts of automation, but how does one preserve some specific aspect of the overall take? We like Hunch.ly for faithful capture. This $129 tool is a Chrome extension that faithfully saves every page visited, it offers ‘selectors’, text strings that are standing queries in an ‘investigation’, which can be exported as a single zip file, which another user can then import. That is an amazingly powerful capability for small groups, who are otherwise typically trying to synchronize with an incomplete, error filled manual process.
Alerting, feed tracking, and content preservation are important, but the Hunch.ly investigation is the right quantum of information for an individual or a small group. Larger bodies of information where linkages matter are best handled with Maltego Community Edition, which is free. There are transforms (queries) that will pull information from a Hunch.ly case, but the volume of information returned exceeds the CE version’s twelve item limit.
Maltego Classic is $1,000 with a $499 annual maintenance fee. This is well worth the cost for serious investigation work, particularly when there is a need to live share data among multiple analysts.
Costs Of Doing Business
We are extremely fond of FOSS tools, but there are some specialized tasks where it simply makes no sense to try to “roll your own”. This $1,200 kit of tools is a force multiplier for any investigator, dramatically enhancing accuracy and productivity.
One of the perennial problems in this field is the antiquated notion of jurisdiction, as well as increasing pressure on Westphalian Sovereignty. JP and I touched on this during our November 5th appearance on The View Up Here. The topic is complex and visual, so this post offers some images to back up the audio there.
Regional Internet Registries
The top level administrative domains for the network layer of the internet are the five Regional Internet Registries. These entities were originally responsible for blocks of 32 bit IPv4 addresses and 16 bit Autonomous System numbers. Later we added 128 bit IPv6 addresses and 32 bit Autonomous System numbers as the original numbers were being exhausted.
When you plug your home firewall into your cable modem it receives an IP address from your service provider and a default route. That outside IP is globally unique, like a phone number, and the default route is where any non-local traffic is sent.
Did you ever stop to wonder where your cable modem provider gets their internet service? The answer is that there is no ‘default route’ for the world, they connect at various exchange points, and they share traffic there. The ‘default route’ for the internet is a dynamic set of not quite 700,000 blocks of IP addresses, known as prefixes, which originate from 59,000 Autonomous Systems.
The Autonomous System can be though of as being similar to an telephone system country code. It indicates from a high level where a specific IP address prefix is located. The prefix can be thought of as an area code or city code, it’s a more specific location within the give Autonomous System.
There isn’t a neat global map for this stuff, but if you’re trying to make a picture, imagine a large bunch of grapes. The ones on the outside of the bunch are the hosting companies and smaller ISPs, who only touch a couple neighbors. The ones in the middle of the bunch touch many neighbors and are similar in position to the big global data carriers.
Domain Name Service
Once a new ISP has circuits from two or more upstream providers they can apply for an Autonomous System number and ask for IP prefixes. Those prefixes used to come straight from the RIRs, but any more you have to be a large provider to do that. Most are issued to smaller service providers by the large ones, but the net effect is the same.
Having addresses is just a start, the next step is finding interesting things to do. This requires the internet’s phone book – the Domain Name System. This is how we map names, like netwarsystem.com, to an IP address, like 184.108.40.206. There is also a reverse DNS domain that is meant to associate IP addresses with names. If you try to check that IP I just mentioned it’ll fail, which is a bit funny, as that’s not us, that’s kremlin[.]ru.
Domain Name Registrars & Root DNS Servers
How do you get a DNS name to use in the first place? Generally speaking, you have to pay a Registrar a fee for your domain name, there is some configuration done regarding your Start Of Authority, which is a fancy way of saying which name servers are responsible for your domain, then this is pushed to the DNS Root Servers.
There are nominally thirteen root servers. That doesn’t mean thirteen computers, it means there are twelve different organizations manage them (Verisign handles two), and their addresses are ‘anycast’, which means they originate from multiple locations, while the actual systems themselves are hidden from direct access. This is sort of a CDN for DNS data, and it exists due to endless attacks that are directed at these systems.
Verisign’s two systems are in datacenters on every continent and have over a hundred staff involved in their ongoing operation.
Layers Of Protection
And then things start to get fuzzy, because people who are in conflict will protect both their servers and their access.
Our web server is behind the Cloudflare Content Distribution Network. There are other CDNs out there and they exist to accelerate content as well as protect origin servers from attack. We like this service because it keeps our actual systems secret. This would be one component of that Adversary Resistant Hosting that we don’t otherwise discuss here.
When accessing the internet it is wise to conceal one’s point of origin if there may be someone looking back. This is Adversary Resistant Networking, which is done with Virtual Private Networks, the Tor anonymizing network, misattribution services like Ntrepid, and other methods that require some degree of skill to operate.
Peeling The Onion
Once you understand how all the pieces fit together there are still complexity and temporal issues.
Networked machines can generate enormous amounts of data. We previously used Splunk and recently shifted to Elasticsearch, both of which are capable of handling tens of millions of datapoints per day, even on the limited hardware we have available to us. Both systems permit time slicing of data as well as many other ways to abstract and summarize.
Data visualization can permit one to see relationships that are impenetrable to a manual examination. We use Paterva‘s Maltego for some of this sort of work and we reach for Gephi when there are larger volumes to handle.
Some of the most potent tools in our arsenal are RiskIQ and Farsight. These services collect passive DNS resolution data, showing bindings between names and IP addresses when they were active. RiskIQ collects time series domain name registration data. We can examine SSL certificates, trackers from various services, and many other aspects of hosting in order to accurately attribute activity.
The world benefits greatly from citizen journalists who dig into all sorts of things. This is less than helpful when it comes to complex infrastructure problems. Some specific issues that have arisen:
People who are not well versed in the technologies used can manage to sound credible to the layman. There have been numerous instances where conspiracy theorists have made comical attribution errors, in particular geolocation data for IPs being used to assert correlations where none exists.
There is a temporal component that arises when facing any opponent with even a bit of tradecraft and freely available tools don’t typically address that, so would-be investigators are left piecing things together, often without all of the necessary information.
Free access to quality tools like Maltego and RiskIQ are both intentionally limited. RiskIQ in particular cases problems in the hands of the uninitiated – a domain hosted on a Cloudflare IP will have thousands of fellows, but the free system will only show a handful. There have been many instances of people making inferences based on that limited data that have no connection to objective reality.
We do not have a y’all come policy in this area, we specifically seek out those who have the requisite skills to do proper analysis, who know when they are out on a limb. When we do find such an individual who has a legitimate question, we can bring a great deal of analytical power to bear.
That specific scenario happened today, which triggered the authoring of this article. We may never be able to make the details public, but an important thing happened earlier, and the world is hopefully a little safer for it.
This is one of nearly two dozen sites pushing fringe right wing views that are all associated with Mark Edward Baker, as detailed in this story by McClatchy. When I first heard of this the initial thought was that something so slippery could be a foreign influence operation. I came to a much different conclusion, but it took many hours of digging.
Here are the full list of domains involved:
The physical plant for this is a circus – 434 unique IP addresses and they all seem to be tied to the operation.
A simpler exam of the SOA for each domain yielded a deeper clue in the form of the [email protected] address used for registration. It’s connected to another cluster of domains.
We are not going to revisit the merry chase this guy provides – fire up hunch.ly and go at it. He uses the alias Mark Bentley, be on the lookout for LOP, which is short of League of Power, and his wife Jennifer is a signatory on some of the paperwork. He has at least half a dozen PO boxes in Florida and a similar setup in Reno, Nevada, which appears to he his origin. Once I was sure I had a real name, I was more interested in what his business model is and if there were any foreign ties.
If you poke around for League of Power you’ll find complaints about his $27 scam work from home DVD. This guy’s ideology is getting other people’s money and providing little to nothing in return. This is pretty common to see on both sides of the aisle – grifters working the earnest, but naive masses. This guy clearly focuses on the right – different skills are needed to run a similar game against the left.
Here is the one image that more or less sums up what he is doing:
You would have to be in the business of examining attribution resistant hosting to notice this, but it was like a flashing neon sign for me. Domains that don’t want to be traced typically have no email handling at all. This guy’s business model is list building, which he’ll use for maybe some political stuff, but it will be an ongoing bulk mail target after the election.
What about foreign influence being behind this? The article mentions that conservativezone[.]com[.]com had been used. That’s a spearphishing move. It resolves to these geniuses:
What is AS206349? A dinky autonomous system in Bulgaria with a history of IP address hijacking. Baker got some service from these guys, but it wasn’t anything he wanted to receive. The may have noticed he was gathering lists of the easily duped and decided this would be a good phishing hole for them.
People who are way into the bipolar politics in the U.S. tend to judge things as either on their side, or the opposition. There are nuances on that spectrum that get ignored, foreign influence, weird cyberstalker types, and just outright fraud, like we’re seeing here. Don’t jump to sticking a red or blue label on something until you’ve had a good chance to inspect it and conjure up some alternate theories.
Let’s take a look at a curious thing in RiskIQ – the October 22nd registration for the 0hour1[.]com domain.
The nameservers at westcall[.]ru are part of a large ISP in St. Petersburg. The registrant’s trail is an intentional mess, but lets see what we can find on Brian Durant. It’s helpful to know his birth name – Fiore DiPietrantonio.
Durant came to my attention because his crew are bothering a CVE researcher I know, and threatening a man in Brooklyn that they mistakenly identified as them. There is a decent Threadreader on Durant from @trebillion that provided me a starting point.
Achtung! If you choose to pursue this, you must turn your OSINT tradecraft up to eleven. The name change is an attempt to leave behind a shady past, there is intentional deception at work prior to the politics, and don’t chase after a pretty face on a largely empty persona.
As a sign of how much of a hassle this backtrail was, take a look at my hunch.ly case for it. And this is the second one – the first draft had so much crud in it I found it easier to just start over and revisit the stuff I confirmed.
About that empty female persona … here is where it starts.
Which of these do you think are legitimate?
If you picked only #1 and #2, which a faint urge to check #3, give yourself a gold star.
I got distracted writing this and spent half an hour playing with the RiskIQ response to the Maltego Domain Analysis transform. GoDaddy is a terrible swamp that typically reveals nothing, so I collapsed it to a point to better see the other things. The westcall[.]ru nameserver mistake only showed in the registration, they caught it before it started turning up in passive DNS. So what are these three other things we see?
220.127.116.11 is part of The Swamp – the tiny allocations in 192.0.0.0/7 that were handed out for free back in the eighties. RiskIQ shows over 9,000 names for it, Maltego finds 552, ARIN says it’s a point to point link from Limestone Networks to a customer. It’s been passed around for thirty years or more and I don’t think it tells us much. The reverse lookup is the last one someone carded to enter and those don’t seem to matter much on shared servers. Make a mental note to come back, only if all else fails.
18.104.22.168 is a European address, you can tell just by the leading octet, and with a little poking we find it’s in AS44901 – BelCloud Hosting, but it’s listed with over 10,000 other names.
What about 22.214.171.124? Looking at the times it was active we find that it had the system to itself, running what looks to be WordPress under cPanel.
This is turning into a common theme – people trying to do their own hosting and then giving up after a couple days. The DNS tab provided another interesting clue that I just noticed as I was drafting this article.
And what’s going on here? Every time I look at this thing I find another eastern European/Russian link.
So … I thought this was going to be a declaration and instead it’s a problem statement – there is more digging to do here. But there is one piece of digging that is done – the ID of Brian Durant’s associate who threatened me when I first started probing is within easy reach. Check the DM that @NetwarSystem received on October 17th and the menacing voicemail from October 27th.
This “very reasonable dude” is @MistaBRONCO, which has been a stable alias for him for at least six years. It was on his Flickr account, where a close inspection of cat photos turned up this gem. Handsome boy, isn’t he?
So our internet tough guy who cleverly pressed *69 before he left me a message on a Google Voice number that hasn’t a phone attached to it in five years is laid low by the ol’ surname & multiple phone numbers on a pet’s nametag. Amateur hour here – stable name, personal details all over the place. This is all recorded in hunch.ly and the good bits of the YouTube channel that put the voice with this cat are safe, too.
So I’ve got a voice threat, another guy that this genius misidentified as Ca1m has received threats, the source lives on Long Island, the target is in Brooklyn. Since these are both covered by the NYC FBI field office and I was pointedly told to “come correct”, I had the following exchange with the counter-terrorism SA in that office, whom I’ve known since Occupy days.
That’s as correct as I can play it in the wee hours of a Friday right before a midterm in which Russian influence is certainly still a problem.
We have a special treat today, an actual investigation and preservation exercise on a Russian company we have had our eye on for the last year. First, to understand why today is their lucky day, peruse this New York Times article:
Here’s the lede from the article for those who refuse to click through:
On the same day Facebook announced that it had carried out its biggest purge yet of American accounts peddling disinformation, the company quietly made another revelation: It had removed 66 accounts, pages and apps linked to Russian firms that build facial recognition software for the Russian government.
This is a good step, but it isn’t the only housecleaning needed.
If you open the current Maltego client you will see the Transform Hub, a listing of all service providers you can integrate. This is what you see this morning in the left hand column if you open the application on a 4k monitor. Note what is first and what is last.
I am the admin for the Maltego Group on LinkedIn. This is a screen shot I collected this morning. Implicit in this is a violation of Facebook terms of service, the source is SocialLinks CEO Alexandr Aleexev.
The company’s YouTube page has a variety of other demonstrations that indicate similar practices on other social media platforms. I snagged a screen shot from the video, which shows what they were doing a year ago.
This alone is enough to set off alarm bells given the current climate. We know a bit more about this company, because I assumed their appearance on the Transform Hub meant they were a quality provider. I spent $130 for a month of service in November of 2017, setting off six weeks of curious encounters, both with them, and disinterested counter-intelligence investigators on three continents.
Needing to capture some Instagram content for a civil case, I grew weary of chasing threads after I got acquainted with the players, so I went looking for an automation solution that would capture their social network. I had an Instagram account to use from one of the parties to the case, but I assumed it would be burned the minute any evidence from it showed in a filing.
What I received just plain didn’t work. Not with Instagram, not with my personal LinkedIn network, not with Github, which should be fairly open.
This went on for a while, at first with me sharing details of DNS resolution issues for their servers and other stuff that seemed like normal troubleshooting. When I went to LinkedIn and made contact with CEO Alexandr Alexeev, I was immediately suspicious given his Moscow location.
We had an initial conversation on LinkedIn, below is an example of how things started. We agreed to meet on Wire, and there I was asked about VPNs, provisioning proxies in the U.S., and other means to circumvent access restrictions. I played along, thinking that some counter-intel attention might be forthcoming.
I felt there was plenty of reason for attention on this situation, but after a month of asking for attention, no fewer than three governments who should have been interested all struck out without so much as a single swing.
United States – target of Russian election interference.
United Kingdom – target of Russia referendum interference.
South Africa – home of Maltego maker Paterva.
I’m not saying that I typed this all into IC3, pressed ‘send’, and crossed my fingers. I had conversations with two people in the U.S. IC community, I talked to James Patrick at Liberty STRATCOM, and I have an associate who has a personal relationship with a brigadier in Hawks, South Africa’s Directorate for Priority Crime Investigation.
There are two ways to interpret this – either the system has already noticed and is working the problem, or the system is utterly overloaded and we are on our own. The fact that I got no response from any of the three countries made me think it was the latter. Having covered my own position, I sat back to see what happened next.
Further Cause To Act
Two months ago I met up with someone who had experiences with Social Links that were very similar to mine. I had already come to believe I was dealing with someone trying to “handle” me, albeit clumsily, and probably seeking advice from someone else in the process. Talking to this other party hardened that impression. That’s all I am going to say about this.
As a last ditch effort, I reached out to an FBI Special Agent I know last week. They had the weekend to think it over, Monday to act, and the NYT article is the last piece of stimulus I need. I just removed Alexeev from the LinkedIn Maltego Group and later today I am going to remove his posts, after announcing why he was removed.
Preservation & Publication
The particulars of what happened are important so I set out to preserve the details. Here are the steps I took:
Started Hunch.ly, preserved forty pages, both public and private. This application preserves content in an admissible fashion.
Preserved the @_SocialLinks_ Twitter account for further analysis using both Maltego and our internal tools.
Wrapped up the exported Hunch.ly casefile, the Maltego graphs, and the videos, transferred it to a person who will hand carry it to a former U.S. Attorney we employ for certain sorts of touchy situations.
That last bit is just for my protection, given that our President and Attorney General both appear to be compromised by the Russian government. I will not be subject to a raid that deprives me of exculpatory evidence, followed by a politically motivated prosecution.
In the absence of any response from the agencies that ought to handle problems like this, I guess we are in charge for the moment. If you’ve had similar troubles with this company, I advise you to back up everything, transfer it to legal counsel so that it is protected from seizure by attorney client privilege, then reach out to your local FBI field office.
I have shared the particulars with a couple reporters, too. If you have similar experiences and wouldn’t mind being interviewed on this, feel free to contact me.
Manchild scammer extraordinaire @JacobAWohl has been the center of a vortex of public attention for the last day or so, but it isn’t working out quite as he had envisioned. The scheme as revealed thus far has involved:
A front agency, Surefire Intelligence, with a claimed global footprint, based entirely on virtual office spaces.
A staff, several of whom had stock photos from A list celebrities, but only two of which had any contacts.
A second front agency, Atrium Private Intelligence, with a domain registered after Surefire’s, apparently existing to provide depth to the legend for the first agency.
Domains registered by proxy, but configured with a web site construction kit that included Wohl’s personal email in the SOA (Start Of Authority) for the domain.
Google Voice phone numbers, one of which Wohl validated with a real number belonging to his mother.
That being said, the failures can be summarized as “youthful enthusiasm”. Hertz will not rent to anyone under twenty five, our society’s firmest recognition that the neurological stuff that starts with the teen years doesn’t at high school graduation. Whatever role Jack Burkman and Jim Hoft played in this scheme, the execution was left in the hands of a lad that won’t be able to drink legally for another six weeks.
The primary reason you are reading this site (we think) is not for our slick production quality or our mastery of algorithms. You are here because we have been there. We try to keep what we do simple, rugged, and recoverable, because we are used to having too few resources, too many tasks, and not nearly enough time.
There has been a gap in the social network analysis world since @ladamic stopped offering her excellent class via Coursera. I received my copy of Complex Network Analysis in Python earlier today, devoured the first five chapters, and I am pleased to report this book is a quality alternative to the class.
The book presumes you have enough Python to load stuff with pip and some pre-existing motivation to explore networks. Some key things to know:
The features and performance considerations of four Python network analysis modules are explained in detail; invaluable for those who are trying to scale up their efforts.
The visualization package Gephi is introduced in a very accessible fashion and advice regarding how to move data between it and your Python scripts is clear and simple.
There are a variety of real world examples included
The thing that is missing is Twitter – which is mentioned just once in passing on page 63. This seems like a good opening – the Complex Network Analysis Github repository is going to contain the Twitter related code we produce as we work through the examples in this text.
The initial wave of NPCs were taken out by Twitter, about 1,500 all together according to reporting. A small number lingered, somehow slipping past the filter, and now they are regrouping. A tweet regarding the initial outbreak collected several new likes, among them this group of five:
And their 740 closest friends are all pretty homogenous:
A fast serial collection of these 740 accounts they follow was undertaken. Their mentions reveal some accounts that are early adopters, survivors of the first purge, or otherwise influential. 735 of them came through collection, the missing were empty, locked, or suspended.
These accounts made 469,889 mentions of others. First we’ll look at 285,102 mentions of normal accounts, then we’ll see 184,787 mentions of Celebrity, Media, and Political accounts. Given that there are 67,000 accounts involved in this mention map, we’ll employ some methods we don’t normally use. This layout was done with OpenOrd rather than Force Atlas 2 and the name size denotes volume of mentions produced.
The large names here are based on Eigenvector centrality – they are likely popular members of the group, or in the case of Yotsublast, a popular content creator aligned with NPC messaging.
Usually we filter CMP – Celebrities, Media, and Politicians. These accounts are actively seeking attention so it is interesting to see who they reach out to in order to achieve that in these 184,787 mentions to about 18,500 others.
Attempting different splines with Eigenvector centrality leads to, after several tries, this mess.
Beyond the core at the bottom, Kathy Griffin, Alexandra Ocasio-Cortez, and Hillary Clinton are singled out for attention.
Mentions are directed but the best way to handle them at this scale seems to be treating them as undirected and using the KBrace filter. This is a manageable set of accounts to examine and the groupings make intuitive sense.
The 742 accounts were placed into our “slow cooker” but only 397 were visible. It isn’t clear why 350 were missed, but Twitter’s quality filter may have something to do with that.
Unlike the group of accounts in yesterday’s A Deadpool Of Bots, this wake/sleep cycle over the last ten days looks like humans making their own accounts to join in the fun. Given a good sized sample of tweets, an average adult will only consistently be inactive from 0200 – 0500, so those empty three hour windows, except for the first day, are a pretty convincing sign.
Their hashtag usage is entirely what one would expect.
Given the tight timeframe it was interesting to look at an area graph of their daily hashtag use for the last ten days.
As a society we have barely begun to adapt to automated propaganda, and now we’re facing a human wave playing at being automation. This is an interesting, helpful thing, as it provides a perfect contrast to what we explored in A Deadpool Of Bots.
Yesterday in chat someone pointed out a small set of accounts that followed this Dirty Dozen of known harassment artists.
The accounts all had the format <first name><last name><two digits>. We extracted two dozen from the followers of these twelve accounts, then ran their followers and found a total of 112 accessible accounts that had this same format. Suspecting a botnet, we created a mention map to see how they have been spending their time.
This image is immediately telling for those used to examining mention maps. There are too many communities (denoted by different colors) present for such a small group and the isolated or nearly isolated islands just don’t look like a human interaction pattern.
Adjusting from outbound degree to Eigenvector centrality, it was immediately clear what the focus of this group of accounts was. The next level of zoom in the names revealed two other cryptocurency news sites and a leader in the field as being the targets of the these accounts.
Thinking that 112 was a small number, we extracted their 7,029 unique followers’ IDs and got their names. Nearly 600 matched the first/last/digits format, but there were other similarities as well. We placed all 7,029 in our “slow cooker”, set to capture all of their tweets.
We were expecting to find signs of a botnet, but it appears the entire set of accounts are part of the same effort. The 6,897 we managed to collect were all created in the same twelve week period. The gap between creation times and the steady production of about eighty accounts per day seems to indicate a small hand run operation in a country with cheap labor.
The network is transparently focused on cryptocurrency over the long haul. Adjusting the timeframe to the last thirty days moved the keywords around a bit but the word cloud is largely the same. The clue to how these accounts got into the mix is there in the lower right quadrant – #followme and #followback indicate a willingness to engage whomever from the world at large, in addition to their siblings.
Why we have pursued this so far when it looks like just a cryptocurrency botnet is due to this clue.
Here are five bad actors with two other accounts created right in between them time wise. This is the most striking example, but there are others like it. And understand the HUMINT that triggered this – a group of people who do nothing but take down racist hate talkers all day feel besieged by a group that manages to immediately regenerate after losing an account.
Our Working Theory
What we think we are seeing here is a pool of low end crypto pump and dump accounts that were either created for or later sold to a ringleader in this radicalized right wing group.
Now that we have roughly 7,000 of them on record, we have to decide what to do. This is just such a blatant example of automation that Twitter might immediately take it down if they notice. The 6.5 million tweets we collected are utterly dull – the prize here is the user profile dataset. We’d need some mods to our software, but maybe we need to collect all the followers for this group of 7,000 and figure out what the actual boundaries of this botnet truly are.
This has been a tiresome encounter for those who make it their business to drive hate speech from Twitter, but this may be the light at the end of the tunnel. If one group is using pools of purchased accounts to put their foot soldiers back in play the minute they get suspended, others are doing this, too. No effort was made to conceal this one from even moderate analysis efforts. If we demonstrate this is a pattern and Twitter is forced to act, we may well find that a lot of the heat will go out of political discourse on that platform.
Earlier today we captured seventy one Twitter accounts that we classified into three groups. These are Durant’s Dullards (21), Team Pillow Forts (23), and TheShed (27). The first are associated with RowdyPolitics[.]com, the second group are associated with CitJourno[.]org and Patribotics[.]blog, while the last group are unified by being stable, long term personas who are often forced to replace accounts due to suspension.
Visually, the Fortress of Pillowtude is on the left, the cluster of red accounts are the RowdyPolitics people, and The Shed’s frequent reincarnations leave them scattered around the perimeter on the right with fewer mentions.
This particular graphic has been filtered to remove 934 ‘CMP’ accounts – Celebrities, Media, and Politicians. The working theory behind this is that those accounts are ubiquitous, they cross group boundaries, and thus are not terribly useful for diagnostics. That thinly populated space in the middle are less notable CMP figures that haven’t been removed yet … but more importantly, some of those are ‘weak ties’, as covered in Mark Granovetter‘s 1973 classic social network analysis paper The Strength of Weak Ties.
Seeing The Whole Forest
While these groups lead in the creation, curation, and elevation of content, we want to be able to see them in the context of their operating environment. Graphs like this are useful for discerning structure, for identifying certain types of relationships, but those accounts generated over 262,000 mentions and over 12,000 others were mentioned twice or more. This is where we set aside Gephi and take up Elasticsearch.
Selecting the 2,739 accounts mentioned ten or more times is a good balance between getting what is important and not overrunning out available resources. Recent performance tuning means our collection system can now handle forty eight accounts in parallel. This run took 70 minutes to collect 6.48M tweets from 2,235 accounts that were actually available, an average of 32 accounts/minute. The 504 missing accounts are mostly those from The Shed that have been banned.
We want to see both overall features as well as group specifics, so JSON filters were created for each group. Applying them, we can see the top hashtags in use by each group over the last week. The fourth cloud is the overall set of hashtags employed by every account they mentioned. Here we begin to see what each group’s contribution to the overall conversation may have been.
6.5 million lines of text is a lot to digest. When we employ Kibana we have powerful ways to search, filter, and abstract content, coupled with fine grained control of time. If we want to know the top hashtags over the prior seven days, limited to those that occurred with #MAGA or #Anonyous, and see how they compare volume wise, that’s easily done.
What if we want to see who first noticed the news of Elena Khusyaynova’s indictment on Friday? A few mouse clicks and we have the data from when the story broke. Long term observations are just as smooth – if we set the system up to spool content, it’ll just continuously capture the accounts that we decide are interesting.
We are just getting started with the Kibana interface to Elasticsearch, using it as an advanced text search engine, and doing some simple infographics in the spirit of descriptive statistics. There are complex, powerful tools out there, such as Timesketch and Wazuh, that are built on the Elasticsearch foundation. If we find just the right person, we may start branching in that direction.