Exploring Conversation Spaces

Earlier today we captured seventy one Twitter accounts that we classified into three groups. These are Durant’s Dullards (21), Team Pillow Forts (23), and TheShed (27). The first are associated with RowdyPolitics[.]com, the second group are associated with CitJourno[.]org and Patribotics[.]blog, while the last group are unified by being stable, long term personas who are often forced to replace accounts due to suspension.

Three Co-Traveler Groups
Three Co-Traveler Groups

Visually, the Fortress of Pillowtude is on the left, the cluster of red accounts are the RowdyPolitics people, and The Shed’s frequent reincarnations leave them scattered around the perimeter on the right with fewer mentions.

This particular graphic has been filtered to remove 934 ‘CMP’ accounts – Celebrities, Media, and Politicians. The working theory behind this is that those accounts are ubiquitous, they cross group boundaries, and thus are not terribly useful for diagnostics. That thinly populated space in the middle are less notable CMP figures that haven’t been removed yet … but more importantly, some of those are ‘weak ties’, as covered in Mark Granovetter‘s 1973 classic social network analysis paper The Strength of Weak Ties.

Seeing The Whole Forest

While these groups lead in the creation, curation, and elevation of content, we want to be able to see them in the context of their operating environment. Graphs like this are useful for discerning structure, for identifying certain types of relationships, but those accounts generated over 262,000 mentions and over 12,000 others were mentioned twice or more. This is where we set aside Gephi and take up Elasticsearch.

Selecting the 2,739 accounts mentioned ten or more times is a good balance between getting what is important and not overrunning out available resources. Recent performance tuning means our collection system can now handle forty eight accounts in parallel. This run took 70 minutes to collect 6.48M tweets from 2,235 accounts that were actually available, an average of 32 accounts/minute. The 504 missing accounts are mostly those from The Shed that have been banned.

We want to see both overall features as well as group specifics, so JSON filters were created for each group. Applying them, we can see the top hashtags in use by each group over the last week. The fourth cloud is the overall set of hashtags employed by every account they mentioned. Here we begin to see what each group’s contribution to the overall conversation may have been.

Durant's Dullards Top 25 Hashtags
Durant’s Dullards Top 25 Hashtags
Team Pillow Forts Top 25 Tags
Team Pillow Forts Top 25 Tags
The Shed Top 25 Hashtags
The Shed Top 25 Hashtags
Top 25 Hashtags From All Accounts Mentioned
Top 25 Hashtags From All Accounts Mentioned

Temporal Matters

6.5 million lines of text is a lot to digest. When we employ Kibana we have powerful ways to search, filter, and abstract content, coupled with fine grained control of time. If we want to know the top hashtags over the prior seven days, limited to those that occurred with #MAGA or #Anonyous, and see how they compare volume wise, that’s easily done.

Top Hashtags Prior Week
Top Hashtags Prior Week

What if we want to see who first noticed the news of Elena Khusyaynova’s indictment on Friday? A few mouse clicks and we have the data from when the story broke. Long term observations are just as smooth – if we set the system up to spool content, it’ll just continuously capture the accounts that we decide are interesting.

Khusyaynova Indictment
Khusyaynova Indictment

Future Explorations

We are just getting started with the Kibana interface to Elasticsearch, using it as an advanced text search engine, and doing some simple infographics in the spirit of descriptive statistics.  There are complex, powerful tools out there, such as Timesketch and Wazuh, that are built on the Elasticsearch foundation. If we find just the right person, we may start branching in that direction.

Tools Of The Trade

Articles here are written by a single author (thus far) but represent the collective views of a loose group of two dozen collaborators, hence the use of the first person plural ‘we’. We take on civil investigations, criminal defense, penetration testing, and geopolitical/cybersecurity threat assessments.

Group members have native fluency in English, French, German, Spanish, Romanian, and we do a fair job with Arabic when it is required. Several of us have corporate or IPS infrastructure backgrounds, and our tools, both chosen and created, reflect this internal integration capability.

This is an inventory of the major systems we currently employ.

Gephi

The Gephi data visualization package is a piece of free software which permits the handling of networks with tens of thousands of nodes and hundreds of thousands of links. We use this for macro scale examinations of Twitter and some types of financial data, coding import procedures to express complex metrics, when required. When you see colorful network maps, this is likely the source.

Maltego

The Maltego OSINT link analysis system began life as a penetration tester’s toolkit. It offers a rich set of entities, integration of many free and paid services, and local transform creation. There is a team collaboration feature for paid subscribers and the free Community Edition can read any graph we produce. This is used internally in the same way a financial audit firm would employ a spreadsheet – it is a de facto standard for recording and sharing investigation information.

Sentinel Visualizer

Sentinel Visualizer is a law enforcement/intel grade link analysis package that supports both geospatial and temporal analysis. This only comes out in the face of paying engagements with large volumes of data, as it has a somewhat intimidating learning curve.

Hunch.ly

Hunch.ly is a Google Chrome extension that preserves the trail of web sites one visits, applying a standing list of selectors to each page and permitting the addition of investigator’s notes. This tool supports the notion of multiple named investigations, preserves content statically, and can export in a variety of formats. Users are free to follow their noses without the burden of bookmarking and making screen shots while investigating, then later attempting to share their findings in a coherent fashion. The system recently began supporting local Maltego transforms.

RiskIQ

The RiskIQ service is an aggregator of a dozen passive threat data repositories in addition to it’s own native tracking of domain registrations, DNS, SSL certificates, and other threat assessment data. The service is delivered as a web based search engine and a companion set of Maltego transforms. This system is a panopticon for bad actor infrastructure which we use daily.

Elasticsearch

The Elasticsearch platform is used for many things, but for us it is a full text search engine with temporal analysis capabilities that will easily handle tens of thousands of Twitter accounts that have produced tens of millions of tweets. This is a construction kit for us, the right way to collate and correlate the work of teams of Actors, Collectors, and Directors. We currently curate 25 million tweets from ISIS accounts that were collected by TRAC, we support Liberty STRATCOM with collection and analysis, and the botnetsu.press system is in use by activists who track violent right wing groups in the west.

Negative Decisions

What not to do is just as important as the right stuff. Here are some things we avoided, that we tested but did not implement, or that we have used but later abandoned.

Analyst’s Notebook – nonstarter, 2x the cost of Sentinel Visualizer, and not nearly as open.

Windows – with the exception of Sentinel Visualizer, we don’t have anything that is Windows dependent. Generally speaking, things have to behave for Linux and OSX, with Windows support being nice, but not required.

Splunk – we tried to love it, truly we did. It just didn’t work out.

OSSIM – largely abandonware from what we hear. AlienVault’s Open Threat Exchange is doing fine though, and it all turns up in RiskIQ.

Aeon, Timeline, etc – we always jump at collaborative timeline tools, then later end up sitting back and being annoyed. SaaS solutions are out there, but we have confidentiality concerns that hold us back from using them.

TimeSketch – very cool, an Elastic based tool, but more incident response focused than intel oriented.

SpiderFoot – very cool, but we settled on RiskIQ/Maltego installed on a remotely accessible workstation. This is one we should put back up and use enough to advise others.

There have been many more digressions over the years, these are some of the more formative ones.