The United Kingdom is divided into twelve N.U.T.S. – Nomenclature des Unités Territoriales Statistiques. These are Northern Ireland, Scotland, Wales, and nine regional divisions within England.
We had been running a Brexit related Twitter stream for some time, but as offerings from Social Media Intelligence Unit matured, we needed some finer grain detail. After a bit of grubbing around to match Twitter accounts to MP names, and then MP names to regions, we had twelve text files of account names. We collected the followers for each account, then consolidated those profiles into region specific indices. Those twelve indices were concatenated into a single national index. Then finally this national index was merged with our master index of user accounts.
This triple redundancy does use space, but not so much, a smaller index is a faster index, and there is some advantage to knowing the source of the content with great detail. This is also conditioned a bit by our upcoming Neo4j graph database implementation. The only thing we are sure of is that it will be an iterative process. Smaller pieces are easier to test than one massive corpus that requires much filtering to untangle.
Having done the organizing work, we made the MP accounts available in a Github repository. There are a number of people who have access to our Elasticsearch system and not all of them are able to whip up filters out of text files, so we prebuilt for most of the things they might want to do.
And with this we can do things that won’t be seen elsewhere …
For example, you can easily see the friends and followers for an MP’s account. Did you ever wonder how many friends they have all together? About 1.28m relationships to 506k unique accounts. We decided to drill down a bit further. How many of those 506k friends do not follow back any MPs at all?
Some of those accounts with MP audience who do not follow back will be family/friends, but if we squelch the ones with only one contact, those will vanish. Intuitively, we should see regional media, NGO, and government accounts next. Those that cross regions are likely influencers. Early returns on processing show a fairly steady 25% of those 506k do not follow any MPs. A set of 127k nodes with fewer than 600 hubs is going to be a dense graph, but not an impossible one to explore.
Time passes …
The final count proved to be 123,862 not followed by MPs, or 24.46%, which was a trifle surprising as the above estimate was done when the set was 10% processed.
And as a pleasant bonus, while those 123k accounts were processing we made good progress on being able to identify MPs by both party and region. U.K. politics watchers will note we got the colour right for four of the largest parties, and managed to visually distinguish one more. The accounts that were followed by MPs but did not reciprocate contributed to being able to arrange the graph so clearly.
Applying Eigenvector centrality for name sizing revealed some influencers.
The region data didn’t sort as neatly, but this was only 5% of the total network. Once it’s complete we should see it resolve into regional groupings, but they’ll be a bit more ragged than the cleaner ideological divides.
Being able to include and employ custom attributes is something we can do with Gephi, but not with Graphistry. If we wanted to accomplish that we’d need a graph database capable back end, like Neo4j, and the license required to connect it.
This is an interesting step, one that was easy in retrospect, but which we should have done some time ago. We’ll make up for lost time, now that we’ve got the capability.