Migrating To Elasticsearch 6.8.1

Consulting our combined collection this morning, we found the oldest to be this 2008 anniversary tweet from @vipe248.

This tweet came into the collection thanks to a study of #Qanon we did earlier this year. The actual inception of our current cluster hardware appears to have been on January 29th of 2019. The very earliest it could have been created was December 19th of 2018 – the release date for Elasticsearch 6.5.4.

The system is resilient to the loss of any one system, which was given an unintended test last night, with an inadvertent shutdown of one of the servers in the cluster. Recovery takes a couple of minutes given the services and virtual machines, but there was not even an interruption in processing.

Today, for a variety of reasons, we began the process of upgrading to the June 20th, 2019 release of Elasticsearch 6.8.1. There are a number of reasons for doing this:

  • Index Life Cycle Management (6.6)
  • Cross Cluster Replication (6.6)
  • Elasticsearch 7 Upgrade Assistant (6.6)
  • Rolling Upgrade To Elasticsearch 7 (6.7)
  • Better Index Type Labeling (6.7)
  • Security Features Bundled for Community Edition (6.8)
  • Conversion From Ubuntu to Debian Linux

We are not jumping directly to Elasticsearch 7.x due to some fairly esoteric issues involving field formats and concerns regarding some of the Python libraries that we use. Ubunt1.u has been fine for both desktop and server use, but we recently began using the very fussy Open Semantic Search, and it behaves well with Debian. Best of all, the OVA of a working virtual machine with the Netwar System code installed and running is just 1.9 gig.

Alongside the production ready Elasticsearch based system we are including Neo4j with some example data and working code. The example data is a small network taken from Canadian Parliament members and the code produces flat files suitable for import as well as native GML file output for Gephi. We ought to be storing relationships to Neo4j as we see them in streams, but this is still new enough that we are not confident shipping it.

Some questions that have cropped up and our best answers as of today:

Is Open Semantic Search going to be part of Netwar System?

We are certainly going to be doing a lot of work with OSS and this seems like a likely outcome, given that it has both Elasticsearch and Neo4j connectors. The driver here is the desire to maintain visibility into Mastodon instances as communities shift off Twitter – we can use OSS to capture RSS feeds.

Will Netwar System still support Search Guard?

Yes, because their academic licensing permits things that the community edition of Elasticsearch does not. We are not going to do Search Guard integration into the OVA, however. There are a couple reasons for that:

  • Doesn’t make sense on a single virtual machine.
  • Duplicate configs means a bad actor would have certificate based access to the system.
  • Eager, unprepared system operators could expose much more than just their collection system if they try to use it online.
  • Netdata monitoring provides new users insight into Elasticsearch behavior, and we have not managed to make that work with SSL secured systems.
  • We are seeking a sensible free/paid break point for this system. It’s not clear where a community system would end and an enterprise system would begin.
Is there a proper FOSS license?

Not yet, but we are going to follow customs in this area. A university professor should expect to be able to run a secure system for a team oriented class project without incurring any expense. Commercial users who want phone support will incur an annual cost. There will be value add components that will only be available to paying customers. Right now 100% of revenue is based on software as a service and we expect that to continue to be the norm.

So the BSD license seems likely.

When will the OVA be available?

It’s online this morning for internal users. If it doesn’t explode during testing today, a version with our credentials removed should be available Tuesday or Wednesday. Most of the work required to support http/https transparently was finished during first quarter. One it’s up we’ll post a link to it here and there will be announcements on Twitter and LinkedIn.

Impressive Performance With Elastic 6.5.1 & Search Guard

After Implementing Search Guard ten days ago I was finally pushed into using Elasticsearch 6. Having noticed that 6.5.0 was out I decided to wait until Search Guard, which seems to lag about a week behind, managed to get their update done.

The 6.5.0 release proved terribly buggy, but now here we are with 6.5.1, running tests in A Small Development Environment, and the results are impressive. The combination of this code and an upgrade from Ubuntu 16.04 to 18.04 has made the little test machine, which we refer to as ‘hotpot‘, as speedy as our three node VPS based cluster.

Perflog Dashboard
Perflog Dashboard

This is a solid long term average of fully collecting over eleven accounts per minute, but the curious thing is that it’s not obvious what resource is limiting throughput. Ram utilization eventually ratcheted up to 80% but the CPU load average has been not more than 20% the whole time.

Utilization Dashboard
Utilization Dashboard

There is still a long learning curve ahead, but what I think I see here is that an elderly four core i7, if it has a properly tuned zpool disk subsystem, will be able to support a group of eight users in constant collection mode.

Kimsufi Servers
Kimsufi Servers

And that makes this page of Kimsufi Servers intriguing.  The KS-9 looks to be the sweet spot, due to the presence of SSDs instead of spindles. If our monthly hardware is $21 that puts us in a place where maybe a $99/month small team setup makes sense to offer.

There is much to be done with Search Guard before this can happen, but hopefully we’ll be ready at the start of 2019.

Implementing Search Guard

Our conversion to Elasticsearch began almost a year ago. Aided by their marvelous O’Reilly book, Elasticsearch: The Definitive Guide, we grew comfortable with the system, exploring Timesketch and implementing Wazuh for our internal monitoring. Our concerns here were the same issue we faced during prior Splunk adventures – how do we fund the annual cost of an enterprise license?

Search Guard solves the explicit cost question and it does a good job on the implicit barrier to entry problem. What you see below is the contents of the Search Guard tab on our prototype system, which more or less installed itself with single command.

Search Guard
Search Guard

The initial experience was so smooth we decided to implement Search Guard on our cluster, which has been a learning experience. The system requires Elasticsearch 6.x, but we have clung to the familiar environment of the 5.6 version of the system. The switch required a solid day of fiddling with Bash scripts and Python code in order to make everything work with the newest Elasticsearch, and then the cluster upgrade was not nearly so straightforward.

The self-installing demo reuses a PKI setup. That’s great for lowering the barrier to entry for initial experiments, but there is no way that can be used on a publicly accessible system. Having done a bit of PKI here and there, the instructions and scripts they offer are fairly smooth.

The troubles began when we moved from Elasticsearch 5.6.13 to 6.4.3. A stumble on our part during Search Guard install left us with a system that was stuck tight. Their install procedure could not continue from the state we had put the system into, while our command line tools and system knowledge were insufficient to back out of the partially completed process.

Resolving that took the better part of a day, but it proved beneficial in the end, as their voluminous documentation did not address the specific problem, but it did offer many pointers. Think: six months troubleshooting experience in an afternoon, and questions posted to their Google Group yielded authoritative answers within hours.

There are six weeks left in 2018 and during this time we are going to accomplish the following:

  • Finish converting botnetsu.press to Search Guard Enterprise
  • Index Twitter, RSS data, and one chat service for the team
  • Explore roles and permissions against real world considerations
  • Create a public facing dashboard for botnetsu.press
  • Implement Search Guard for a Wazuh system

Best of all, Search Guard offers a gratis Enterprise license to non-profits. We have applied for this for both botnetsu.press here in the U.S., as well as a similar effort in the U.K. Given just a bit of luck, we’ll have two teams active in the field by the end of first quarter, and maybe some of the commercial opportunities we are pursuing will come to fruition as well.