Migrating To Elasticsearch 6.8.1

Consulting our combined collection this morning, we found the oldest to be this 2008 anniversary tweet from @vipe248.

This tweet came into the collection thanks to a study of #Qanon we did earlier this year. The actual inception of our current cluster hardware appears to have been on January 29th of 2019. The very earliest it could have been created was December 19th of 2018 – the release date for Elasticsearch 6.5.4.

The system is resilient to the loss of any one system, which was given an unintended test last night, with an inadvertent shutdown of one of the servers in the cluster. Recovery takes a couple of minutes given the services and virtual machines, but there was not even an interruption in processing.

Today, for a variety of reasons, we began the process of upgrading to the June 20th, 2019 release of Elasticsearch 6.8.1. There are a number of reasons for doing this:

  • Index Life Cycle Management (6.6)
  • Cross Cluster Replication (6.6)
  • Elasticsearch 7 Upgrade Assistant (6.6)
  • Rolling Upgrade To Elasticsearch 7 (6.7)
  • Better Index Type Labeling (6.7)
  • Security Features Bundled for Community Edition (6.8)
  • Conversion From Ubuntu to Debian Linux

We are not jumping directly to Elasticsearch 7.x due to some fairly esoteric issues involving field formats and concerns regarding some of the Python libraries that we use. Ubunt1.u has been fine for both desktop and server use, but we recently began using the very fussy Open Semantic Search, and it behaves well with Debian. Best of all, the OVA of a working virtual machine with the Netwar System code installed and running is just 1.9 gig.

Alongside the production ready Elasticsearch based system we are including Neo4j with some example data and working code. The example data is a small network taken from Canadian Parliament members and the code produces flat files suitable for import as well as native GML file output for Gephi. We ought to be storing relationships to Neo4j as we see them in streams, but this is still new enough that we are not confident shipping it.

Some questions that have cropped up and our best answers as of today:

Is Open Semantic Search going to be part of Netwar System?

We are certainly going to be doing a lot of work with OSS and this seems like a likely outcome, given that it has both Elasticsearch and Neo4j connectors. The driver here is the desire to maintain visibility into Mastodon instances as communities shift off Twitter – we can use OSS to capture RSS feeds.

Will Netwar System still support Search Guard?

Yes, because their academic licensing permits things that the community edition of Elasticsearch does not. We are not going to do Search Guard integration into the OVA, however. There are a couple reasons for that:

  • Doesn’t make sense on a single virtual machine.
  • Duplicate configs means a bad actor would have certificate based access to the system.
  • Eager, unprepared system operators could expose much more than just their collection system if they try to use it online.
  • Netdata monitoring provides new users insight into Elasticsearch behavior, and we have not managed to make that work with SSL secured systems.
  • We are seeking a sensible free/paid break point for this system. It’s not clear where a community system would end and an enterprise system would begin.
Is there a proper FOSS license?

Not yet, but we are going to follow customs in this area. A university professor should expect to be able to run a secure system for a team oriented class project without incurring any expense. Commercial users who want phone support will incur an annual cost. There will be value add components that will only be available to paying customers. Right now 100% of revenue is based on software as a service and we expect that to continue to be the norm.

So the BSD license seems likely.

When will the OVA be available?

It’s online this morning for internal users. If it doesn’t explode during testing today, a version with our credentials removed should be available Tuesday or Wednesday. Most of the work required to support http/https transparently was finished during first quarter. One it’s up we’ll post a link to it here and there will be announcements on Twitter and LinkedIn.