Installing Netwar System

Online conflicts are evolving rapidly and escalating. Three years ago we judged that it was best to not release a conflict oriented tool, even one that is used purely for observation. Given the events since then, this notion of not proliferating seems … quaint.

So we released the Netwar System code, the companion ELK utilities, and this week we are going to revisit the Twitter Utils, a set of small scripts that are part of our first generation software, and which are still used for some day to day tasks.

When you live with a programming language and a couple of fairly complex distributed systems, there are troubles that arise which can be dispatched almost without thought. A new person attempting to use such a system might founder on one of these, so this post is going to memorialize what is required for a from scratch install on a fresh Ubuntu 18.04 installation.

Python

We converted to Python 3 a while ago. The default install includes Python 3.6.7, but you need pip, and git, too.

apt install python3-pip
apt install git
ln -s /usr/bin/python3 /usr/bin/python
ln -s /usr/bin/pip3 /usr/bin/pip

The next step is cloning the Netwar System repository into your local directory, make the commands executable, and place them on your path.

git clone [email protected]:NetwarSystem/NetwarSystem.git

chmod 755 tw-*

chmod 755 F-queue

cp tw-* /usr/local/bin/

cp F-queue /usr/local/bin/

Once that’s done, it’s time to install lots of packages. This is normally done like this:

pip install -r REQUIREMENTS.txt

But our REQUIREMENTS.txt for the Netwar System was pretty stale. We think it’s OK now, but here is how we updated it. A little bit of grep/sort/uniq provided this list of missing packages.

configparser
elasticsearch
elasticsearch_dsl
psutil
py2neo
redis
setproctitle
squish2
tweepy
walrus

You can manually install those and they’ll all work, except for squish2, the name for our internal package that contains the code to “squish” bulky, low value fields out of tweets and user profiles. This requires special handling like so.

cd NetwarSystem/squish2
pip install -e .

If you have any errors related to urllib3, SSL, or XML, those might be subtle dependency problems. Post them as issues on Github.

Elasticsearch Commands

There are a bunch of Elasticsearch related scripts in the ELKSG repository. You should clone them and then copy them into your path.

git clone [email protected]:NetwarSystem/ELKSG.git

cd ELKSG

chmod 755 elk*

cp elk* /usr/local/bin/

The ELK software can handle a simple install, or one with Search Guard. This is the simple setup, so add this final line to your ~/.profile so the scripts know where to find Elasticsearch.

export ELKHOST="http://localhost:9200"

Debian Packages

You need the following four pieces of software to get the system running in standalone mode.

  • Redis
  • Netdata
  • Elasticsearch
  • Neo4j

Redis and Netdata are simple.

apt update
apt install redis

There is an install procedure for Netdata that is really slick. Copy one command, paste it in a shell, it does the install, and makes the service active on port 19999.

Elasticsearch and Neo4j require a bit more work to get the correct version:

add-apt-repository ppa:webupd8team/java

apt update

apt install oracle-java8-installer

apt install curl apt-transport-https

curl -s https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -

echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | tee /etc/apt/sources.list.d/elastic-6.x.list

apt update

apt install elasticsearch=6.5.4

apt install kibana=6.5.4

mv /etc/apt/sources.list.d/elastic-6.x.list /etc/apt/sources.list.d/elastic-6.x.idle

systemctl enable elasticsearch

systemctl start elasticsearch

systemctl enable kibana

systemctl start kibana

The mv line leaves the Elasticsearch repository file in your sources directory, but it disables it. This is so you can update the rest of your system without stepping on the specific version needed.

Neo4j is similar, but it’s fine to track the latest version. Also note that Neo4j is a Java app – it needs the same Java installer we added for Elasticsearch.

wget -O - https://debian.neo4j.org/neotechnology.gpg.key | apt-key add -

echo 'deb https://debian.neo4j.org/repo stable/' | tee -a /etc/apt/sources.list.d/neo4j.list

apt update

apt install neo4j=1:3.5.4

Note that the version mentioned there is just what happens to be in the Neo4j install instructions on the day this article was written. This is not sensitive the way Elasticsearch is.

At this point you should have all four applications running. The one potential problem is Kibana, which may fail to start because it depends on Elasticsearch, which takes a couple minutes to come alive the first time it is run. Try these commands to verify:

systemctl status redis
systemctl status elasticsearch
systemctl status kibana
systemctl status neo4j

In terms of open TCP ports, try the following, which checks the access ports for Kibana, Redis, Neo4j, and Elasticsearch.

netstat -lan | awk '/:5601|:6379|:7474|:9200|:19999/'

And that’s that – you’ve got the software installed. Now we need to configure some things.

Linux & Packages Configuration

There are a number of things that need adjusting in order for the system to run smoothly. Elasticsearch will cause dropped packets under a load, so lets add these two lines to /etc/sysctl.conf

net.core.netdev_budget=3500
net.core.netdev_budget_usecs=35000

And then make them immediately active:

sysctl -w net.core.netdev_budget=3500
sysctl -w net.core.netdev_budget_usecs=35000

We also need to adjust the file handles and process limits upward for Elasticsearch’s Lucene component and Neo4j’s worker threads. Add these lines to /etc/security/limits.conf, and note that there are tab stops in the actual file, this looks terrible on the blog. Here it’s just best to reboot to make these settings active.

elasticsearch    -   nofile      300000
neo4j - nofile 300000
root - nofile 300000
neo4j hard nproc 10000
neo4j soft nproc 10000

If you’re running this software on your desktop, pointing a web browser at port 5601 will show Kibana and 7474 will show Neo4j. If you’re using a standalone or virtual machine, you’ll need to open some access. Here are three one liners with sed that will do that.

sed -i 's/#network.host: 192.168.0.1/network.host: 0.0.0.0/' /etc/elasticsearch/elasticsearch.yml

sed -i 's/#server.host: \"localhost\"/server.host: 0.0.0.0/' /etc/kibana/kibana.yml

sed -i 's/#dbms.connectors.default_listen/dbms.connectors.default_listen/' /etc/neo4j/neo4j.conf

systemctl restart elasticsearch

systemctl restart kibana

systemctl restart neo4j

Elasticsearch doesn’t require a password in this configuration, but Neo4j does, and it’ll make you change it from the default of ‘neo4j’ the first time you log in to the system.

OK, point your browser at port 19999, and you should see this:

Netdata status on a working system.

Notice the elasticsearch local and Redis local tabs at the lower right. You can get really detailed information on what Elasticsearch is doing, which is helpful when you are just starting to explore its capabilities.

Configuring Your First Twitter Account

You must have a set of Twitter application keys to take the next step. You’ll need to add the Consumer Key and Consumer Secret to the tw-auth command. Run it, paste the URL it offers into a browser, log in with your Twitter account, enter the seven digit PIN from the browser into the script, and it will create a ~/.twitter file that looks something like this.



You’ll need to enter the Neo4j password you set earlier. The elksg variable has to point to the correct host and port. The elksguser/elksgpass entries are just placeholders. If you got this right, this command will cough up your login shell name and Twitter screen name.

tw-myname

Next, you can check that your Elasticsearch commands are working:

elk-health

Now is the time to get Elasticsearch ready to accept Twitter data. Mostly this involves making sure it recognizes timestamps. Issue these commands:

elk-userids

elk-tuindices

elk-newidx

elk-mylog

elk-set2k

The first three ensure that timestamps work for the master user index, any tu* index related to a specific collection, and any tw* index containing tweets. The mylog command ensures the perflog indec is searchable. The last command bumps the field limit on indices. Experienced Elasticsearch users will be scratching their heads on this one – we still have much to learn here, feel free to educate us on how to permanently handle that problem.

If you want to see what these did, this command will show you a lot of JSON.

elk-showfmt

And now we’re dangerously close to actually getting some content in Elasticsearch. Try the following commands:

tw-friendquick NetwarSystem > test.txt

tw-load4usertest test.txt

tw-showusertest

This should produce a file with around 180 numeric Twitter IDs that are followed by @NetwarSystem, load them into Redis for processing, and the last command will give you a count of how many are loaded. This is the big moment, try this command next:

time tw-queue2usertest

That command should spew a bunch of JSON as it runs. The preceding time command will tell you how long it took, a useful thing when performance tuning long running processes.

Now try this one:

elk-list

You should get back two very long lines of text – one for the usertest index, show about 180 documents, and the other for perflog, which will just have a few.

There, you’ve done it! Now let’s examine the results.

Into Kibana

Your next steps require the Kibana graphical interface. Point your browser at port 5601 on your system. You’ll be presented with the Kibana welcome page. You can follow their tutorial if you’d like. Once you’ve done that, or skipped it, you will do the following:

  • Go the Management tab
  • Select Index Patterns
  • Create an Index Pattern for the usertest index

There should be a couple of choices for time fields – one for when the user account was created, the other is the date for their last tweet. Once you’ve done this, go to the Discover tab, which should default to your newly created Index Pattern. Play with the time picker at the upper right, find the Relative option, and set it to 13 years. You should see a creation date histogram something like this:

Conclusion

Writing this post involved grinding off every burr we found in the Github repositories, which was an all day job, but we’ve come to the point where you have cut & pasted all you can. The next steps will involve watching videos about how to use Kibana, laying hands on a copy of Elasticsearch: The Definitive Guide, and installing Graphileon so you can explore the Neo4j data.

If you need help, DM @NetwarSystem your email address, and we’ll send you an invite to the Netwar System Slack server.