The first article introduced an overview of goals and architecture for log processing, next two articles will cover inputs and outputs – how can be data (both logs and metrics) forwarder into monitoring and how can be data viewed after the processing.
There are two ways of forwarding data into monitoring platform, automatic and manual. The first one – automatic – is currently used in testing environments, where both logs and metrics are continuously collected and forwarded for processing. On the other hand, when YSoft SafeQ is deployed at customer’s site, such approach is seldom possible, because of security concerns and additional performance requirements for the monitoring server. Instead, only specific log files containing the problem are transferred from the customer and these have to be manually uploaded.

Automatic log forwarding

The simplest way to forward logs would be configuring logging framework to send logs directly over the network, however, such solution does not work with network outages which can be part of tests. Some logging frameworks can be configured with failover logging destination (if the network does not work, it will write logs into files), but these files would need another mechanism to automatically upload them.
Instead, logs are sent to local port into log forwarder, which has to be installed. We currently use Logstash, which (since version 5.0) has a persistent queue. If network works properly, logs are sent before they are flushed to disk, however, if there is a network outage, logs are written on a disk and there is no danger of overflowing RAM memory.
There are two other goals of Logstash. The first one is to unify log formats, logs generated by different logging frameworks have different formats. That could be done on monitoring servers, but this approach makes the processing simpler. The other goal of Logstash is enhancing logs by additional info, like hostname and name of deployment group.
Telegraf is deployed next to Logstash to collect various host metrics, which are again forwarder to monitoring servers. Note that Telegraf does not support persistent queue, so it sends metrics into Logstash, which provides necessary buffering.
Logstash and Telegraf are installed by Calf, our internal tool. Calf can be easily configured and installed as service, it is responsible for installing, configuring and running both Logstash and Telegraf. That makes usage of both tools much easier.

Log and metrics collection schema

Manual log uploading

The main goal of manual log uploading is clear, forward logs to monitoring servers, in the same format as the previous method. That requires log parsing and adding additional information.
The logs for automatic processing are generated directly in JSON format, on the other hand, logs are written into files as lines. These lines have to be parsed, GROK patterns are used for this purpose (basically named regexes). More can be found here, there is also a simple way for constructing GROK patterns.

2017-05-19 10:03:19,368 DEBUG pool-9-thread-14| RemotePeerServer| [RemotePeer{name='1dc5e474-1abc-43fc-85c9-7e5e786919ef', state='ONLINE', session='ZeroMQSessio

grok pattern:
^%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} *(?<thread>[\w-]*)\| *%{WORD:loggerName}\| *%{GREEDYDATA:message}

  "timestamp": "2017-05-19 10:03:19,368",
  "level": "DEBUG",
  "thread": "pool-9-thread-14",
  "loggerName": "RemotePeerServer",
  "message": "[RemotePeer{name='1dc5e474-1abc-43fc-85c9-7e5e786919ef', state='ONLINE', session='ZeroMQSessio"

However, when manually uploading files, it is necessary to provide additional information about log file, specifically hostname, a name of deployment group and a component name, since each component of YSoft SafeQ has different log format. Logstash is again used for log uploading, but it is wrapped in a Python script for better usability.

cat spoc*.log | python -c spoc -ip localhost -g default