Setting up a multi-tiered log infrastructure Part 11 -- Cluster Tuning

Tuning Graylog, Elasticsearch, and MongoDB for optimized cluster performance

This has been an article a long time in the making. One problem with making changes to a complex clustered environment is that you may have to wait long periods of time to gather data that either shows an improvement or shows a negative impact. Some other considerations just make total sense, if you can afford them. Running on SSDs is going to perform far better than spinning disks. Slow disks might be one of the biggest performance bottlenecks and can only be solved by new hardware.

When tuning Graylog, Elasticsearch, and MongoDB, one has to take into account the specific workload and how the environment is used. If there is going to be little deep searching then make sure the environment is tuned for faster ingestion. If the environment is going to be used for longer term analysis then optimizing for deep searching makes more sense. Each type of use case will require different optimizations to obtain the best results. What I am really saying here is test any changes you make and validate the effectiveness.

General System Tuning

Elasticsearch Tuning

The Elasticsearch tuning should be performed on all nodes running Elasticsearch.

Disable transparent hugepages in grub config

vi /etc/default/grub

Add transparent_hugepage=never at the end of the line for GRUB_CMDLINE_LINUX

It should look similar to this (do NOT copy the line below)

GRUB_CMDLINE_LINUX="rd.lvm.lv=vg_local/lv_usr crashkernel=auto rd.lvm.lv=vg_local/swap rd.lvm.lv=vg_local/lv_root rhgb quiet transparent_hugepage=never"

Rebuild the grub.cfg menu

grub2-mkconfig -o /boot/grub2/grub.cfg

Configure ES_HEAP_SIZE=16g (HEAP should be half the system memory but not more than 32GB)

vi /etc/sysconfig/elasticsearch

Configure resource limits for the elasticsearch user

echo “elasticsearch soft nofile 64000” >> /etc/security/limits.d/90-elasticsearch.conf
echo “elasticsearch hard nofile 64000” >> /etc/security/limits.d/90-elasticsearch.conf

Create a new tuned profile for elasticsearch (tuned should be enabled by default on RHEL7 based distros)

mkdir -p /etc/tuned/elasticsearch
vi /etc/tuned/elasticsearch/tuned.conf

Add this content

# cat /etc/tuned/elasticsearch/tuned.conf
[main]
include= throughput-performance

[vm]
transparent_hugepages=never

Enable it as the default

tuned-adm profile elasticsearch

Set kernel override for vm.swappiness and vm.max_map_count

vi /etc/sysctl.d/10-ES_KernelOverride.conf

Add this content

# RHEL7 Elasticsearch tweaks
vm.swappiness=10
vm.max_map_count=262144

MongoDB Tuning

The MongoDB tuning should be performed on all nodes running MongoDB.

Configure resource limits for the mongod user

echo “mongod soft nofile 64000” >> /etc/security/limits.d/90-mongodb.conf
echo “mongod hard nofile 64000” >> /etc/security/limits.d/90-mongodb.conf

echo “mongod soft nproc 32000” >> /etc/security/limits.d/90-mongodb.conf
echo “mongod hard nproc 32000” >> /etc/security/limits.d/90-mongodb.conf

Graylog Tuning

Update the HEAP for Graylog

vi /etc/sysconfig/elasticsearch

Set the --Xms2g and -Xmx2g to an appropriate size (Remember to test your changes)

GRAYLOG_SERVER_JAVA_OPTS=”-Xms8g -Xmx8g

The following changes are made within the graylog server.conf

vi /etc/graylog/server/server.conf

Find these entries and adjust as needed (Remember to test your changes)

# Enable GZIP support for REST API. This compresses API responses and therefore helps to reduce
# overall round trip times. This is disabled by default. Uncomment the next line to enable it.
rest_enable_gzip = true

# The size of the thread pool used exclusively for serving the REST API.
rest_thread_pool_size = 20

# Enable/disable GZIP support for the web interface. This compresses HTTP responses and therefore helps to reduce
# overall round trip times. This is enabled by default. Uncomment the next line to disable it.
web_enable_gzip = true

# The size of the thread pool used exclusively for serving the web interface.
web_thread_pool_size = 20

# The number of parallel running processors.
# Raise this number if your buffers are filling up.
processbuffer_processors = 5
outputbuffer_processors = 3

# Number of threads used exclusively for dispatching internal events. Default is 2.
async_eventbus_processors = 4

# Global timeout for communication with Graylog server nodes; default: 5s
timeout.DEFAULT=15s

These settings may have a negative performance impact but increase usability

# Do you want to allow searches with leading wildcards? This can be extremely resource hungry and should only
# be enabled with care. See also: https://www.graylog.org/documentation/general/queries/
allow_leading_wildcard_searches = true

# Do you want to allow searches to be highlighted? Depending on the size of your messages this can be memory hungry and
# should only be enabled after making sure your Elasticsearch cluster has enough memory.
allow_highlighting = true

Some additional tweaks that may or may not give you good results

I don’t perform any of these tweaks on my setup ATM but they have been discussed on the Graylog mailing list in the past. https://groups.google.com/forum/#!searchin/graylog2/cgroup_disable$3Dmemory/graylog2/gm6NrMMBdY8/Ztib7r2jCgAJ

  • Changing the disk elevator
  • Disabling numad
  • Disabling cgroups (add cgroup_disable=memory to to /etc/default/grub)
  • Disabling tmpfs (comment in /etc/fstab)
  • Adjust mount options (noatime,nodiratime,nobarrier)

I am sure there are numerous other things to add and tweak in general but this post should be a good starting point. This is another good article about performance tuning from a more general point of view https://wiki.mikejung.biz/OS_Tuning

You must be logged in to post a comment.

Proudly powered by WordPress   Premium Style Theme by www.gopiplus.com