Tuning Graylog, Elasticsearch, and MongoDB for optimized cluster performance
This has been an article a long time in the making. One problem with making changes to a complex clustered environment is that you may have to wait long periods of time to gather data that either shows an improvement or shows a negative impact. Some other considerations just make total sense, if you can afford them. Running on SSDs is going to perform far better than spinning disks. Slow disks might be one of the biggest performance bottlenecks and can only be solved by new hardware.
When tuning Graylog, Elasticsearch, and MongoDB, one has to take into account the specific workload and how the environment is used. If there is going to be little deep searching then make sure the environment is tuned for faster ingestion. If the environment is going to be used for longer term analysis then optimizing for deep searching makes more sense. Each type of use case will require different optimizations to obtain the best results. What I am really saying here is test any changes you make and validate the effectiveness.
General System Tuning
Elasticsearch Tuning
The Elasticsearch tuning should be performed on all nodes running Elasticsearch.
Disable transparent hugepages in grub config
vi /etc/default/grub
Add transparent_hugepage=never at the end of the line for GRUB_CMDLINE_LINUX
It should look similar to this (do NOT copy the line below)
GRUB_CMDLINE_LINUX="rd.lvm.lv=vg_local/lv_usr crashkernel=auto rd.lvm.lv=vg_local/swap rd.lvm.lv=vg_local/lv_root rhgb quiet transparent_hugepage=never"
Rebuild the grub.cfg menu
grub2-mkconfig -o /boot/grub2/grub.cfg
Configure ES_HEAP_SIZE=16g (HEAP should be half the system memory but not more than 32GB)
vi /etc/sysconfig/elasticsearch
Configure resource limits for the elasticsearch user
echo “elasticsearch soft nofile 64000” >> /etc/security/limits.d/90-elasticsearch.conf
echo “elasticsearch hard nofile 64000” >> /etc/security/limits.d/90-elasticsearch.conf
Create a new tuned profile for elasticsearch (tuned should be enabled by default on RHEL7 based distros)
mkdir -p /etc/tuned/elasticsearch
vi /etc/tuned/elasticsearch/tuned.conf
Add this content
# cat /etc/tuned/elasticsearch/tuned.conf [main] include= throughput-performance [vm] transparent_hugepages=never
Enable it as the default
tuned-adm profile elasticsearch
Set kernel override for vm.swappiness and vm.max_map_count
vi /etc/sysctl.d/10-ES_KernelOverride.conf
Add this content
# RHEL7 Elasticsearch tweaks vm.swappiness=10 vm.max_map_count=262144
MongoDB Tuning
The MongoDB tuning should be performed on all nodes running MongoDB.
Configure resource limits for the mongod user
echo “mongod soft nofile 64000” >> /etc/security/limits.d/90-mongodb.conf
echo “mongod hard nofile 64000” >> /etc/security/limits.d/90-mongodb.confecho “mongod soft nproc 32000” >> /etc/security/limits.d/90-mongodb.conf
echo “mongod hard nproc 32000” >> /etc/security/limits.d/90-mongodb.conf
Graylog Tuning
Update the HEAP for Graylog
vi /etc/sysconfig/elasticsearch
Set the --Xms2g and -Xmx2g to an appropriate size (Remember to test your changes)
GRAYLOG_SERVER_JAVA_OPTS=”-Xms8g -Xmx8g
The following changes are made within the graylog server.conf
vi /etc/graylog/server/server.conf
Find these entries and adjust as needed (Remember to test your changes)
# Enable GZIP support for REST API. This compresses API responses and therefore helps to reduce
# overall round trip times. This is disabled by default. Uncomment the next line to enable it.
rest_enable_gzip = true# The size of the thread pool used exclusively for serving the REST API.
rest_thread_pool_size = 20# Enable/disable GZIP support for the web interface. This compresses HTTP responses and therefore helps to reduce
# overall round trip times. This is enabled by default. Uncomment the next line to disable it.
web_enable_gzip = true# The size of the thread pool used exclusively for serving the web interface.
web_thread_pool_size = 20# The number of parallel running processors.
# Raise this number if your buffers are filling up.
processbuffer_processors = 5
outputbuffer_processors = 3# Number of threads used exclusively for dispatching internal events. Default is 2.
async_eventbus_processors = 4# Global timeout for communication with Graylog server nodes; default: 5s
timeout.DEFAULT=15s
These settings may have a negative performance impact but increase usability
# Do you want to allow searches with leading wildcards? This can be extremely resource hungry and should only
# be enabled with care. See also: https://www.graylog.org/documentation/general/queries/
allow_leading_wildcard_searches = true# Do you want to allow searches to be highlighted? Depending on the size of your messages this can be memory hungry and
# should only be enabled after making sure your Elasticsearch cluster has enough memory.
allow_highlighting = true
Some additional tweaks that may or may not give you good results
I don’t perform any of these tweaks on my setup ATM but they have been discussed on the Graylog mailing list in the past. https://groups.google.com/forum/#!searchin/graylog2/cgroup_disable$3Dmemory/graylog2/gm6NrMMBdY8/Ztib7r2jCgAJ
- Changing the disk elevator
- Disabling numad
- Disabling cgroups (add cgroup_disable=memory to to /etc/default/grub)
- Disabling tmpfs (comment in /etc/fstab)
- Adjust mount options (noatime,nodiratime,nobarrier)
I am sure there are numerous other things to add and tweak in general but this post should be a good starting point. This is another good article about performance tuning from a more general point of view https://wiki.mikejung.biz/OS_Tuning