Mongos is the MongoDB query router (https://docs.mongodb.com/manual/reference/program/mongos/) and all the recommendations tells you that you should run a mongos process locally along with the service it is going to use it.
For traditional applications that’s totally fine, but when you run it in containers you need to be aware of two things.
1- The dynamic nature of containers make the usage of mongos a bit inefficient
2- Mongos is not cgroups aware
Let me go into details:
1- The dynamic nature of containers make the usage of mongos a bit inefficient:
Let’s say for example that you are running your Kubernetes cluster with a Pod that has the application container and the mongos container. The flexibility Kubernetes or any other microservice orcherstrator gives us, makes the containers be frequently removed and recreated again, this means that the mongos procecss will be started often.
This provokes a lot of load for the mongo config servers, each mongos process reads all the cluster config metadata everytime it starts (yes, including the chunks metadata), if we are restarting (or rescheduling or reassigning) mongos containers, that metadata will be read multiple times causing a lot of overhead to the config servers (for a nice size cluster, in ours we have more than 4 million chunks).
The way to solve this is to not follow the recommendation of running a mongos process along with the application, but instead run a mongos per server. Using Kubernetes this can be done using a DaemonSet, the tricky part is how to force the connections to be local and not go through the kube-proxy that will redirect it to a mongos in another server.
I can explain how to achieve this in the next post, or if someone wants to know now, send me a message.
2- Mongos is not cgroups aware:
Lot of things have been said about tips for running Java in containers because the JVM is not cgroups aware. The same happens with mongos.
The issue is related to the connection pools that mongos creates. By default, it creates a connection pool per CPU core, and since it is not cgroups aware, no matter how many CPUs you assign to the container, it will ‘see’ the CPUs of the host.
Example: you are running a mongos container in a host with 32 cores, you assign only 2 to your mongos container. Mongos will create 32 connection pools. If you have multiple mongos containers in the same host creating 32 connection pools, they will fight for the CPU, slowing down the connections and trashing the performance of your system.
The way to avoid this is by using this ‘kind of internal’ parameter in mongos configuration: taskExecutorPoolSize (https://docs.mongodb.com/manual/reference/parameters/#param.taskExecutorPoolSize)
Set it to the same amount of cores you assigned to the container, however take into account that the minimum value is 4, so if you assign 2 cores to the container, there will be still a bit of fighting for the CPU with other mongos containers.