How an inotify watchers limit caused kubelet instability

Problem

After restarting kubelet, one of the Kubernetes nodes started behaving unpredictably.

The symptoms were unusual and did not point to an obvious root cause.

The following issues were observed:

The most likely causes were checked first:

None of these hypotheses were confirmed.

The issue was caused by an exhausted inotify watchers limit on the node.

inotify is a Linux subsystem used to monitor file and directory changes.

Many infrastructure components depend on it:

When the limit is reached, unexpected symptoms can appear:

These issues are difficult to diagnose because they can remain invisible for a long time.

Most teams actively monitor:

inotify watchers consumption rarely makes it onto that list of metrics.

Not every Kubernetes issue originates inside Kubernetes itself.

Sometimes the root cause is hidden several layers lower in the stack.

A forgotten Linux system limit can cause more operational pain than a lack of compute resources.

As infrastructure grows, Linux system limits should be reviewed periodically alongside cluster resources.

After this incident, inotify watchers consumption was added to the list of metrics that are monitored proactively.