Categories
Uncategorized

Monitoring for email delivery problems

This post describes how I monitor for mail delivery issues, in particular mails where delivery is blocked by recipients policy, on a small server running Debian Bookworm, with Postfix and the monitoring-plugins-check-logfiles package.

If the system you are delivering to is RFC compliant, i.e. follows the normal internet rules, then if it does not accept an email from you due to policy (such as denying email from your IP address), it should return a status code which starts with 5.7 This will show up in your /var/log/mail.log as a status containing

'dsn=5.7'

I run an icinga2 system to monitor systems for errors, and whenever I discover a problem which had not been picked up by monitoring, and it was one which could have been detected earlier, I don’t consider the issue resolved until I have added a check to look for it in future. For this particular case I use check_logfiles – which can be found in the Debian package monitoring-plugins-check-logfiles, installed on the outgoing mail server, with these configuration items.

Mail server with /var/log/mail.log

Here are the items I added to a mail server running postfix with logs going to /var/log/mail.log, with other items already being monitored.

/etc/nagios/nrpe.d/check_logfiles_dsn57.cfg
command[check_logfiles_dsn57]=/usr/bin/sudo /usr/local/bin/check_logfiles_dsn57
/etc/sudoers.d/nagios_check_logfiles_dsn57
nagios ALL=(root) NOPASSWD:/usr/local/bin/check_logfiles_dsn57
/usr/local/bin/check_logfiles_dsn57
#/bin/sh

/usr/lib/nagios/plugins/check_logfiles --tag=dsn57 --criticalpattern="dsn=5.7" --logfile=/var/log/mail.log

Mail server with journald

Transitioning away from traditional syslog, with postfix logs in journald needs a slight tweak to the monitoring check_logfiles, which I patched as shown in the patch attached to Debian bug #1060859. I also changed the local script to be

/usr/local/bin/check_logfiles_dsn57
#/bin/sh

/usr/lib/nagios/plugins/check_logfiles_journal_identifier --tag=dsn57 --type=journald:identifier='postfix/smtp' --criticalpattern="dsn=5.7"

Icinga2 server configuration

The service check for check_logfiles_dsn57 is likely not to find a new adminstratively denied mail the next time it is run (I have a check_interval of 1h), so I use the following non standard parameters for the nrpe service which does the check

  check_command = "nrpe"
  check_interval = 1h
  retry_interval = 24h
  volatile = true
  vars.nrpe_command = "check_logfiles_dsn57"

The ‘retry_interval = 24h’ leaves the critical alert for the problem visible until it has been investigated, and will then be cleared by re-running the test.

I have ‘nagstamon‘ (from the Debian package) running on my desktop as a constant overview which needs less screen space than the icingaweb page, which provides more information for a detailed investigation.