This post describes how I monitor for mail delivery issues, in particular mails where delivery is blocked by recipients policy, on a small server running Debian Bookworm, with Postfix and the monitoring-plugins-check-logfiles package.
If the system you are delivering to is RFC compliant, i.e. follows the normal internet rules, then if it does not accept an email from you due to policy (such as denying email from your IP address), it should return a status code which starts with 5.7 This will show up in your /var/log/mail.log as a status containing
'dsn=5.7'
I run an icinga2 system to monitor systems for errors, and whenever I discover a problem which had not been picked up by monitoring, and it was one which could have been detected earlier, I don’t consider the issue resolved until I have added a check to look for it in future. For this particular case I use check_logfiles – which can be found in the Debian package monitoring-plugins-check-logfiles, installed on the outgoing mail server, with these configuration items.
Mail server with /var/log/mail.log
Here are the items I added to a mail server running postfix with logs going to /var/log/mail.log, with other items already being monitored.
/etc/nagios/nrpe.d/check_logfiles_dsn57.cfg
command[check_logfiles_dsn57]=/usr/bin/sudo /usr/local/bin/check_logfiles_dsn57
/etc/sudoers.d/nagios_check_logfiles_dsn57
nagios ALL=(root) NOPASSWD:/usr/local/bin/check_logfiles_dsn57
/usr/local/bin/check_logfiles_dsn57
#/bin/sh
/usr/lib/nagios/plugins/check_logfiles --tag=dsn57 --criticalpattern="dsn=5.7" --logfile=/var/log/mail.log
Mail server with journald
Transitioning away from traditional syslog, with postfix logs in journald needs a slight tweak to the monitoring check_logfiles, which I patched as shown in the patch attached to Debian bug #1060859. I also changed the local script to be
/usr/local/bin/check_logfiles_dsn57
#/bin/sh
/usr/lib/nagios/plugins/check_logfiles_journal_identifier --tag=dsn57 --type=journald:identifier='postfix/smtp' --criticalpattern="dsn=5.7"
Icinga2 server configuration
The service check for check_logfiles_dsn57 is likely not to find a new adminstratively denied mail the next time it is run (I have a check_interval of 1h), so I use the following non standard parameters for the nrpe service which does the check
check_command = "nrpe"
check_interval = 1h
retry_interval = 24h
volatile = true
vars.nrpe_command = "check_logfiles_dsn57"
The ‘retry_interval = 24h’ leaves the critical alert for the problem visible until it has been investigated, and will then be cleared by re-running the test.
I have ‘nagstamon‘ (from the Debian package) running on my desktop as a constant overview which needs less screen space than the icingaweb page, which provides more information for a detailed investigation.