Blocking Spam in Exim with URI Block Lists

Current Version: 2.4
Released: 21 Aug 22

Written and maintained by Erik Mugele.

Introduction | Exim Configuration | Script Installation
License | Download | Change Log

Introduction

This document describes using SURBL (Spam URI Realtime Blocklists), URIBL, and the Spamhaus DBL in conjunction with the Exim MTA to block spam containing "spamvertizing" URLs. To achieve this, one can use the Perl script that is found below. This utilizes Exim's MIME and/or DATA ACLs and Exim's embedded Perl engine.

The Perl routine from this page should be relatively easy to modify to use in any other MTA that can call an external script to scan a message.

The SURBL, URIBL and Spamhaus DBL systems functions just like a normal DNSBL system but instead of containing a list of IPs of servers that send spam, they maintain a list of the domains that are found in the bodies of messages. These are the domains that are part of the URL the spammers want you to click on to buy their wares. It's quite an effective way of filtering for spam and can be used in conjunction with traditional DNSBLs for maximum effectiveness.

Exim Configuration

Exim MUST be compiled with the option to enable the embedded Perl engine. See Chapter 12 of the Exim Specification for details. By default, it is commented out in the official source distribution and thus will not be enabled when compiled. Exim's perl_startup runtime option is used to call the embedded Perl engine and define what file contains any Perl routines you want Exim to use. For example:

perl_startup=do '/etc/exim/exim.pl'

In addition to scanning the body of a plain text mail message, the Perl subroutine can scan any MIME attachment if Exim is using the exiscan additions (included in Exim 4.60 and higher).

The following two ACLs are working examples of Exim MIME and DATA ACLs that call the Perl subroutine to scan the message for blacklisted domain names in URLs. See Chapter 40 of the Exim Specification for details on Exim ACLs.

MIME ACL - This ACL should go in the MIME ACL section of the Exim configuration.

   deny condition = ${if <{$message_size}{100000}{yes}{no}}
        set acl_m0 = ${perl{surblspamcheck}}
	condition = ${if eq{$acl_m0}{false}{no}{yes}}
        message = $acl_m0

DATA ACL - This ACL should go in the DATA ACL section of the Exim configuration.

   deny condition = ${if <{$message_size}{100000}{yes}{no}}
        condition = ${if eq{$acl_m0}{}{yes}{no}}
	set acl_m1 = ${perl{surblspamcheck}}
	condition = ${if eq{$acl_m1}{false}{no}{yes}}
        message = $acl_m1

The second condition statement in the DATA ACL above ensures that the DATA ACL is only called if no MIME ACL was called (i.e. there were no MIME parts). This keeps the message from being scanned inefficiently twice by both the MIME and DATA ACLs.

The following two Exim options should be set in the main configuration area:

message_body_visible = 5000
message_body_newlines = true

The message_body_visible option will determine how much of of the body is scanned during the DATA ACL and the default value won't catch much. A value of 5000 is recommended.

The message_body_newlines option should be set to "true". This ensures the message body will be parsed correctly.

Script Installation and Configuration

Download and extract the contents of the gzipped tar file (see below). The archive contains four files:

README - README file similar to this page
two-level-tlds - List of two-level Top Level Domains
three-level-tlds - List of three-level Top Level Domains
surbl_whitelist.txt - The whitelist file with default entries
exim_surbl.pl - The Exim-SURBL Perl subroutine

If this is the only Perl subroutine in the Exim installation, copy exim_surbl.pl to the location specified in the perl_startup Exim configuration setting (mentioned above). If other subroutines are in use, append the contents of exim_surbl.pl to the existing file defined in perl_startup.

The two-level-tlds, three-level-tlds, and surbl_whitelist.txt files should be copied to the same location as the Perl subroutine script. It is recommended that these files be updated regularly (e.g. monthly) via a cron job from their original sources.

Two-level-tlds and Three-level-tlds Files

The Perl subroutine script follows the SURBL Implementation Guidelines found at the SURBL website. As such, the script makes use of files that contain two-level and three-level TLDs.

These files define domains generally have sub domains and need to have those sub-domains checked. For example, blogspot.com generally has sub-domains such as foo.blogspot.com that need to be checked.

Near the top of the script are the following variable definitions:

my $twotld_file = "/etc/exim/two-level-tlds";
my $threetld_file = "/etc/exim/three-level-tlds";

These variables MUST be set to the full path of the file containing the these lists.

If a domain has two or more elements, all levels from two through four are checked against the URIBL and Spamhaus DBL lists.

Surbl_whitelist File

As part of the SURBL Implementation Guidelines, the Perl subroutine script makes use of a whitelist file which contains certain known good domains such as yahoo.com which will never be blacklisted. The use of this whitelist file will prevent unnecessary queries.

Near the top of the script is the following variable definition:

my $whitelist_file = "/etc/exim/surbl_whitelist.txt";

This variable MUST be set to the full path of the file containing the whitelisted domains.

The file of whitelisted domains can contain additional domains that need to be whitelisted locally. The domains should be entered exactly one domain per line. Blank lines and those beginning with # (comments) are ignored. Entries that are IP addresses should be in IN-ADDR format (reversed). Here is an example of some simple whitelist entries:

### BEGIN SAMPLE WHITELIST ENTRIES
# This is a sample SURBL whitelist file
#
test.surbl.org
# The following is an example of an IP address entry for 127.0.0.2
2.0.0.127
### END SAMPLE WHITELIST ENTRIES

If a domain is listed in the whitelist file, not only will that domain be exempt from checking but all sub-domains to that domain will be exempt as well. For example, if example.com is whitelisted, foo.example.com and foo.bar.example.com will be also be exempt.

Enabling/Disabling Individual URI Checks

Since the Perl subroutine script now has the ability to check multiple databases, the ability has been added to disable individual checks. All are enabled by default. Near the top of the script are the following variables:

my $surbl_enable = 1;
my $uribl_enable = 1;
my $dbl_enable = 1;

Set any of these variables to 0 to disable the desired list check. While it will not produce an error, it should go without saying that disabling all of these checks would be a waste of resources.

Limiting MIME Attachment Scanning Based on Attachment Size

Scanning large MIME attachment can cause excessive load on the mail system. This situation can be exacerbated by the way Exim decodes the MIME attachments prior to scanning.

Near the top of the script is the following variable definition:

my $max_file_size = 50000;

Set this variable to be the maximum size of an attachment that will be scanned. If the attachment is larger than this size, scanning of that attachment will be skipped. By default, this size is 50KB. The variable is specified in bytes.

License

This script is under a BSD style license. See the script for details.

Download

Version 2.4 - exim_surbl-2.4.tar.gz
Version 2.3 - exim_surbl-2.3.tar.gz
Version 2.2 - exim_surbl-2.2.tar.gz
Version 2.1 - exim_surbl-2.1.tar.gz
Version 2.0 - exim_surbl-2.0.tar.gz
Version 1.9 - exim_surbl-1.9.tar.gz
Version 1.8 - exim_surbl-1.8.tar.gz
Version 1.7 - exim_surbl-1.7.tar.gz
Version 1.6 - exim_surbl-1.6.tar.gz
Version 1.5 - exim_surbl-1.5.tar.gz
Version 1.4 - exim_surbl-1.4.tar.gz
Version 1.3 - exim-1.3.pl
Version 1.2 - exim-1.2.pl
Version 1.1 - exim-1.1.pl
Version 1.0 - exim-1.0.pl

Changes

Version 2.4 - 21 Aug 22

Change the parsing of the "http://" to also check for "https://".
Made updates to support SURBL's not so new classification system.
Updated the Spamhause URL referenced in rejection messages.
Other debug changes.
Thanks to CJ Fearnley (www.LinuxForce.net) for all of these changes and for providing a patch.
Regenerated surble_whitelist.txt, two-level-tlds, and three-level-tlds files.

Version 2.3 - 06 Mar 12

Added functionality to better parse "http://" strings that are being encoded. Thanks to Gordon Dickens for spotting this in the wild and providing test data.
Regenerated surbl_whitelist.txt, two-level-tlds, and three-level-tlds files.

Version 2.2 - 26 Dec 10

Fixed a problem with the wrong variable used when checking DBL lists. Copy/paste problem.
Fixed a problem with tables being initialized in the wrong place causing some domains not to be checked when multiple domains exist in a message.
Regenerated surble_whitelist.txt, two-level-tlds, and three-level-tlds files.

Version 2.1 - 10 Jun 10

Major rewrite to support new SURBL Implementation Guidelines.

The script is significantly longer than previous versions. While the script could be compacted, it is more maintainable.

Added support for the Spamhaus DBL list.
Removed ccTLD.txt file.
Added two-level-tlds and three-level-tlds files.
Regenerated surbl_whitelist.txt file.

Version 2.0 - 06 Jan 07

Fixed a bug that could cause false positives in VERY rare circumstances when legitimate and spam messages come in the same connection.
Fixed a parsing problem with quoted-printable messages when the domain name has a line break in it.
Regenerated ccTLD.txt file.
Regenerated surbl_whitelist.txt file.

Version 1.9 - 12 Oct 06

Moved some file IO functions to make them more efficient.
Made the lookups more efficient by only looking up domains once even if they appear in the message more than once.
Make whitelisting mandatory and include a whitelisting file with data that the SURBL will never blacklist.
Regenerated ccTLD.txt file.
Regenerated surbl_whitelist.txt file.

Version 1.8 - 26 Sep 06

Fixed a problem with the decoding of ASCII obfuscated URLs that cause the script to crash.
Fixed several other logic errors related to the decoding of obfuscated URLs in general.
Thanks to Craig Whitmore for pointing out the original issue and especially for providing excellent diagnostics when reporting the problem.

Version 1.7 - 4 Sep 06

Added support for limiting the size of MIME attachments scanned.
Updated ccTLD.txt file.

Version 1.6 - 1 May 06

Add support for checking decoded MIME attachments (i.e. base64).
Updated ccTLD.txt file.

Version 1.5 - 29 Mar 06

Add support for the URIBL.
Moved the error message from the Exim data ACL into the Perl script.
Updated ccTLD.txt file.

Version 1.4 - 20 Feb 06

Rewrite a good portion of the script to more closely follow the Implementation Guidelines found on the SURBL website.
This means that the script now REQUIRES the use of a file containting a list of Country Code Top Level Domains (ccTLD).
The script is now more efficient and does much fewer lookups for each URL found in the message.

Version 1.3 - 17 Jun 05

Make the http:// delimeter case insensitive. Yes, spammers were trying this.

Version 1.2 - 1 May 05

Fix regression introduced in 1.1.

Version 1.1 - 29 Apr 05

Added support for whitelists and a license.

Version 1.0

9 Nov 04 - Added support to handle quoted-printable hex characters.
8 Nov 04 - Semi-major rewrite to handle domains of any length. Previously it was limited to just two level domains.
3 Oct 04 - To handle % ASCII obfuscated URLs commonly found in Phishing spam. Also to handle SURBL's [jp] list.

Last modified: Sunday, 21-Aug-2022 21:14:56 MDT :: Erik Mugele :: Main Page :: Contact Page :: Powered by Teuton

xyzzy