mod_antispam
mod_antispam is an apache module which can control referer spam.
mod_antispam for Apache-2.0 / Apache-2.1
WHAT IS THIS ?
By using this module, you can control referer spam accesses.
As you know, sometimes you can see referer spam access in your log files. their purpose is to lead you to spam website by recording their website address in your log files.
about referer spam, see http://www.spywareinfo.com/articles/referer_spam/
spammers always use bots/tools to connect your website with invalid referer.
when http server gets some HTTP_REFERER from clients, mod_antispam will connect to that website and try to find link to your website from the target.
if address is not found, module will update blacklist file automatically not to connect there later. and if your address found, update whitelist automatically not to connect here later.
also you can edit white/black lists by hands using regular expressions.
REFERER spam MECHANISM
The most important thing is HTTP_REFERER in your log files is generated from client's web browser. therefore, people who knows referer mechanism can fake their HTTP_REFERER using some tools or by hands.
I'll give you an example.
% telnet your.website.example.com 80 GET / HTTP/1.1 Host: your.website.example.com Referer: http://www.google.com/ Connection: close (contents will be displayed here)
Then http://www.google.com/ is added in your access log files, however http://www.google.com/ doesn't have any link to your website.
mod_antispam ACTION
When this module finds any spam URI, you can choose some actions.
(1) [Test]
record spam address into blacklist and access is allowed (test mode)
(2) [Replace]
Rejectrecord spam address into blacklist and rewrite HTTP_REFERER to none and access allowd.
by this method, access is allowed and spam address is not added in your logfile
(3) [Reject]
record spam address into blacklist and return HTTP_FORBIDDEN (access denied)
(4) [ReplaceReject]
record spam address into blacklist and rewrite HTTP_REFERER to none and access denied.
by this method, access is denied and spam address is not added in your logfile
in some case (3) or (4) is dangerous. because some websites need cookie to display their website, some site is protected by authentication. (e.g. BBS in the groupware) or some HTTP_REFERER maybe intranet address.
(e.g. http://127.0.0.1/bookmark.html, http://intranet/bookmarks.html)
this module doesn't support cookie and can't connect to authorized website, because module doesn't know that username or password.
first you should use Test or Replace mode and choose another methods when you can analyze spam URI if you need.
INSTALL
If your apache supports shared modules, install is very easy.
# /usr/local/apache2/bin/apxs -a -i -c mod_antispam.c
CONFIGURATION
required section
- AntispamEnable (on/off, default=off)
Enable or not this module
- AntispamWhiteList (filename, default=none)
Whitelist file path. you can edit by hands with regular expressions. this file is not created automatically. you have to create this file and set proper permissions (writable by http user) before running Apache.
- AntispamBlackList (filename, default=none)
Blacklist file path. you can edit by hands with regular expressions. this file is not created automatically. you have to create this file and set proper permissions (writable by http user) before running Apache.
- AntispamAutoWhiteList (filename, default=none)
Whitelist file that will be automatically created. you shouldn't edit by hands. this file is not created automatically. you have to create this file and set proper permissions (writable by http user) before running Apache.
- AntispamAutoBlackList (filename, default=none)
Blacklist file that will be automatically created. you shouldn't edit by hands. this file is not created automatically. you have to create this file and set proper permissions (writable by http user) before running Apache.
optional section
- AntispamAction (Test/Replace/Reject/ReplaceReject, default=Test)
you can define actions after getting spam.
Test: update white/black lists. all accesses allowed.
Replace: update white/black lists. and replace spam referer to none. all accesses allowed.
Reject: update white/black lists. deny referer spam by HTTP_FORBIDDEN. spam URI will be stored in the log files.
ReplaceReject: update white/black lists. replace spam referer to none. deny referer spam by HTTP_FORBIDDEN.
- AntispamTarget (FQDN/FULL, default=FULL)
mod_antispam updates white/black lists automatically by adding spam/ham URI into files. if this setting is FQDN, only FQDN part of the HTTP_REFERER is saved in the datafile. and in case FULL, full URI is saved.
- AntispamSizeLimit (integer: bytes, default=100000)
when this module gets HTTP_REFERER from clients, it will connect to that target and download their contents. you can define download size limit.
- AntispamTimeout (integer: seconds, default=5)
timeout of the connection.
- AntispamRetry (integer, default=3)
retry count for connection error. in case some errors after retry count, update black list.
STEP BY STEP
when you install this module at first, these configurations are recommended. as I explained, you have to create black/white list files and set proper permissions to update them by http owner.
AntispamEnable on AntispamAction Test AntispamWhiteList logs/antispam.white AntispamBlackList logs/antispam.black AntispamAutoWhiteList logs/antispam.white.auto AntispamAutoBlackList logs/antispam.black.auto
some days or months later, you can find many spam accesses in the antispam.black.auto. then you should copy spam URI and paste to antispam.black by hands. and also if you find nonspam URI in the autnsiapm.black.auto, you should copy them and paste to antispam.white. of course you can define them by regular expressions.
I'll give you an example.
after some weeks, you can get some address like this.
(notice: no-spam URI is recorded in the blacklist in this case)
- logs/antispam.black.auto
http://spam1.example.com/ http://spam2.example.net/
- logs/antispam.white.auto
http://bluecoara.net/ http://foo.bar.example.org/foo/bar.html
- logs/antispam.black
(empty unless you edit by hands)
- logs/antispam.white
(empty unless you edit by hands)
you should edit these files by hands. this is not required but recommended to manage/understand spam.
- logs/antispam.black.auto
(empty)
- logs/antispam.white.auto
(empty)
- logs/antispam.black
http://spam1.example.com/ http://spam2.example.net/
- logs/antispam.white
http://bluecoara.net/ http://foo.bar.example.org/foo/bar.html http://www.example.net/this/is/not/spam.html
after editing, modify httpd.conf and change AntispamAction to Replace, Reject, or ReplaceReject.
USEFUL SAMPLE
- allow all *.jp referer
^http://[^/]+\.jp
- allow google referer
^http://([^/]+\.|)google\.com
LOOP ?
If you are using this module on "http://www.example.com/" and someone connect your website with modifying HTTP_REFERER to "http://www.example.com/", mod_antispam will connect to your own website.
but once this module connects to some website, white/black lists will be updated and if their address is already in your lists, module never connect to their website if you have proper settings. therefore you don't need to worry about connection loop.
USER-AGENT
when mod_antispam connect to the target, it will send "User-Agent: mod_antispam" by default. you can modify this source and change User-Agent.
PERFORMANCE
when some clients connect to Apache, this module will connect to that HTTP_REFERER, it takes some seconds at the first time.
and once mod_antispam connect to the target, this will update white/black lists. and after that, module will not refer to white/black lists on the server. but it takes some seconds to read white/black lists and compare spam with them. therefore, if white/black lists are too large, apache performance will be slow.
I'll give you the performance data.
- apache default
Concurrency Level: 10
Time taken for tests: 0.267426 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 271000 bytes
HTML transferred: 27000 bytes
Requests per second: 3739.35 [#/sec] (mean)
Time per request: 2.674 [ms] (mean)
Time per request: 0.267 [ms] (mean, across all concurrent requests)
Transfer rate: 987.19 [Kbytes/sec] received
- mod_antispam enabled (each 1000 lines)
and I made each 1000 lines white/black/autowhite/autoblack lists, and added target URI in the bottom of the black list.
Concurrency Level: 10
Time taken for tests: 41.905376 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 271000 bytes
HTML transferred: 27000 bytes
Requests per second: 23.86 [#/sec] (mean)
Time per request: 419.054 [ms] (mean)
Time per request: 41.905 [ms] (mean, across all concurrent requests)
Transfer rate: 6.30 [Kbytes/sec] received
- mod_antispam enabled (each 100 lines)
and I made each 100 lines white/black/autowhite/autoblack lists, and added target URI in the bottom of the black list.
Concurrency Level: 10
Time taken for tests: 4.387564 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 272084 bytes
HTML transferred: 27108 bytes
Requests per second: 227.92 [#/sec] (mean)
Time per request: 43.876 [ms] (mean)
Time per request: 4.388 [ms] (mean, across all concurrent requests)
Transfer rate: 60.40 [Kbytes/sec] received
you should write rules by regular expressions not make large white/black lists. and I'll support BerkeleyDB to get good performance in future.
DOWNLOAD
BLACKLISTS
Here is my current blacklist.
TODO
- SSL support
apache API don't spport SSL connection ?
- BerkeleyDB support
faster than text, but can't use regular expressions.
- DNSBL support
to share spam databases. it will supported in next version.
