- changed status to open
CLF Probes blocks Googlebot
“CLF Probes” blocks Googlebot:
example.com-access_log:66.249.70.10 - - [16/Aug/2020:15:57:36 +0300] "GET /index.php?option=com_content&view=section&layout=blog&id=12&Itemid=81 HTTP/1.1" 404 794 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.92 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" (0.047)
Here is how we can verify an IP is from Googlebot:
https://support.google.com/webmasters/answer/80553
I can modify /usr/local/libexec/sshg-fw-ipfw to not block Googlebot, but then in my system logs file I will still see these entries:
Blocking "66.249.70.10/32" for 240 secs (3 attacks in 1712 secs, after 2 abuses over 2785 secs.)
Also this file will be replaced in the next sshguard upgrade.
I think the best solution is to add a feature to sshguard, add an option to sshguard.conf ( dynamicwhitelist = /usr/local/etc/sshguard.whitelist.sh ) and then before a block it executes:
/usr/local/etc/sshguard.whitelist.sh IP
The shell script we will get the IP, does it stuff and if it returns 0 then sshguard blocks the IP and if it returns 1 to not block it.
Comments (14)
-
-
reporter Can you tell me what file I have to edit? I use FreeBSD.
-
reporter I found the file in /usr/local/sbin/sshguard, but if I make changes there I will have to make the changes again after each sshguard upgrade. Do you think it would be useful for others too to add upstream an option in sshguard.conf , and if that option is enabled then /usr/local/sbin/sshguard uses the filter?
-
Yes, that’s the right file to edit. Yes, if nothing is done, you’ll have to make those changes every time you upgrade.
I understand the desire to keep local changes outside of /usr/local/sbin/sshguard and in the configuration file, but the
sshguard
script was also written in shell to make it easy to adapt or change if the needs required it; for example, you can even replacesshg-parser
with your own custom parser.One possibility would be to essentially make sshguard.conf executable, and essentially put everything that’s currently in
sshguard
insshguard.conf
. That would break existing configurations but make pretty much every part of SSHGuard “configurable”.Another possibility could be to add a configuration option, something along the lines of
POST_PARSER_FILTER
that optionally adds that to the second to last line, so that it would look like:eval $tailcmd | $libexec/sshg-parser | ${POST_PARSER_FILTER} | \ $libexec/sshg-blocker $flags | $BACKEND &
What do you think?
-
reporter Thank you very much for the help.
-
You’re welcome. I appreciate your feedback on what you’d like to see happen.
-
reporter Finally I wrote a shell script that runs every minute using cron. It reads the previous minute entries from /var/log/auth.log, it checks for IPs blocked from sshguard, it checks (with the recommended Google method) if these IPs are from Googlebot or Bing and removes them from firewall and also from /var/db/sshguard/blacklist.db , then it sends an e-mail to admin.
#!/usr/local/bin/bash HOSTNAME=`/bin/hostname -s` previousMinute=`date -v -1M '+%b %d %H:%M:'` for ip in `grep "${previousMinute}" /var/log/auth.log | grep "sshguard" | grep ": Blocking " | awk '{ print $7 }' | sed 's|"||g' | awk -F '/' '{print $1}' | sort | uniq` do if `ipfw table 22 list | awk '{print $1}' | awk -F '/' '{print $1}' | sort | uniq | grep -q "${ip}"`; then HOSTRESULT="$(host -W 1 ${ip})" HOSTRESULT="$(echo $HOSTRESULT | awk '{print $5}' | sed 's/\.$//')" REGEX='.*(googlebot\.com|google\.com|search\.msn\.com)$' if [[ "$HOSTRESULT" =~ $REGEX ]]; then IPRESULT="$(host -W 1 ${HOSTRESULT})" IPRESULT="$(echo $IPRESULT | awk '{print $4}')" if [[ $IPRESULT = "$ip" ]]; then ipfw -q table 22 delete ${ip} sed -i "" -e "/|4|${ip}$/d" /var/db/sshguard/blacklist.db printf "${IPRESULT} (${HOSTRESULT})" | /usr/bin/mail -s "[SSHGUARD - ${HOSTNAME}] ${IPRESULT}" root fi fi fi done
-
That’s certainly a way to do it. I would imagine it would have been easier just to filter out the Googlebot IP’s in the filter pipeline?
-
reporter Yes but I want to avoid making custom changes to sshguard files to avoid forgetting redoing them after an upgrade. I am ok for temporary blocking a search engine IP for up to 1 minute and also it doesn’t happen frequently.
-
For a permanent solution, would you like something like a
POST_PARSER_FILTER
option? -
reporter I am not sure I understand this solution.
I will add POST_PARSER_FILTER in sshguard.conf with the path of a shell script that does the check? And it will automatically run the following command:
eval $tailcmd | $libexec/sshg-parser | ${POST_PARSER_FILTER} | \
\$libexec/sshg-blocker $flags | $BACKEND &instead of the following command?
eval $tailcmd | $libexec/sshg-parser | ${POST_PARSER_FILTER} | \
\$libexec/sshg-blocker $flags | $BACKEND & -
Instead of the current pipeline, without ${POST_PARSER_FILTER}. This way, the filter is optional and configurable from
sshguard.conf
. -
- changed status to resolved
Added
POST_PARSER
option to sshguard.conf in 0e66380. -
reporter And the post parser script to exclude the search engines:
#!/usr/local/bin/bash while read SSHGUARD; do HOSTNAME=`/bin/hostname -s` IP=`echo "${SSHGUARD}" | awk '{print $2}'` HOSTRESULT="$(host -W 1 ${IP})" HOSTRESULT="$(echo $HOSTRESULT | awk '{print $5}' | sed 's/\.$//')" REGEX='.*(googlebot\.com|google\.com|search\.msn\.com|yandex\.ru|yandex\.net|yandex\.com|crawl\.baidu\.com|crawl\.yahoo\.net)$' if [[ "$HOSTRESULT" =~ $REGEX ]]; then IPRESULT="$(host -W 1 ${HOSTRESULT})" IPRESULT="$(echo $IPRESULT | awk '{print $4}')" if [[ "$IPRESULT" != "$IP" ]]; then echo "${SSHGUARD}" else printf "We did not block: ${IPRESULT} (${HOSTRESULT})" | /usr/bin/mail -s "[SSHGUARD - ${HOSTNAME}] ${IPRESULT}" root fi else echo "${SSHGUARD}" fi done
- Log in to comment
What about inserting your own filter on the second to last line of the
sshguard
driver script?