Google CAPTCHA handling is failing.

Issue #249 resolved
Steve Malfroidt
created an issue

In version 4.9.0 , in ghdb module, after answering some captcha's, it exits on "IndexError: list index out of range" File usr/share/recon-ng/recon/mixins/, line 72 in _solve_google_captcha .

Comments (33)

  1. Mark Muir

    I've been having a similar issue. The HTTP code I'm getting is 403 (forbidden). This might be due to some personal modifications I've made to that module (so our issues might have different causes), although it could also be that google are temp IP banning automated dorking when they detect it. Next time it happens, immediately try to run:

    use recon/domains-hosts/google_site_web

    If that also errors out, it's probably google wielding the ban-hammer. If that's the case, it becomes a question of whether they've specifically fingerprinted this tool, or are detecting automated dorking in general. If it's the latter, not much we can really do other than rotating IP addresses. If it's the former, two ways I can think of to reduce chances of detection:

    • initialise with a randomised user-agent from a list of valid UAs (though I haven't tested if things still work using a javascript capable UA, although maybe 'no javascript' can be communicated via GET headers or something)
    • shuffling the list of dorks before running through them, so they don't search through in the exact same order every time

    EDIT: the speed at which they ban would also vary, depending on how 'dirty' your IP address already is (e.g. if you're using a popular VPN service)

  2. Tim Tomes repo owner

    I think it's more a case of Google starting to mix in their new CAPTCHA stuff with the old CAPTCHA implementation that the search interface has leveraged for quite some time. I've heard reports of this, but have yet to see it for myself. Kinda hard to fix something I can't replicate and is random in nature to begin with. I'm not sure when/if this will be resolvable. Any data we can collect on this the better.

  3. Mark Muir

    Next time I go a-dorking I'll record with a mitm proxy and see what turns up. Just on a random tangent, do you guys have a preferred mitm proxy? I've been using the free edition of burp for some time, but wouldn't mind using something a little more flexible/a little less java.

  4. Tim Tomes repo owner

    Burp is the standard bearer. There is nothing else even close right now that I would recommend. But as far as Burp goes, buy yourself a pro license. It's worth every penny and affordable for individuals.

  5. grid

    Seeing the same behavior described by Steve Malfroidt, in recon-ng 4.9.2. I tried the trick Mark Muir suggested, and immediately saw the same error as listed in the original post. I typically run the GHDB module in 2 parts: once with only advisories and vulnerabilities, and the second time with all other options set to true and advisories and vulnerabilities set to false. The first part runs fine; it's the second run that is problematic

  6. francesco

    Hi Tim, I had the same issue a few times these days. Last time it happened (today), I was running the module "recon/domains-vulnerabilities/ghdb".

    [!] IndexError: list index out of range
    File "/usr/share/recon-ng/recon/mixins/", line 72, in _solve_google_captcha.


  7. Tim Tomes repo owner

    I did some analysis and... I was afraid this day would come. Google has caught up with the rest of the world and is now using their newer reCAPTCHA system. I have neither the time nor inclination to find a way around it. I'll let this issue sit open for now, but if no one else wants to take a crack at it, I'll probably strip out all of the Google web search stuff moving forward. Such a sad day... :-(

  8. Melissa

    Noooooooooooo plz don't strip out Google web search far I haven't had any of those issues with ghdb module yet, but i was doing some research in case i did...

    IndexError: list index out of range

    "Always keep in mind when you want to overcome this error, the default value of indexing and range starts from 0, so if total items is 100 then 99 in the range will be the last element.

    The way Python indexing works is that it starts at [0]. So Your if your indexing range has [100] elements, the first number of your list would be [0] and your last number would be [99]. Minus 1 And You Should be Fine."

  9. Tim Tomes repo owner

    How long has it been since you checked? The CAPTCHA stuff should not be working anymore. It hasn't been for the rest of us. Google has changed their interface, and the Recon-ng parser is not built to handle the changes. The only option I see is integrating a CAPTCHA answering service, but even then, it may not work right, and not everyone will want to use the same service. Bottom line is I don't have the time to architect a new solution, and the broken stuff needs to be stripped out eventually. No reason for the dead code to be floating around. It saddens me too, as it was one of my favorite features. I spent a long time figuring out how to make it work.

  10. Melissa

    hmmm it seems that google hates me now lol i just ran ghdb module against my website using my own dorks_file on my Desktop and this is the result. Out of 62 dorks, about 31 went though before the error.

    [] Searching Google for: site:www.* intitle:"Index of /" modified php.exe [] Searching Google for: filetype:php inurl:"viewfile" -"index.php" -"idfil [*] Searching Google for: site:www.* filetype:cnf my.cnf -cvs -example [] Searching Google for: filetype:wsdl wsdl [] Searching Google for: site:www.* filetype:inc inc intext:setcookie [] Searching Google for: ext:cgi inurl:ubb6_test.cgi [*] Searching Google for: site:www.* intitle:"PHP Explorer" ext:php (inurl:phpexplorer.php | inurl:list.php | inurl:browse.php) [] Searching Google for: inurl:robpoll.cgi filetype:cgi [] Searching Google for: site:www.* inurl:"plog/register.php" [] Searching Google for: link: [*] Searching Google for: site:www.* inurl:"nph-proxy.cgi" "Start browsing through this CGI-based proxy" [] Searching Google for: intitle:gallery inurl:setup "Gallery configuration" [] Searching Google for: site:www.* "create the Super User" "now by clicking here" [] Searching Google for: filetype:lit lit (books|ebooks) [*] Searching Google for: site:www.* inurl:cgi.asx?StoreID [] Searching Google for: inurl:" WWWADMIN.PL" intitle:"wwwadmin" [] Searching Google for: site:www.* inurl:changepassword.cgi -cvs [] Searching Google for: intitle:"Directory Listing" "tree view" [*] Searching Google for: site:www.* intitle:mywebftp "Please enter your password" [] Searching Google for: ezBOO "Administrator Panel" -cvs [] Searching Google for: site:www.* intitle:"ASP FileMan" Resend [] Searching Google for: intitle:"phpremoteview" filetype:php "Name, Size, [*] Searching Google for: site:www.* "File Upload Manager v1.3" "rename to" [] Searching Google for: inurl:click.php intext:PHPClickLog [] Searching Google for: site:www.* "powered by YellDL" [] Searching Google for: filetype:cgi inurl:cachemgr.cgi [*] Searching Google for: site:www.* ext:asp inurl:DUgallery intitle:"3.0" -site:dugall [] Searching Google for: ext:asp "powered by DUForum" inurl:(messages|details|login|default|register) [] Searching Google for: site:www.* "Powered by Land Down Under 601" [] Searching Google for: inurl:php.exe filetype:exe [*] Searching Google for: site:www.* filetype:mdb inurl:"news/news" [] Searching Google for: filetype:pl -intext:"/usr/bin/perl" inurl:webcal (inurl:webcal | inurl:add | inurl:delete |

    inurl:config) [] Searching Google for: site:www.* inurl:cgi-bin inurl:bigate.cgi [] Searching Google for: site:www.* intitle:"SSHVnc Applet"OR intitle:"SSHTerm Applet"

    [!] IndexError: list index out of range File "/usr/share/recon-ng/recon/mixins/", line 72, in _solve_google_captcha.


    [*] 37 total (37 new) vulnerabilities found. [recon-ng][melissa][ghdb] >

  11. Tim Tomes repo owner

    Yes, that's because the CAPTCHA triggered at that point. Look at the last method call of the exception, _solve_google_captcha. Like I've been saying, it's broken. Thank you for confirming on your end though.

  12. Melissa

    Sure no probs and thank you for Recon-ng one of the best and most powerful tools out there. Sad day coz this tool was the only one that was working with google and the ghdb until now.

  13. Tim Tomes repo owner

    Yeah. :-( The built-in GHDB is actually old too. Offensive Security cut my access to it over a year ago, so I haven't been able to update it. Hard to come by good information these days.

  14. Melissa

    Tim I just ran my dorks file on my Desktop again and all 62 dorks went through without any sort of Captcha from Google - it was successful! I get my dorks files from the ghdb but i don't output these files in note pad txt format but in the format I believe ghdb has them in if that makes any sense lol. I use phantomjs with a script to get these dorks files.

  15. Tim Tomes repo owner

    Yes, that makes sense. But its only a matter of time until you hit a CAPTCHA. Then you're out of luck for a while. You should be able to run small batches every now and then, but any larger set of requests is going to trigger the CAPCTA eventually and it will break. Perhaps rather than remove these modules all together, I'll just present a CAPTCHA alert and stop execution of the module when this happens. That way, it's still there folks like yourself to use in limited capacity.

  16. Tim Tomes repo owner

    Won't work because the cookie that gets issued for the correct response would be in the browser, and we need it in the framework's cookie jar. I can't think of a way to intercept the response without interaction with tools outside of the framework itself.

  17. Tim Tomes repo owner

    They might have access to the old school Google API. Or, perhaps they spin up custom CSEs for each domain and do it that way. Interesting though. Are the requests going straight to Google from Acunetix, or to a Acunetix server? If straight to Google, sniff the traffic and check it out. Might be something to learn there.

  18. Melissa

    I think it might be str8 to Google not sure tho but i'm gonna do some sniffing to find out. The version i have 10,5 has a Desktop graphical interface but now the latest version is launched online and they have hidden their scripts and tools so basically the latest version 11 is just to hit the start-run button on a target website and it does everything for you. They prolly did this when they realize their software/program was being cracked but that didn't help because there is now a cracked version 11 on the net done by a Chinese guy. I saw some Russians highly rating this tool in a Russian forum and that's how i found out about it.

  19. modifily

    [recon-ng][default][google_site_web] > set SOURCE SOURCE => [recon-ng][default][google_site_web] > run


    [] Searching Google for: [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] Searching Google for: [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] [host] (<blank>) [] Searching Google for: [] [host] (<blank>) [] Searching Google for: [] No New Subdomains Found on the Current Page. Jumping to Result 201. [] Searching Google for:


    [*] 18 total (18 new) hosts found.

    HOW I DO It????)))

    shit, it's random.

    tried to do again, failed, why?!

    i add key for google_api and and it earned

    key add google_api A454we*******

  20. Tim Tomes repo owner

    I apologize. I have no idea what you're asking. Also, I believe this is in the wrong issue. This is an issue for the Google Dork CAPTCHA handling.

    @Melissa Are you still using this module as is?

  21. Tim Tomes repo owner

    So, I did a bit of research into this today and it's all bad news. I have to use the Lynx CLI-based browser user agent in order to get a parsable response from Google searches. The problem is Google recognizes this and won't give me a CAPTCHA. Therefore, I have no CAPTCHA data to work with. I can either have parsable search results, or access to CAPTCHA, but it doesn't look like I can have both, which is required to handle CAPTCHAs from the command line. Sadly, it may be time to say goodbye to this functionality.

  22. Log in to comment