Tim Tomes
Build a module which extracts the AdSense/Analytics ID from a given page and enumerates other pages on the internet with the same ID. Similar to

  Tim Tomes

    Is that the only way that a UA number is used in the page markup? Is there s standard for making the call to Google? Or can you choose however you want? The reason I ask is that if you only parse it one way, and there are many different ways to do it, then you end up with a high probability for false negatives.

  WebBreacher

    So, I've spent a little bit tonight looking around the web about this (I'm no expert). What I'm finding is that the best way to grab all the sites that are using a single %UA% for Google Analytics is to use a site like webboar or ewhois as they have search and databased the UA- or pub- IDs within pages. I've looked at doing a Google search for a page with the UA- or pub- in the page, searched for APIs and other things. The issue is that in order to know who uses which UA-, you need a DB (or good page source code search engine). The UA- and pub- values are sent via Javascript to the Google site to retrieve the appropriate ads.

    I think the best way to do this is for the user to enter a domain/host, we make a call to that domain on http/https and then scrape that page for an analytics account then leverage the ewhois or webboar (which one is more reputable/reliable?) databases.

  Tim Tomes

    I looked into doing this as well. I concluded that there must be a way to do what ewhois and webboar are doing on our own. An API call or something must be available to Google to enumerate UA- and pub- IDs.

  WebBreacher

    I think you are right. webboar and ewhois most likely use an Adsense API key to make these calls. Since they are a single entity making these calls, it makes sense. Since we'll have 1000s of people using this module, instead of asking each of them to sign up for an Adsense API key to make this call (which most/many of them won't so they won't use the module), I think it'd be better to use the intermediary ewhois site to do these calls and just scrape the page. Thoughts?

    And this module is different than Robin Wood's as, with his, you scrape a group of pages and analyze if the UA- is similar. With this one, it'd be performing a general lookup to enum other, unknown sites that use that UA- code. His is more targetted. This is more, "tell me what I don't know".

  Tim Tomes

    This rocks! I made a few comments and will definitely merge as soon as they are addressed. Good stuff. I've been waiting for someone to come up with a good way to solve this one. You did it.

