Issue #5 resolved

AdSense/Analytics ID Module

Tim Tomes avatarTim Tomes created an issue

Build a module which extracts the AdSense/Analytics ID from a given page and enumerates other pages on the internet with the same ID. Similar to http://www.ewhois.com/.

Comments (7)

  1. Tim Tomes

    Is that the only way that a UA number is used in the page markup? Is there s standard for making the call to Google? Or can you choose however you want? The reason I ask is that if you only parse it one way, and there are many different ways to do it, then you end up with a high probability for false negatives.

  2. WebBreacher

    So, I've spent a little bit tonight looking around the web about this (I'm no expert). What I'm finding is that the best way to grab all the sites that are using a single %UA% for Google Analytics is to use a site like webboar or ewhois as they have search and databased the UA- or pub- IDs within pages. I've looked at doing a Google search for a page with the UA- or pub- in the page, searched for APIs and other things. The issue is that in order to know who uses which UA-, you need a DB (or good page source code search engine). The UA- and pub- values are sent via Javascript to the Google site to retrieve the appropriate ads.

    I think the best way to do this is for the user to enter a domain/host, we make a call to that domain on http/https and then scrape that page for an analytics account then leverage the ewhois or webboar (which one is more reputable/reliable?) databases.

  3. Tim Tomes

    I looked into doing this as well. I concluded that there must be a way to do what ewhois and webboar are doing on our own. An API call or something must be available to Google to enumerate UA- and pub- IDs.

  4. WebBreacher

    I think you are right. webboar and ewhois most likely use an Adsense API key to make these calls. Since they are a single entity making these calls, it makes sense. Since we'll have 1000s of people using this module, instead of asking each of them to sign up for an Adsense API key to make this call (which most/many of them won't so they won't use the module), I think it'd be better to use the intermediary ewhois site to do these calls and just scrape the page. Thoughts?

    And this module is different than Robin Wood's as, with his, you scrape a group of pages and analyze if the UA- is similar. With this one, it'd be performing a general lookup to enum other, unknown sites that use that UA- code. His is more targetted. This is more, "tell me what I don't know".

  5. Tim Tomes

    This rocks! I made a few comments and will definitely merge as soon as they are addressed. Good stuff. I've been waiting for someone to come up with a good way to solve this one. You did it.

  6. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.