Is that the only way that a UA number is used in the page markup? Is there s standard for making the call to Google? Or can you choose however you want? The reason I ask is that if you only parse it one way, and there are many different ways to do it, then you end up with a high probability for false negatives.
I think the best way to do this is for the user to enter a domain/host, we make a call to that domain on http/https and then scrape that page for an analytics account then leverage the ewhois or webboar (which one is more reputable/reliable?) databases.
I looked into doing this as well. I concluded that there must be a way to do what ewhois and webboar are doing on our own. An API call or something must be available to Google to enumerate UA- and pub- IDs.
I think you are right. webboar and ewhois most likely use an Adsense API key to make these calls. Since they are a single entity making these calls, it makes sense. Since we'll have 1000s of people using this module, instead of asking each of them to sign up for an Adsense API key to make this call (which most/many of them won't so they won't use the module), I think it'd be better to use the intermediary ewhois site to do these calls and just scrape the page. Thoughts?
And this module is different than Robin Wood's as, with his, you scrape a group of pages and analyze if the UA- is similar. With this one, it'd be performing a general lookup to enum other, unknown sites that use that UA- code. His is more targetted. This is more, "tell me what I don't know".
Heard nothing more about this so I went ahead and made a module. It is different than Robin's. His looks at a bunch of sites and determines what is related. Mine takes one UA and finds related domains. How? It takes a URL (target), visits that page, scrapes a UA- off it then looks that up on ewhois. Works really well.