Wiki

Clone wiki

Domain Cage / On the current state of the art in the battle between online advertisement and ad blocking software

In the last decade advertisement is increasingly flooding the Internet. This became a real problem for visitors of many sites. Ads occupy more space, popup over and overlap content of your interest, slow down page loading, collect user data for better "targeting", may create a security breach for your device. This is why the users would like to engage ad blocking programs like DomainCage. On the other hand, advertisers would like to pass the guards and is constantly inventing new methods for this.

From very beginning, ad blocking programs did intercept http-requests and block them based on specific parts of URLs. DomainCage can also block http-requests if their target domain is other than that of the requesting page, and neither target nor origin is white-listed. Also a specific fragment in URL can be black-listed, such as the word "banner". This technique works well in many cases, but some sites started to serve ads from the same domain as all other meaningful content. In other words, their requests for ads do not differ in any way from requests for actual content. There is even a new term for this - "native ads": ads which appear and behave as genuine content.

Moreover some sites can detect the http-request blocking itself, when it's enabled for ads, and show error message instead of the content. Usually the message advises a user to add the site into white-list of ad-blocking software.

One of possible alternatives to suppress native ads and prevent ad-blocking detection is to analyze html-elements and their properties to find ad-related staff and remove it afterward. For example, in many cases div-blocks with ads are marked by specific class names, or keep predefined sizes. This approach is implemented in DomainCage as well, in the feature named Dumping. This method does not prevent ads from loading, but removes them as soon as they are rendered in the page. They can be inbuilt in the page from the start or inserted dynamically later - this does not matter: DomainCage will track both cases. The drawbacks of this method include the useless traffic (because ads are downloaded) and possible momentary showing up of the ads on the page before they get removed. And last but not the least, unfortunately it does not work for all sites anymore.

Recently, certain sites began to utilize a new method of webpage rendering. They generate html-tags with non-standard random names, or standard tags with random attributes (class names). The word "random" means that they just look like strings consisting of arbitrary letters, but in fact they are aliases for hidden meaningful names and attributes, known to webpage scripts. The random strings are changing on every page reload. The class atrribute can be replaced with something other, for example data-xyz, where xyz is a random string. Intrinsic image sizes can be enlarged to somewhat larger fake extents, and blank margins are added around the original payload. Then corresponding offsets are applied to these images in such a way that they visually fit into the page in proper places. In this situation it's hard to distinguish ads by their properties - size, element type, placement in the page.

This is a new task, which ad blocking software faces and needs to solve. DomainCage will address it in future using a kind of "parametric" CSS selectors with deferred resolving placeholders and fingerprinting based on subtree structures in the DOM. These are non-standard features, which should be implemented from scratch. Nevertheless, existing DomainCage is still very effective. The sophisticated html randomization is very rare at the moment. Most of ads are discarded successfully, and the remaining ones are less obtrusive.

Another problem with new generation of ads is that they are transforming more and more into video format. Indeed, on a site which hosts video clips (very popular nowadays), its developers can show ads before the clips in the single video-element on the webpage, in a seamless stream. Interception of such ads is almost impossible. In particular, one could conceive in-depth analysis of still images and video-streams (via hidden canvas for example), and run some of recognition algorithms on it. Yet, the ads timeline can not be excluded/skipped from the video, and user will need to wait until the end of the ads (even if the ads are hidden by an overlapping cover created by ad-blocker). It's also possible to track special controls of the player and somehow make them "think" that "skip" function (which is usually made available after several seconds of viewing ads) is already activated. But all of these is very tricky, and makes ad-blocking a very challenging task.

It's obvious that the battle will continue, involving all new and new techniques on both opposing sides.

Updated