CEF ignores Content-Type charset option when using custom CefResourceHandler

Issue #1906 resolved
BrowserAutomationStudio created an issue

What steps will reproduce the problem?

Need to override CefResourceHandler and send responce with non default encoding. Even if specify charset through Content-Type header, CEF won't recognize it and use default. Here is code example, which illustrates issue:

#include "include/cef_app.h"
#include "include/cef_client.h"
#include <stdio.h>


//ResourceHandler which main task is to serve static content
class  MyResourceHandler : public CefResourceHandler
{
    //This works fine
    //std::string Responce = "Hello";

    //This works fine to, because string is encoded in utf-8, and default encoding is utf-8
    //std::string Responce = "\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82";

    //This doesn't work, beacause string is encoded in windows-1251 encoding, even if it is specified correct encoding in headers.
    std::string Responce = "\xCF\xF0\xE8\xE2\xE5\xF2";

    //This works, because encoding is set by meta
    //std::string Responce = "<meta charset='windows-1251'/>\xCF\xF0\xE8\xE2\xE5\xF2";

public:

    bool ProcessRequest(CefRefPtr<CefRequest> request, CefRefPtr<CefCallback> callback)
    {
        callback->Continue();
        return true;
    }

    void GetResponseHeaders(CefRefPtr<CefResponse> response, int64& response_length, CefString& redirectUrl)
    {
        CefResponse::HeaderMap HeaderMapData;
        // !!! Charset is ignored !!!
        HeaderMapData.insert(std::pair<CefString, CefString>("Content-Type","text/html;charset=windows-1251"));
        response->SetHeaderMap(HeaderMapData);

        response->SetMimeType("text/html");


        response->SetStatus(200);
        response_length = -1;
    }

    bool ReadResponse(void* data_out,int bytes_to_read,int& bytes_read,CefRefPtr<CefCallback> callback)
    {
        if(Responce.empty())
            return false;

        memcpy(data_out,Responce.data(),Responce.size());
        bytes_read = Responce.size();
        Responce.clear();
        return true;
    }

    bool CanGetCookie(const CefCookie& cookie) { return true; }

    bool CanSetCookie(const CefCookie& cookie) { return true; }

    void Cancel(){}

    private:
        IMPLEMENT_REFCOUNTING(MyResourceHandler);
};

//Standart application
class MyCefApp: public CefApp
{
private:
    IMPLEMENT_REFCOUNTING(MyCefApp);
};


//Cef client, which routes all requests to MyResourceHandler
class MyHandler : public CefClient, public CefRequestHandler
{
    CefRefPtr<CefRequestHandler> GetRequestHandler()
    {
        return this;
    }
    CefRefPtr<CefResourceHandler> GetResourceHandler(CefRefPtr<CefBrowser> browser, CefRefPtr<CefFrame> frame, CefRefPtr<CefRequest> request)
    {
        return new MyResourceHandler();
    }

private:
    IMPLEMENT_REFCOUNTING(MyHandler);
};


int main()
{

    //Initialize main classes
    CefMainArgs main_args;
    CefRefPtr<CefApp> App = new MyCefApp();
    CefRefPtr<MyHandler> Handler = new MyHandler();
    CefExecuteProcess(main_args, App, NULL);
    CefSettings GlobalSettings;
    CefInitialize(main_args, GlobalSettings, App, NULL);


    //Create browser
    CefWindowInfo window_info;
    window_info.SetAsPopup(0,"");
    CefBrowserSettings browser_settings;
    //Set utf-8 as default encoding
    std::wstring wencoding = L"utf-8";
    cef_string_utf16_set(wencoding.data(),wencoding.size(),&browser_settings.default_encoding,true);
    CefRefPtr<CefBrowser> Browser = CefBrowserHost::CreateBrowserSync(window_info, Handler, "google.com", browser_settings, 0);

    //Infinite message loop
    while(true)
    {
        CefDoMessageLoopWork();
    }

  return 0;
}

What is the expected output? What do you see instead?

I see browser window with following output :

WrongEncoding.png

And expected output is following:

GoodEncoding.png

What version of the product are you using? On what operating system?

I use 3.2623.1397.gaf139d7_windows32 on Windows 7 x64

Does the problem reproduce with the cefclient or cefsimple sample application at the same version? How about with a newer or older version?

No, it doesn't. Problem reproduces only with custom CefResourceHandler

Does the problem reproduce with Google Chrome at the same version? How about with a newer or older version?

No, it doesn't. Chrome always correctly treats Content-Type headers.

Comments (19)

  1. Dmitry Azaraev

    @amaitland CEF currently miss character set support completely. Resource handlers miss some standard filters which handle this (probably). Same with compressed response (it will not be handled).

    @kdkdkd You can workaround this by injecting BOM in response (for cases when you have UTF8 or UTF16LE/BE).

  2. amaitland

    Alex Maitland CEF currently miss character set support completely. Resource handlers miss some standard filters which handle this (probably). Same with compressed response (it will not be handled).

    @dmitry-azaraev That's interesting to know, thanks. CefSharp uses the BOM approach.

  3. BrowserAutomationStudio reporter

    You can workaround this by injecting BOM in response (for cases when you have UTF8 or UTF16LE/BE).

    It is not the best solution in my situation, because, as you sad, it helps only with utf encoding. I want to handle every charset which chrome does.

    For example, there is russian social network vk.com and it uses windows-1251 encding, it works fine for html content because of meta tag: <meta charset='windows-1251'/>. In that case cef successfully decodes page. But when ajax request is done and server returns non utf encoded text, cef fails to decode it. Thus vk.com is not usabe with CefResourceHandler right now(

    I'm thinking about handle encoding by myself with icu or iconv or something similar, but than I need to parse web pages and check for meta tag to avoid double decoding(

  4. amaitland
  5. BrowserAutomationStudio reporter

    You can change the default_encoding on a per browser basis.

    I tried to do that but with no results, follwing code

    //Get context
    CefRefPtr<CefRequestContext> Context = CefRequestContext::GetGlobalContext();
    
    //Create and populate dictionary
    CefRefPtr<CefValue> Value = CefValue::Create();
    CefRefPtr<CefDictionaryValue> Dictionary = CefDictionaryValue::Create();
    Dictionary->SetString("charset_default","windows-1251");
    Value->SetDictionary(Dictionary);
    
    //Modify context
    CefString Error;
    Context->SetPreference("intl",Value,Error);
    std::cout<<std::endl<<Error.ToString()<<std::endl<<std::endl;
    

    prints "Trying to modify an unregistered preference", while

    Context->GetAllPreferences(true)->GetDictionary("intl")->GetString("charset_default").ToString()
    

    has value.

    Same approach but with proxy.mode changes proxy settings.

  6. amaitland

    I've had similar problems when trying to set properties using dictionaries, the dot notation is more reliable in my experience.

    It'll be something like context->SetPreference("intl.charset_default", "windows-1251", error);

    In OnAfterCreated, I can change the preference, haven't checked anything with windows-1251 encoding though.

    Just a reminder that you can only call SetPreference on the CEF UI thread.

  7. BrowserAutomationStudio reporter

    Checked it out and dot notation works great, it even changes default encoding without need to restart browser.

    Only one concern left: if SetPreference will work for different frames with different encodings, for example non-utf advertising iframe and utf main site content.

    I'm afraid, that it can work from time to time, or there could be a race condition(

  8. amaitland

    Unfortunately I don't believe you can specify a preference at a frame level, it's at a CefRequestContext level, which you can use to isolate CefBrowser instances.

  9. BrowserAutomationStudio reporter

    @amaitland Yes, and some sites does several charsets within same frame. For example, qq.com uses GB2312, gbk and utf-8. Thanks for help anyway.

    For now I ended with solution, which detects charset based on http headers or meta tag, decodes page content to utf-8 and modifies charset in meta tag if needed. In other words forces every page to have utf-8 encoding.

    I've attached source code if somebody else will need it.

    But still waiting for native fix from CEF team.

  10. Marshall Greenblatt

    Does Google Chrome properly handle different frames with different character encodings? Do you have a URL that demonstrates this?

  11. BrowserAutomationStudio reporter

    Yes, Google Chrome handles that properly. I wrote simple example in node js, it gets properly rendered in Chrome and in CefClient, but it is impossible to render properly when using custom CefResourceHandler. There are 2 frames: one with utf-8 encoding and one with windows-1251 encoding, each frame don't have meta tag, but have Content-Type header. And there is no chance to display that properly with custom CefResourceHandler:

    If I set encoding to utf-8, second frame will render in a wrong way.

    If I set encoding to window-1251, first frame will render in a wrong way.

    I don't have direct url to real world example, but if you login in vk.com and try to obtain group list. then part of data will be corrupted. Part of ajax requests gives data in windows-1251 encoding and another part in utf-8 and the only thing, whcih gives information about encoding is Content-Type which is not handleed properly whith custom CefResourceHandler.

  12. Log in to comment