Wiki

Clone wiki

agtools / Profiler

Built-in Sampling Profiler

Passing the config switch AGT_CONFIG_PROFILER from the Makefile will enable the built-in profiling tool. This is similar to 'raster timing' but provides average costs over time, in text form, sorted by cost.

Sometimes the information it provides is more useful than rasters alone - especially for complex projects where there are many different things going on, and CPU load is erratically spread between different game objects. Rasters can be very hard to read under these conditions, and even more so if the framerate drops. Still it pays to use all tools available as no single method tells the whole story.

Enabling the Profiler

Pass 'AGT_CONFIG_PROFILER' in the Makefile.

At least one text output device must be enabled to use this feature.

see the Debug Console page for more info on setting that up.

Defining Profiling Contexts

It is necessary to add some 'context' definitions to use the profiler. Contexts are uniquely identifiable regions of code, which you want to have shown in the profiling report.

At a bare minimum, you can add this to the top of your main project sourcefile...

#!c++

// context definitions (builtin contexts only)
prf_context_begin
prf_context_end(prf_USER_)

...which will be sufficient to profile just the engine code. No user-defined contexts.

To profile some user code, such as specific AI/tick functions, some user tokens must be defined here instead.

e.g. to define three new user-specified profiling contexts...

#!c++

// user-defined context IDs
enum
{
    //
    // start of user-defined profiling tokens
    //
    // *** begin enumerating user contexts where builtin contexts left off ***
    prf_tick_view = prf_USER_,   
    prf_tick_player,
    prf_tick_enemy,
    //
    // end of user-defined profiling tokens
    //
    prf_MAX_
};

// context definitions, including user-defined ones
prf_context_begin
//
prf_context_def("tick:view",prf_tick_view)
prf_context_def("tick:player",prf_tick_player)
prf_context_def("tick:enemy",prf_tick_enemy)
//
prf_context_end(prf_MAX_)

All prf_context_def instances must lie between prf_context_begin/prf_context_end. Do not end with semicolons on these lines - these macros won't expand properly if they are present.

The first argument is the string to be emitted in the report, for code profiled under that context. The second argument is the ID to use when activating that context in the code itself.

This is all that is required to define profiling contexts. You can add up to 20 of these currently.

There are 12 built-in entries which are always available. Only two of these should be referenced directly in the game code. Don't add definitions for them. The two builtins you can use are:

prf_mainloop can be used as a catch-all for code in the mainloop not falling into any other context.

prf_idle can be used to capture idling time, waiting for the VBL to elapse.

Using the Profiler

Before your mainloop, call AGT_Sys_ProfilerReset() to ensure the sample counts are reset before the interesting stuff happens, otherwise loading delays and other setup costs will offset the results.

To profile a function - for example a tick function, add the AUTOPRF(context) macro to the top of the function of interest e.g.

#!c++

void myAI_fntick(entity_t *_pself)
{
    AUTOPRF(tick_enemy);
    ...

Alternatively you can use direct calls to prf_push(ctx)/prf_pop() or prf_set(ctx) the current profiling context.

e.g. to record the time spent idling in the mainloop you may add these calls around the wait-vbl code:

#!c++
prf_push(prf_mainloop);
...
while (1)
{
    ...
    prf_push(prf_idle);
    {
        curr_vbl = g_vbl;
        while (g_vbl == curr_vbl) { /* spin until counter changes */ }
    }
    prf_pop();
}

...or...

#!c++
prf_push(prf_mainloop);
...
while (1)
{
    ...
    prf_set(prf_idle);
    {
        curr_vbl = g_vbl;
        while (g_vbl == curr_vbl) { /* spin until counter changes */ }
    }
    prf_set(prf_mainloop);
}

...or...

#!c++

prf_push(prf_mainloop);
...
while (1)
{
    ...
    {
        AUTOPRF(prf_idle);
        curr_vbl = g_vbl;
        while (g_vbl == curr_vbl) { /* spin until counter changes */ }
    }
}

Profile results can be reported to the AGT console (or Steem/Hatari logs) by calling AGT_Sys_ProfilerDoReport(numlines) at intervals. This can be an expensive call if the builtin console is enabled so avoid calling it every refresh, once in 50 or so refreshes is sufficient. Output to Steem or Hatari logs alone is cheaper.

Example code to emit occasional report:

#!c++

    // occasional profiling report
    {
        static s16 s_refresh = 0;
        if (--s_refresh < 0)
        {
            // report once every 64 refreshes (reporting is expensive!)
            s_refresh = 64;
            // report just 8 lines - the 8 most significant costs
            AGT_Sys_ProfilerDoReport(8);
        }
    }

Example, taken from Steem Boiler log using AGT_CONFIG_STEEM_DEBUG:

ST -- Profiler:
ST -- main:idle 48.40
ST -- ent:draw 21.30
ST -- arena:bgrestore 10.40
ST -- arena:fill 6.50
ST -- -profiling-  6.32
ST -- ent:tick 3.29
ST -- arena:misc 1.75
ST -- main:loop 1.02
ST -- ent:spatial 0.77
ST -- ent:links 0.28
ST -- ent:interact 0.00
ST -- undefined... 0.00

Some notes:

While the results are sorted, the sorting is incremental and takes time to settle. The sort is performed during the report. Calling the report less often means a longer settling time for the sort.

The AUTOPRF(ctx) macro effectively wraps prf_push/prf_pop calls in an object so they occur on function entry/exit paths. This ensures push/pop is always balanced so its the safest option to use.

Note that using prf_set spoils the last active context. It behaves more like raster colour changes - so take care you don't end up accidentally covering more code than you expected in the current profiling context, with dangling 'set' contexts.

Note that unbalanced push/pop will cause a crash.

There are equivalent macros for the asm code - pushctx/popctx etc. - should they be needed.

Updated