1. Allan Davis
  2. RubyLearning

Source

RubyLearning / ProjectTrak / vendor / rails / railties / doc / guides / source / benchmarking_and_profiling / rubyprof.txt

== Understanding Performance Tests Outputs ==

=== Our First Performance Test ===

So how do we profile a request. 

One of the things that is important to us is how long it takes to render the home page - so let's make a request to the home page. Once the request is complete, the results will be outputted in the terminal. 

In the terminal run

[source, ruby]
----------------------------------------------------------------------------
[User profiling_tester]$ gcruby tests/performance/homepage.rb
----------------------------------------------------------------------------

After the tests runs for a few seconds you should see something like this. 

----------------------------------------------------------------------------
HomepageTest#test_homepage (19 ms warmup)
        process_time: 26 ms
              memory: 298.79 KB
             objects: 1917

Finished in 2.207428 seconds.
----------------------------------------------------------------------------

Simple but efficient. 

* Process Time refers to amount of time necessary to complete the action. 
* memory is the amount of information loaded into memory
* object ??? #TODO find a good definition. Is it the amount of objects put into a ruby heap for this process?

In addition we also gain three types of itemized log files for each of these outputs. They can be found in your tmp directory of your application.

*The Three types are*

* Flat File - A simple text file with the data laid out in a grid
* Graphical File - A html colored coded version of the simple text file with hyperlinks between the various methods. Most useful is the bolding of the main processes for each portion of the action. 
* Tree File - A file output that can be use in conjunction with KCachegrind to visualize the process

NOTE: KCachegrind is Linux only. For Mac this means you have to do a full KDE install to have it working in your OS. Which is over 3 gigs in size. For windows there is clone called wincachegrind but it is no longer actively being developed.  

Below are examples for Flat Files and Graphical Files

=== Flat Files ===

.Flat File Output Processing Time
============================================================================
Thread ID: 2279160
Total: 0.026097

 %self     total     self     wait    child    calls  name
  6.41      0.06     0.04     0.00     0.02      571  Kernel#===
  3.17      0.00     0.00     0.00     0.00      172  Hash#[]
  2.42      0.00     0.00     0.00     0.00       13  MonitorMixin#mon_exit
  2.05      0.00     0.00     0.00     0.00       15  Array#each
  1.56      0.00     0.00     0.00     0.00        6  Logger#add
  1.55      0.00     0.00     0.00     0.00       13  MonitorMixin#mon_enter
  1.36      0.03     0.00     0.00     0.03        1  ActionController::Integration::Session#process
  1.31      0.00     0.00     0.00     0.00       13  MonitorMixin#mon_release
  1.15      0.00     0.00     0.00     0.00        8  MonitorMixin#synchronize-1
  1.09      0.00     0.00     0.00     0.00       23  Class#new
  1.03      0.01     0.00     0.00     0.01        5  MonitorMixin#synchronize
  0.89      0.00     0.00     0.00     0.00       74  Hash#default
  0.89      0.00     0.00     0.00     0.00        6  Hodel3000CompliantLogger#format_message
  0.80      0.00     0.00     0.00     0.00        9  c
  0.80      0.00     0.00     0.00     0.00       11  ActiveRecord::ConnectionAdapters::ConnectionHandler#retrieve_connection_pool
  0.79      0.01     0.00     0.00     0.01        1  ActionController::Benchmarking#perform_action_without_rescue
  0.18      0.00     0.00     0.00     0.00       17  <Class::Object>#allocate
============================================================================  

So what do these columns tell us:

	* %self - The percentage of time spent processing the method. This is derived from self_time/total_time
	* total - The time spent in this method and its children.
	* self - The time spent in this method.
	* wait - Time processed was queued
	* child - The time spent in this method's children.
	* calls - The number of times this method was called.
    *  name - The name of the method.

Name can be displayed three seperate ways:
	* #toplevel - The root method that calls all other methods
	* MyObject#method - Example Hash#each, The class Hash is calling the method each
	* <Object:MyObject>#test - The <> characters indicate a singleton method on a singleton class. Example <Class::Object>#allocate
	
Methods are sorted based on %self. Hence the ones taking the most time and resources will be at the top.

So for Array#each which is calling each on the class array. We find that it processing time is 2% of the total and was called 15 times. The rest of the information is 0.00 because the process is so fast it isn't recording times less then 100 ms. 


.Flat File Memory Output
============================================================================
Thread ID: 2279160
Total: 509.724609

 %self     total     self     wait    child    calls  name
  4.62     23.57    23.57     0.00     0.00       34  String#split
  3.95     57.66    20.13     0.00    37.53        3  <Module::YAML>#quick_emit
  2.82     23.70    14.35     0.00     9.34        2  <Module::YAML>#quick_emit-1
  1.37     35.87     6.96     0.00    28.91        1  ActionView::Helpers::FormTagHelper#form_tag
  1.35      7.69     6.88     0.00     0.81        1  ActionController::HttpAuthentication::Basic::ControllerMethods#authenticate_with_http_basic
  1.06      6.09     5.42     0.00     0.67       90  String#gsub
  1.01      5.13     5.13     0.00     0.00       27  Array#-
============================================================================

Very similar to the processing time format. The main difference here is that instead of calculating time we are now concerned with the amount of KB put into memory *(or is it strictly into the heap) can I get clarification on this minor point?* 

So for <Module::YAML>#quick_emit which is  singleton method on the class YAML it uses 57.66 KB in total, 23.57 through its own actions, 6.69 from actions it calls itself and that it was called twice.

.Flat File Objects 
============================================================================
Thread ID: 2279160
Total: 6537.000000

 %self     total     self     wait    child    calls  name
 15.16   1096.00   991.00     0.00   105.00       66  Hash#each
  5.25    343.00   343.00     0.00     0.00        4  Mysql::Result#each_hash
  4.74   2203.00   310.00     0.00  1893.00       42  Array#each
  3.75   4529.00   245.00     0.00  4284.00        1  ActionView::Base::CompiledTemplates#_run_erb_47app47views47layouts47application46html46erb
  2.00    136.00   131.00     0.00     5.00       90  String#gsub
  1.73    113.00   113.00     0.00     0.00       34  String#split
  1.44    111.00    94.00     0.00    17.00       31  Array#each-1
============================================================================


 #TODO Find correct terminology for how to describe what this is exactly profiling as in are there really 2203 array objects or 2203 pointers to array objects?. 

=== Graph Files ===

While the information gleamed from flat files is very useful we still don't know which processes each method is calling. We only know how many. This is not true for a graph file. Below is a text representation of a graph file. The actual graph file is an html entity and an example of which can be found link:examples/graph.html[Here] 

#TODO (Handily the graph file has links both between it many processes and to the files that actually contain them for debugging. 
 )

.Graph File
============================================================================
Thread ID: 21277412

  %total   %self     total      self    children               calls   Name
/____________________________________________________________________________/
100.00%   0.00%      8.77      0.00      8.77                   1     #toplevel*
                     8.77      0.00      8.77                 1/1     Object#run_primes
/____________________________________________________________________________/
                     8.77      0.00      8.77                 1/1     #toplevel
100.00%   0.00%      8.77      0.00      8.77                   1     Object#run_primes*
                     0.02      0.00      0.02                 1/1     Object#make_random_array
                     2.09      0.00      2.09                 1/1     Object#find_largest
                     6.66      0.00      6.66                 1/1     Object#find_primes
/____________________________________________________________________________/
                     0.02      0.02      0.00                 1/1     Object#make_random_array
0.18%     0.18%      0.02      0.02      0.00                   1     Array#each_index
                     0.00      0.00      0.00             500/500     Kernel.rand
                     0.00      0.00      0.00             500/501     Array#[]=
/____________________________________________________________________________/
============================================================================

As you can see the calls have been separated into slices, no longer is the order determined by process time but instead from hierarchy. Each slice profiles a primary entry, with the primary entry's parents being shown above itself and it's children found below. A primary entry can be ascertained by it having values in the %total and %self columns. Here the main entry here have been bolded for connivence. 

So if we look at the last slice. The primary entry would be Array#each_index. It takes 0.18% of the total process time and it is only called once. It is called from Object#make_random_array which is only called once. It's children are Kernal.rand which is called by it all 500 its times that it was call in this action and Arry#[]= which was called 500 times by Array#each_index and once by some other entry. 

=== Tree Files ===

It's pointless trying to represent a tree file textually so here's a few pretty pictures of it's usefulness 

.KCachegrind Graph
[caption="KCachegrind graph"] 
image:images/kgraph.png[Graph created by KCachegrind] 

.KCachegrind List
[caption="KCachegrind List"] 
image:images/klist.png[List created by KCachegrind]

#TODO Add a bit more information to this. 

== Getting to the Point of all of this ==

Now I know all of this is a bit dry and academic. But it's a very powerful tool when you know how to leverage it properly. Which we are going to take a look at in our next section