Issue #17 new

Monitoring of Simulation

Keith Smith
created an issue

I recently conducted an engineering review of a simPy system simulation and received a lot of feed back and requests.

One major request was that everyone wanted a lot more analysis of how resources were being used and exactly what was the system state when a process failed to acquire a resource. They wanted to know what was using or taking a resource away from a resource starved process. They felt they needed this information to 'fix' the system so that it works better.

For a complex system simulation with lots of processes and resources that runs for long simulation durations this could amount to a huge amount of data.

I'm not sure where to go with this, but I at least need some mechanism to be able to extract the state of a resource in order to conduct this analysis. A tally type of monitoring could reduce the data required. I believe the user needs a simple method of defining what information to tally and/or store during a simulation run.

Any ideas?

Comments (28)

  1. Keith Smith reporter

    Here's an idea I had.

    Its a Probe class which allows the user to capture the state of the simulation through probe points. Each process updates its status through the probe points. When some interesting event occurs the set of probe points is recorded. I believe this, or a similar, approach has the following Pros and Cons:

    Pros:

    • Simple design
    • Allows the user to select exactly what state information they want to record
    • Allows the user to select when the state information is to be recorded.
    • Reduces the amount of information recorded by focusing on just what the user is interested in.

    Cons:

    • Requires that the user code all of the state changes in all of the simulation processes of interest.
    • The deepcopy command could impact performance (is there a better way?)

    The attached file attempts to demonstrate the Probe class concept.

    Unfortunately, I'm getting an Exception thrown outside the intended try, except scope. When a process releases a resource at the same time that a higher priority process requests one, then, even though the lower priority process has released the resource it is still interrupted by the higher priority process. (see code output). Perhaps this should be flagged as a separate bug?

  2. Keith Smith reporter

    Stefan:

    I started taking a look at what you are doing with monitoring. Its a bit complicated, but I think I understand where you are going and its interesting. Here are a few comments:

    • It currently appears that the monitoring scheme attaches itself (via Patch) to either pre and/or post collect data to a function call. Often times I don't want to collect data every time a function is called, but only when a condition, or exception is caught. Have you considered that type of data collection and how it may be implemented?

    • In its current state I think that a typical user will have difficulty figuring out how to use monitoring.py. Although it is clever how you are monkey patching in the collection process, and will be simple for capturing the dynamic state of resources, it may be difficult to apply to to other problems like tallys, or conditional/exception type collection schemes. I'd recommend many simple examples for users to access.

    I'll try to incorporate what you've done into my code. Currently I'd like to replace or modify my Probe code, but it only collects data when an exception is thrown and I'm not yet clear on how to use mentoring.py to do that.

    Thanks,

  3. Stefan Scherfke

    Keith Smith The monkey patching is only done for "external" (relative to your own) code that you cannot (or don’t want to) modify yourself. If you want to monitor your own processes, it becomes much more easy and explicit. I haven’t done an example for this yet, but you can take a look at the test cases in test_monitoring.py.

  4. luensdorf

    If this is going to happen, I don't think it will be for 3.1. Anyhow, I'm strongly in favor to not provide monitoring in simpy, but in an addon.

  5. pothier

    In Sept 23 I submitted code for pull request that does a monkey patch on Environment.step() for a kind of monitoring (tracing). I do something similar against my instances derived ffrom Resource() that outputs suitable for graphite

  6. pothier

    For instances, here's some of my output:

            Queue    Put   Max 
             Name  Count  Used Comment
        dsan3.NSA:    20    20 
      dsan3.q1535:     7     3 
      dsan3.q1635:     7     1 
      dsan3.q1735:     7     1 
      dsan3.q1934:     5     5 
      dsan3.q8335:    20     4 
      dsan3.q8336:    20    20 
      dsan3.q9435:     8     1
    

    dsas3.NOWHERE1: 7 7 dsas3.q2535: 10 1 dsas3.q2635: 10 5 dsas3.q2735: 10 2 dsas3.q9635: 7 1

    Where rows are Resources. I also pump the results to google spreadsheet which will do a cool time animation.

  7. luensdorf

    Hi Steve,

    yes, we dropped the developer mailinglist because there wasn't any traffic.

    Nice to hear about your monitoring approach. Maybe you could write an article that we can include in the SimPy documentation?

    And about the pull request: I'm sorry I didn't report back yet. Hopefully there's time to do that on the weekend.

    Cheers, Ontje

  8. pothier

    Ontje, I've taken a crack at such an article. My source for this is "org-mode" so I can export to various formats. Let me know how you want it. Not sure what precise format RTD allows. Below is exported to Markdown and just pasted. Preview shows glitches but should be readable enough for you to comment.


    <div id="table-of-contents"> <h2>Table of Contents</h2> <div id="text-table-of-contents"> <ul> <li><a href="#sec-1">1. About this document</a></li> <li><a href="#sec-2">2. Monitoring a simulation</a> <ul> <li><a href="#sec-2-1">2.1. Using the simpy.util.trace() function</a></li> <li><a href="#sec-2-2">2.2. Creating your own monitoring/profiling feature</a></li> </ul> </li> </ul> </div> </div>

    About this document<a id="sec-1" name="sec-1"></a>

    Intended for inclusion in SimPy documentation per Ontje's (https://bitbucket.org/luensdorf) request in issues/Monitoring

    I don't know the best way/format to provide this info so I'll write it in the way that's easiest and go from there. For me, "easiest" is org-mode with export to some other format(s).

    Monitoring a simulation<a id="sec-2" name="sec-2"></a>

    There are several posibilities

    Using the simpy.util.trace() function<a id="sec-2-1" name="sec-2-1"></a>

    The trace() function included in simpy.utilprovides a simple way to get a trace of the events being processed by the simulator. To use include something like:

    def mySim():
        env = simpy.Environment()
        simpy.util.trace(env)
        mySimulationSetup(env)
        env.run(until=1e3)
    

    This cause a simple message to be displayed at the beginning of an event evaluation (as part of env.step()). To use a custom function for display instead of the default, use the additional keyword argument to trace() like this:

    def stepTraceFunc(event):
        import inspect
        if isinstance(event,simpy.Process):
            xtra = 'Waiting for %s'%(event.target,)
            generatorName = event._generator.__name__
            geninfo = inspect.getgeneratorstate(event._generator)
            print('TRACE %s: event.Process: gen=%s(locals=%s) (value=%s) %s'  
                  % (event.env.now, generatorName, geninfo, event.value, xtra))
    
    
    def mySim():
        env = simpy.Environment()
        simpy.util.trace(env,stepTrace=stepTraceFunc)
        mySimulationSetup(env)
        env.run(until=1e3)
    

    You can, of course, put logging in the custom function so that output goes to a file or is configured by the python logging module.

    Creating your own monitoring/profiling feature<a id="sec-2-2" name="sec-2-2"></a>

    Its pretty easy to add your own custom profiling to a SimPy simulation. The advantage is that, because its custom, you can tailor for your application and get exactly what you want. Such a technique is used in dfsimand will be discussed here.

    To avoid a performance hit when profiling is turned off, dfsim uses monkey patching. In our case, when a command line argumement specifies that profiling should be used, our addProfiling() function is called after the simulation has been sufficiently setup, but before the simulation starts. As part of our initial setup, we create a graph (network) that holds the simulation related instances and connectivity. For instance, the graph has simpy.Store instances attached to some nodes.

    To add profiling to all of our simpy.Store instances, we do something like (in addProfiling():

    1  if not hasattr(simpy.Store,'monkey'):
    2      origStorePut = simpy.Store.put
    3      def putStoreWithCount(self,data):
    4          instance = self
    5          setattr(instance,'putcount', 1 + getattr(instance,'putcount',0))
    6          return origStorePut(self,data)
    7      simpy.Store.put = putStoreWithCount
    8      setattr(simpy.Store,'monkey',True)
    

    We add a monkey attribute to the instance we are modifying to prevent doing duplicate modifications [line 8]. This code snippet just creates a new function to use instead of the original simpy.Store.put(). The new function tracks how many put() calls were made, then does the orginal put().

    Also in addProfiling(), we do the following to keep track of all of the instances we profiled:

    for n,d in G.nodes_iter(data=True):
        if 'sim' not in d:
            continue
        si = d['sim']
        if isinstance(si,Dataq):
            Dataq.instances.append(si)
        elif isinstance(si, simpy.Store):
            simpy.Store.instances.append(si)
    

    After our simulation is done, we retrieve the attribute values we collected during simulation and display them:

    def print_summary(env, G, summarizeNodes=[]):
        print('Simulation done at time: %d.'%(env.now))
        print('Next event starts at: %s'%(env.peek()))
    
        # print profile if we collected data
        if G.graph.get('profileCollected',False):
            qmap = dict() # qmap[name] = Dataq
            for n,d in G.nodes_iter(data=True):
                if ('sim' in d) and isinstance(d['sim'],Dataq):
                    instance = d['sim']
                    if not hasattr(instance,'putcount'):
                        setattr(instance,'putcount',0)
                    if not hasattr(instance,'hiwater'):
                        setattr(instance,'hiwater',0)
    
                    qmap[d['sim'].name] = d['sim']
    
            print('Dataq use summary:')
            print('  %15s  %5s %5s %s'%('Queue', 'Put',   'Max',  ''))
            print('  %15s  %5s %5s %s'%('Name' , 'Count', 'Used', 'Comment'))
            for name in sorted(qmap.keys()):
                print('  %15s: %5d %5d %s'
                      %(name,
                        qmap[name].putcount,
                        qmap[name].hiwater,
                        'WARNING: unused' if qmap[name].hiwater == 0 else ''
                    ))
            print()
    
            siList = simpy.Store.instances
            siList.sort(key=lambda x: x.edge)
            if len(siList) > 0:
                print('Store use summary (%d):'%len(siList))
                for si in siList:
                    print('\t Edge %s: putcount=%d'
                          %(si.edge,
                            getattr(si, 'putcount',-1)
                        ))
    

    <div id="footnotes"> <h2 class="footnotes">Footnotes: </h2> <div id="text-footnotes">

    <div class="footdef"><sup><a id="fn.1" name="fn.1" class="footnum" href="#fnr.1">1</a></sup> Submitted as pull-request on Sept 23, 2014. Might not be in current release yet.</div>

    <div class="footdef"><sup><a id="fn.2" name="fn.2" class="footnum" href="#fnr.2">2</a></sup> A somewhat general simulator for a Data-Flow diagram. It reads a diagram into a graph and uses the graph as the basis for setting up the simulation.</div>

    </div> </div>

  9. luensdorf

    Thanks for the article, Steve. We use Sphinx for documentation. If you reformat your article to ReST (thats the format used by sphinx), we could add it to the topical guides section of our documentation. See here for example: https://bitbucket.org/simpy/simpy/raw/22c6ac05096c485033927856ac486bd848cf957f/docs/topical_guides/events.rst

    If you integrate your article in the documentation, you can create a new pull request and see what Stefan thinks about your article (he has written all of the topical guides).

    Cheers, Ontje

  10. Log in to comment