Hi! I appreciate your great work on yappi. I want to make yappi useful for applications that use greenlets, like Gevent.
The problem using yappi with greenlets is that yappi doesn't know about switches between greenlets. Consider:
importtimeimportgreenletdefmyfunction():main_greenlet.switch()print'done'main_greenlet=greenlet.getcurrent()g=greenlet.greenlet(myfunction)g.switch()time.sleep(1)g.switch()# Prints 'done'.
yappi thinks myfunction took one second of wall-clock time, but in fact it was paused while the main greenlet was sleeping for one second.
My solution is to add a function, yappi.set_context_id_callback(). If you set a callback, yappi uses it instead of the thread id to find the current context.
Then, in a separate project, I'll use the new set_context_id_callback function to make a greenlet profiler. Here's a test that demonstrates the greenlet-aware profiler. I like having the greenlet-aware profiler in a separate project. I don't think yappi should include code specific to greenlets. I prefer adding a more general-purpose function to yappi, like set_context_id_callback.
The new function doesn't add much overhead. When there is no context_id_callback, yappi has no extra overhead compared to the previous version. When there is a context_id_callback, yappi's overhead increases about 13%, in my tests.
This pull request is incomplete. I'm showing it to you now to see what you think. If you approve of my basic idea, then I'll do some additional testing before I make a final pull request.
I am OK with the way you implemented this. I also agree with having another project for greenlet profiling on top of yappi. However, let me couple of days(I need some time because of my real job) to suck in some overview details about the greenlets/gevent. Maybe it can be changed a little bit...
In the meanwhile, as a beginner to this subject, let me ask you very trivial questions:
Are greenlets application level threads? Meaning there is actually only one thread, but you control it via switch statements basically, right?
Is this gevent library the most accepted lib on the subject, are there any others, if so, can they also be integrated with the hook you implemented? (No problem if you don't know others, just your initial thoughts on this is enough)
Also, I am just asking to verify: the only function yappi will implement is set_context_id_callback() right? Because in your code there are new variables added to _ctx struct? I am assuming these will be moved to your greenlet-profiler project, right?
Hmm I think I understood your implementation and the basic problem, you also traverse the callstack to advance the timing values. So, just disregard my last question. Let me some time to review your changes .
"Are greenlets application level threads?" Right, the "greenlet" package is a C extension for CPython. There is typically one actual kernel thread, but it can switch the running greenlet at any time by calling greenlet.switch() explicitly. However, in some applications there are multiple kernel threads, and each kernel thread can run multiple greenlets. (Greenlets can't cross threads.) I need to think and test to make sure this pull request handles multiple threads correctly.
"Is this gevent library the most accepted lib on the subject, are there any others." Yes and yes. Besides Gevent there's also eventlet and my own library, Motor. Profiling Motor is the motivation for this project. All of them are built on top of the same greenlet library. I believe this pull request supports all possible libraries that use greenlets. But I haven't actually tested Gevent or Motor yet, that's next on my agenda.
I'm changing my mind about something. In my patch, if a context_id_callback is set, yappi pauses the old context and resumes the new context whenever the context_id changes. But now, I think yappi should only do this if clock_type is "cpu".
Here's why: if clock_type is "wall", we want to know how long each function took in wall-clock time. It doesn't matter if the function took a long time because its greenlet was paused: the fact is, it took a long time. This is how yappi already works: if a function is paused waiting on a mutex, for example, then yappi will record that it took a long time in wall-clock time.
So I should only pause / resume contexts if a context_id_callback is set and clock_type is "cpu". It's only when clock_type is "cpu" that yappi needs to be changed, because it currently gets CPU time for the whole thread, not for each greenlet.
I have prepared a long mail explaining this to you. Happy to hear that from you:)
Bu again, there seems to be some problems regarding this algorithm of "pausing./resuming" context when a context switch occurs. Because we only detect context switch whenever a function is called. I will try to demonstrate this problem by an example in the following days. Just FYI: I have thought this kind of approach in the past and cannot come up with a reasonable/general solution.
The problem you are trying to solve is another beast not only detecting context switch of greenlets. Subtracting the elapsed time from all of the call stack items seems to solve the problem , but you also need to think about some corner cases, too. I have thought about this problem for “normal thread wall-time profiling” in past, and discard the idea as there seems not to be a good overall solution. For greenlets, on the other hand, there may be a possibility.
Here is an example:
defa():switch()# cpu intensive task that generates no CALL, CCALL, RETURN etc. eventTASK()switch()# cpu intensive task that generates no CALL, CCALL, RETURN etc. eventTASK()defb():switch()# cpu intensive task that generates no CALL, CCALL, RETURN etc. eventTASK()switch()# cpu intensive task that generates no CALL, CCALL, RETURN etc. eventTASK()
So in this example, if you use normal threads, it is impossible to detect context switches as no profile event is generated for context switches. However, for greenlets you may have that opportunity as switch() is actually a function which again generates a profile event. It seems to be OK but again need to tested carefully and you need to think about it a lot. Example: How will you handle recursive/mutual recursive functions when substracting paused time? Will there be any defects?
So, in general, I don't like the idea of subtracting time information from call stack
when a thread is paused. I don't like the idea because it seems not applicable to normal
threads which means the code you suggests contains greenlet specific code.
What I am suggesting is that following:
If you look at the update rate of yappi, you will see that I update it once or twice in a year.
I hopefully, want to be backward compatible starting from v0.82, so my suggestion is to
start "Gappi" on top of yappi 0.82 with the changes you shared, and it seems not so hard to
pull back the changes to yappi along the year. Because the problems you are trying to solve are complex and seem not to be in domain of yappi.
I see what you mean: this is a tough problem, and immediately accepting my patch might not be wise. On the other hand, I hope yappi can eventually include hooks to support greenlets, since I don't want to have a separate fork of yappi forever.
What do you suggest for now? Should I just pull yappi's full source (plus my patches) into greenlet-profiler and release it as an alpha to see if it works for people in the real world?
Yes. At first, I think you should release the greenlet-profiler by including yappi's full source. I agree with you that yappi should include some hooks to support different threading models eventually. Let's see the profiler in the wild a while, and in the meantime I will find some chance to think about the problem more, too. I would specifically try to focus to somehow eliminate/move the "substracting pause time" part of your code. If we can find a way to do that kind of interaction externally, (via some other hooks), I think it would be better.
Because, as you indicated before: not all greenlet libraries use N:1 thread model: meaning one kernel thread for all. Maybe some of them uses N:M model which makes the pause/resume implementation useless, right? We should somehow find a better way to support better integration for different threading models.
There's only one greenlet library. Although the greenlet library is used by several libraries (Eventlet, Gevent, Motor), they all use the same greenlet library at the core. And greenlets can never cross threads, so the pause / resume implementation for CPU time will work for all these libraries.
But I agree, let's see how this works in the real world before we change yappi. I've copied yappi into greenlet-profiler. I'll announce greenlet-profiler next week and see what happens.
The only thing that disturbs me about the current implementation is the pause/resume functionality when a context_id_callback is set. This seems not right as when you set context_id_callback, it only should set context_id_callback nothing more.
Well, anyway, I think I have figured out a way to export that functionality to Python code.
Here's the draft:
set_id_context_callback() will be same as you implemented.
set_context_switch_callback() function will be defined.
when yappi detects a context switch, it will call the callback if configured. (this may also be useful for other use cases, too)
advance_callstack_() function(I cannot decide on a proper name yet) which will advance the initial timing values with a provided value. As, you will provide the value, you will have to bookkeep previous context paused time and maybe other info you want.
With the context_name_callback set, get_thread_stats().print_all() looks like this in a Gevent application:
name tid ttot scnt
Greenlet 4386064816 1.031498 11
Hub 4386064976 0.000719 32
Otherwise it's just:
name tid ttot scnt
_MainThread 4388202608 1.031153 11
_MainThread 4388202928 0.000668 32
I love the idea of a context_switch_callback. In the case of greenlets there's actually a greenlet.settrace function, so I could pass the context_switch_callback to greenlet.settrace instead of yappi.set_context_switch_callback. However, there may be other circumstances, besides greenlets, in which we want yappi to be responsible for calling the context_switch_callback.
How about we name the final function adjust_time(long context_id, long amount)? amount can be positive or negative, it's added to the t0 for all items on the context's current call stack.
Whoever calls adjust_time needs access to the tickcount() when the context is paused, and when the context is resumed. Should Yappi make tickcount() public too?
adjust_time() function name creates an assumption that time is broke if we not call it. Let's name it pause_context() or other name you like and maybe we will also change it.
It would be nice to have a unittest for greenlets that have chained-recursive functions, like a->b->c->b->c. There are lots of examples for this scenario in yappi's original test_functionality module.
set_context_name_callback() would be great. Please also write a unittest for that,
But warning: Everytime you define a callback and call a function from yappi, you have the risk of creating a deadlock: as the profiled code may be in the process of an import and if you also import a module in those callback's, then you will wait forever on the global import lock of Python. Believe me that happens quite a while: that is the reason I am using PyImport_ImportModuleNoBlock() in _get_current_thread_class_name(). Anyway, keep that in mind and maybe in your documentation for GreenletProfiler.