#599 Declined
Repository
Deleted repository
Branch
badmerge2 (330bc7ad618c)
Repository
ogre
Branch
default
Author
  1. Pavel
Reviewers
Description

merge was premature. The PR was still in discussion with necessary changes outstanding.

Comments (50)

    1. Pavel author

      you should really completely back out this merge instead of fixing the builds as it also contains changes harming performance like the virtual Plane destructor or moving getters to cpp.

      1. GOD

        The vast majority of the commits are good, cherry picking controversial commits may be more efficient.

        "harming performance" "moving getters to cpp"

        you can't even measure that

        virtual Plane destructor

        it's negligible

        Regarding the modulation pass for shadows, your last commit broke shadow volumes for OpenGL (glsl).

        1. Eugene Golushkov

          I`ll reiterate my objections from PR 555 about Plane with vtbl:

          Adding virtual destructor to Ogre::Plane is like adding virtual destructor to Ogre::Vector3. It is primitive value type, used primarily for intersection tests, there is nothing to destruct, it should not have vtbl.

          Combining Frustum::enableReflection(const Plane& p) and enableReflection(const MovablePlane * p) into the one setReflectionPlane(const Plane * p) only to immediately split code based on RTTI check dynamic_cast<const MovablePlane*>(p) - IMHO it`s not good. BTW it additionally requires Plane to be polymorphic, that is especially bad.

        2. Pavel author

          The vast majority of the commits are good, cherry picking controversial commits may be more efficient.

          thats your POV. I would rather say most of them are pointless changes and some of them are even harmful.

          Regarding the modulation pass for shadows, your last commit broke shadow volumes for OpenGL (glsl).

          I tested it with Gl3Plus and GLES2 RS. What was the error you got? If you look at the code the shader has to work with both. Your changes are definitely incompatible with GLES2.

          the Plane destructor is not only bad for performance wise, but also bad design - you could achieve the same in a more clear way at compile time with function overloading.

          next there is the abuse of shared_ptr..

          Anyway the point is - this PR was not ready, so revert it and wait until it has matured. Fixing the build and leaving the other changes as landmines for other to step on is definitely the wrong thing.

          And yes I know what I am talking about:

          1. Murat Sari

            Cool down. What you do not know is that most changes are done by company which are using OGRE in a productive env and this company need performance. So most changes are done and tested with a real world app, sure some changes are done so that this app run good and we have to question which changes are not for the mainstream, Downside is yesterday I tested Eugene changes and wanted to merge...so its postponed until we sort the problems with this one. Also Assaf merged the changes because they are tested in real life app (windows and dx only).

            1. Pavel author

              What you do not know is that most changes are done by company (...) tested in real life app (windows and dx only).

              actually I read about the company along the lines. However the last point is why one should be suspicious to changes on OgreMain: They are only tested on windows and DX, but affect all platforms.

              Also having a performance oriented product does not necessarily translate into performance PRs as this one proves. Then you must balance the contributions of this particular company against other contributions. As far as I know Eugene is working for a company having an Ogre based product as well..

              This is also what is really bugging me: there was no such balancing here. While other, more polished, PRs are sitting in the queue for months without comments, this one was merged despite ongoing discussions in 3 weeks which shows bias. It was even marked controversial in the first place.

              Also note that this PR resulted from the not reviewable humongous pull request #555 which others and me objected against and which eventually got split up into this and two, good, DX related PRs. So insisting on good contributions indeed results in good contributions. This is also why this PR should be reverted and not cleaned up.

  1. Assaf Raman

    lets start by fixing the build (I did check the build after the merge but I guess I missed something ) - regarding the rest - I will have a look.

  2. Assaf Raman

    Pavel – great comments – keep them coming, you may want to know that I personally going to go over each of them and resolve any issues you have. It is true that Lior’s code comes from a real company, and it is also true that it was mostly checked on DX and windows, so some of it may need to change to handle more platforms\cases. I will talk directly to Lior tomorrow(Sunday), and I going to work all of Tuesday on this. For now I want to try to resolve everything, if the team will feel in the end that the result is not good – we will revert back to the merge, but this is a last resort. The default branch can be broken from time to time, this is just how it is when you do large changes, it happened more then once in the past, in this case it is worth the effort as Lior has many good fixes\features. If some of it does harm– I will remove it or will add an option to compile without it to cmake with the default set to the more common case, perhaps add it as a platform dependent change. So – no flames here, just good people trying to make OGRE the best project, we all share a common goal. I am sorry if this merge results in some of your time being wasted and I will do my best to make sure that in the end you will be satisfied. If not – we can talk about it more.
    I am a little sick today, so I hope I will be good by Tuesday, but in any case (even if I am sick) I will talk tomorrow to Lior. Cheers, keep up the good work and comments!

  3. Pavel author

    Maybe this more about how you manage PRs than the concrete technical issues. Lets begin and see:

    The default branch can be broken from time to time, this is just how it is when you do large changes

    I think that this is the first misconception. The default branch is where all other contributors branch off for their changes. If it breaks it breaks productivity for everybody. It is true that you cannot completely prevent this, but I think you should try to avoid it as far as possible.

    In our case I think you should have created a new branch where you integrate Liors changes and ask others for tests/ comments before merging them into default.

    you may want to know that I personally going to go over each of them and resolve any issues you have.

    this was indeed an misconception on my side. Typically it is the responsibility of the creator of the PR to resolve any issues with the PR. But as you put high value in this changes I now understand your dedication.

    in this case it is worth the effort as Lior has many good fixes /features

    this is true. However you have to keep in mind that there are also many bad changes and poor implementation of essentially good features. Lior, this is not personal, but just judging the code from my POV. The changes are also only tested on one platform for one specific use-case. Therefore I am strongly opposed to merging the PRs as is.

    For now I want to try to resolve everything, if the team will feel in the end that the result is not good – we will revert back to the merge, but this is a last resort.

    I would rather like to review diff(default, lior_changes + assaf_changes) than diff(default, lior_changes) and then diff(default + lior_changes, assaf_changes). It is by far more likely to miss something in the latter case.

    Technical comments

    In general I think the PRs should be focused ond only introduce as much change as strictly necessary so they are easier to revert and easier to review. This is something that should not be part of a PR ever.

    Other PRs adhere to this, but as you are willing to do a fixup, lets move on.

    Our primary goal should be to keep OgreMain clean and focused on common functionality. If you put RenderSystem and usecase specific parts there, it will become unmaintainable as you will not be able to do changes without breaking something.

    HBU_ONLY_ACTIVE_DEVICE

    the comment for HBU_ON_DEMAND says the same. There is no comment telling the difference. Both of the enums are D3D9 specific. Mesh respects getFreqUpdatedBuffersUploadOption while Entity and ShadowCaster do not. In general it seems like a badly integrated fix for a case where HBU_ON_DEMAND did not work as expected.

    CRC32

    A whole new Hashing algorithm was added to OgreMain that is only used in one function (in Main). There are already FastHash and Murmur3 which already seem 1 too much. There must be a good justification to add and maintain yet another one. If it is the best of the 3 we should rather remove the other two instead of just adding CRC32 here.

    BindingTypeBitShift

    duplicates BindingType without adding any value. Only used at one location. See my comment there.

    RenderSystemDataPtr

    makes Renderable own RenderSystemData without a valid reason. The current Ogre Memory Management concept is that some *Manager object owns (i.e. deletes) objects of a type. This runs against it. For RenderSystemData I would say RenderSystem is/ should be the owner.

    SharedPtr::null_ptr

    this is conceptually nonsense. You only ever test if(ptr) never if(ptr == SharedPtr::null_ptr). Think about it. It only emerged because SharedPtr was used wrong in other code parts.

    The proper fix would be to declare Pass::getProgram(GpuProgramUsage&). And use that. There was neither a null pointer check before because the API requires to check before use with Pass::has*Program.

    Reflection Plane changes

    already discussed. The vtable for Plane is pointless as is the whole change. You merge code paths only to split them using a dynamic_cast later. The code was in better shape before.

    ShadowVolumeExtrudeProgram::mModulate*glsl

    this should work with GL1/3/ES2. The changes break GL1 and ES2 for sure. Also it is not clear why it did not work with GL3 for Lior.

    moving getters to CPP

    just introduce reviewing noise and prevents inlining. just drop the commit.

    const primitive_type

    const bool foo(); gives you rightfully tons of warnings.

    Other

    • ShadowCaster does not set op.srcRenderable, while everybody else does.
    • ArrayParams: remove any member functions and initialize as ArrayParams a = {0};. Then learn C.

    Get well soon and have fun fixing.

    1. Assaf Raman

      thank. I will do my best when I will be well. I did talk to Lior over the phone and we have gone over all of your previous comments. I will do the same with some new good topics you raise here.

    2. GOD

      All of These are not neccesarily for fixing.

      I count 11 issues, I'll number them for future reference, There are issues we simply disagree, no need to elaborate on that, these already discussed. Assaf and the team will see fit.

      1. HBU_ONLY_ACTIVE_DEVICE - From what I remember this is a Direct3D9/Direct3D11 and in the future Direct3D12 specific, since there's no implementation for multi device in OpenGL. shadow casters are updated only for their respective device, I'll revisit it again to be sure.

      2. CRC32 - the same speed of fast hash with less collisions, could be much faster than fast hash with hardware acceleration on newer intel processors.

      3. BindingTypeBitShift- yes, it's a duplicate, I think it's more convenient this way, I know you disagree.

      4. RenderSystemDataPtr - already discussed, we disagree on that too.

      5. SharedPtr::null_ptr - Allows to return null shared pointer by reference. is it wrong to do so ?.

      6. Reflection Plane changes - your claims and Eugene's are spot on, nevertheless I would still leave the code as it is as I said before, perhaps it's wrong.

      7. volume shadow modulation pass - I had visual artifacts, shadows were gradient of black and yellow, doesn't matter, it still broken. Today I was working on getting an OpenGL ES emulation to work for fixing the current issue, but it seems like everything works except the stencil buffer, I'm looking for alternatives.

      8. moving getters to CPP - There are issues with MSVC where 'this' pointer is out of scope in a single line methods where breakpoints are triggered. Changes made to the header file require more dependencies to be rebuilt, and in many cases (I'm not sure if it's an issue with MSVC) require rebuilding the project.

      9. const primitive return type - Indeed it's not necessary, I think it tells the story of the code better. what compiler gave you errors ? I don't see them on VS2015 (VC14).

      10. ShadowCaster - op.srcRenderable intentionally not set for shadow casters, shadow casters are rendered differently from standard entities and there's no reliable way to track their state, Not setting srcRenderable is telling the Direct3D11 render system: "Don't track state". As far as i know, no other render system uses this member.

      11. ArrayParams - the current implementation is safer than your suggestion.

      1. Pavel author

        2 CRC32 - the same speed of fast hash with less collisions, could be much faster than fast hash with hardware acceleration on newer intel processors.

        According to this FNV hash (aka. std::tr1::hash) is even faster. And Murmur3 is again even faster. Why not use those? Also the used implementation will not magically get faster on newer intel processors by itself. Even if it would - Ogre must also perform well on ARM devices. Then we are talking about 544 bytes that you hash there, so there is no sane justification on adding a whole new hash function there.

        5 SharedPtr::null_ptr - Allows to return null shared pointer by reference. is it wrong to do so ?.

        It tells you that your code trying to do so is wrong. See my previous explanation.

        8 moving getters to CPP - There are issues with MSVC where 'this' pointer is out of scope in a single line methods where breakpoints are triggered. Changes made to the header file require more dependencies to be rebuilt, and in many cases (I'm not sure if it's an issue with MSVC) require rebuilding the project.

        what you are describing is the effect of inlining. You worked around it by making breaking the code s.t. it can not be inlined any more. Just disable optimizations when you need to debug these functions. The general Ogre Coding Style is to put trivial getters/ setters in the header.

        9 const primitive return type - Indeed it's not necessary, I think it tells the story of the code better. what compiler gave you errors ? I don't see them on VS2015 (VC14).

        GCC see here. Also you should figure out while the Tests with the GL RS are asserting.

        11 ArrayParams - the current implementation is safer than your suggestion.

        lets add it do the list where we do not agree..

        1. GOD

          2. CRC32 - Adding hardware support is in the todo list, Intel added it, perhaps other manufactures will follow, once you have hardware support it is the status quo.
          So should we begin the migration process to CRC32 right now ?
          If not, Murmur would probably be a good solution.

          8. moving getters to CPP - this happens with full debug information without optimzations.

          methods may stay in the header file, but the open/close curly brackers should have their own line.
          header.h

          int foo()
          {
           return 3;
          }
          

          5 SharedPtr::null_ptr - the solution is not ideal.
          the intention of my question was:
          "Is there a legitimate case where shared pointer is returned by reference and may be null ?"
          If so, null_ptr protects from memory corruption, if not, null_ptr should be removed.

          1. Pavel author

            5 SharedPtr::null_ptr

            no, to require a static "null" shared_ptr the question must be:

            Is there a legitimate case where shared pointer must be returned by reference and may be not existing

            and the answer to this is no.

            1. GOD

              It's not a must but may be beneficial, shared pointer is an object like any other.

              UserObjectBindings::getUserAny returns by reference, if there's nothing to return it returns the static Any::msEmptyAny

              Similar behavior with SharedPtr::null_ptr.
              Perhaps null_ptr is a wrong name, and should be changed to SharedPtr::emptyPtr,
              that way the user won't be lead to expressions such as if (ptr == SharedPtr::null_ptr).

              Regarding our specific issue, I didn't quite get your solution: "the proper fix would be to declare Pass::getProgram(GpuProgramUsage&)"
              Would you please elaborate ?

              1. Pavel author

                You allow NULL in Pass::getProgram for no reason: the NULL check has to happen at a higher level as required by the Pass API: you must call Pass::has*Program before you call Pass::get*Program. Therefore the correct "fix" would be to add an assert/ exception like in Pass::get*ProgramParameters.

                Maybe you still could allow NULL in Pass::getProgram to move that exception to a common place.

                I admit though that the Pass implementation is quite bad by forcing one through the GpuProgramUsage* indirection all over the place. The correct fix would be adding a GPT_NULL type and initializing all GpuProgramUsage with it.

                But this is what I meant: if think you need SharedPtr::null_ptr, it just means you have a bigger problem at a larger scale..

                1. GOD

                  I can live without the SharedPtr::null_ptr, hence forth, SharedPtr::emptyPtr
                  Return value initially returned by reference, probably for the benefit of performance: http://sourceforge.net/p/ogre/code/2947/?page=1#diff-4 (scroll to OgrePass.cpp)
                  which implies for an empty static, as in user bindings.
                  Now the usage is simpler, no need to call Pass::has*Program
                  Moreover, I already have several cases where I use SharedPtr::emptyPtr in my use case, simply because it's a faster.

                  I understand now that the debate is whether a variable of the kind of SharedPtr::emptyPtr is needed / beneficial in general.
                  Since there's no settlement on that between us, I would suggest for others to step in, and we'll see what the majority thinks.

            2. Matias Goldberg

              I am catching up with the discussion. I wasn't planning on saying anything. But it appears I do.

              5 SharedPtr::null_ptr

              I am strongly against this one. It will encourage the use of:

              SharedPtr myPtrExample01 = SharedPtr::null_ptr;
              if( myPtr == SharedPtr::null_ptr )
              // ...
              return SharedPtr( SharedPtr::null_ptr );
              

              When we already have:

              SharedPtr myPtr;
              if( myPtr.isNull() )
              // ...
              return SharedPtr();
              

              I don't want two ways of doing the same spreading throghout the entire codebase as time unfolds.

              Specially when the reason for this addition is because a rarely used feature (UserObjectBindings) wanted this, when it was already working just fine by using a static member variable Any::msEmptyAny

              CRC32 vs Murmur vs etc

              I chose Murmur because it has a 32-bit variant (when size matters or speed of comparing already hashed keys matters) and a 128-bit one (when collisions matter; such as the case of when comparing hashed shader source code)

              Optimized, well written CRC variants usually come in 32-bit & 64-bit flavours, but not 128-bit.

              I'm also wary of HW implementations. If we were to hash GB/s worth of data, then going to great lengths (like using HW acceleration) is justified. Otherwise we have to deal with CPUs that support the CRC extensions and the ones that don't for marginal gains.

              Which means either having multiple EXEs to support all the extensions, or do if(..) {} else {} which leads to branching, cache misses, etc; beating the point of using the HW acceleration.

              Note: having multiple EXEs is a valid strategy (i.e. this is what games do with SSE... they have a launcher that selects the proper exe); but considering hashing is secondary for us and not time critical; it's totally not worth it.

              Getters in headers vs in cpp

              Advising to use LTO is NOT a reasonable solution. LTO is a band-aid for projects that have grown too large, are out of control, and can't afford big refactors.

              Yes, LTO may improve performance even for well developed, well written projects. But you do that on the final build before shipping to the customer. LTO increases build time by many times which makes it unacceptable for production where iteration times must be kept low.

              Having this said:

              Getters in the header improve runtime performance because the compiler will automatically inline them.

              However it also increases compilation time because the compiler must evaluate the function's declaration as a temporary function (because it's inlined) for every compilation unit where it's included; whereas the opposite (placing getters in the cpp) sacrifices runtime performance for for lower compilation times.

              There is no definitive answer and good judgement is needed here. If the getter will be a hot one (i.e. called often, called in a performance sensitive section of the code) then place the getter in the header.

              If the getter isn't hot or pollutes readability of the header, then considering placing it in the cpp file is acceptable.

              1. GOD

                2. CRC32 vs Murmur vs etc

                CRC32 has been removed completely, changes has been entered to our internal PR queue.
                As discussed I went for option 1.

                Case closed.

                5 SharedPtr::null_ptr

                The reason for it is not as you wrote.
                First, null_ptr is a bad name, it is now SharedPtr::emptyPtr. It depicts its purpose better, and immediately relief your concern for duality.

                SharedPtr::emptyPtr is used to be able to return safely shared pointers by reference when there is no instance to return.
                The code before was unsafe, and relied on the user to perform prerequisite query(s) on the API as Pavel wrote.

                I referred toUserObjectBindings::msEmptyAny only as a precedence for adding SharedPtr::emptyPtr to strengthen my claims.

                8. Getters in headers vs in cpp

                Due to the evidences came forth and the consideration for mobile devices, I've made my peace with that.
                I would rather define a guiding rules than to trust one's judgment.

                Suggestion 1:

                All "hot" functions should be treated this way, since in essence they are not different from getter/setters.

                A way to define a "hot" function may be by its average count calls per second.
                Degree of code pollution may be its number of lines of code.
                Whether this method should be inlined could be a factor of the two.

                Suggestion 2:

                As a thumb rule, I suggest placing all functions in the implementation file, and when a user comes forward with a "hot" function and the evidences supporting it, not resulting due to a bad usage, then the function should be added to the "hot functions list".

        2. GOD

          2. CRC32

          Did some hash function comparison, results as expected.

          http://pastebin.com/cidNB7FA

          Murmur3 is the fastest software solution, though it suffers more from collisions compared to CRC32.

          So I see three options:

          1. Test Murmur3, if works good, remove CRC32.
          2. Leave things as they are, later on add hardware support for CRC32-C.
          3. Add support to H/W CRC32-C, and change the software solution from CRC32 to CRC32-C I did find a BSD license implementation "Compiles with GCC on Linux and Mac OS X:" with the same Slicing-by-8 algorithm used in the current CRC32 implementation. http://www.evanjones.ca/crc32c.html

          Note: the new atom mobile phones (zenfone 2) do have CRC32-C hardware support.

          1. Pavel author

            for now I think we should go with 1. - as I said we are just hashing 544 bytes. Once you have the CRC32-C w/ hardware support and software fallback ready we might consider replacing Murmur3 for hashing the microcode. But this should be a separate PR. Until then everything is premature optimization.

          2. Murat Sari

            We should really use one hash function here - also I can not remember which hashing the RTSS is using right now. One hash should rule them all.

            1. GOD

              The RTSS uses it's own hash function.

              To generate a unique value this component used to use _StringHash class.
              However when this generates a hash value it selects a maximum of 10 places within the string
              and bases the hash value on those position. This is not good enough for this situation.
              
              Different programs must have unique hash values. Some programs only differ in the size of array parameters.
              This means that only 1 or 2 letters will be changed. Using the _StringHash class in these case will, in all 
              likelihood, produce the same values.
              

              "One hash should rule them all."

              which one ?

                1. GOD

                  CRC32-C has less collisions and is much faster on Intel platforms

                  Edit: Also on all newer AMD CPUs (Bulldozer,Jaguar and Piledriver) , so basically it is now a standard for virtually all desktops and laptops.

  4. Assaf Raman

    Hi Pavel, One of your points is that getters should be in the h file and not in the cpp, as they are not inline if they are in the cpp. I checked.

    1. First everybody here seem to think this is not true, see the point there - I don't need to repeat.

    2. As I respect you I did my own test, opened a new c++ project, had two getter member functions for a class - one in the h and one in the cpp. The assembly (I used the dissembler) in release is the same - see for yourself - here is the project -
      In the header:

    class Test
    {
        int a;
        int b;
    public:
        Test() : a(0), b(0)
        {
        }
    
        void setA(const int val)
        {
            a = val;
        }
        void setB(const int val);
    
        int getA() const
        {
            return a;
        }   
        int getB() const;
    };
    

    In the cpp:

    void Test::setB(const int val)
    {
        b = val;
    }
    
    int Test::getB() const 
    {
        return b;
    }   
    

    main:

        Test t;
        int result = 0;
        t.setA(rand());
        t.setB(rand());
    
        result += t.getA();
        result += t.getB();
        return result;
    

    The result release assembly:

    asm.png

    As you can all is inline (the call is to "rand")

    Do you have a platform that you need for OGRE that has it as a function and not inline? (were you see the function call to getB in the assembly).

    1. Pavel author

      intresting. As you might have noticed MSVC is not my primary platform.

      That kind of optimization is possible, hower one needs Link Time Optimizations for that. And indeed with MSVC 2010+ they are enabled by default: https://msdn.microsoft.com/en-us/library/xbf3tbeh.aspx

      However neither GCC nor clang enable this by default. See:

      As I recall there LTO significantly increases link time on these platforms and in some cases degrades performance. Furthermore the prevalent style in OGRE is to put them in the header.

      1. GOD

        "the prevalent style in OGRE is to put them in the header"

        That's not necessarily right, you may also refer to OGRE coding standards, 'Top-level organisation issues', section 3.
        If you think small getters/setters should be an exception, you are welcome to raise this suggestion.

      2. Assaf Raman

        Pavel you are right, I tested the same program using gcc as the compiler (with -O3), and then used the free version of IDA to see the result assembly.

        Here is the assembly when everything is in the header in the class:

        mainInClass.png

        And here is the code from my post above without a change (when the getters\setters are outside the class in the cpp)

        mainNotInClass.png

        You can see the calls to get getters\setters and you can see that there are more assembly commands.

        Lior - it seems to me that we should revert the getters\setters back to the header. This is also true for the DirectX render systems - as someone may want to compile them using gcc on windows for windows.

      3. Assaf Raman

        Pavel, I discussed this issue with other OGRE team members and they have asked me to test the LTO. So I added the "-flto" flag to the gcc compile and the result was as in MSVC (all getters\setter were inline automatically):

        mainflto.png

        So if that is the case - we feel that we need more information regarding your claim that LTO "in some cases degrades performance".

            1. Pavel author

              the Bullet Benchmark is slower at runtime with LTO as is the example in the GCC bugtracker. Or have I misunderstood your question?

              1. Assaf Raman

                No, my bad, I didn't read it all - just your comment regarding it.

                So - LTO seems not to be a valid solution I guess.

                I guess we will have to put the getters\setters in the h-file. I don't really like it, but as gcc is commonly used - I guess we have to, right?

          1. Assaf Raman

            Pavel - what about gcc 5.0, I read here:

            "Command-line optimization and target options are now streamed on a per-function basis and honored by the link-time optimizer. This change makes link-time optimization a more transparent replacement of per-file optimizations. It is now possible to build projects that require different optimization settings for different translation units (such as -ffast-math, -mavx, or -finline). Contrary to earlier GCC releases, the optimization and target options passed on the link command line are ignored."

            Doesn't it mean that from gcc 5.0 the getters can be in the cpp?

            1. Pavel author

              I dont quite read where you see a a change in this regard there..

              Also I do not really understand the importance of putting the getters in cpp for you. When it gets to debugging in MSVC it is enough if the return statement of the getter is on a separate line. (as Lior wrote somewhere)

  5. Assaf Raman

    Regarding HBU_ONLY_ACTIVE_DEVICE

    It should be documented, you are right. I will make sure a comment will be added. I guess it should be renamed to "HBU_ONLY_ACTIVE_GPU" as DEVICE is a DirectX term. But - there is a need for it, it is for the case were you render on more then one GPU, in many cases you want to control if textures and other resources are uploaded to all device or only the active one, and you don't want it "on demand", for example - an RTT texture - sometimes you create it for use on only one monitor\GPU (compositor, shadows) and save the memory of the other GPU, and sometimes you want it on all monitors (a map of the room that is part of the overlay). You can't copy it on demand, you want to render it ahead of the scene and monitors. On demand doesn't work for dynamic textures\vertexes in some case. This is not a DirectX issue, but a general multi GPU issue, it only happens that DirectX has a much better multi monitor support then GL - as none of us in the community ever needed that support on OpenGL and multi GPU is supported in GL only on the expensive GPUs that most people don't have, but I will be happy if anyone will add it, so this enum value should be available.

    1. Pavel author

      Does OGRE really allow to control which GPU renders which RenderTarget? I think the scenario you are describing is: multiple GPUs rendering the same scene with some resources only needed on one GPU. However is this really possible today? All of the participating GPUs should active then, right?

      The scenario I had in mind was: you have an integrated and dedicated GPU with only one of them rendering, but windows duplicating the resources in case it wants to switch GPUs (e.g. for energy saving) which degrades performance. But then what you really want is "on demand", namely in the case that windows indeed decides that it is power-saving time.

      If only the latter scenario is possible today, I think we should stick with the HBU_ON_DEMAND semantics. If we really need GPU affinity, we should be able to tell which GPU we want the resource on instead of just "active GPU".

      multi GPU is supported in GL only on the expensive GPUs that most people don't have, but I will be happy if anyone will add it, so this enum value should be available.

      this is wrong: http://www.equalizergraphics.com/documentation/parallelOpenGLFAQ.html. Yet I agree that OpenGL support for that is essentially non-existing.

      1. GOD

        "multiple GPUs rendering the same scene with some resources only needed on one GPU"

        It's exactly that:
        All the GPUs are active and independent of each other.
        So for maximum performance the current multidevice implementation ensures that the render target is being rendered by the GPU which the monitor where the render target is displayed on is connected to.

  6. Assaf Raman

    RenderSystemDataPtr

    Lior wrote here:

    I was about to revert the shared pointer for the render system data, but the current usage imply for a shared ownership or a potential for it. Renderable::setRenderSystemData is a public method, deleting the old pointer in Renderable::setRenderSystemData may cause memory corruptions since it's unclear who uses it, and not deleting may lead to memory leaks if used improperly.

    So instead of leaving the responsibility to the user, I moved it to the Shared pointer. It may reduce further complications

    I think he is right - as Renderable::setRenderSystemData is a public method - I think his change is valid.

    But I also agree with what Pavel wrote:

    the Field is only read/ written to by the rendersystem, so the rendersystem should also manage the RenderSystemData Objects. For this it does not need to know whether the renderable is still alive or not. If anything this field should be changed to Ogre::Any to match Ogre::GpuSharedParameters.

    Ogre::Any was created for such cases.

    So Lior - can you change to Ogre::Any and rename to setRenderSystemAny\getRenderSystemAny? This will make the member functions to be with the same naming as setUserAny\getUserAny. Also remove RenderSystemData and RenderSystemDataPtr from Renderable.

    Eugene wrote

    RenderSystemData is potentially object with high-frequency access

    I don't agree, there aren't that many renderables for this to be a performance issue for the render system.

    Thanks.