Handle future vote versions properly

Issue #184 resolved
Michael Witrant
created an issue

This problem was identified by @John L. Jegutanis here: https://bitbucket.org/JordanLeePeershares/nubit/pull-request/209/added-a-framework-for-protocol-version/diff#comment-4851998

If someone sends a vote with a version number above the current protocol, all the clients will accept it and parse it like any other vote in the current protocol version. But when we later introduce a change the new client could think this old block already support the changes we just introduced.

So someone downloading the blockchain from the beginning with the latest client would probably fail to parse this special block.

We can wait for such an event to occur before we fix that, because it would only affect future releases. But if this happens while we're waiting for a protocol change (i.e. we released a protocol change and we are waiting for the new protocol to be effective), we would have to release a urgent and mandatory fix.

I proposed a solution here: https://bitbucket.org/JordanLeePeershares/nubit/pull-request/209/added-a-framework-for-protocol-version/diff#comment-4854770:

The cleanest solution I've found so far is to provide a maximum allowed version to the vote unserialization. If the serialized version is above this maximum, the vote is unserialized like if it was serialized with the maximum version.

The maximum version could be set in pindexNew->vote.nVersion before the vote is extracted here. The maximum version would be 50000 before protocol V06 is effective, and 60000 after that.

In the vote unserialization code we would not change this->nVersion if the serialized version is greater. If using this->nVersion for the maximum version leads to ugly code it may be better to add a CVote::nMaximumVersion or something. It's easier but it takes some memory. It may also be possible to use the serialization nVersion passed here. I think it defines the initial value of the local variable nVersion (as opposed to this->nVersion). It may be cleaner because we could just add a nMaximumVersion parameter to ExtractVote but I'm not certain there would not be side effects (the serialization code is quite complex).

Comments (22)

  1. John L. Jegutanis

    Only had time to look at it today.

    The solution that we initially discussed works only when are accepting new blocks and not when there are already serialized blocks like the version 60000 blocks that you discover on testnet (I am to blame about them).

    The problem is that in the CVote's IMPLEMENT_SERIALIZE we cannot trust the nVersion value as it can have arbitrary value and we cannot make conditional READWRITEs.

    One solution could be to ignore the CVote:nVersion from V06 and instead rely on the CBlockIndex::nTime and CBlockIndex::nEffectiveProtocol in the next versions. This means adding a CBlockIndex* field to CVote class...

    edit: some rephrasing

  2. Michael Witrant reporter

    I suggest you start by adding a nMaximumAllowedVersion or something to CVote and fill it before votes are (un)serialized. Then we'll see whether it's feasible (and useful) to use nVersion instead.

    For already serialized votes you can set the maximum version before this line. But unfortunately we don't have nTime yet. So I suggest we serialize nEffectiveProtocol from pull request #209 before all that (for example just after nVersion) and use it. It would be initialized to 0 instead of being serialized if nVersion is < 60000. When the effective protocol is 0 then the maximum vote version would be set to 50000.

  3. John L. Jegutanis

    Created a commit to the #209 pull request that deals with this issue.

    Instead of a maximum version it uses the nEffectiveProtocol for the conditional serialization. I did some tests and it works, except for the time that that it switches to the V06 protocol as I haven't setup the cucumber testing environment.

    I still would like to do some more testing as the part where the CVote::nProtocolVersionVote starts to be serialized is critical.

    Also the part that I don't like is that we have to be careful and set the CVote::nEffectiveProtocol before (un)serializing and this creates a potential for bugs. We could add this parameter to the constructors but I am not sure it is the best way.

  4. Michael Witrant reporter

    The current solution in pull request #209 is not very satisfactory.

    This whole version problem is a mess, and may also affect other serialized structures.

    What about just assuming future versions are invalid?

    We would start 2.0 at protocol 60001 or something and just solve this problem just by rejecting any vote whose version is > PROTOCOL_VERSION.

    When a block is received we would additionally make sure the version of a vote is not above the effective protocol version.

    What do you think @John L. Jegutanis?

  5. John L. Jegutanis

    The rejection of a CVote::nVersion that the client does not support is good but solves part of the problem.

    The current situation is that we have 3 values for the version:

    • nVersion is effectively deprecated but still used in serialization
    • nEffectiveProtocol is never serialized but computed on runtime and has to be set before the CVote usage
    • nProtocolVersionVote is the vote for the upcoming version

    We could use the nVersion to store the effective protocol and remove the nEffectiveProtocol completely. Like this a client not only rejects versions that it does not support but also versions that are still not voted for or version downgrades. The nVersion value will reflect the global consensus and not just the individual version of a client.

    We need to be careful to not serialize nVersion with PROTOCOL_VERSION as it is right now but always using and checking the calculated effective protocol version. The most critical places for this are in CWallet::CreateCoinStake() and CBlock::AcceptBlock(). Any other places?

    By using the CVote::nVersion as the effective protocol version will only simplify the current mess. Another side effect is that nVersion is used in CCustodianVote serialization, this simplifies the conditional check in the CCustodianVote::IsValid() to enable the NSR grants.

    What do you think @Michael Witrant and most importantly what are the downsides?

  6. Michael Witrant reporter

    The rejection of a CVote::nVersion that the client does not support is good but solves part of the problem.

    Yes, the other part of the problem is we must reject versions that are not yet effective.

    The good thing with this solution is we can easily do that after unserialization (compared to having to inform the vote structure what's the current protocol prior to unserialization).

    We could use the nVersion to store the effective protocol and remove the nEffectiveProtocol completely.

    Using nVersion as the effective version would probably work but I'm afraid it would be a mess to implement. You'd still have to use this kind of code: https://bitbucket.org/JordanLeePeershares/nubit/pull-request/209/added-a-framework-for-protocol-version/diff#Lsrc/main.hT1596

    If you want to go this way, then it may be better to completely remove nVersion from the CVote structure, and provide the effective version to the unserialization code.

    That would mean passing the effective version instead of PROTOCOL_VERSION in ExtractVote. And in the block index serialization we would have to use a custom READWRITE macro to which we can pass an explicit nVersion. There are certainly other changes, and maybe some difficult ones, especially to handle old versions (we must be able to calculate the version of each vote serialized in the blockchain).

    That may be how I should have done it from the beginning, but I'm not sure it's the best solution now.

  7. John L. Jegutanis

    Sorry for the late response, I wanted to think about it for a bit and make some tests.

    You were right about the READWRITE it complicates things a lot.

    Additionally I discovered a place that creates a problem: CBlock::CheckBlock(). This checks the block validity in a contextless way, meaning we cannot count the votes and calculate the protocol version for disconnected or orphan blocks. When processing new blocks, it will catch any errors in CBlock::AcceptBlock() but this is not good enough IMO.

    In the CVote's IMPLEMENT_SERIALIZE block I noticed this:

    ReadWriteSingleMotion(s, nType, nVersion, ser_action);
    

    Why it does not increase the nSerSize like READWRITE? For example like:

    nSerSize += ReadWriteSingleMotion(s, nType, nVersion, ser_action);
    
  8. Michael Witrant reporter

    Additionally I discovered a place that creates a problem: CBlock::CheckBlock(). This checks the block validity in a contextless way, meaning we cannot count the votes and calculate the protocol version for disconnected or orphan blocks. When processing new blocks, it will catch any errors in CBlock::AcceptBlock() but this is not good enough IMO.

    It's not ideal but there are things we are forced to check later when we know the previous block. If we can't do that we will have to reject some motions.

    The proof of stake check still works on disconnected blocks, so I think it's not a big problem to accept potentially invalid blocks because it cannot be abused (you need a valid and unused kernel to propagate a block, so it's something rare).

    The vote is already checked twice (in CheckBlock and AddToBlockIndex). It's probably redundant. We could have 2 levels of check: one for not-yet-connected blocks in CheckBlock, and one for connected blocks in AddToBlockIndex.

    Why it does not increase the nSerSize like READWRITE?

    It looks like I forgot to do that. I'm surprised it didn't have consequences. nSerSize is probably used only to calculate the serialized size and we never do that on votes. It'll need a fix though. This part could probably be rewritten at the same time, with a const cast instead of all the methods.

  9. John L. Jegutanis

    I have updated the pull request 209 to simplify a bit the serialization of the protocol voting.

    It changes the CVote::nVersion to nProtocolVersionVote so the new clients can vote immediately and the V06 switch will happen after a vote count.

    Also when this value is over 60000 it does not change the nVersion of the IMPLEMENT_SERIALIZE so in effect the CVote is serialized on disk with the client's version (instead of the version that is on a block's CVote). The network (or better the blockchain) serialization of the CVote is always CBlockIndex::nEffectiveProtocol.

    After the switch the client will reject any CVote::nProtocolVersionVote that is lower than the effective protocol, this means that any V05 blocks after the fork will be rejected (by the minting majority).


    One part that I find my understanding lacking is why an 0.5.2 client does not accept a longer chain:

    This was part of a test in protocol_votes.feature. All nodes switched to "V06" block, even the Old 0.5.2 node accepted it

    When node "Old" finds a block "V06_rejected"
    And node "A" finds a block "V06_accepted"
    And all nodes reach the same height
    Then nodes "A", "B" should be at block "V06_accepted"
    

    Now the A and B are on the V06_accepted block while rejecting Old's V06_rejected block

    When node "B" finds a block "Newest"
    Then all nodes should be at block "Newest"
    

    Here it fails as the 0.5.2 node does not accept the "Newest" block because of this error:

    ProcessBlock() : block with too little proof-of-stake

    Any idea?


    Another issue that I encountered when testing is the time it takes to generate a stake and simulate a vote. Generating 1800 PoS blocks takes around 30min, so I set the required/total votes to 9/10 for testnet but it should be changed to another value.

  10. Michael Witrant reporter

    Questions specific to a solution should probably be discussed on the pull request itself.

    I find replacing nVersion by nProtocolVersionVote very confusing. Before V06 it contains the serialized version and after V06 it contains the protocol vote.

    Votes are serialized in multiple situations and I'm not sure this solution will work every time. It's not obvious by reading the code anyway.

    The current code fails to validate our current blockchain (it may be only because of this but there may be more).

    In the vote serialization I'd rather isolate the old nVersion processing and the new nProtocolVersionVote processing.

    It looks like the removal of nVersion was successful though. It's cleaner this way but it led to complications elsewhere so I'm not convinced it's positive at the end.

    If it was not removed you could ignore old protocols and wouldn't need these block times everywhere. You could probably have just one CalculateProtocolForNextBlock(const CBlockIndex* pindexprev) instead of these few related methods (there's room for improvement even if nVersion is removed though).

    The fact it already enables voting for protocol V06 is a nice addition though.

    ProcessBlock() : block with too little proof-of-stake

    Any idea?

    No, I'll try to look at that.

    Generating 1800 PoS blocks takes around 30min, so I set the required/total votes to 9/10 for testnet but it should be changed to another value.

    You can use #ifdef TESTING to set lower values only active during cucumber testing.

  11. John L. Jegutanis

    The CVote::nVersion is basically renamed to nProtocolVersionVote so we can revert it, in any case the value is always PROTOCOL_VERSION when the client creates a block.

    The difference is this line, where we say that after V06 we don't serialize the CVote with a value that a peer decided because of abuse concerns.

    Even if we kept nVersion and added a new nProtocolVersionVote, the issue will remain: we are serializing the CVote with a value that we cannot fully trust.


    Now for the blockchain loading issue, the root cause is that when CDiskBlockIndex is loaded from disk there are blocks that have serialization version of 1000100 (the current CLIENT_VERSION) so the part where it was loading the CDiskBlockIndex::nEffectiveProtocol failed because the test was for nVersion >= 60000.

    By renaming the PROTOCOL_V06 to PROTOCOL_V2_0 with a value of 2000000 solved the problem (tried also version 1010000 and it also worked).


    I merged and with the latest 2.0.0 branch and all tests pass except:

    • import_keys_in_wrong_wallet.feature:2 # Scenario:
    • votenotify.feature:27 # Scenario: Votenotify is used as data feed source
  12. Michael Witrant reporter

    The difference is this line, where we say that after V06 we don't serialize the CVote with a value that a peer decided because of abuse concerns.

    I know and it's this particular line (and the one above) I find very confusing. It may be just the fact that on old votes the old nVersion is loaded to the voted version, whereas there was no voted version at that time. It makes me think this voted version is used like the old nVersion at some other places. I'd really like to separate the two concepts.

    But reusing this field is actually very clever. If someone sends a vote with a large version before 2.0 becomes effective it will just be counted as a vote for a higher version by future clients. And old clients will just treat it like a V05 vote.

    So maybe just resetting nVersionVote to zero when nVersionVote < PROTOCOL_V2_0 would solve my confusion.

    Even if we kept nVersion and added a new nProtocolVersionVote, the issue will remain: we are serializing the CVote with a value that we cannot fully trust.

    We could trust it, but only if it's a known version.

    We have 2 ways to handle versions:

    (1) The version is encoded at the beginning so that any client can know how to decode a particular vote. In this case we should reject unknown versions because future code may treat them differently. I didn't do that, but it can be fixed.

    (2) The version is determined by the currently effective protocol. Each vote must be serialized in the version that was effective on the block they're serialized into. It means a vote cannot be encoded or decoded without a known effective protocol.

    Nu worked with (1) since the beginning and this pull request makes us switch to (2).

    (2) is cleaner but I think we could also make (1) work. The switch from (1) to (2) is hazardous but it looks like you managed to make it work. So it looks like a better direction.

    Can we have a single GetEffectiveProtocolForNextBlock(const CBlockIndex* pindexprev) that would do all the protocol calculations? i.e. remove CalculateEffectiveProtocol. AddToBlockIndex would just assign the result of this function to nEffectiveProtocol.

    I think we can remove the need for nTime of the next block in this function. That would solve some remaining TODOs. You could for example consider an effective protocol of 0 for all blocks before 2.0 is effective. I think the only consequence is you'd have to set nVersion in the vote serialization when nVersionVote >= PROTOCOL_V2_0 and nVersion < PROTOCOL_V2_0 (which means we have a vote without a serialized version but we have no effective protocol yet, so we set nVersion to PROTOCOL_V0_5).

    The current code fails to load the current blockchain because it tries to extract a vote on Proof of Work blocks. After this small fix it seems to load the chain successfully.

    I think you can remove the vote validation in GenerateCurrencyCoinBases and CalculateParkRateResults. The votes have been checked already so it's certainly superfluous. You wouldn't need to add the nEffectiveProtocol parameter anymore.

    But even in CVote::IsValid the new nVersion parameter is not used. Do you think it will be necessary later?

  13. John L. Jegutanis

    I know and it's this particular line (and the one above) I find very confusing. It may be just the fact that on old votes the old nVersion is loaded to the voted version, whereas there was no voted version at that time. It makes me think this voted version is used like the old nVersion at some other places. I'd really like to separate the two concepts.

    I did some refactoring, taking your considerations in to account. Now the CVote has only nVersion that is behaving in the same way as before with the difference that when a block has a nVersion >= PROTOCOL_V2_0 it doesn't use that value to serialize the object:

    READWRITE(this->nVersion);
    if (this->nVersion < PROTOCOL_V2_0)
        nVersion = this->nVersion;
    

    The reason that we disable this, is to delegate the nVersion mutation to CVote::ToScript(int nProtocolVersion) and CVote::ExtractVote by using the parameter nProtocolVersion (renamed from nEffectiveProtocol)

    This enables some behaviors:

    • a v2.0 client receives a v0.5 block (before the switch) so the READWRITE will yield a CVote::nVersion == 50000 so it will override the serialization nVersion in a backwards compatible fashion.
    • a v2.0 client receives a v2.0 block (before the switch), then this->nVersion will be 2000000 but the nVersion will remain 50000 (as dictated by nProtocolVersion)
    • after the switch, the serialization nVersion becomes 2000000
    • a future v2.1 client implements a new field that must be activated in protocol 2010000. Before the switch it receives a block from a v2.1 minter. It will READWRITE this->nVersion as 2010000 and nVersion as 2000000, thus not activate the new field prematurely.

    (2) is cleaner but I think we could also make (1) work. The switch from (1) to (2) is hazardous but it looks like you managed to make it work. So it looks like a better direction.

    Agree. We have to carefully review and test everything. Especially a possible chain reorg near the switch period i.e. the protocol updates, then downgrades due to a fork and upgrades on another height.

    Your feedback is most valuable.


    I think we can remove the need for nTime of the next block in this function.

    The nTime is needed to check if the time has arrived for the protocol switch and vote counting.

    The current code fails to load the current blockchain because it tries to extract a vote on Proof of Work blocks. After this small fix it seems to load the chain successfully.

    Fixed that.

    I think you can remove the vote validation in GenerateCurrencyCoinBases and CalculateParkRateResults. The votes have been checked already so it's certainly superfluous. You wouldn't need to add the nEffectiveProtocol parameter anymore.

    Can confirm, the checks are redundant. But when I removed them some tests in test_vote.cpp fail. Think we should revert or just change the tests?

    But even in CVote::IsValid the new nVersion parameter is not used. Do you think it will be necessary later?

    It is needed for the CCustodianVote::IsValid() where it will check the NSR custodian grants.

  14. Michael Witrant reporter

    Now the CVote has only nVersion that is behaving in the same way as before with the difference that when a block has a nVersion >= PROTOCOL_V2_0 it doesn't use that value to serialize the object

    Actually I prefer the way it was done before. The variable name doesn't suggest it's a vote anymore, and it has the same name as in other classes where it is used as the serialized version. I find it more confusing this way. It's better to make the variable match its current role, and apply a clear exception on old votes. I only suggested to reset the version vote on old votes. Something like this:

    // In votes prior to 2.0 the serialization version was encoded in the data.
    // After 2.0 it's determined by the current protocol version (which is passed in the nVersion parameter).
    // And the encoded value is used as the protocol version vote instead.
    READWRITE(nVersionVote);
    if (nVersionVote < PROTOCOL_V2_0)
    {
        nVersion = nVersionVote;
        nVersionVote = 0;
    }
    

    The nTime is needed to check if the time has arrived for the protocol switch and vote counting.

    But it's only required to check whether we're at protocol V05, because we could make protocol V20 switch after the prev block nTime is after the switch time. I think it would make things simpler. You wouldn't have to get the nTime from the CoinStake everytime.

    when I removed them some tests in test_vote.cpp fail. Think we should revert or just change the tests?

    Yes these tests actually checked the validity of the vote, and there's now a dedicated test case for that. It seems to cover these cases.

  15. Log in to comment