Handle future vote versions properly
This problem was identified by @John L. Jegutanis here: https://bitbucket.org/JordanLeePeershares/nubit/pull-request/209/added-a-framework-for-protocol-version/diff#comment-4851998
If someone sends a vote with a version number above the current protocol, all the clients will accept it and parse it like any other vote in the current protocol version. But when we later introduce a change the new client could think this old block already support the changes we just introduced.
So someone downloading the blockchain from the beginning with the latest client would probably fail to parse this special block.
We can wait for such an event to occur before we fix that, because it would only affect future releases. But if this happens while we're waiting for a protocol change (i.e. we released a protocol change and we are waiting for the new protocol to be effective), we would have to release a urgent and mandatory fix.
I proposed a solution here: https://bitbucket.org/JordanLeePeershares/nubit/pull-request/209/added-a-framework-for-protocol-version/diff#comment-4854770:
The cleanest solution I've found so far is to provide a maximum allowed version to the vote unserialization. If the serialized version is above this maximum, the vote is unserialized like if it was serialized with the maximum version.
The maximum version could be set in
pindexNew->vote.nVersionbefore the vote is extracted here. The maximum version would be 50000 before protocol V06 is effective, and 60000 after that.In the vote unserialization code we would not change
this->nVersionif the serialized version is greater. If usingthis->nVersionfor the maximum version leads to ugly code it may be better to add aCVote::nMaximumVersionor something. It's easier but it takes some memory. It may also be possible to use the serializationnVersionpassed here. I think it defines the initial value of the local variablenVersion(as opposed tothis->nVersion). It may be cleaner because we could just add anMaximumVersionparameter toExtractVotebut I'm not certain there would not be side effects (the serialization code is quite complex).
Comments (22)
-
Michael Witrant reporter
-
Michael Witrant reporter
We can wait for such an event to occur before we fix that
Actually we can't wait. Testnet already has votes with version 60000 (the next protocol version).
-
Michael Witrant reporter
-
assigned issue to
John L. Jegutanis
-
assigned issue to
-
John L. Jegutanis
Only had time to look at it today.
The solution that we initially discussed works only when are accepting new blocks and not when there are already serialized blocks like the version 60000 blocks that you discover on testnet (I am to blame about them).
The problem is that in the CVote's IMPLEMENT_SERIALIZE we cannot trust the nVersion value as it can have arbitrary value and we cannot make conditional READWRITEs.
One solution could be to ignore the CVote:nVersion from V06 and instead rely on the CBlockIndex::nTime and CBlockIndex::nEffectiveProtocol in the next versions. This means adding a CBlockIndex* field to CVote class...
edit: some rephrasing
-
John L. Jegutanis
The problem however is that CVote is used before CBlockIndex is created like here so it's not a clean solution either...
-
Michael Witrant reporter
I suggest you start by adding a
nMaximumAllowedVersionor something toCVoteand fill it before votes are (un)serialized. Then we'll see whether it's feasible (and useful) to usenVersioninstead.For already serialized votes you can set the maximum version before this line. But unfortunately we don't have
nTimeyet. So I suggest we serializenEffectiveProtocolfrom pull request #209 before all that (for example just afternVersion) and use it. It would be initialized to 0 instead of being serialized if nVersion is < 60000. When the effective protocol is 0 then the maximum vote version would be set to 50000. -
John L. Jegutanis
Created a commit to the #209 pull request that deals with this issue.
Instead of a maximum version it uses the nEffectiveProtocol for the conditional serialization. I did some tests and it works, except for the time that that it switches to the V06 protocol as I haven't setup the cucumber testing environment.
I still would like to do some more testing as the part where the CVote::nProtocolVersionVote starts to be serialized is critical.
Also the part that I don't like is that we have to be careful and set the CVote::nEffectiveProtocol before (un)serializing and this creates a potential for bugs. We could add this parameter to the constructors but I am not sure it is the best way.
-
Michael Witrant reporter
- changed milestone to 2.0.0
-
Michael Witrant reporter
The current solution in pull request #209 is not very satisfactory.
This whole version problem is a mess, and may also affect other serialized structures.
What about just assuming future versions are invalid?
We would start 2.0 at protocol 60001 or something and just solve this problem just by rejecting any vote whose version is
> PROTOCOL_VERSION.When a block is received we would additionally make sure the version of a vote is not above the effective protocol version.
What do you think @John L. Jegutanis?
-
John L. Jegutanis
The rejection of a CVote::nVersion that the client does not support is good but solves part of the problem.
The current situation is that we have 3 values for the version:
- nVersion is effectively deprecated but still used in serialization
- nEffectiveProtocol is never serialized but computed on runtime and has to be set before the CVote usage
- nProtocolVersionVote is the vote for the upcoming version
We could use the nVersion to store the effective protocol and remove the nEffectiveProtocol completely. Like this a client not only rejects versions that it does not support but also versions that are still not voted for or version downgrades. The nVersion value will reflect the global consensus and not just the individual version of a client.
We need to be careful to not serialize nVersion with PROTOCOL_VERSION as it is right now but always using and checking the calculated effective protocol version. The most critical places for this are in CWallet::CreateCoinStake() and CBlock::AcceptBlock(). Any other places?
By using the CVote::nVersion as the effective protocol version will only simplify the current mess. Another side effect is that nVersion is used in CCustodianVote serialization, this simplifies the conditional check in the CCustodianVote::IsValid() to enable the NSR grants.
What do you think @Michael Witrant and most importantly what are the downsides?
-
Michael Witrant reporter
The rejection of a CVote::nVersion that the client does not support is good but solves part of the problem.
Yes, the other part of the problem is we must reject versions that are not yet effective.
The good thing with this solution is we can easily do that after unserialization (compared to having to inform the vote structure what's the current protocol prior to unserialization).
We could use the nVersion to store the effective protocol and remove the nEffectiveProtocol completely.
Using
nVersionas the effective version would probably work but I'm afraid it would be a mess to implement. You'd still have to use this kind of code: https://bitbucket.org/JordanLeePeershares/nubit/pull-request/209/added-a-framework-for-protocol-version/diff#Lsrc/main.hT1596If you want to go this way, then it may be better to completely remove
nVersionfrom theCVotestructure, and provide the effective version to the unserialization code.That would mean passing the effective version instead of
PROTOCOL_VERSIONinExtractVote. And in the block index serialization we would have to use a customREADWRITEmacro to which we can pass an explicitnVersion. There are certainly other changes, and maybe some difficult ones, especially to handle old versions (we must be able to calculate the version of each vote serialized in the blockchain).That may be how I should have done it from the beginning, but I'm not sure it's the best solution now.
-
John L. Jegutanis
Sorry for the late response, I wanted to think about it for a bit and make some tests.
You were right about the READWRITE it complicates things a lot.
Additionally I discovered a place that creates a problem: CBlock::CheckBlock(). This checks the block validity in a contextless way, meaning we cannot count the votes and calculate the protocol version for disconnected or orphan blocks. When processing new blocks, it will catch any errors in CBlock::AcceptBlock() but this is not good enough IMO.
In the CVote's IMPLEMENT_SERIALIZE block I noticed this:
ReadWriteSingleMotion(s, nType, nVersion, ser_action);
Why it does not increase the nSerSize like READWRITE? For example like:
nSerSize += ReadWriteSingleMotion(s, nType, nVersion, ser_action);
-
Michael Witrant reporter
Additionally I discovered a place that creates a problem: CBlock::CheckBlock(). This checks the block validity in a contextless way, meaning we cannot count the votes and calculate the protocol version for disconnected or orphan blocks. When processing new blocks, it will catch any errors in CBlock::AcceptBlock() but this is not good enough IMO.
It's not ideal but there are things we are forced to check later when we know the previous block. If we can't do that we will have to reject some motions.
The proof of stake check still works on disconnected blocks, so I think it's not a big problem to accept potentially invalid blocks because it cannot be abused (you need a valid and unused kernel to propagate a block, so it's something rare).
The vote is already checked twice (in
CheckBlockandAddToBlockIndex). It's probably redundant. We could have 2 levels of check: one for not-yet-connected blocks inCheckBlock, and one for connected blocks inAddToBlockIndex.Why it does not increase the nSerSize like READWRITE?
It looks like I forgot to do that. I'm surprised it didn't have consequences.
nSerSizeis probably used only to calculate the serialized size and we never do that on votes. It'll need a fix though. This part could probably be rewritten at the same time, with a const cast instead of all the methods. -
John L. Jegutanis
I have updated the pull request 209 to simplify a bit the serialization of the protocol voting.
It changes the CVote::nVersion to nProtocolVersionVote so the new clients can vote immediately and the V06 switch will happen after a vote count.
Also when this value is over 60000 it does not change the nVersion of the IMPLEMENT_SERIALIZE so in effect the CVote is serialized on disk with the client's version (instead of the version that is on a block's CVote). The network (or better the blockchain) serialization of the CVote is always CBlockIndex::nEffectiveProtocol.
After the switch the client will reject any CVote::nProtocolVersionVote that is lower than the effective protocol, this means that any V05 blocks after the fork will be rejected (by the minting majority).
One part that I find my understanding lacking is why an 0.5.2 client does not accept a longer chain:
This was part of a test in
protocol_votes.feature. All nodes switched to "V06" block, even the Old 0.5.2 node accepted itWhen node "Old" finds a block "V06_rejected" And node "A" finds a block "V06_accepted" And all nodes reach the same height Then nodes "A", "B" should be at block "V06_accepted"
Now the A and B are on the
V06_acceptedblock while rejecting Old'sV06_rejectedblockWhen node "B" finds a block "Newest" Then all nodes should be at block "Newest"
Here it fails as the 0.5.2 node does not accept the "Newest" block because of this error:
ProcessBlock() : block with too little proof-of-stakeAny idea?
Another issue that I encountered when testing is the time it takes to generate a stake and simulate a vote. Generating 1800 PoS blocks takes around 30min, so I set the required/total votes to 9/10 for testnet but it should be changed to another value.
-
Michael Witrant reporter
Questions specific to a solution should probably be discussed on the pull request itself.
I find replacing
nVersionbynProtocolVersionVotevery confusing. Before V06 it contains the serialized version and after V06 it contains the protocol vote.Votes are serialized in multiple situations and I'm not sure this solution will work every time. It's not obvious by reading the code anyway.
The current code fails to validate our current blockchain (it may be only because of this but there may be more).
In the vote serialization I'd rather isolate the old
nVersionprocessing and the newnProtocolVersionVoteprocessing.It looks like the removal of
nVersionwas successful though. It's cleaner this way but it led to complications elsewhere so I'm not convinced it's positive at the end.If it was not removed you could ignore old protocols and wouldn't need these block times everywhere. You could probably have just one
CalculateProtocolForNextBlock(const CBlockIndex* pindexprev)instead of these few related methods (there's room for improvement even ifnVersionis removed though).The fact it already enables voting for protocol V06 is a nice addition though.
ProcessBlock() : block with too little proof-of-stakeAny idea?
No, I'll try to look at that.
Generating 1800 PoS blocks takes around 30min, so I set the required/total votes to 9/10 for testnet but it should be changed to another value.
You can use
#ifdef TESTINGto set lower values only active during cucumber testing. -
John L. Jegutanis
The CVote::nVersion is basically renamed to nProtocolVersionVote so we can revert it, in any case the value is always PROTOCOL_VERSION when the client creates a block.
The difference is this line, where we say that after V06 we don't serialize the CVote with a value that a peer decided because of abuse concerns.
Even if we kept nVersion and added a new nProtocolVersionVote, the issue will remain: we are serializing the CVote with a value that we cannot fully trust.
Now for the blockchain loading issue, the root cause is that when CDiskBlockIndex is loaded from disk there are blocks that have serialization version of
1000100(the currentCLIENT_VERSION) so the part where it was loading theCDiskBlockIndex::nEffectiveProtocolfailed because the test was fornVersion >= 60000.By renaming the
PROTOCOL_V06toPROTOCOL_V2_0with a value of2000000solved the problem (tried also version1010000and it also worked).
I merged and with the latest 2.0.0 branch and all tests pass except:
- import_keys_in_wrong_wallet.feature:2 # Scenario:
- votenotify.feature:27 # Scenario: Votenotify is used as data feed source
-
Michael Witrant reporter
The difference is this line, where we say that after V06 we don't serialize the CVote with a value that a peer decided because of abuse concerns.
I know and it's this particular line (and the one above) I find very confusing. It may be just the fact that on old votes the old
nVersionis loaded to the voted version, whereas there was no voted version at that time. It makes me think this voted version is used like the oldnVersionat some other places. I'd really like to separate the two concepts.But reusing this field is actually very clever. If someone sends a vote with a large version before 2.0 becomes effective it will just be counted as a vote for a higher version by future clients. And old clients will just treat it like a V05 vote.
So maybe just resetting
nVersionVoteto zero whennVersionVote < PROTOCOL_V2_0would solve my confusion.Even if we kept nVersion and added a new nProtocolVersionVote, the issue will remain: we are serializing the CVote with a value that we cannot fully trust.
We could trust it, but only if it's a known version.
We have 2 ways to handle versions:
(1) The version is encoded at the beginning so that any client can know how to decode a particular vote. In this case we should reject unknown versions because future code may treat them differently. I didn't do that, but it can be fixed.
(2) The version is determined by the currently effective protocol. Each vote must be serialized in the version that was effective on the block they're serialized into. It means a vote cannot be encoded or decoded without a known effective protocol.
Nu worked with (1) since the beginning and this pull request makes us switch to (2).
(2) is cleaner but I think we could also make (1) work. The switch from (1) to (2) is hazardous but it looks like you managed to make it work. So it looks like a better direction.
Can we have a single
GetEffectiveProtocolForNextBlock(const CBlockIndex* pindexprev)that would do all the protocol calculations? i.e. removeCalculateEffectiveProtocol.AddToBlockIndexwould just assign the result of this function tonEffectiveProtocol.I think we can remove the need for
nTimeof the next block in this function. That would solve some remaining TODOs. You could for example consider an effective protocol of 0 for all blocks before 2.0 is effective. I think the only consequence is you'd have to setnVersionin the vote serialization whennVersionVote >= PROTOCOL_V2_0andnVersion < PROTOCOL_V2_0(which means we have a vote without a serialized version but we have no effective protocol yet, so we setnVersiontoPROTOCOL_V0_5).The current code fails to load the current blockchain because it tries to extract a vote on Proof of Work blocks. After this small fix it seems to load the chain successfully.
I think you can remove the vote validation in
GenerateCurrencyCoinBasesandCalculateParkRateResults. The votes have been checked already so it's certainly superfluous. You wouldn't need to add thenEffectiveProtocolparameter anymore.But even in
CVote::IsValidthe newnVersionparameter is not used. Do you think it will be necessary later? -
John L. Jegutanis
I know and it's this particular line (and the one above) I find very confusing. It may be just the fact that on old votes the old nVersion is loaded to the voted version, whereas there was no voted version at that time. It makes me think this voted version is used like the old nVersion at some other places. I'd really like to separate the two concepts.
I did some refactoring, taking your considerations in to account. Now the CVote has only nVersion that is behaving in the same way as before with the difference that when a block has a nVersion >= PROTOCOL_V2_0 it doesn't use that value to serialize the object:
READWRITE(this->nVersion); if (this->nVersion < PROTOCOL_V2_0) nVersion = this->nVersion;The reason that we disable this, is to delegate the nVersion mutation to
CVote::ToScript(int nProtocolVersion)andCVote::ExtractVoteby using the parameter nProtocolVersion (renamed from nEffectiveProtocol)This enables some behaviors:
- a v2.0 client receives a v0.5 block (before the switch) so the READWRITE will yield a CVote::nVersion == 50000 so it will override the serialization nVersion in a backwards compatible fashion.
- a v2.0 client receives a v2.0 block (before the switch), then this->nVersion will be
2000000but the nVersion will remain 50000 (as dictated by nProtocolVersion) - after the switch, the serialization nVersion becomes
2000000 - a future v2.1 client implements a new field that must be activated in protocol
2010000. Before the switch it receives a block from a v2.1 minter. It will READWRITE this->nVersion as2010000and nVersion as2000000, thus not activate the new field prematurely.
(2) is cleaner but I think we could also make (1) work. The switch from (1) to (2) is hazardous but it looks like you managed to make it work. So it looks like a better direction.
Agree. We have to carefully review and test everything. Especially a possible chain reorg near the switch period i.e. the protocol updates, then downgrades due to a fork and upgrades on another height.
Your feedback is most valuable.
I think we can remove the need for nTime of the next block in this function.
The nTime is needed to check if the time has arrived for the protocol switch and vote counting.
The current code fails to load the current blockchain because it tries to extract a vote on Proof of Work blocks. After this small fix it seems to load the chain successfully.
Fixed that.
I think you can remove the vote validation in GenerateCurrencyCoinBases and CalculateParkRateResults. The votes have been checked already so it's certainly superfluous. You wouldn't need to add the nEffectiveProtocol parameter anymore.
Can confirm, the checks are redundant. But when I removed them some tests in test_vote.cpp fail. Think we should revert or just change the tests?
But even in CVote::IsValid the new nVersion parameter is not used. Do you think it will be necessary later?
It is needed for the CCustodianVote::IsValid() where it will check the NSR custodian grants.
-
Michael Witrant reporter
Now the CVote has only nVersion that is behaving in the same way as before with the difference that when a block has a nVersion >= PROTOCOL_V2_0 it doesn't use that value to serialize the object
Actually I prefer the way it was done before. The variable name doesn't suggest it's a vote anymore, and it has the same name as in other classes where it is used as the serialized version. I find it more confusing this way. It's better to make the variable match its current role, and apply a clear exception on old votes. I only suggested to reset the version vote on old votes. Something like this:
// In votes prior to 2.0 the serialization version was encoded in the data. // After 2.0 it's determined by the current protocol version (which is passed in the nVersion parameter). // And the encoded value is used as the protocol version vote instead. READWRITE(nVersionVote); if (nVersionVote < PROTOCOL_V2_0) { nVersion = nVersionVote; nVersionVote = 0; }The nTime is needed to check if the time has arrived for the protocol switch and vote counting.
But it's only required to check whether we're at protocol V05, because we could make protocol V20 switch after the prev block
nTimeis after the switch time. I think it would make things simpler. You wouldn't have to get thenTimefrom the CoinStake everytime.when I removed them some tests in test_vote.cpp fail. Think we should revert or just change the tests?
Yes these tests actually checked the validity of the vote, and there's now a dedicated test case for that. It seems to cover these cases.
-
John L. Jegutanis
Pushed the version with the latest refactorings and fixes
-
Michael Witrant reporter
I think it's good like that. It just needs some polishing.
-
John L. Jegutanis
- changed status to resolved
Merged pull request #209
- Log in to comment
Other serialized structures may be affected by this problem too.