Slow initial download in 2.1
Initial download in 2.1 is much slower than in 2.0. My measurements showed that the blocks per second were halved when downloading from a local node. Some users reported much slower rates on the real network. This may be caused by bad nodes though, since the client initially downloads from a single node.
This slow initial download can be bypassed by downloading a bootstrap file. But even keeping up after the bootstrap file has been processed seems to also take forever for some users.
The initial download is a complex process that didn't work very well even before 2.1. It's not Proof of Stake friendly. Peercoin stills relies on Proof of Work during the initial download. So it may be worth investigating its general improvement but it's not an easy task.
First we should probably figure out what exactly is slower between 2.1 and 2.0. It may be the more frequent I/O due to the BlockMap but I'm not sure.
We should also investigate whether the very slow downloads reported on the real network are because of bad nodes. If it's just that then removing peers.dat and connecting to a good node could be a good enough solution. Cleaning up the seed node list may also help.
One of the things that don't work well with Proof of Stake is the fact that when a node is downloading, the other nodes frequently send their best known hash but the receiving node cannot validate it yet. That happens on 2 occasions:
a) because a new block arrived on the chain. That's the normal propagation process, but it may disturb the initial download. It may for example make the node start downloading from another peer, which may sound good but it may actually make the node receive many blocks twice. This problem seems to be prevented though, but I saw it happen when I was experimenting improvements on the initial download.
b) because the sending node sent the last block of the batch (here). I've experimented sending instead a block hash that we know the other node can validate and it seemed to work well. So maybe we should implement that but there may be side effects.
The ultimate fix would be to import the https://github.com/bitcoin/bitcoin/pull/4468. But that's certainly a lot of work because unlike bitcoin we cannot validate these headers without some transactions. Finding a solution to this problem would also enable SPV clients, which would be a good thing.
Comments (5)
-
Jordan Lee repo owner
-
Jordan Lee repo owner
-
assigned issue to
Woodstock Merkle
-
assigned issue to
-
Woodstock Merkle
I've been doing some tests and coming more and more up to speed on all things Nu, Bitcoin, Peershares, mingw64, boost, oh my. Much effort has gone in, but I do know I have not had much output ... I ask for everyone's patience as proceed on the learning curve.
So I think I'll be splitting this up into at least 2 different issues:
1- the "client misbehaving" impact from receiving block messages during download 2- thrashing caused by BlockMap::cleanup
In regards to
#1- "client misbehaving", some finding - as peers find blocks, they relay them, causing a +1 to the misbehaving score. - At an average of 1 block/minute, this would cause the client to ban the nodes it's attempting to download from in a little under 2 hours. - This would presumably be a ban of all peers connected at that moment, meaning every 2 hours or so, another set of peers would be banned - Since Nu does not run on a million nodes, a peer may eventually ban itself away from most of the network - In the code, main.cpp / ProcessMessage, around line 4627, fImporting and fReindex are both false, causing the misbehaving to have impact when it shouldn't. Any immediate insight to if we have a flag around "initial download" would be great; I haven't looked but will do so soon.In regards to
#2- thrashing: - 237 times during the download was there a removal of 1000+ blocks - 176 of these times, 10k+ blocks were removed - 69 times, 100k+ blocks were removed. - Presumably these blocks were soon re-loaded. - I think the BlockMap::cleanup function needs to be less aggressive, or perhaps more tunable.
- I have tried builds where there is no cleanup, and memory usage goes 1.5 gbytes. However it is fast.The good news: 2.1.1 should not be substantially slower than 2.0.3, though testing data says that it is at least a little bit (but perhaps there are ways to optimize it.
I managed to get similar performance by: - using a machine with an SSD - setting banscore=999999999 to prevent premature disconnect due to the "misbehaving" issue - using only 1 peer, a 2.0.3 "parent" that was local.
2.0.3 loaded the blockchain in 11.1 hours hours, using 11.9h of CPU 2.1.1 loaded the blockchain in 12.86 hours, using 13.75h of CPU
I've uploaded an image that shows the plot of x: block #, and y: # of seconds to load 1000 blocks. There is a lot of similarity between the two.
I am testing a load again, with a return; on the Blockmap::cleanup function.
"We can't improve what we don't measure".
I am thinking of posting on Discourse to get some configuration information from any key testers, and see if the banscore workaround helps.
)
-
Michael Witrant reporter
In regards to # 2 - thrashing: - 237 times during the download was there a removal of 1000+ blocks - 176 of these times, 10k+ blocks were removed - 69 times, 100k+ blocks were removed. - Presumably these blocks were soon re-loaded. - I think the BlockMap::cleanup function needs to be less aggressive, or perhaps more tunable.
The cleanup function has indeed a lot of room for improvement. However when 100k+ blocks are removed it means there's a process that scans all the blocks and this code needs an update. Very few processes should need a full scan, and when they do they should clean up regularly during the process. I've been hunting such process while I implemented the BlockMap but it looks like I missed some. That's the reason why some people still sees some occasional memory grows. So there's probably something to fix there too.
-
Michael Witrant reporter
- changed milestone to 2.1.1
- Log in to comment
SPV clients are a major security risk. The fact that they are practical in proof of work and not proof of stake is a major security advantage for proof of stake. Bitcoin, in particular, has had problems with consecutive orphan blocks due to the use of SPV clients.
I envision a future where we have light clients that do not have a complete copy of the blockchain. However, having minting/mining clients that don't verify transactions (my understanding of the definition of an SPV client), or even facilitating that, isn't a good thing.
However, the approach Mike has suggested in Bitcoin PR 4468 is still worth additional consideration.
While the facts may not accommodate us, let's try to find as simple a solution as possible by understanding what changed between 2.0 and 2.1.