Wiki
chut / Benchmarks
Chut Benchmarks
Three kinds of benchmarks need to be run.
- The first kind is a brutish stress test trying to see how well a loaded node without a server in front can fare when being spammed with messages.
- The second kind is a more realistic benchmark, still without a server. This benchmark should instead focus on emulating real conversations - each user listening for longer periods of time and sending messages less frequently.
- The third kind of benchmark should be a stress test with the server up to see how it impacts Chut, and to help configure it better.
Stress Benchmark
The first benchmark is all about stress tests.
Rather than using a regular benchmark where a sea of clients are connected and messaging anyone at random, this test intends to simulate what could be considered a 'social group': the whole user base is divided into x groups containing y users each. These users message each other for a given period of time.
Results
Ran on:
- HP dv9625ca
- Ubuntu 8.04 (hardy) 32 bits
- 2GB Ram
- AMD Turion 64 X2 Mobile Technology TL-60 (2GHz)
Legend:
- G: number of groups.
- CPG: number of clients per group
- T: time allowed to message and receive in seconds.
The first few rounds are done using text messages. The text message is "I've been mad for fucking years—absolutely years"
G:10 CPG:5 T:10
Sent: 528704
Received: 506096
Average sent per second: 52870.4
Average received per second: 50609.6
Average sent per group: 52870.4
Average received per group: 50609.6
Average sent per client: 10574.08
Average received per client: 10121.92
Average sent per client per second: 1057.408
Average received per client per second: 1012.192
G:10 CPG:10 T:10
Sent: 457730
Received: 414082
Average sent per second: 45773.0
Average received per second: 41408.2
Average sent per group: 45773.0
Average received per group: 41408.2
Average sent per client: 4577.3
Average received per client: 4140.82
Average sent per client per second: 457.73
Average received per client per second: 414.08
G:100 CPG:10 T:10
Sent: 564092
Received: 420320
Average sent per second: 56409.2
Average received per second: 42032.0
Average sent per group: 5640.92
Average received per group: 4203.2
Average sent per client: 564.092
Average received per client: 420.32
Average sent per client per second: 56.4092
Average received per client per second: 42.032
G:100 CPG:15 T:10
Sent: 611595
Received: 440836
Average sent per second: 61159.5
Average received per second: 44083.6
Average sent per group: 6115.95
Average received per group: 4408.36
Average sent per client: 407.73
Average received per client: 293.8906666666667
Average sent per client per second: 40.773
Average received per client per second: 29.38906666666667
Here the laptop starts to swap with 7500 users:
G:750 CPG:10 T:10
Sent: 990737
Received: 512803
Average sent per second: 99073.7
Average received per second: 51280.3
Average sent per group: 1320.9826666666668
Average received per group: 683.7373333333334
Average sent per client: 132.09826666666666
Average received per client: 68.37373333333333
Average sent per client per second: 13.209826666666666
Average received per client per second: 6.837373333333334
So I'll try it with a binary message instead... term_to_binary({<<"I've been mad for fucking years—absolutely years">>,os:timestamp()}) (the timestamps avoids the same binary being referenced and it's copied instead).
No swapping this time.
G:750 CPG:10 T:10
Sent: 1184530
Received: 697726
Average sent per second: 118453.0
Average received per second: 69772.6
Average sent per group: 1579.3733333333332
Average received per group: 930.3013333333333
Average sent per client: 157.93733333333333
Average received per client: 93.03013333333334
Average sent per client per second: 15.793733333333332
Average received per client per second: 9.303013333333334
Analysis of the code shows the benchmark to be really focusing on the code that needs to be tested, except what would be the random selection of a peer. There's also a discrepancy between received and sent messages, which sounds a bit weird because it grows as there are more users. Testing proved that it was simply a question of changing the listener's delay to 1ms rather than 0ms. Such a test on another computer showed the sent messages go down and the received messages go up:
With a 0ms delay (same as the benchmarks above)
G:100 CPG:10 T:10
Sent: 916084
Received: 759593
Average sent per second: 91608.4
Average received per second: 75959.3
Average sent per group: 9160.84
Average received per group: 7595.93
Average sent per client: 916.084
Average received per client: 759.593
Average sent per client per second: 91.60839999999999
Average received per client per second: 75.9593
With 1ms delay:
G:100 CPG:10 T:10
Sent: 849655
Received: 849139
Average sent per second: 84965.5
Average received per second: 84913.9
Average sent per group: 8496.55
Average received per group: 8491.39
Average sent per client: 849.655
Average received per client: 849.139
Average sent per client per second: 84.96549999999999
Average received per client per second: 84.9139
So the send:receive ratios improved from 1.2:1 to roughly 1:1. No messages were dropped in the previous benchmarks, only delayed. This is a good thing, obviously. The benchmark results should be updated accordingly. The small differences remaining probably come from processes shutting down at the end of the benchmark without going through their mailbox.
Otherwise, the benchmark shows a somewhat good load possible. At 7500 users, it's still possible to send a few dozen thousand messages per second. The bottleneck will probably be the web server, which means the code is fast enough, especially given real users aren't likely to be spamming that much. Most of the memory consumption (not shown here) appears to be from the messages sent and the history kept by each user (10 messages).
Realistic benchmark
The second benchmark is the Realistic benchmark.
The realistic benchmarks operate in a manner similar to the stress benchmarks, but there is a 5 seconds delay between each message sent by any user. The idea is to see whether the number of messages sent/received on a average stay constant when we augment the number of users in a system. This should be a decent test of latency and of how many users can be supported in a more realistic scenario.
G:10 CPG:5 T:60
Sent: 550
Received: 550
Average sent per group: 55.0
Average received per group: 55.0
Average sent per client: 11.0
Average received per client: 11.0
G:10 CPG:10 T:60
Sent: 1100
Received: 1141
Average sent per group: 110.0
Average received per group: 114.1
Average sent per client: 11.0
Average received per client: 11.41
G:100 CPG:10 T:60
Sent: 11000
Received: 11393
Average sent per group: 110.0
Average received per group: 113.93
Average sent per client: 11.0
Average received per client: 11.393
Here my guess about why I get more messages received than sent is that a sent message generates one 'sent' event, counted as a received message. What that means is that because the 'sent' event takes less time than the 'receive' event to show up (no lookup needed), when I send the termination signal, the synchronization between what was sent and received is always off.
The proof of this rests in how there is not much variation in the rate of Sent/Received depending on how long the test runs:
G:100 CPG:10 T:10
Sent: 1996
Received: 2107
...
G:100 CPG:10 T:30
Sent: 5182
Received: 5480
...
G:150 CPG:10 T:30
Sent: 7500
Received: 7771
...
Where the ratios are of 0.947, 0.946 and 0.965, respectively. The difference in what's sent and received doesn't seem to depend on time, groups or clients. The real question is thus to find whether the discrepancies are due to the delays in messaging every actor to stop its work (if it takes 4-5 seconds to propagate all messages and act on them) or if there is really a high latency getting there. Testing with a real user should be needed for that. Now let's carry on with the benchmarks...
G:100 CPG:15 T:60
Sent: 16516
Received: 17154
Average sent per group: 165.16
Average received per group: 171.54
Average sent per client: 11.010666666666667
Average received per client: 11.436
And for 7500 users:
G:750 CPG:10 T:60
Sent: 84110
Received: 86456
Average sent per group: 112.14666666666666
Average received per group: 115.27466666666666
Average sent per client: 11.214666666666666
Average received per client: 11.527466666666667
Still no change in the stats. Let's push it to 15 000 users (you need to raise the default process limit: we have 15000*3+750 processes (Users * supervisors + groups + ...) = 45 750 + ... Given the limit is 32 768 by default, we've gotta go higher. Restart the VM with 'erl +P 75000' and then it can be run:
G:750 CPG:20 T:60
Sent: 179621
Received: 183794
Average sent per group: 239.49466666666666
Average received per group: 245.05866666666665
Average sent per client: 11.974733333333333
Average received per client: 12.252933333333333
The send/received ratios remain similar, and the average per client too. The core of the chat server can thus theoretically handle over 15000 users sending each other messages every 5 second without too much degradation in response time per user. Note that at that point, the shut down of processes became a bit hard on my laptop and user timeouts started appearing (after the results were done and clients disconnected).
Here are the memory statistics, done with erlang-statistics on a 2 hours run with 15 000 users.
On this one, the huge IO peak is the benchmark code announcing the creation of each user process. It falls down afterwards, once the real messaging begins.
Real world trials would be needed to further show reliability of Chut's core, but so far I'm pretty satisfied with the results.
Server benchmark
Not done yet.
Updated