1. Fred T-H
  2. chut

Wiki

Clone wiki

chut / Benchmarks

Chut Benchmarks

Three kinds of benchmarks need to be run.

  1. The first kind is a brutish stress test trying to see how well a loaded node without a server in front can fare when being spammed with messages.
  2. The second kind is a more realistic benchmark, still without a server. This benchmark should instead focus on emulating real conversations - each user listening for longer periods of time and sending messages less frequently.
  3. The third kind of benchmark should be a stress test with the server up to see how it impacts Chut, and to help configure it better.

Stress Benchmark

The first benchmark is all about stress tests.

Rather than using a regular benchmark where a sea of clients are connected and messaging anyone at random, this test intends to simulate what could be considered a 'social group': the whole user base is divided into x groups containing y users each. These users message each other for a given period of time.

Results

Ran on:

  • HP dv9625ca
  • Ubuntu 8.04 (hardy) 32 bits
  • 2GB Ram
  • AMD Turion 64 X2 Mobile Technology TL-60 (2GHz)

Legend:

  • G: number of groups.
  • CPG: number of clients per group
  • T: time allowed to message and receive in seconds.

The first few rounds are done using text messages. The text message is "I've been mad for fucking years—absolutely years"

G:10    CPG:5   T:10
Sent: 528704
Received: 506096
Average sent per second: 52870.4
Average received per second: 50609.6
Average sent per group: 52870.4
Average received per group: 50609.6
Average sent per client: 10574.08
Average received per client: 10121.92
Average sent per client per second: 1057.408
Average received per client per second: 1012.192
G:10    CPG:10  T:10
Sent: 457730
Received: 414082
Average sent per second: 45773.0
Average received per second: 41408.2
Average sent per group: 45773.0
Average received per group: 41408.2
Average sent per client: 4577.3
Average received per client: 4140.82
Average sent per client per second: 457.73
Average received per client per second: 414.08
G:100   CPG:10  T:10
Sent: 564092
Received: 420320
Average sent per second: 56409.2
Average received per second: 42032.0
Average sent per group: 5640.92
Average received per group: 4203.2
Average sent per client: 564.092
Average received per client: 420.32
Average sent per client per second: 56.4092
Average received per client per second: 42.032
G:100   CPG:15  T:10
Sent: 611595
Received: 440836
Average sent per second: 61159.5
Average received per second: 44083.6
Average sent per group: 6115.95
Average received per group: 4408.36
Average sent per client: 407.73
Average received per client: 293.8906666666667
Average sent per client per second: 40.773
Average received per client per second: 29.38906666666667

Here the laptop starts to swap with 7500 users:

G:750   CPG:10  T:10
Sent: 990737
Received: 512803
Average sent per second: 99073.7
Average received per second: 51280.3
Average sent per group: 1320.9826666666668
Average received per group: 683.7373333333334
Average sent per client: 132.09826666666666
Average received per client: 68.37373333333333
Average sent per client per second: 13.209826666666666
Average received per client per second: 6.837373333333334

So I'll try it with a binary message instead... term_to_binary({<<"I've been mad for fucking years—absolutely years">>,os:timestamp()}) (the timestamps avoids the same binary being referenced and it's copied instead).

No swapping this time.

G:750   CPG:10  T:10
Sent: 1184530
Received: 697726
Average sent per second: 118453.0
Average received per second: 69772.6
Average sent per group: 1579.3733333333332
Average received per group: 930.3013333333333
Average sent per client: 157.93733333333333
Average received per client: 93.03013333333334
Average sent per client per second: 15.793733333333332
Average received per client per second: 9.303013333333334

Analysis of the code shows the benchmark to be really focusing on the code that needs to be tested, except what would be the random selection of a peer. There's also a discrepancy between received and sent messages, which sounds a bit weird because it grows as there are more users. Testing proved that it was simply a question of changing the listener's delay to 1ms rather than 0ms. Such a test on another computer showed the sent messages go down and the received messages go up:

With a 0ms delay (same as the benchmarks above)

G:100   CPG:10  T:10
Sent: 916084
Received: 759593
Average sent per second: 91608.4
Average received per second: 75959.3
Average sent per group: 9160.84
Average received per group: 7595.93
Average sent per client: 916.084
Average received per client: 759.593
Average sent per client per second: 91.60839999999999
Average received per client per second: 75.9593

With 1ms delay:

G:100   CPG:10  T:10
Sent: 849655
Received: 849139
Average sent per second: 84965.5
Average received per second: 84913.9
Average sent per group: 8496.55
Average received per group: 8491.39
Average sent per client: 849.655
Average received per client: 849.139
Average sent per client per second: 84.96549999999999
Average received per client per second: 84.9139

So the send:receive ratios improved from 1.2:1 to roughly 1:1. No messages were dropped in the previous benchmarks, only delayed. This is a good thing, obviously. The benchmark results should be updated accordingly. The small differences remaining probably come from processes shutting down at the end of the benchmark without going through their mailbox.

Otherwise, the benchmark shows a somewhat good load possible. At 7500 users, it's still possible to send a few dozen thousand messages per second. The bottleneck will probably be the web server, which means the code is fast enough, especially given real users aren't likely to be spamming that much. Most of the memory consumption (not shown here) appears to be from the messages sent and the history kept by each user (10 messages).

Realistic benchmark

The second benchmark is the Realistic benchmark.

The realistic benchmarks operate in a manner similar to the stress benchmarks, but there is a 5 seconds delay between each message sent by any user. The idea is to see whether the number of messages sent/received on a average stay constant when we augment the number of users in a system. This should be a decent test of latency and of how many users can be supported in a more realistic scenario.

G:10    CPG:5   T:60
Sent: 550
Received: 550
Average sent per group: 55.0
Average received per group: 55.0
Average sent per client: 11.0
Average received per client: 11.0
G:10    CPG:10  T:60
Sent: 1100
Received: 1141
Average sent per group: 110.0
Average received per group: 114.1
Average sent per client: 11.0
Average received per client: 11.41
G:100   CPG:10  T:60
Sent: 11000
Received: 11393
Average sent per group: 110.0
Average received per group: 113.93
Average sent per client: 11.0
Average received per client: 11.393

Here my guess about why I get more messages received than sent is that a sent message generates one 'sent' event, counted as a received message. What that means is that because the 'sent' event takes less time than the 'receive' event to show up (no lookup needed), when I send the termination signal, the synchronization between what was sent and received is always off.

The proof of this rests in how there is not much variation in the rate of Sent/Received depending on how long the test runs:

G:100   CPG:10  T:10
Sent: 1996
Received: 2107
...

G:100   CPG:10  T:30
Sent: 5182
Received: 5480
...

G:150   CPG:10  T:30
Sent: 7500
Received: 7771
...

Where the ratios are of 0.947, 0.946 and 0.965, respectively. The difference in what's sent and received doesn't seem to depend on time, groups or clients. The real question is thus to find whether the discrepancies are due to the delays in messaging every actor to stop its work (if it takes 4-5 seconds to propagate all messages and act on them) or if there is really a high latency getting there. Testing with a real user should be needed for that. Now let's carry on with the benchmarks...

G:100   CPG:15  T:60
Sent: 16516
Received: 17154
Average sent per group: 165.16
Average received per group: 171.54
Average sent per client: 11.010666666666667
Average received per client: 11.436

And for 7500 users:

G:750   CPG:10  T:60
Sent: 84110
Received: 86456
Average sent per group: 112.14666666666666
Average received per group: 115.27466666666666
Average sent per client: 11.214666666666666
Average received per client: 11.527466666666667

Still no change in the stats. Let's push it to 15 000 users (you need to raise the default process limit: we have 15000*3+750 processes (Users * supervisors + groups + ...) = 45 750 + ... Given the limit is 32 768 by default, we've gotta go higher. Restart the VM with 'erl +P 75000' and then it can be run:

G:750   CPG:20  T:60
Sent: 179621
Received: 183794
Average sent per group: 239.49466666666666
Average received per group: 245.05866666666665
Average sent per client: 11.974733333333333
Average received per client: 12.252933333333333

The send/received ratios remain similar, and the average per client too. The core of the chat server can thus theoretically handle over 15000 users sending each other messages every 5 second without too much degradation in response time per user. Note that at that point, the shut down of processes became a bit hard on my laptop and user timeouts started appearing (after the results were done and clients disconnected).

Here are the memory statistics, done with erlang-statistics on a 2 hours run with 15 000 users.

Memory usage

Modules loaded

IO

On this one, the huge IO peak is the benchmark code announcing the creation of each user process. It falls down afterwards, once the real messaging begins.

Resources

Garbage Collection

Real world trials would be needed to further show reliability of Chut's core, but so far I'm pretty satisfied with the results.

Server benchmark

Not done yet.

Updated