Date

TL;DR:

A minimal webserver achieved less than 5k concurrent requests / second, much below expectations. The reason: the logging configuration was invisibly consuming cycles.

Context

Last night I stumbled upon this post which outlined a performance issue related to JVM webservers. The post you're reading contributes nothing, it just serves to document my thoughts.

Reproducing

I cloned the repo and reproduced the results.

  1. I found running the command twice gave different results. So I let the first run warmup the server, then on the next run I took the measurement.
  2. The initial run of -t 12 -c 200 showed no difference in requests / second before/after the fix. That happened only after I bumped the concurrency up to 2000.
  3. The configuration with higher throughput had lower p50 latency,
    higher p75 latency, and a lower p99 latency. I'd like to better understand this relationship.

Before the fix:

 wrk --latency -t 12 -c 2000 -d30s http://localhost:3003
Running 30s test @ http://localhost:3003
  12 threads and 2000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    94.94ms  196.02ms   1.21s    92.88%
    Req/Sec   748.27    455.32     2.02k    64.31%
  Latency Distribution
     50%   42.66ms
     75%   47.92ms
     90%   64.82ms
     99%    1.06s
  170610 requests in 30.09s, 26.52MB read
  Socket errors: connect 0, read 300559, write 0, timeout 0
Requests/sec:   5669.51
Transfer/sec:      0.88MB

After the fix

 wrk --latency -t 12 -c 2000 -d30s http://localhost:3003
Running 30s test @ http://localhost:3003
  12 threads and 2000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    99.53ms  124.28ms   1.05s    79.77%
    Req/Sec     2.95k     1.08k    9.13k    75.04%
  Latency Distribution
     50%   31.52ms
     75%  190.35ms
     90%  292.67ms
     99%  409.98ms
  992290 requests in 30.09s, 154.25MB read
  Socket errors: connect 0, read 66686, write 252, timeout 0
Requests/sec:  32972.35
Transfer/sec:      5.13MB

Future Questions/Reading

  • Are there any other java logging pitfalls?
  • I should gain a better understanding of the logback configuration.
  • Why is JMX referenced in that post? [This article] baeldung-flight-recorder implies it's closely related to flight recorder.
  • How is wrk implemented?
  • Tested limits are per-codebase and per-machine. In order to reproduce the issue I had to search for the issue. How should I keep this in mind when using load-testing tools? Is there an identifiable sweet spot?
  • How useful is local profiling? My local machine will never exactly reproduce prod.
  • Learn how to use wrk Lua scripting to implement fancy load-testing
  • Can I enable profiling/clustered on production servers?
  • How can I get an intuition for server speed?