A minimal webserver achieved less than 5k concurrent requests / second, much below expectations. The reason: the logging configuration was invisibly consuming cycles.
Last night I stumbled upon this post which outlined a performance issue related to JVM webservers. The post you're reading contributes nothing, it just serves to document my thoughts.
I cloned the repo and reproduced the results.
- I found running the command twice gave different results. So I let the first run warmup the server, then on the next run I took the measurement.
- The initial run of
-t 12 -c 200showed no difference in requests / second before/after the fix. That happened only after I bumped the concurrency up to 2000.
- The configuration with higher throughput had lower p50 latency,
higher p75 latency, and a lower p99 latency. I'd like to better understand this relationship.
Before the fix:
❯ wrk --latency -t 12 -c 2000 -d30s http://localhost:3003 Running 30s test @ http://localhost:3003 12 threads and 2000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 94.94ms 196.02ms 1.21s 92.88% Req/Sec 748.27 455.32 2.02k 64.31% Latency Distribution 50% 42.66ms 75% 47.92ms 90% 64.82ms 99% 1.06s 170610 requests in 30.09s, 26.52MB read Socket errors: connect 0, read 300559, write 0, timeout 0 Requests/sec: 5669.51 Transfer/sec: 0.88MB
After the fix
❯ wrk --latency -t 12 -c 2000 -d30s http://localhost:3003 Running 30s test @ http://localhost:3003 12 threads and 2000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 99.53ms 124.28ms 1.05s 79.77% Req/Sec 2.95k 1.08k 9.13k 75.04% Latency Distribution 50% 31.52ms 75% 190.35ms 90% 292.67ms 99% 409.98ms 992290 requests in 30.09s, 154.25MB read Socket errors: connect 0, read 66686, write 252, timeout 0 Requests/sec: 32972.35 Transfer/sec: 5.13MB
- Are there any other java logging pitfalls?
- I should gain a better understanding of the logback configuration.
- Why is JMX referenced in that post? [This article] baeldung-flight-recorder implies it's closely related to flight recorder.
- How is
- Tested limits are per-codebase and per-machine. In order to reproduce the issue I had to search for the issue. How should I keep this in mind when using load-testing tools? Is there an identifiable sweet spot?
- How useful is local profiling? My local machine will never exactly reproduce prod.
- Learn how to use
wrkLua scripting to implement fancy load-testing
- Can I enable profiling/clustered on production servers?
- How can I get an intuition for server speed?