As of .NET 4.5 the thread pool injects one thread per 0.5sec if it thinks more threads are required. This is a problem if the number of required threads suddenly increases. Example: A corporate ASP.NET website is idle at night. In the morning at 8AM 1000 people log on and start working. If the app is using let's say 100 threads starting from 8AM it will take like 50sec to create all of them. Until then there will be serious delays and timeouts. It is possible to construe arbitrarily bad scenarios.
Problem statement: If thread pool load suddenly increases in IO bound workloads the pool is too slow to respond. This causes throughput and availability disruption. IO bound workloads relying on synchronous IO are common. Sudden workload changes are also common. Sometimes the workload can change due to a problem outside of the developer's control: A web service timing out or a database become slow.
Let me stress that this causes service interruption.
You can easily repro this yourself. Run a load test on an ASP.NET site with Thread.Sleep(10000);. The thread count goes up by 2 each second.
Starting and shutting down a thread was benchmarked by me to be around 1ms in total. Threads are not really an expensive resource. The thread pool should be a lot more eager to create and destroy threads. 500ms delay to potentially save 1ms is not a good trade-off.
Easy fix: I propose lowering the injection delay to 100ms. This reduces the problem given above by 5x. Ideally, the rate would be configurable. The shutdown delay could be lowered from 30s as well. Keeping an idle thread for 30000ms to save 1ms seems excessive. In this ticket I'm not that concerned with retiring threads, though.
Smarter, riskier fix: The delay could depend on the number of threads in existence, the core count and the perceived pressure on the thread-pool. The injection rate could be:
Problem statement: If thread pool load suddenly increases in IO bound workloads the pool is too slow to respond. This causes throughput and availability disruption. IO bound workloads relying on synchronous IO are common. Sudden workload changes are also common. Sometimes the workload can change due to a problem outside of the developer's control: A web service timing out or a database become slow.
Let me stress that this causes service interruption.
You can easily repro this yourself. Run a load test on an ASP.NET site with Thread.Sleep(10000);. The thread count goes up by 2 each second.
Starting and shutting down a thread was benchmarked by me to be around 1ms in total. Threads are not really an expensive resource. The thread pool should be a lot more eager to create and destroy threads. 500ms delay to potentially save 1ms is not a good trade-off.
Easy fix: I propose lowering the injection delay to 100ms. This reduces the problem given above by 5x. Ideally, the rate would be configurable. The shutdown delay could be lowered from 30s as well. Keeping an idle thread for 30000ms to save 1ms seems excessive. In this ticket I'm not that concerned with retiring threads, though.
Smarter, riskier fix: The delay could depend on the number of threads in existence, the core count and the perceived pressure on the thread-pool. The injection rate could be:
- 0ms delay for up to (ProcessorCount * 1) threads
- 50ms delay for up to (ProcessorCount * 4) threads
- Starting from that a delay of (100ms * (ThreadCount / ProcessorCount * someFloatFactor)). Reasoning: The more the CPU is oversubscribed the slower we want to inject. Maybe we need to have a maximum delay of 1sec. Or, the delay must rise sub-linearly (e.g. sqrt).