ULE Points For Investigation And Potential Improvement
- Take advantage of "knowing the future" when threads are shuffled right before mi_switch call. E.g. priority lending that happens in turnstile_wait before mi_switch may move a ts owning thread from its tdq to another one. Which tdq/CPU gets picked up depends on tdq_load and tdq_lowpri which should be adjusted for the current tdq to discount the current thread.
- Identify other areas following the same or similar pattern.
- Search for lowest loaded tdq seems to overemphasize tdq_lowpri. A tdq where a target thread could immediately start running would always get preferred whatever its tdq_load. This may lead to piling up of threads on one tdq while a very high priority thread runs on the other. Potentially we could consider a tdq to be good enough if td_priority and tdq_lowpri are of the same priority class, provided that the tdq has lowest tdq.
- A child threads starts with the same (or worse?) priority than the parent thread. So tdq search algorithm (sched_pickcpu and its utility functions) would often prefer to place it on a different CPU. Is this good? Especially consider the vfork case where parent goes to sleep waiting for the child to terminate.
- Cost of forking and CPU utilization of children are accounted only for interactivity score. Only ts_slptime and ts_runtime get adjusted. ts_ticks, on the other hand, is not updated. So a long term batch thread gets disadvantaged comparing to a thread that forks many short-lived CPU-bound batch threads.
- If a thread spends a lot of time on runq (due to high total system load), where that time gets charged among ts_slptime, ts_runtime and ts_ticks.
- Batch thread normal priority range is quite narrow thanks to a big chunk taken out for nice values. This translates in a narrow range of runq slots with a noticeable offset from start. Which in turns mean not that great ratio of CPU times potentially given to normal batch threads with the highest and lowest priority. Thus, threads that systematically under-utilize their slices by more than the ration may not be getting their fair share of CPU time.