Hey Nils,
Yes - from what we have seen working on a number of different scheduler issues over the years - I believe that there is still a set of edge cases where it is possible for multiple scheduler worker pools to be started on the same system - resulting in the sort of thing you are currently seeing.
Its been a while since i dug around in the code - but this thread got me to thinking on the code again - and i believe I may now know where the issue is coming from.
Specifically - this occurs when a system is setup so that it uses the REQUEST_MODE scheduler as opposed to the TIMER_MODE - and most often on test servers which see maybe more load or application restarts.
In REQUEST_MODE a test is performed every time any sort of http request in made to the server - this test checks to see if the scheduler is up and running - well actually it tests to see if the scheduler is ready to receive requests using SchedulingProvider.ReadyForPoll equal to true - if it is ready - a new background worker thread is created and that thread is bound to the scheduler provider and the thread is then turned on and the scheduler starts executing.
BUT the problem is - that there is a small window of time while that new worker thread is being started up - after that first test of SchedulingProvider.ReadyForPoll - where it could be possible for a second httprequest to also find that SchedulingProvider.ReadyForPoll is still flagged as ready to start - and also attempt to start a separate background scheduler thread.
the code that is triggered by the http request handler looks like this:
////////////////////////////////////////////////////////
if (SchedulingProvider.SchedulerMode == SchedulerMode.REQUEST_METHOD && SchedulingProvider.ReadyForPoll)
{
Logger.Trace("Running Schedule " + (SchedulingProvider.SchedulerMode));
var scheduler = SchedulingProvider.Instance();
var requestScheduleThread = new Thread(scheduler.ExecuteTasks) {IsBackground = true};
requestScheduleThread.Start();
SchedulingProvider.ScheduleLastPolled = DateTime.Now;
}
////////////////////////////////////////////////////
The issue is that - SchedulingProvider.ReadyForPoll - is not semaphore locked in any way. So between the time when SchedulingProvider.ReadyForPoll is tested and the time when SchedulingProvider.ScheduleLastPolled is set - which is how the ReadyForPoll value is actually turned to false - there is some significant potential lag time - given that creating a worker thread in itself takes sometime - potentially many thousands of clock cycles - and that in addition the actual values behind ReadyForPoll and ScheduleLastPolled are actually stored and dereferenced into the DNN DataCache - adding even more clock cycles.
It seems to me that what is needed here is for that block of code to be wrapped in a lock() statement on a static object so that there is no possible chance of two separate http requests getting into that same code block at the same time.
Would be interested in hearing other peoples thoughts on this one.
Westa