I have issues with the scheduler myself. I believe one of the root causes is a bug in the stored procedure GetScheduleNextTask this appears to have changed in 4.8 it no longer looks at just enabled tasks and also a field NextStart is being selected as MAX but not assigned a column name and hence looking at the way the CBO class loads entities (in this case ScheduledItem) it won't load a NextStart date in the to entity. I also believe there is a further bug in the case of web farms where the server the task last ran on the the history table is ignored. Below is what I believe the proc should look like which is what I have updated my version to be. Of course I take no repsonsibility for anyone who chooses to make the same change! :) I'd also suggest that an index on ScheduleHistory (IX_ScheduleHistory which is on NextStart Desc, SheduleHistoryID, ScheduleID) is quite pointless. It appears to have been added to help with the GetScheduleNextTask but it won't get used. ScheduleID should be the first part of the index key and ScheduleHistoryID is of no use. I changed mine to ScheduleID, NextStart (and I only have NextStart to avoid a bookmark lookup). This greatly improved the performance which was important as in a web farm (5 servers) this proc was hammering our database server especially given that the next run has to be determined by scanning the history entries so if you have a lot of history the performance gets worse and worse. The purge job for ScheduleHistory also appears to be a bit of an issue as it seems that it is only allowed 30 seconds to run and as this is quite intensive it often timesout which means you can get in to the situation where the purge job continues to run doing nothing whilst the ScheduleHistory table gets bigger and bigger lowering the performance of the GetScheduleNextTask calls. Of course all this is really exagerated in a web farm environment as the history table grows really quickly which is why I have had to sort these issues out. I've also adjusted the purge proc as it had an issue with a no join predicate due to loaction of the join to Schedule. My changed version is below also and I added an index on ScheduleHistory on (ScheduleID, StartDate) to try and help with the calculation of rows older than current'y checked row. This also seemed to help. Anyway these are my thoughts and actions, I had to take. I'd appreciate if someone from the development team could validate them.
ALTER procedure [dbo].[GetScheduleNextTask]
@Server varchar(150)
AS
SELECT TOP 1
S.ScheduleID,
S.TypeFullName,
S.TimeLapse,
S.TimeLapseMeasurement,
S.RetryTimeLapse,
S.RetryTimeLapseMeasurement,
S.ObjectDependencies,
S.AttachToEvent,
S.RetainHistoryNum,
S.CatchUpEnabled,
S.Enabled,
MAX(SH.NextStart) AS NextStart
FROM dbo.Schedule S
LEFT JOIN dbo.ScheduleHistory SH ON S.ScheduleID = SH.ScheduleID
WHERE
(S.Servers LIKE '%,' + @Server + ',%' or S.Servers IS NULL)
AND (S.Enabled = 1)
AND ((SH.Server = @Server) OR (SH.ScheduleHistoryID IS NULL))
GROUP BY
S.ScheduleID, S.TypeFullName, S.TimeLapse, S.TimeLapseMeasurement,
S.RetryTimeLapse, S.RetryTimeLapseMeasurement, S.ObjectDependencies, S.AttachToEvent, S.RetainHistoryNum, S.CatchUpEnabled, S.Enabled
ORDER BY NextStart ASC
ALTER PROCEDURE [dbo].[PurgeScheduleHistory]
AS
DELETE FROM dbo.ScheduleHistory
FROM dbo.Schedule s
WHERE
(
SELECT COUNT(*)
FROM dbo.ScheduleHistory sh
WHERE
sh.ScheduleID = dbo.ScheduleHistory.ScheduleID AND
sh.StartDate >= dbo.ScheduleHistory.StartDate
) > s.RetainHistoryNum
AND s.RetainHistoryNum<>-1
AND s.ScheduleID = dbo.ScheduleHistory.ScheduleID