Best practice for SQL Server Agent configuration and monitoring

Published on 6 December 2023 at 15:53

SQL Server Agent is a great tool for automation, however there are a several points I would recommend as a best practice, obviously best practice is very much a subjective opinion!

Creating operators and configuring notifications against each job

Unreported job failures can result in SSIS packages not running, data not being updated and issues occurring on the server without anyone being aware.  Often meaning that these generally only get picked up when a user reports out of date data or other performance/functionality issues.  This is especially problematic when the job that fails is the backup job.

The solution is to create operators (those that will receive email alerts of job failures) and configure notifications against jobs, to ensure that notifications (emails) are sent out when jobs fail.  This can be completed by carrying out the following steps;

Creating Operators

This step only needs to be carried out once per group that you intend to send notifications to.

  • Expand the SQL Server Agent menu in SSMS and right click on the operators folder and select "New Operator"
  • Enter a brief and obvious name for the group of people that the operator refers to (e.g. Data Engineering Team).
  • Add their email distribution group to the email box.
  • Click Ok

Ensure you have checked that the mail profile is configure prior to undertaking this step, also ensure that the email distribution group is capable of receiving external emails (you might need some assistance from your digital infrastructure team on this).  Its worthwhile sending a test email from the server prior to anymore indepth fault finding if the operators don't initially work.

Setting a job to send notifications

  • Navigate to the job you want to send notifications from in the event of a failure in the SQL agent Jobs menu.
  • Right click on the job and select properties.
  • Click on the notifications page.
  • Click the button next to the email notificaiton type and select the operator from the drop down list.

The job will now send failiure notifications, although see my point on individual step failiures. 

Monitoring jobs for individual step failures

When configuring SQL server jobs its pretty common to configure steps to jump to the next step even on failure, although this means that even with the above configured, it won't report an individual step failure (unless its the final step).

The solution I have found is to produce a simple stored procedure, that checks for failed steps and sends an email to notify data engineers of these.  This requires a few components and implementing this can be created using the steps below.

 

Monitoring overrunning jobs

 

Categorising jobs

SQL Server jobs have the option under the general tab (see below) to categorise the job, this is useful to do as it can allow jobs to be classified by their broad purpose, the team that looks after them and/or the area they serve.  This can then be used to separate out performance reporting of jobs and more sophisticated distribution of job errors/failures.  To support this a control to identify uncategorised jobs would also be useful, so that if they are relied upon that they are there.

Consider analysis to look at job performance

SQL agent writes all of its activity to system tables, this can then be drawn out using SQL and then analysed in Excel, SSRS or Power BI.  With Power BI being the more elegant solution.  This can allow analysis of overall job performance, propensity of job failures and breakdown of job performance by its component steps.

 

The only caveat to this is that by default SQL agent only holds logs for a certain period of time, the default history is 1,000 records the length of which that covers being based on how often and how many jobs are ran.  The best solution I have found to this is to create a job that archives off job history, so that a longer history is present.

Add comment

Comments

There are no comments yet.