More Than An Output

No more than an output to me, but it sure was to SQL Server!  This issue kept me busy over the weekend, and most definitely gave me a case of Saturday Night SQL Fever (as opposed to "SQL Saturday Night Fever", for which the only cure is more karaoke).

When executing a stored procedure with an OUTPUT parameter, I always believed that the pre-execution state of a variable being written to by the output parameter didn’t matter, as it would be overwritten. A quick poll on Twitter showed me that I’m not alone, however I recently discovered this isn’t always the case.

Business requirements at my workplace are such that when a SQL Server Agent job needs to be modified, a script must be generated that does not update the job, but rather drops it and re-creates a new job. I was working on such a script (simplified and shown below) which uses the sp_delete_job and sp_add_job stored procedures to drop and add the job respectively.  Here's what I came up with:

 1SET NOCOUNT ON;
 2DECLARE @jobId uniqueidentifier;
 3DECLARE @JobName NVARCHAR(128);
 4DECLARE @ReturnCode INT;
 5DECLARE @errCode INT;
 6SET @JobName = 'AJob';
 7
 8BEGIN TRANSACTION;
 9SET @jobId = (SELECT job_id FROM msdb.dbo.sysjobs WHERE name = @JobName);
10
11-- delete the job if it exists
12IF(@jobId IS NOT NULL)
13BEGIN
14   EXEC @ReturnCode = msdb.dbo.sp_delete_job @job_id=@jobId
15   IF(@@ERROR != 0 OR @ReturnCode != 0)
16   BEGIN
17      SET @errCode = @@ERROR;
18      GOTO QuitWithRollback;
19   END
20   PRINT 'deleted job';
21   --SET @jobId = NULL;
22END
23
24-- create the job
25EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=@JobName,
26      @enabled=0,
27      @job_id = @jobId OUTPUT
28IF(@@ERROR != 0 OR @ReturnCode != 0)
29BEGIN
30   SET @errCode = @@ERROR;
31   GOTO QuitWithRollback;
32END
33PRINT 'added job';
34
35COMMIT TRANSACTION;
36RETURN;
37
38QuitWithRollback:
39IF(@@TRANCOUNT > 0)
40   ROLLBACK TRANSACTION;
41
42PRINT 'Err: ' + CAST(@errCode AS varchar(10)) + ' ret: ' + cast(@ReturnCode as varchar(10));

If you run this script on SQL Server 2008 SP1, you’ll see that the first time it’s run the job is added successfully. Running it again shows that the delete succeeds but the add fails and returns error 14274: “Cannot add, update, or delete a job (or its steps or schedules) that originated from an MSX server.”

Outputs
Multiple Outputs

Scouring the web for information on error 14274, I found plenty of articles & blog posts with regard to SQL 2000 talking about how this error occurs when you change the name of a machine, and that you need to fix it by altering values in the originating_server column of the msdb.dbo.sysjobs table. The problem is that the machine definitely didn’t change names since the job was created, and in SQL 2008 there is no originating_server column, as it has apparently been replaced by originating_server_id instead.

After much head scratching, I found that the cause of this error was the state of the @jobId variable prior to the execution of sp_add_job. If you uncomment the SET @jobId = NULL line in the above example and run again, you’ll see that it completes successfully the first time and every time thereafter.

Checking out the T-SQL code that’s under the hood of sp_add_jobs (which admittedly I should have done sooner), I found this:

 1IF (@job_id IS NULL)
 2BEGIN
 3   -- Assign the GUID
 4   SELECT @job_id = NEWID()
 5END
 6ELSE
 7BEGIN
 8   -- A job ID has been provided, so check that the caller is SQLServerAgent (inserting an MSX job)
 9   IF (PROGRAM_NAME() NOT LIKE N'SQLAgent%')
10   BEGIN
11      RAISERROR(14274, -1, -1)
12      RETURN(1) -- Failure
13   END
14END

There are 2 cases for @job_id: either it’s NULL and a GUID needs to be assigned, or it’s not NULL and an error will be thrown if it wasn’t called by the SQL Server Agent.

Looking at the sp_add_job entry in Books Online, it shows that the @job_id parameter is an OUTPUT parameter with no mention of an input value being used, hence why I assumed any value stored in the @jobId variable would simply be overwritten. BOL does mention “with a default of NULL” but that wording didn't get the point across (to me at least). If one creates a variable with a default of NULL and assigns a value later the default is still NULL, how does that violate the documentation's wishes?

In my opinion, there's 2 ways to reduce confusion on this issue:

  1. Modify BOL to mention a little more forcefully that @job_id must be NULL when adding a new job. Something like “passing a non-null variable to @job_id is for internal use only” would be a great improvement.

  2. Return a more descriptive error code when @job_id is not NULL.  While I'm sure the error code makes sense if you have complete understanding of what's happening inside sp_add_job, it's not very helpful if all you've done is read BOL.