A Tale of a Trigger

Sep 11, 2018 · T-SQL TSQL Tuesday ·

This month's T-SQL Tuesday is being hosted by Steve Jones, who has asked for blog posts today on experiences we have had with triggers, good or bad.

I am not a fan of triggers. While they represent a very simplistic construct of "if something happens, do something else", I've yet to encounter a situation where they weren't more trouble than they were worth. In my opinion, their biggest flaw is that they're very easy to forget about, and then tend to pop up and create problems when least expected. I avoid them like the plague, and happily tell my clients to do the same. Unfortunately I still have to deal with them frequently because they are commonly used in vendor applications. This scenario is exactly what brought about this story.

The Good

I was helping an organization add high availability to their environment by deploying Availability Groups. This particular server hosted several different third-party applications, and the decision was made to give each application its own AG. The AG deployment was successful and everything seemed fine until we started testing failovers. At this point we noticed that one of the applications was using a server-level DDL trigger, which was written roughly like this:

 1CREATE TRIGGER [BadDDLTrigger]
 2    ON ALL SERVER
 3    WITH EXECUTE AS 'sa'
 4    AFTER DDL_EVENTS
 5AS
 6BEGIN TRY
 7	DECLARE @eventdata XML = EVENTDATA();
 8	EXECUTE AppDB.dbo.AddEvents @eventdata
 9END TRY
10BEGIN CATCH
11	ROLLBACK
12END CATCH

Very simply, this trigger collects information about DDL events via the EVENTDATA() function and saves them to the application's database, AppDB, with a stored procedure called dbo.AddEvents. This might not seem to bad at first, but there's a few problems with it which can cause major issues.

The Bad

First, the scope. While the application that deployed this trigger has its own database, AppDB, this trigger is firing for events on the entire server, which is what the ON ALL SERVER line means. Any qualifying event on this server, even if it pertains to another application with a separate database, will be written into this database. And what is a "qualifying event"? Literally any DDL statement. The line AFTER DDL_EVENTS specifies the very top of the event hierarchy used by DDL triggers.

So to recap on scope, this application is capturing all DDL statements on the entire server and saving a copy for itself. This application is seeing (and recording) plenty of events that it has no need to see. If this were a healthcare application or a system that dealt with PII it would be a legal nightmare, but fortunately it isn't.

However, scope isn't the only issue.

Let's look at error handling. This trigger is using TRY/CATCH syntax for error handling. If an error occurs and the CATCH block is entered, it rolls back. This may be a good practice, depending on the intent. Don't forget, triggers execute in the context of the transaction that triggered them. If this trigger is fired by an event, such as a table being created, and the trigger is unable to capture the event, it will issue a ROLLBACK, undoing the table creation. If the goal is to make sure that every operation is captured with nothing at all sneaking by, this is a reasonable choice. But what happens if the rollback is unintended?

The Ugly

Remember, the original intent here was to implement Availability Groups, and there were multiple AGs, one for each application. Consider a scenario where two nodes exist, which we will call Node 1 and Node 2.

Assume all Availability Groups are running with Node 1 as primary. Then the AG containing this application's database "AppDB" fails over to Node 2, but all other AGs stay behind on Node 1. That DDL trigger still exists on Node 1, but the stored procedure in AppDB cannot execute there since writes can only occur on the primary node for AppDB, which is Node 2. At this point, the trigger causes an error, and rolls back.

Any DDL operation taking place on Node 1 for any database will fail in this case. And that's just not acceptable. Instead of not playing nicely in the sandbox, this application is taking the entire sandbox with it to another server, and nobody else gets to play at all.

The Fix

There are several possible fixes for this situation:

Rewrite the DDL trigger to detect if it is executing on the primary replica for AppDB or not, possibly by using the sys.fn_hadr_is_primary_replica() function.
Change the scope of the DDL trigger to only affect the application's database and not the entire server.
Remove the DDL trigger entirely and replace with a different technology, such as SQL Server Audit.

In the end, I contacted the vendor and explained the issue. They stated that their product does support Availability Groups, and agreed that this trigger was inherently incompatible with them. Their solution was "Go ahead and remove that trigger. We really only use it for internal testing anyway, it's not necessary in customer environments."

And with that, the problem went away. If only all my issues could be solved that easily!

A big thank you to Steve Jones for hosting this month's T-SQL Tuesday! If you'd like to help host a T-SQL Tuesday, contact Steve.