Sunday, September 14, 2014

Doing an UPSERT (update or insert if new) in MS SQL Server and MySQL

The UPSERT is an incredibly useful SQL instruction for databases, especially for data synchronization and key-value data structures. Unfortunately it is ignored by some databases.

An UPSERT is an SQL instruction which attempts to update a record (with a particular primary key) and will automatically create a new record with the updated data if it does not exist. This would otherwise be accomplished by first doing a SELECT and then doing an INSERT or UPDATE depending on whether the SELECT found anything. This would require two separate queries plus some programmed logic. The UPSERT allows you to do these two queries in one.

Here's how to do it for MySQL and MS SQL Server. The examples below are based on a table called "tbl" which contains 3 fields: "pk" (primary key), "f1" (field 1), and "f2" (field 2). In both examples the value being inserted is "key" for pk, "data1" for f1, and "data2" for f2.

Notice: Be sure to avoid using autoincrement primary keys for tables in which you want to upsert or it wouldn't make sense to insert a specific key. It is also advisable to lock the tables prior to upserting if the data is busy with multiple upserts being performed concurrently.


The ON DUPLICATE KEY UPDATE statement allows you to perform an update if an insert is not possible due to an attempt to insert a duplicate value in a field with a UNIQUE constraint.

INSERT INTO tbl (pk, f1, f2) 
  VALUES ('key', 'data1', 'data2')
    pk = VALUES(pk),
    f1 = VALUES(f1),
    f2 = VALUES(f2)

In order to avoid rewriting the values twice, we can refer to values already specified in an INSERT by using the VALUES function.

MS SQL Server

In MS SQL, an UPSERT can be achieved using a MERGE. This is originally meant to be used to insert many rows together into a table and then perform some actions (delete, update, insert, etc.) depending on whether or not a particular field already exists.

    (SELECT pk='key', f1='data1', f2='data2') AS s
  ON =
    UPDATE SET, f1=s.f1, f2=s.f2
    INSERT (pk, f1, f2)
    VALUES (, s.f1, s.f2); 

In order to avoid rewriting the values three times, the values were placed in a named nested SELECT. "s" refers to the source of the values, that is the values to add, whilst "t" refers to the target of the values, that is the table in which to add the values.