Home > HowTo > Update or insert rows in SQL

Update or insert rows in SQL

Some operations are not as easy in SQL as in procedural or imperative languages. For example while it is very intuitive to make an update or insert (also known as “upsert”) operation on a std::vector in C++, it is tricky in SQL. The upsert operation is: update data if already in the set and insert it otherwise.

As for recursion in SQL, the way to perform an upsert depends on the DBMS and its version.

This article introduces different methods to upsert data in a table. Many solutions are introduced because not all DBMS are compatible with SQL:2008 and the MERGE statement. As of today, it is compatible from these versions:

MySQL does not provide this statement but its own SQL extentions:

Microsoft SQL Server 2000 and 2005 don’t provide this feature but upsert is possible using the output clause for the UPDATE statement.

Finally, the only solution I am aware of for PostgreSQL is to use a cursor and test each line: rows existing in the target table are updated and the others are inserted.

Beware: all examples were written under Microsoft SQL Server 2008 (except the MySQL part). They may use some specific code but should remain fully understandable by everyone used to SQL.

Dataset

Before introducing all methods, let’s set up the sample data.

@tUsers is the working table. It contains a list of users with one particular field: active. When inserting data, this field is not set.

declare	@tUsers		table
(
	id		int		NOT NULL,
	firstName	varchar (30)	NOT NULL,
	lastName	varchar (30)	NOT NULL,
	active		tinyint		NULL
)

insert into @tUsers	(id, firstname,	lastname)
values	(1, 'Clément', 'Perrod')

insert into @tUsers	(id, firstname,	lastname)
values	(2, 'Thomas', 'Dupont')

insert into @tUsers	(id, firstname,	lastname)
values	(3, 'John', 'Doe')

The table contains the following rows:

id	firstName	lastName	active
1	Clément	        Perrod	        NULL
2	Thomas	        Dupont	        NULL
3	John	        Doe	        NULL

@tStaging will be the source table for upsert in @tUsers. Some rows are new and others are just an update for the active field.

declare	@tStaging	table
(
	id		int		NOT NULL,
	firstName	varchar (30)	NOT NULL,
	lastName	varchar (30)	NOT NULL,
	active		tinyint		NULL
)

insert into @tStaging	(id, firstname,	lastname, active)
values	(1, 'Clément', 'Perrod', 1)

insert into @tStaging	(id, firstname,	lastname, active)
values	(3, 'John', 'Doe', 0)

insert into @tStaging	(id, firstname,	lastname)
values	(4, 'Elise', 'Duffin')

insert into @tStaging	(id, firstname,	lastname)
values	(5, 'Anonie', 'Mousse')

The table contains following rows:

id	firstName	lastName	active
1	Clément		Perrod		1
3	John		Doe		0
4	Elise		Duffin		NULL
5	Anonie		Mousse		NULL

Methods

MERGE statement

The easiest way to perform upsert is to use the MERGE statement introduced by SQL:2008. Its syntax is pretty intuitive: different treatments are applied on the target table using data from the source table depending on a merging clause. In this article it is used for upsert, but it is also possible to use it to delete data in target table.

MERGE 
    [ TOP ( expression ) [ PERCENT ] ] 
    [ INTO ] <target_table> [ WITH ( <merge_hint> ) ] [ [ AS ] table_alias ]
    USING <table_source> 
    ON <merge_search_condition>
    [ WHEN MATCHED [ AND <clause_search_condition> ]
        THEN <merge_matched> ] [ ...n ]
    [ WHEN NOT MATCHED [ BY TARGET ] [ AND <clause_search_condition> ]
        THEN <merge_not_matched> ]
    [ WHEN NOT MATCHED BY SOURCE [ AND <clause_search_condition> ]
        THEN <merge_matched> ] [ ...n ]
    [ <output_clause> ]
    [ OPTION ( <query_hint> [ ,...n ] ) ]    
;

The upsert is as easy as this:

merge	@tUsers		as us
using	@tStaging	as st
on	us.id = st.id
/*	Update	*/
when matched then
	update
	set		us.active = st.active
/*	Insert	*/
when not matched then
	insert (id, firstName, lastName, active)
	values (st.id, st.firstName, st.lastName, st.active)
/*	Mandatory semi colonn	*/
;

ON DUPLICATE KEY clause of the INSERT statement

This upsert method is based on the ON DUPLICATE KEY clause for the INSERT statement, which is available in MySQL 5 and above.

This solution only works if the table @tUsers has a primary key. Thus, for this example to work, the table declaration is:

CREATE TABLE `tUsers` (
  `id` int(11) NOT NULL,
  `firstName` varchar(30) NOT NULL,
  `lastName` varchar(30) NOT NULL,
  `active` tinyint(4) DEFAULT NULL,
  PRIMARY KEY (`id`,`firstName`,`lastName`)
)
;
INSERT INTO tUsers	(id, firstname,	lastname)
VALUES	(1, 'Clément', 'Perrod')
;
INSERT INTO tUsers	(id, firstname,	lastname)
VALUES	(2, 'Thomas', 'Dupont')
;
INSERT INTO tUsers	(id, firstname,	lastname)
VALUES	(3, 'John', 'Doe')

CREATE TABLE `tStaging` (
  `id` int(11) NOT NULL,
  `firstName` varchar(30) NOT NULL,
  `lastName` varchar(30) NOT NULL,
  `active` tinyint(4) DEFAULT NULL
  )
;
INSERT INTO tStaging	(id, firstname,	lastname, active)
VALUES	(1, 'Clément', 'Perrod', 1)
;
INSERT INTO tStaging	(id, firstname,	lastname, active)
VALUES	(3, 'John', 'Doe', 0)
;
INSERT INTO tStaging	(id, firstname,	lastname)
VALUES	(4, 'Elise', 'Duffin')
;
INSERT INTO tStaging	(id, firstname,	lastname)
VALUES	(5, 'Anonie', 'Mousse')

Using the clause is as simple as follows:

insert into	
	tUsers
select	st.id,
	st.firstName,
	st.lastName,
	st.active
from	tStaging	st
ON DUPLICATE KEY 
UPDATE	active = st.active

OUTPUT clause of the UPDATE statement

This upsert method is based on the OUTPUT clause of the UPDATE statement, which is available in Microsoft SQL Server 2000 and 2005 and above.

First, an update is performed on @tUsers, using the rows of @tStaging. The OUTPUT clause keeps all updated rows in the @tUpdated table. We are now able to see which rows of @tStaging have been used and which have to be inserted.

declare	@tUpdated	table
(
	id		int
)

update	@tUsers
set	active = st.active
output	inserted.id
into	@tUpdated
from	@tStaging	st,
	@tUsers		us
where	us.id = st.id

The second step is a simple INSERT statement, in which rows that are not in @tUpdated are selected from @tStaging using a LEFT OUTER JOIN (or a NOT EXISTS statement).

insert into	
	@tUsers
select	st.id,
	st.firstName,
	st.lastName,
	st.active
from	@tStaging	st
left join
	@tUpdated	up
on	up.id = st.id
WHERE	up.id is null

Final points

In this article, I have been through the most famous solutions to update or insert rows in a table (except the cursor one). Even if some methods are easier to use, it all depends on the installed DBMS.

MERGE statement ON DUPLICATE KEY clause for INSERT statement output clause for UPDATE statement cursors
  • IBM DB2
  • Microsoft SQL Server 2008
  • Oracle 9i and later versions
  • Microsoft SQL Server 2000
  • Microsoft SQL Server 2005
  • MySQL 5.5 and later version
  • PostgreSQL

Stay tuned: we will discuss later why there are so many differences between DBMS.

Read more about merge

About these ads
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: