Checking if your XML is actually well-formed

Sometimes you’ll find that you will have XML in your database. This could be for various reasons – from storing the XML after receiving an API response to keeping it in a table because a web developer couldn’t figure out another way to store their data. Sometimes – no matter how much you trust your source – you should question if the XML is well-formed. Let’s work out a few ways you can do that in a database.

First, let’s look at SQL Server.

One of the easiest ways is to attempt to explicitly attempt to convert it to an XML data type by using the TRY_CONVERT function.

DECLARE @XMLData VARCHAR(500);

SELECT @XMLData = '<book>
    <title lang="english">All about dogs</title>
    <author>Someone</author>
    <isin>true</isin>
    <daysuntilreturn>0</daysuntilreturn>
    <price/>
</book>'

SELECT TRY_CONVERT(XML, @XMLData);

This works fairly well. All we have to do is check to see if the result is not NULL and we will know that we have “real” XML that we can do something with.

Good XML
Bad XML – Notice the author tag is not closed

But, what if someone really screws up and puts in a number instead of real XML. Let’s see what happens then in SQL Server…

Problem…

Well, that’s just not good at all. What’s happening here is that SQL Server does not allow a number to be converted to an XML Data Type as seen in this handy dandy chart.

You can also use the following method which does go fairly deep into XML Queries…

DECLARE @TestTable TABLE (ID INT IDENTITY (1, 1) PRIMARY KEY, XmlColumn XML NOT NULL);
INSERT INTO @TestTable (XmlColumn) VALUES 
(N'<root><child /></root>'),
(N'<city>Miami</city><city>Orlando</city>'),
(N'Foo, this is not XML'),
(N'<root><child /></root>Foo'),
(N'<!-- -->'),
(N'<root><parent><child /></parent></root>');

SELECT * 
    , XmlColumn.query('<root>{
        for $x in /*
        return
        if ($x instance of element()) then <r/> else ()
        }</root>').value('count(/root/r)', 'INT') AS Result
    , XmlColumn.query('for $x in .
        return if ($x eq /*[1]) then "well-formed" else "not well-formed"').value('.','VARCHAR(20)') AS [well-formed]
FROM @TestTable;

This one does take a few seconds to run for even just these 6 rows of data, but it gives back the answers as expected. Notice, though, the setup that has to be done to complete this though…

Now, I know you’re asking yourself – “Self, ” – for that’s what you call yourself – “Self, how do I do this in Snowflake?” Well, that my friends, is actually pretty simple. In Snowflake, we just need to use the CHECK_XML function.

Now, notice that this is actually the opposite of SQL Server. If the result is NULL, then it’s a good thing!

For this one, I changed the closing tag to “badname” instead of “name” making it poorly formed. Also, notice that Snowflake tells you what it sees as wrong with the data.

And back to our check earlier that failed spectacularly on SQL Server. Notice that on Snowflake it runs with no issues, just tells us what is wrong.

Now, if you have progressed as an organization and think that XML is awful, but JSON is where it’s at – Snowflake can do the same for you with the CHECK_JSON function. Works the same way as CHECK_XML just with JSON data.

And that, my friends, is all for this post. Hopefully, if/when you find XML or JSON in your database, please remember these great functions! And now, one of my favorite videos of all time about XML!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.