Xml Functions

This page lists all xml functions available in Spark SQL.


from_xml

from_xml(xmlStr, schema[, options]) - Returns a struct value with the given xmlStr and schema.

Examples:

> SELECT from_xml('<p><a>1</a><b>0.8</b></p>', 'a INT, b DOUBLE');
 {"a":1,"b":0.8}
> SELECT from_xml('<p><time>26/08/2015</time></p>', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
 {"time":2015-08-26 00:00:00}
> SELECT from_xml('<p><teacher>Alice</teacher><student><name>Bob</name><rank>1</rank></student><student><name>Charlie</name><rank>2</rank></student></p>', 'STRUCT<teacher: STRING, student: ARRAY<STRUCT<name: STRING, rank: INT>>>');
 {"teacher":"Alice","student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}

Since: 4.0.0


schema_of_xml

schema_of_xml(xml[, options]) - Returns schema in the DDL format of XML string.

Examples:

> SELECT schema_of_xml('<p><a>1</a></p>');
 STRUCT<a: BIGINT>
> SELECT schema_of_xml('<p><a attr="2">1</a><a>3</a></p>', map('excludeAttribute', 'true'));
 STRUCT<a: ARRAY<BIGINT>>

Since: 4.0.0


to_xml

to_xml(expr[, options]) - Returns a XML string with a given struct value

Examples:

> SELECT to_xml(named_struct('a', 1, 'b', 2));
 <ROW>
     <a>1</a>
     <b>2</b>
 </ROW>
> SELECT to_xml(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
 <ROW>
     <time>26/08/2015</time>
 </ROW>

Since: 4.0.0


xpath

xpath(xml, xpath) - Returns a string array of values within the nodes of xml that match the XPath expression.

Examples:

> SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()');
 ["b1","b2","b3"]
> SELECT xpath('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b');
 [null,null,null]

Since: 2.0.0


xpath_boolean

xpath_boolean(xml, xpath) - Returns true if the XPath expression evaluates to true, or if a matching node is found.

Examples:

> SELECT xpath_boolean('<a><b>1</b></a>','a/b');
 true

Since: 2.0.0


xpath_double

xpath_double(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Examples:

> SELECT xpath_double('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3.0

Since: 2.0.0


xpath_float

xpath_float(xml, xpath) - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Examples:

> SELECT xpath_float('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3.0

Since: 2.0.0


xpath_int

xpath_int(xml, xpath) - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Examples:

> SELECT xpath_int('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3

Since: 2.0.0


xpath_long

xpath_long(xml, xpath) - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Examples:

> SELECT xpath_long('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3

Since: 2.0.0


xpath_number

xpath_number(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Examples:

> SELECT xpath_number('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3.0

Since: 2.0.0


xpath_short

xpath_short(xml, xpath) - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Examples:

> SELECT xpath_short('<a><b>1</b><b>2</b></a>', 'sum(a/b)');
 3

Since: 2.0.0


xpath_string

xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the XPath expression.

Examples:

> SELECT xpath_string('<a><b>b</b><c>cc</c></a>','a/c');
 cc

Since: 2.0.0