2021-12-21 04:02:07 +00:00
# XML
2021-12-21 05:52:54 +00:00
Encode and decode to and from XML. Whitespace is not conserved for round trips - but the order of the fields are.
2021-12-21 04:02:07 +00:00
2021-12-21 05:52:54 +00:00
Consecutive xml nodes with the same name are assumed to be arrays.
2021-12-21 04:02:07 +00:00
2022-10-23 23:09:42 +00:00
XML content data, attributes processing instructions and directives are all created as plain fields.
This can be controlled by:
| Flag | Default |Sample XML |
| -- | -- | -- |
| `--xml-attribute-prefix` | `+` (changing to `+@` soon) | Legs in ```< cat legs = "4" /> ``` |
| `--xml-content-name` | `+content` | Meow in ```< cat > Meow < fur > true</ true ></ cat > ``` |
| `--xml-directive-name` | `+directive` | ```<!DOCTYPE config system "blah"> ``` |
| `--xml-proc-inst-prefix` | `+p_` | ```<?xml version="1"?> ``` |
{% hint style="warning" %}
Default Attribute Prefix will be changing in v4.30!
In order to avoid name conflicts (e.g. having an attribute named "content" will create a field that clashes with the default content name of "+content") the attribute prefix will be changing to "+@".
This will affect users that have not set their own prefix and are not roundtripping XML changes.
{% endhint %}
## Encoder / Decoder flag options
In addition to the above flags, there are the following xml encoder/decoder options controlled by flags:
| Flag | Default | Description |
| -- | -- | -- |
| `--xml-strict-mode` | false | Strict mode enforces the requirements of the XML specification. When switched off the parser allows input containing common mistakes. See [the Golang xml decoder ](https://pkg.go.dev/encoding/xml#Decoder ) for more details.|
| `--xml-keep-namespace` | true | Keeps the namespace of attributes |
| `--xml-raw-token` | true | Does not verify that start and end elements match and does not translate name space prefixes to their corresponding URLs. |
| `--xml-skip-proc-inst` | false | Skips over processing instructions, e.g. `<?xml version="1"?>` |
| `--xml-skip-directives` | false | Skips over directives, e.g. ```<!DOCTYPE config system "blah"> ``` |
See below for examples
2021-12-21 04:02:07 +00:00
2022-01-22 01:35:33 +00:00
## Parse xml: simple
Notice how all the values are strings, see the next example on how you can fix that.
2021-12-21 04:02:07 +00:00
2022-01-22 01:35:33 +00:00
Given a sample.xml file of:
```xml
<?xml version="1.0" encoding="UTF-8"?>
< cat >
< says > meow< / says >
< legs > 4< / legs >
< cute > true< / cute >
< / cat >
2021-12-21 04:02:07 +00:00
```
2022-01-22 01:35:33 +00:00
then
```bash
2023-09-01 01:52:58 +00:00
yq -oy '.' sample.xml
2021-12-21 04:02:07 +00:00
```
2022-01-22 01:35:33 +00:00
will output
```yaml
2022-10-23 23:09:42 +00:00
+p_xml: version="1.0" encoding="UTF-8"
2022-01-22 01:35:33 +00:00
cat:
says: meow
legs: "4"
cute: "true"
2021-12-21 05:52:54 +00:00
```
2022-01-22 01:35:33 +00:00
## Parse xml: number
All values are assumed to be strings when parsing XML, but you can use the `from_yaml` operator on all the strings values to autoparse into the correct type.
2021-12-21 05:52:54 +00:00
2021-12-21 04:02:07 +00:00
Given a sample.xml file of:
```xml
<?xml version="1.0" encoding="UTF-8"?>
2022-01-22 01:35:33 +00:00
< cat >
< says > meow< / says >
< legs > 4< / legs >
< cute > true< / cute >
< / cat >
2021-12-21 04:02:07 +00:00
```
then
```bash
2023-09-01 01:52:58 +00:00
yq -oy ' (.. | select(tag == "!!str")) |= from_yaml' sample.xml
2021-12-21 04:02:07 +00:00
```
will output
```yaml
2022-10-23 23:09:42 +00:00
+p_xml: version="1.0" encoding="UTF-8"
2022-01-22 01:35:33 +00:00
cat:
says: meow
legs: 4
cute: true
2021-12-21 04:02:07 +00:00
```
## Parse xml: array
Consecutive nodes with identical xml names are assumed to be arrays.
Given a sample.xml file of:
```xml
<?xml version="1.0" encoding="UTF-8"?>
2022-01-22 01:35:33 +00:00
< animal > cat< / animal >
< animal > goat< / animal >
2021-12-21 04:02:07 +00:00
```
then
```bash
2023-09-01 01:52:58 +00:00
yq -oy '.' sample.xml
2021-12-21 04:02:07 +00:00
```
will output
```yaml
2022-10-23 23:09:42 +00:00
+p_xml: version="1.0" encoding="UTF-8"
2021-12-21 04:02:07 +00:00
animal:
2022-01-22 01:35:33 +00:00
- cat
- goat
2021-12-21 04:02:07 +00:00
```
2023-09-01 01:52:58 +00:00
## Parse xml: force as an array
In XML, if your array has a single item, then yq doesn't know its an array. This is how you can consistently force it to be an array. This handles the 3 scenarios of having nothing in the array, having a single item and having multiple.
Given a sample.xml file of:
```xml
< zoo > < animal > cat< / animal > < / zoo >
```
then
```bash
yq -oy '.zoo.animal |= ([] + .)' sample.xml
```
will output
```yaml
zoo:
animal:
- cat
```
2023-09-26 04:43:08 +00:00
## Parse xml: force all as an array
Because of the way yq works, when updating everything you need to update the children before the parents. By default `..` will match parents first, so we reverse that before updating.
Given a sample.xml file of:
```xml
< zoo > < thing > < frog > boing< / frog > < / thing > < / zoo >
```
then
```bash
yq -oy '([..] | reverse | .[]) |= [] + .' sample.xml
```
will output
```yaml
- zoo:
- thing:
- frog:
- boing
```
2021-12-21 04:02:07 +00:00
## Parse xml: attributes
2022-01-15 00:57:59 +00:00
Attributes are converted to fields, with the default attribute prefix '+'. Use '--xml-attribute-prefix` to set your own.
2021-12-21 04:02:07 +00:00
Given a sample.xml file of:
```xml
<?xml version="1.0" encoding="UTF-8"?>
< cat legs = "4" >
< legs > 7< / legs >
< / cat >
```
then
```bash
2023-09-01 01:52:58 +00:00
yq -oy '.' sample.xml
2021-12-21 04:02:07 +00:00
```
will output
```yaml
2022-10-23 23:09:42 +00:00
+p_xml: version="1.0" encoding="UTF-8"
2021-12-21 04:02:07 +00:00
cat:
2022-11-10 11:22:55 +00:00
+@legs: "4"
2021-12-21 04:02:07 +00:00
legs: "7"
```
## Parse xml: attributes with content
2022-01-22 01:35:33 +00:00
Content is added as a field, using the default content name of `+content` . Use `--xml-content-name` to set your own.
2021-12-21 04:02:07 +00:00
Given a sample.xml file of:
```xml
<?xml version="1.0" encoding="UTF-8"?>
< cat legs = "4" > meow< / cat >
```
then
```bash
2023-09-01 01:52:58 +00:00
yq -oy '.' sample.xml
2021-12-21 04:02:07 +00:00
```
will output
```yaml
2022-10-23 23:09:42 +00:00
+p_xml: version="1.0" encoding="UTF-8"
2021-12-21 04:02:07 +00:00
cat:
+content: meow
2022-11-10 11:22:55 +00:00
+@legs: "4"
2021-12-21 04:02:07 +00:00
```
2022-11-27 06:29:27 +00:00
## Parse xml: content split between comments/children
Multiple content texts are collected into a sequence.
Given a sample.xml file of:
```xml
< root > value <!-- comment --> anotherValue < a > frog< / a > cool!< / root >
```
then
```bash
2023-09-01 01:52:58 +00:00
yq -oy '.' sample.xml
2022-11-27 06:29:27 +00:00
```
will output
```yaml
root:
+content: # comment
- value
- anotherValue
- cool!
a: frog
```
2022-03-28 03:05:10 +00:00
## Parse xml: custom dtd
2022-10-23 23:09:42 +00:00
DTD entities are processed as directives.
2022-03-28 03:05:10 +00:00
Given a sample.xml file of:
```xml
<?xml version="1.0"?>
< !DOCTYPE root [
<!ENTITY writer "Blah.">
<!ENTITY copyright "Blah">
]>
< root >
< item > &writer; ©right; < / item >
< / root >
```
then
```bash
2023-09-01 01:52:58 +00:00
yq '.' sample.xml
2022-03-28 03:05:10 +00:00
```
will output
2022-10-23 23:09:42 +00:00
```xml
<?xml version="1.0"?>
< !DOCTYPE root [
<!ENTITY writer "Blah.">
<!ENTITY copyright "Blah">
]>
< root >
< item > & writer;& copyright;< / item >
< / root >
```
## Parse xml: skip custom dtd
DTDs are directives, skip over directives to skip DTDs.
Given a sample.xml file of:
```xml
<?xml version="1.0"?>
< !DOCTYPE root [
<!ENTITY writer "Blah.">
<!ENTITY copyright "Blah">
]>
< root >
< item > &writer; ©right; < / item >
< / root >
```
then
```bash
2023-09-01 01:52:58 +00:00
yq --xml-skip-directives '.' sample.xml
2022-10-23 23:09:42 +00:00
```
will output
```xml
<?xml version="1.0"?>
< root >
< item > & writer;& copyright;< / item >
< / root >
2022-03-28 03:05:10 +00:00
```
2022-01-15 00:57:59 +00:00
## Parse xml: with comments
A best attempt is made to preserve comments.
Given a sample.xml file of:
```xml
<!-- before cat -->
< cat >
<!-- in cat before -->
< x > 3<!-- multi
line comment
for x -->< / x >
<!-- before y -->
< y >
<!-- in y before -->
< d > <!-- in d before --> z<!-- in d after --> < / d >
<!-- in y after -->
< / y >
<!-- in_cat_after -->
< / cat >
<!-- after cat -->
```
then
```bash
2023-09-01 01:52:58 +00:00
yq -oy '.' sample.xml
2022-01-15 00:57:59 +00:00
```
will output
```yaml
# before cat
cat:
# in cat before
x: "3" # multi
# line comment
# for x
# before y
y:
# in y before
# in d before
d: z # in d after
# in y after
# in_cat_after
# after cat
```
2022-06-14 23:40:31 +00:00
## Parse xml: keep attribute namespace
2022-11-13 00:13:05 +00:00
Defaults to true
2022-06-14 23:40:31 +00:00
Given a sample.xml file of:
```xml
<?xml version="1.0"?>
2022-11-13 00:13:05 +00:00
< map xmlns = "some-namespace" xmlns:xsi = "some-instance" xsi:schemaLocation = "some-url" > < / map >
2022-06-14 23:40:31 +00:00
```
then
```bash
2023-09-01 01:52:58 +00:00
yq --xml-keep-namespace=false '.' sample.xml
2022-06-14 23:40:31 +00:00
```
will output
```xml
2022-10-23 23:09:42 +00:00
<?xml version="1.0"?>
2022-11-13 00:13:05 +00:00
< map xmlns = "some-namespace" xsi = "some-instance" schemaLocation = "some-url" > < / map >
2022-06-14 23:40:31 +00:00
```
instead of
```xml
2022-10-23 23:09:42 +00:00
<?xml version="1.0"?>
2022-11-13 00:13:05 +00:00
< map xmlns = "some-namespace" xmlns:xsi = "some-instance" xsi:schemaLocation = "some-url" > < / map >
2022-06-14 23:40:31 +00:00
```
## Parse xml: keep raw attribute namespace
2022-11-13 00:13:05 +00:00
Defaults to true
2022-06-14 23:40:31 +00:00
Given a sample.xml file of:
```xml
<?xml version="1.0"?>
2022-11-13 00:13:05 +00:00
< map xmlns = "some-namespace" xmlns:xsi = "some-instance" xsi:schemaLocation = "some-url" > < / map >
2022-06-14 23:40:31 +00:00
```
then
```bash
2023-09-01 01:52:58 +00:00
yq --xml-raw-token=false '.' sample.xml
2022-06-14 23:40:31 +00:00
```
will output
```xml
2022-10-23 23:09:42 +00:00
<?xml version="1.0"?>
< map xmlns = "some-namespace" xmlns:xsi = "some-instance" some-instance:schemaLocation = "some-url" > < / map >
2022-06-14 23:40:31 +00:00
```
instead of
```xml
2022-10-23 23:09:42 +00:00
<?xml version="1.0"?>
2022-11-13 00:13:05 +00:00
< map xmlns = "some-namespace" xmlns:xsi = "some-instance" xsi:schemaLocation = "some-url" > < / map >
2022-06-14 23:40:31 +00:00
```
2021-12-21 04:56:08 +00:00
## Encode xml: simple
Given a sample.yml file of:
```yaml
cat: purrs
```
then
```bash
2022-01-27 06:21:10 +00:00
yq -o=xml '.' sample.yml
2021-12-21 04:56:08 +00:00
```
will output
```xml
2022-01-15 00:57:59 +00:00
< cat > purrs< / cat >
```
2021-12-21 04:56:08 +00:00
## Encode xml: array
Given a sample.yml file of:
```yaml
pets:
cat:
- purrs
- meows
```
then
```bash
2022-01-27 06:21:10 +00:00
yq -o=xml '.' sample.yml
2021-12-21 04:56:08 +00:00
```
will output
```xml
< pets >
< cat > purrs< / cat >
< cat > meows< / cat >
2022-01-15 00:57:59 +00:00
< / pets >
```
2021-12-21 04:56:08 +00:00
2021-12-21 05:08:37 +00:00
## Encode xml: attributes
Fields with the matching xml-attribute-prefix are assumed to be attributes.
Given a sample.yml file of:
```yaml
cat:
2022-11-10 11:22:55 +00:00
+@name: tiger
2021-12-21 05:08:37 +00:00
meows: true
```
then
```bash
2022-01-27 06:21:10 +00:00
yq -o=xml '.' sample.yml
2021-12-21 05:08:37 +00:00
```
will output
```xml
< cat name = "tiger" >
< meows > true< / meows >
2022-01-15 00:57:59 +00:00
< / cat >
```
2021-12-21 05:08:37 +00:00
2021-12-21 05:19:27 +00:00
## Encode xml: attributes with content
Fields with the matching xml-content-name is assumed to be content.
Given a sample.yml file of:
```yaml
cat:
2022-11-10 11:22:55 +00:00
+@name: tiger
2021-12-21 05:19:27 +00:00
+content: cool
```
then
```bash
2022-01-27 06:21:10 +00:00
yq -o=xml '.' sample.yml
2021-12-21 05:19:27 +00:00
```
will output
```xml
2022-01-15 00:57:59 +00:00
< cat name = "tiger" > cool< / cat >
```
## Encode xml: comments
A best attempt is made to copy comments to xml.
Given a sample.yml file of:
```yaml
2023-03-01 23:57:54 +00:00
#
2022-10-28 03:16:46 +00:00
# header comment
2022-01-15 00:57:59 +00:00
# above_cat
2023-03-01 23:57:54 +00:00
#
2022-01-15 00:57:59 +00:00
cat: # inline_cat
# above_array
array: # inline_array
- val1 # inline_val1
# above_val2
- val2 # inline_val2
# below_cat
```
then
```bash
2022-01-27 06:21:10 +00:00
yq -o=xml '.' sample.yml
2022-01-15 00:57:59 +00:00
```
will output
```xml
2022-10-28 03:16:46 +00:00
<!--
2023-03-01 23:57:54 +00:00
header comment
above_cat
-->
<!-- inline_cat -->
< cat > <!-- above_array inline_array -->
2022-01-15 00:57:59 +00:00
< array > val1<!-- inline_val1 --> < / array >
< array > <!-- above_val2 --> val2<!-- inline_val2 --> < / array >
< / cat > <!-- below_cat -->
```
2022-10-23 23:09:42 +00:00
## Encode: doctype and xml declaration
Use the special xml names to add/modify proc instructions and directives.
Given a sample.yml file of:
```yaml
+p_xml: version="1.0"
+directive: 'DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" '
apple:
+p_coolioo: version="1.0"
+directive: 'CATYPE meow purr puss '
b: things
```
then
```bash
yq -o=xml '.' sample.yml
```
will output
```xml
<?xml version="1.0"?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >
< apple > <?coolioo version="1.0"?> <!CATYPE meow purr puss >
< b > things< / b >
< / apple >
```
2022-01-15 00:57:59 +00:00
## Round trip: with comments
A best effort is made, but comment positions and white space are not preserved perfectly.
Given a sample.xml file of:
```xml
<!-- before cat -->
< cat >
<!-- in cat before -->
< x > 3<!-- multi
line comment
for x -->< / x >
<!-- before y -->
< y >
<!-- in y before -->
< d > <!-- in d before --> z<!-- in d after --> < / d >
<!-- in y after -->
< / y >
<!-- in_cat_after -->
< / cat >
<!-- after cat -->
```
then
```bash
2023-09-01 01:52:58 +00:00
yq '.' sample.xml
2022-01-15 00:57:59 +00:00
```
will output
```xml
2023-03-01 23:57:54 +00:00
<!-- before cat -->
< cat > <!-- in cat before -->
2022-01-15 00:57:59 +00:00
< x > 3<!-- multi
line comment
for x -->< / x > <!-- before y -->
< y > <!-- in y before
in d before -->
< d > z<!-- in d after --> < / d > <!-- in y after -->
< / y > <!-- in_cat_after -->
< / cat > <!-- after cat -->
```
2021-12-21 05:19:27 +00:00
2022-10-23 23:09:42 +00:00
## Roundtrip: with doctype and declaration
yq parses XML proc instructions and directives into nodes.
Unfortunately the underlying XML parser loses whitespace information.
Given a sample.xml file of:
```xml
<?xml version="1.0"?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >
< apple >
<?coolioo version="1.0"?>
<!CATYPE meow purr puss >
< b > things< / b >
< / apple >
```
then
```bash
2023-09-01 01:52:58 +00:00
yq '.' sample.xml
2022-10-23 23:09:42 +00:00
```
will output
```xml
<?xml version="1.0"?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >
< apple > <?coolioo version="1.0"?> <!CATYPE meow purr puss >
< b > things< / b >
< / apple >
```