Skip to content

Commit

Permalink
add docs for entities
Browse files Browse the repository at this point in the history
  • Loading branch information
amitguptagwl committed Nov 30, 2021
1 parent c3fb07f commit ec640f6
Show file tree
Hide file tree
Showing 5 changed files with 147 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ In a HTML page
2. [XML Parser](./docs/v4/2.XMLparseOptions.md)
3. [XML Builder](./docs/v4/3.XMLBuilder.md)
4. [XML Validator](./docs/v4/4.XMLValidator.md)
5. [Entites](./docs/5.Entities.md)

## Performance

Expand Down
5 changes: 5 additions & 0 deletions docs/v4/2.XMLparseOptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,8 @@ Output
}
```

## htmlEntities
FXP by default parse XMl entities if `processEntities: true`. You can set `htmlEntities` to parse HTML entities. Check [entities](./5.Entities.md) section for more information.
## ignoreAttributes

By default `ignoreAttributes` is set to `true`. It means, attributes are ignored by the parser. If you set any configuration related to attributes without setting `ignoreAttributes: false`, it is useless.
Expand Down Expand Up @@ -502,6 +504,9 @@ const XMLdata = `
}
]
```

## processEntities
Set it to `true` (default) to process default and DOCTYPE entites. Check [Entities](./5.Entities.md) section for more detail. If you don't have entities in your XML document then it is recommanded to disable it `processEntities: false` for better performance.
## removeNSPrefix

Remove namespace string from tag and attribute names.
Expand Down
3 changes: 3 additions & 0 deletions docs/v4/3.XMLBuilder.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ Applicable only if `format:true` is set.
## preserveOrder
When you parse a XML using XMLParser with `preserveOrder: true`, the result JS object has different structure. So parse that structure in original XML, you should set the same option while building the XML from that js object.

## processEntities
Set it to `true` (default) to process XML entities. Check [Entities](./5.Entities.md) section for more detail. If you don't have entities in your XML document then it is recommanded to disable it `processEntities: false` for better performance.

## suppressEmptyNode
Tags with no text value would be parsed as empty tags.
Input
Expand Down
4 changes: 3 additions & 1 deletion docs/v4/4.XMLValidator.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,6 @@ const xmlData = `<parent><extra></parent>`;
const result = XMLValidator.validate( xmlData, {
unpairedTags: ["extra"]
});
```
```

[> Next: Entites](./5.Entities.md)
135 changes: 135 additions & 0 deletions docs/v4/5.Entities.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@

Entities are the variables that can be used in XML content to maintain consistency. Eg,

```xml
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE note [
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Donald Duck.">
<!ENTITY copyright "Copyright: W3Schools.">
]>

<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body attr="&writer;">Don't forget me this weekend!</body>
<footer>&writer;&nbsp;&copyright;</footer>
</note>
```

You can define your own entities using DOCTYPE. FXP by default supports following XML entities;

| Entity name | Character | Decimal reference | Hexadecimal reference |
| :---------- | :-------- | :---------------- | :-------------------- |
| quot | " | &#34; | &#x22; |
| amp | & | &#38; | &#x26; |
| apos | ' | &#39; | &#x27; |
| lt | < | &#60; | &#x3C; |
| gt | > | &#62; | &#x3E; |

However, since the entity processing can impact the parser's performance drastically, you can use `processEntities: false` to disable it.

XML Builder decodes default entities value. Eg
```js
const jsObj = {
"note": {
"@heading": "Reminder > \"Alert",
"body": {
"#text": " 3 < 4",
"attr": "Writer: Donald Duck."
},
}
};

const options = {
attributeNamePrefix: "@",
ignoreAttributes: false,
// processEntities: false
};
const builder = new XMLBuilder(options);
const output = builder.build(jsObj);
```
Output:
```xml
<note heading="Reminder &gt; &quot;Alert">
<body>
3 &lt; 4
<attr>Writer: Donald Duck.</attr>
</body>
</note>
```

## Side effects

Though FXP doesn't silently ignores entities with `&` in the values, following side efftcts are possible

```xml
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE note [
<!ENTITY nbsp "writer;">
<!ENTITY writer "Writer: Donald Duck.">
<!ENTITY copyright "Copyright: W3Schools.">
]>

<note>
<heading>Reminder</heading>
<body attr="&writer;">Don't forget me this weekend!</body>
<footer>&writer;&&nbsp;&copyright;</footer>
</note>
```

Output

```js
{
"note": {
"heading": "Reminder",
"body": {
"#text": "Don't forget me this weekend!",
"attr": "Writer: Donald Duck."
},
"footer": "Writer: Donald Duck.Writer: Donald Duck.Copyright: W3Schools."
}
}
```

To deal with such situation, use `&amp;` instead of `&` in XML document.

## Attacks

Following attacks are possible due to entity processing

* Denial-of-Service Attacks
* Classic XXE
* Advanced XXE
* Server-Side Requst Forgery (SSRF)
* XInclude
* XSLT

Since FXP doesn't allow entities with `&` in the values, above attacks should not work.

## HTML Entities

Following HTML entities are supported by the parser by default when `htmlEntities: true`.

| Result | Description | Entity Name | Entity Number |
| :----- | :--------------------------------- | :---------- | :------------ |
| | non-breaking space | &nbsp; | &#160; |
| < | less than | &lt; | &#60; |
| > | greater than | &gt; | &#62; |
| & | ampersand | &amp; | &#38; |
| " | double quotation mark | &quot; | &#34; |
| ' | single quotation mark (apostrophe) | &apos; | &#39; |
| ¢ | cent | &cent; | &#162; |
| £ | pound | &pound; | &#163; |
| ¥ | yen | &yen; | &#165; |
|| euro | &euro; | &#8364; |
| © | copyright | &copy; | &#169; |
| ® | registered trademark | &reg; | &#174; |
|| Indian Rupee | &inr; | &#8377; |
---

In future version of FXP, we'll be supporting more features of DOCTYPE such as `ELEMENT`, reading content for an entity from a file etc.

0 comments on commit ec640f6

Please sign in to comment.