Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: initial KMP XML reader implementation #601

Merged
merged 2 commits into from
Mar 14, 2022

Conversation

ianbotsf
Copy link
Contributor

@ianbotsf ianbotsf commented Mar 11, 2022

Issue #

Addresses aws-sdk-kotlin#538. Closes #519.

Description of changes

This is the initial implementation of a KMP XML reader conforming to the XmlStreamReader interface. There are a few TODOs remaining and no performance tests have been completed. XML writing is also not yet updated.

The new XML reader consists of these classes:

  • StringTextStream: a character stream used for peeking and reading data from an underlying stream (usually decoded from bytes)
  • XmlLexer: produces XML tokens from a StringTextStream
  • LexingXmlStreamReader: the top-level implementation of XmlStreamReader which layers support for peeking, subtrees, etc., on top of an XmlLexer

The new whitespace handling allows a few disabled protocol tests to succeed, see companion PR aws-sdk-kotlin#548.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ianbotsf ianbotsf requested a review from a team as a code owner March 11, 2022 22:18
@ianbotsf ianbotsf requested a review from aajtodd March 11, 2022 22:19
Copy link
Contributor

@aajtodd aajtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very clean!

* A stream of text characters that can be processed sequentially. This stream maintains a current position (i.e.,
* offset in the string) from which all reading operations begin. The stream is advanced by `read` operations. The
* stream is **not** advanced by `peek` operations.
* @param bytes The source bytes for this stream (which will be decoded to a string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated comment

/**
* A stateful scanner that scans a [StringTextStream] and reads [XmlToken] elements.
*/
class XmlScanner(internal val source: StringTextStream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

This seems like it is should be the one named XmlLexer since it doing the actual lexical analysis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, originally this was all inside what's now called XmlLexer. Will rename this class to XmlLexer and rename the other class to LexingXmlStreamReader (since it's primarily an adapter for the XmlStreamReader interface and uses a lexer as its primary delegate).

* @param length The number of characters to read.
* @param errMessage A provider of an error message to include in the exception.
*/
fun readOrThrow(errMessage: () -> String): Char =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is all internal we could probably inline all of this to avoid function call per char. I'd wait for the benchmarks of course but something to look into

val start = source.readOrThrow { "Unexpected end-of-doc while trying to read name" }
if (!start.isValidForNameStart()) error("Invalid start character for name '$start'")
val subsequent = source.readWhile { it.isValidForName() }
return "$start$subsequent".qualify()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't this just do

val start = source.peekOrThrow { "Unexepcted end-of-doc while trying to read name" }
if (!start.isValidForNameStart()) error(...)

val name = source.readWhile { it.isValidForName() }
return name.qualify()

@@ -14,7 +14,7 @@ import org.xmlpull.v1.XmlPullParserFactory
import java.io.ByteArrayInputStream
import java.nio.charset.Charset

actual fun xmlStreamReader(payload: ByteArray): XmlStreamReader =
actual fun xmlPull(payload: ByteArray): XmlStreamReader =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this not be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, I'm leaving temporarily it so I can include it in my benchmarking. It'll be removed in a future PR that cleans up all of XmlPull.

/**
* Describes an internal state of a [XmlScanner].
*/
sealed class ScannerState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal?

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 8 Code Smells

No Coverage information No Coverage information
3.5% 3.5% Duplication

@ianbotsf ianbotsf merged commit 476a9de into feat-kmp-xml Mar 14, 2022
@ianbotsf ianbotsf deleted the initial-kmp-xml-reader branch March 14, 2022 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants