-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ChunkResolver to partially resolve expressions #525
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very specific to HubSpot. Should it live in the public project?
I'm also wondering why we need another parser when jinjava already has a parser that we could configure.
); | ||
|
||
private static final Set<String> RESERVED_KEYWORDS = ImmutableSet.of( | ||
"and", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you reference constants from the tags themselves here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a list of reserved keywords from here https://svn.python.org/projects/external/Jinja-1.1/docs/build/designerdoc.html#reserved-keywords though I think it makes sense to not hardcode as many of these as possible. Just don't know where to get a more complete list, but I'll use the constants from tags to remove some of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this some more, this doesn't need to include any of the tag names as we only need to look for words that appear within expressions like in
and or
, etc.
throws JsonProcessingException { | ||
return OBJECT_MAPPER | ||
.writeValueAsString(val) | ||
.replaceAll("(?<!\\\\)(?:\\\\\\\\)*(\\\\n)", "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do these do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These swap out a couple of characters from the JSON string. Double quotes are replaced with single quotes, and an escaped n
is replaced with the newline character, and an escaped "
is replaced with a double-quote character.
The n
and "
replacements are because if they don't get replaced, when the JSON string is output, it would have literal \n
s and \"
s in it where we actually would want newlines and "
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, adding a comment here would be helpful for non-obvious code like this.
@boulter It would be used alongside DeferredValue functionality so it can be useful outside of HubSpot. I'm not sure right now how we could handle and flag the deferred values while evaluating AstNodes, but it could probably work. Going with that approach would be a modification of the low-level parsing rather than decorating the parsing and adding functionality on top of it. Modifying the Ast parsing would make the ChunkResolver faster as it would only need to process each piece of an expression a single time, but I'm not too concerned about speed as eager execution wouldn't ever be happening live. With this project, I am keeping the eager execution functionality as decoupled from the rest of Jinjava as possible. I will think about modifying the Edit: The ChunkResolver is similar to the HelperStringTokenizer, but it allows for expressions with deferred values to be partially evaluated. I think that modifying the Parser would be more complicated as it would require all types of AstNode to have a way to handle DeferredValues (currently if a deferred value is hit when evaluating them, an exception is thrown and the evaluation is cancelled. This is what the ChunkResolver decorates, but it resolves/creates ASTs from substrings of the expression rather than creating an AST of the whole expression. |
@@ -526,4 +527,12 @@ public void addDependencies(SetMultimap<String, String> dependencies) { | |||
public SetMultimap<String, String> getDependencies() { | |||
return this.dependencies; | |||
} | |||
|
|||
public boolean isHideInterpreterErrors() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even though this is a boolean, it would probably be more readable with a get
instead of is
Part of #532
This PR introduces what I have called the
ChunkResolver
as a part of my Eager Execution project (draft PR for context).Purpose
The purpose of the ChunkResolver is to evaluate as much of an expression as possible, generally when there is something deferred within that expression. This is to minimize the number of values, macros, etc. that actually need to be deferred for a second rendering pass. The ChunkResolver will only be used when pre-rendering and will not be enabled by default. It will only be used when a future
eagerExecutionEnabled
flag is set to true.When a value is deferred, or something cannot be resolved because it directly depends on a deferred value (e.g. as
range(0, deferred)
), all the non-reserved words get added to the ChunkResolver'sdeferredWords
. In that former example,getDeferredWords()
would then return a list:["range", "deferred"]
. Clearly, for a second pass this expression would be resolvable once thedeferred
value gets resolved because therange
function would likely exist in the second-pass-jinjava instance, but if it were a function such ashubdb_table_rows
, that may prevent running a second pass if the second jinjava instance can't evaluate that function.Say, however, we're resolving an expression like:
and
contact
is deferred (table_id
,row_id
are known).Without the ChunkResolver, the expression within the if tag would be simply reconstructed in the output, and a dependency on
hubdb_table_row
would remain (but also there wouldn't be any indication that it does remain). Output:Using the ChunkResolver, the expression could be resolved in chunks. The
table_id
androw_id
get resolved to their values and thenhubdb_table_row(1234, 56789).company
can get resolved to'HubSpot'
, for example.contact.company
is then deferred, and the output is:And
chunkResolver.getDeferredWords()
is:["contact.company"]
.In the case that a value gets resolved as something like a Map, then it gets output in JSON, which can be read in by a Jinjava interpreter on the second pass. (Dates need their own serialization override so that they are formatted correctly and also don't use the thread-unsafe
SimpleDateFormat
that's built into Jackson.)For example, if
foobar
is a map wherefoobar.foo='FooSpot'
,foobar.bar='HubBar'
then the ChunkResolver would resolvedeferred == foobar
asdeferred == {'foo':'FooSpot','bar':'HubBar'}
Logic
The ChunkResolver gets its name by evaluating expressions in "chunks". It also uses "mini-chunks" and "tokens".
,
Tokens are words that are continuous strings made of letters, digits, underscores, and periods. (
foo
,foo_bar
,baz99.zap
, etc).Mini-chunks are multiple tokens concatenated by their splitters (
|
,:
, etc.). Mini-chunks are broken up by commas. Some examples of mini-chunks are (0
,19 || deferred
,1 + 2 + 3
,'foo' + 'bar'
)Chunks are grouped by
()
,[]
, or{}
so inrange(0, deferred + 1)
,0, deferred + 1
is a chunk.Chunks are resolved recursively because there can be nested chunks with many layers of brackets/parentheses. Tokens are either retraced to variables or resolved as expressions, and chunks/mini-chunks are resolved as expressions. If they cannot be resolved, then they get deferred.
Note that because the ChunkResolver resolves expressions in small pieces, interpreter errors are often encountered as, for example
=
by itself is not a valid expression. A flag is added to the context when the chunk resolver is resolving the expression to prevent errors from being unnecessarily added to the interpreter.Feel free to make any suggestions for this class or new test cases to add.