Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Placeholder and replace_grammar_node in favor of pointer-like container object #1007

Merged
merged 8 commits into from
Sep 6, 2024

Conversation

hudson-ai
Copy link
Collaborator

Essentially reopening #638 with some major simplifications allowed by the rust re-write (the serialized grammar now directly supports "references").

Reopening because:
#995 still takes far too long to process the large JSON schema. Line-profiling revealed that ~80% of the time spent constructing the GrammarFunction is due to replace_grammar_node. In particular, this function has to traverse the entire grammar tree, and we potentially have to call it many times if there is a lot of mutual recursion.

Simply adding a node type that acts as a container (which we can fill after we decide what its contents should be) side-steps this problem entirely.

@hudson-ai
Copy link
Collaborator Author

ping @mmoskal in case you have a better idea than serializing this as a singleton Join.

Side note: I suppose we don't even need a special type for this, as we could just use an empty Join and append a value. Anyone have any preferences?

@@ -243,6 +243,28 @@ def match_byte(self, byte):
def max_tokens(self):
return 1000000000000

class Box(Terminal):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably wants a different name. Could re-use the old Placeholder name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd only use Placeholder for something which is actually going to be replaced (and not via that fun Python thing of resetting self.__class__ which Guidance used to do). I'd say Box is a bit nondescript, though. DeferredGrammar perhaps? Or DeferredReference


@value.setter
def value(self, value):
if self._resolved:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure we know we're shooting ourselves in the foot if we try to set the same Box's value more than once. Not strictly necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Not strictly necessary" only applies to now, when you're writing this. In six months' time when you're hunting some obscure bug, this could make it much less obscure :-)

if self._resolved:
return self._value
else:
raise ValueError("Box is not yet resolved")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: better error messages here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, but regular guidance users shouldn't be going here

@@ -1133,6 +1125,14 @@ def process(self, node: GrammarFunction):
"literal": "",
}
}
elif isinstance(node, Box):
if node.value is None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually unnecessary as accessing .value will raise an exception if it's not set...

@mmoskal
Copy link
Collaborator

mmoskal commented Sep 3, 2024

ping @mmoskal in case you have a better idea than serializing this as a singleton Join.

Singleton Join is good. They are removed in Rust grammar optimization stage.

@codecov-commenter
Copy link

codecov-commenter commented Sep 3, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 89.28571% with 3 lines in your changes missing coverage. Please review.

Project coverage is 61.25%. Comparing base (5eaf240) to head (440d8b2).

Files with missing lines Patch % Lines
guidance/_grammar.py 85.71% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1007      +/-   ##
==========================================
- Coverage   70.21%   61.25%   -8.97%     
==========================================
  Files          62       62              
  Lines        4422     4426       +4     
==========================================
- Hits         3105     2711     -394     
- Misses       1317     1715     +398     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not certain on the name, but this isn't in the public API yet.

if self._resolved:
return self._value
else:
raise ValueError("Box is not yet resolved")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, but regular guidance users shouldn't be going here

@@ -243,6 +243,28 @@ def match_byte(self, byte):
def max_tokens(self):
return 1000000000000

class Box(Terminal):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd only use Placeholder for something which is actually going to be replaced (and not via that fun Python thing of resetting self.__class__ which Guidance used to do). I'd say Box is a bit nondescript, though. DeferredGrammar perhaps? Or DeferredReference

def __init__(self):
super().__init__(temperature=-1, capture_name=None)
self._resolved = False
self._value = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing _value is a GrammarFunction ?


@value.setter
def value(self, value):
if self._resolved:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Not strictly necessary" only applies to now, when you're writing this. In six months' time when you're hunting some obscure bug, this could make it much less obscure :-)

@hudson-ai hudson-ai merged commit 2b252fc into guidance-ai:main Sep 6, 2024
100 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants