add support for using unimplemented nodes for array assignment #89

tomasohara · 2023-06-15T09:09:17Z

It would be good for array assignments to flagged as unimplemented when the new proceedonerror flag is enabled. This way, a complete AST can still be generated.

Currently, array assignment leads to a parsing error:

$ snippet='num=2 arr=(1 2 3)'

$ python -c "import bashlex; print(''.join(p.dump() for p in bashlex.parse('$snippet', proceedonerror=0)))"
Traceback (most recent call last):
...
  File "/usr/local/misc/programs/python/bashlex/bashlex/parser.py", line 587, in p_error
    raise errors.ParsingError('unexpected token %r' % p.value,
bashlex.errors.ParsingError: unexpected token '(' (position 10)

It would be better to add an unimplemented node to the AST:

$ python -c "import bashlex; print(''.join(p.dump() for p in bashlex.parse('$snippet', proceedonerror=1)))"
CommandNode(pos=(0, 17), parts=[
  AssignmentNode(pos=(0, 5), word='num=2'),
  UnimplementedNode(pos=(6, 17), word='arr=(1 2 3)'),
])

This can be implemented as follows (see attachment for complete diff):

--- a/bashlex/flags.py
+++ b/bashlex/flags.py
@@ -52,4 +52,5 @@ word = enum.Enum('wordflags', [
+    'UNIMPLEMENTED', # word uses unimplemented feature (e.g., array)

--- a/bashlex/parser.py
+++ b/bashlex/parser.py
@@ -173,6 +173,8 @@ def p_simple_command_element(p):
+        if (p.slice[1].flags & flags.word.UNIMPLEMENTED):
+            p[0][0].kind = 'unimplemented'
@@ -720,6 +722,7 @@ class _parser(object):
+                                       proceedonerror=proceedonerror,

--- a/bashlex/tokenizer.py
+++ b/bashlex/tokenizer.py
@@ -199,7 +199,8 @@ eoftoken = token(tokentype.EOF, None)
-                 lastreadtoken=None, tokenbeforethat=None, twotokensago=None):
+                 lastreadtoken=None, tokenbeforethat=None, twotokensago=None,
+                 proceedonerror=None):
@@ -232,6 +233,7 @@ class tokenizer(object):
+        self._proceedonerror = proceedonerror
@@ -391,7 +393,7 @@ class tokenizer(object):
-        d['dollar_present'] = d['quoted'] = d['pass_next_character'] = d['compound_assignment'] = False
+        d['dollar_present'] = d['quoted'] = d['pass_next_character'] = d['compound_assignment'] = d['unimplemented'] = False
@@ -467,6 +469,19 @@ class tokenizer(object):
+        def handlecompoundassignment():
+            # note: only finds matching parenthesis, so parsing can proceed
+            handled = False
+            if self._proceedonerror:
+                ttok = self._parse_matched_pair(None, '(', ')')
+                if ttok:
+                    tokenword.append(c)
+                    tokenword.extend(ttok)            
+                    d['compound_assignment'] = True
+                    d['unimplemented'] = True
+                    handled = True
+            return handled
+
@@ -512,6 +527,8 @@ class tokenizer(object):
+                elif c == '(' and handlecompoundassignment():
+                    gotonext = True
@@ -573,7 +590,7 @@ class tokenizer(object):
-        if d['compound_assignment'] and tokenword[-1] == ')':
+        if d['compound_assignment'] and tokenword.value[-1] == ')':
@@ -581,6 +598,10 @@ class tokenizer(object):
+        if d['compound_assignment']:
+            tokenword.flags.add(wordflags.ASSIGNARRAY)
+        if d['unimplemented']:
+            tokenword.flags.add(wordflags.UNIMPLEMENTED)

unimplemented-array-node-diff.txt

I can work this into a pull request if desired. I wasn't quite sure of the best way to handle the flags, so suggestions would be welcome. For example, I was going to use parser flags, but they seemed more related to internal state than final attribute.

The text was updated successfully, but these errors were encountered:

idank · 2023-06-17T07:17:42Z

Would it not be a better use of your time to implement parsing of array assignment?

tomasohara · 2023-06-18T08:09:41Z

That's something I wouldn't feel comfortable doing. Adding unimplemented nodes is straightforward as the internals of the span are ignored. Moreover, the bar for success is lower so to speak, because the alternative is an exception.

My current testing over the bash test suite shows that the following results:

   Configuration     # Exceptions   Percent
   default           110            41%
   unimplemented     99             37%
   "" w/ arrays      78             29%

This uses the 268 .test and .sub scripts in the bash tests directory
for the bashlex version: 4.3.22(2).

I'm hoping to get the exceptions down to less than 10% because I'm using bashlex in a bash-to-python translator, although currently just for sanity checks.

idank · 2023-06-18T08:19:20Z

Fair enough.

My concern is deviation from the bash C implementation. Adding custom code to bashlex would make adding the missing functionality from bash harder.

But I'd like to unblock you, so please send a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for using unimplemented nodes for array assignment #89

add support for using unimplemented nodes for array assignment #89

tomasohara commented Jun 15, 2023

idank commented Jun 17, 2023

tomasohara commented Jun 18, 2023

idank commented Jun 18, 2023

add support for using unimplemented nodes for array assignment #89

add support for using unimplemented nodes for array assignment #89

Comments

tomasohara commented Jun 15, 2023

idank commented Jun 17, 2023

tomasohara commented Jun 18, 2023

idank commented Jun 18, 2023