Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Pattern-Based Code Fixing System Using Parser Combinator #111

Open
notJoon opened this issue Jan 17, 2025 · 1 comment
Open

RFC: Pattern-Based Code Fixing System Using Parser Combinator #111

notJoon opened this issue Jan 17, 2025 · 1 comment
Assignees
Labels
RFC RFC document T-fixer Type: Auto fix

Comments

@notJoon
Copy link
Contributor

notJoon commented Jan 17, 2025

Description

The current system modifies code based on the line numbers and offsets. While this approach is simple, but the limitation is definitely bold.

  1. Simple test substitution without considering syntatic context.
  2. Difficulty in complex case
  3. Limitations in multi-line pattern matching
  4. Difficulty in defining reusable rules

We aim to address these limitations by introducing a Comby-style pattern matching system, by using parser combinator.

Implementation Strategy

Phase 1: Basic Pattern Engine

  1. Implement pattern parsing engine
  2. Support basic metavariables
  3. Simple substitution functionality

Phase 2: Advanced Features

  1. gno-specific context support
  2. conditional matching
  3. nested pattern support

Each of these tasks should be migrated incrementally as they proceeds.

Design

This design is a conceptual sketch, so it can be change at any time.

1. Pattern Matching Engine

type Pattern struct {
    Match    string          // Matching pattern including metavariables in :[[name]] format
    Rewrite  string          // Transformation pattern
    Where    PatternContext  // Additional constraints
}

type PatternContext struct {
    Language    string                 // Language-specific rule application
    Predicates  []func(Match) bool     // Match validation functions
}

type Match struct {
    Variables map[string]string  // Metavariable capture results
    Range     Range             // Range of matched code
}

2. New Fixer Interface

type Fixer struct {
    patterns []Pattern
    config   FixerConfig
}

type FixerConfig struct {
    MinConfidence float64
    DryRun       bool
    Language     string
}

func (f *Fixer) AddPattern(pattern Pattern) error
func (f *Fixer) Fix(filename string, issues []tt.Issue) error

3. Pattern Definition Format

Need to allow users to add custom patterns via the config file.

patterns:
  - name: "remove-redundant-if"
    match: "if :[[cond]] { return :[[value]] } else { return !:[[value]] }"
    rewrite: "return :[[cond]]"
    where:
      language: "go"
      
  - name: "deprecated-function"
    match: ":[[old]](:[[args]])"
    rewrite: ":[[new]](:[[args]])"
    where:
      predicates:
        - IsDeprecated

Example Usage

fixer := New(FixerConfig{
    MinConfidence: 0.8,
    Language: "go",
})

fixer.AddPattern(Pattern{
    Match: "if :[[cond]] { return true } else { return false }",
    Rewrite: "return :[[cond]]",
})

err := fixer.Fix("example.go", issues)

Metavariable Syntax

Defines metavariable syntax for Comby-style pattern matching

1. Basic Metavariables

:[[identifier]]      # Basic metavariable, matches single token/expression
:[identifier]        # Short syntax

2. Special Metavariables

# Whitespace handling
:[[spacing]]         # Match whitespace characters (space, tab, newline)
:[s]                # Short syntax
# Nested structure matching
:[[block]]          # Match nested blocks (e.g., {...}, (...))
:[b]               # Short syntax
# Repetition matching
:[[...]]*          # Match 0 or more times
:[[...]]+          # Match 1 or more times
:[[...]]?          # Match 0 or 1 time

3. Constraints

where:
  # Metavariable value validation
  :[[expr]] != "nil"
  :[[type]] matches "^int|float"
  
  # Structural constraints
  :[[block]] contains "return"
  :[[stmt]] ends_with ";"

4. Example

patterns:
  # Remove unnecessary if-else
  - match: |
      if :[[cond]] {
        return :[[value]]
      } else {
        :[[rest:block]]
      }
    rewrite: |
      if :[[cond]] {
        return :[[value]]
      }
      :[[rest]]
  # Simplify nil check
  - match: |
      if :[[x]] != nil {
        :[[body:block]]
      }
    rewrite: |
      if :[[x]] {
        :[[body]]
      }
    where:
      :[[x]] matches "^[A-Za-z]"
@notJoon notJoon added RFC RFC document T-fixer Type: Auto fix labels Jan 17, 2025
@notJoon notJoon self-assigned this Jan 17, 2025
notJoon added a commit that referenced this issue Jan 17, 2025
# Description

Inital feature for auto-fixer v2.

Implements a lexer and parser for processing Comby-style metavarible
expressions (e.g., `:[var]` and `:[[function]]`). This package serves as
the first parsing phase before main syntax parsing.

Key features:
- Lexer that tokenizes metavariable patterns and surrounding text
- Parser that generates AST with pattern, hole, text, and block nodes
- Support for both short (`:[name]`) and long (`:[[name]]`) metavariable
hole-expression forms
- Proper handling of nested block structures and whitespace

This feature will replace the exsting AST-based auto fix functionality.

## Related Issue

#111
@notJoon
Copy link
Contributor Author

notJoon commented Jan 17, 2025

TODO: Update metavariable parser to use state machine and SIMD after support all expressions.

Resolved: #113

notJoon added a commit that referenced this issue Jan 18, 2025
# Description

A hole pattern is a special syntax used in pattern matching that acts as
a placeholder or "wildcard" in a pattern. Think of it as a variable that
can match and capture different parts of code while searching through
source code.

```go
// Pattern with holes
if :[[condition]] {
    :[[body]]
}

// Can match code like:
if x > 0 {
    return true
}

// Or:
if isValid(user) {
    doSomething()
}
```

## Key Changes

1. Added support for typed hole patterns with format `:[[name:type]]`
2. Implemented quantifier support for hole patterns (`*`, `+`, `?`)
3. Enhanced lexer to properly parse and tokenize new pattern syntax
4. Added hole configuration system to manage pattern types and
quantifiers
5. Updated parser to handle the new hole types and configurations
6. Added comprehensive test coverage for new features

## New Pattern Syntax

| Syntax | Description | Example | Notes |
|--------|-------------|---------|-------|
| `:[name]` | Basic hole pattern | `:[var]` | Matches any content |
| `:[[name]]` | Long-form hole pattern | `:[[expr]]` | Same as basic,
but with double brackets |
| `:[[name:identifier]]` | Identifier-typed hole | `:[[var:identifier]]`
| Matches only valid identifiers |
| `:[[name:block]]` | Block-typed hole | `:[[body:block]]` | Matches
code blocks |
| `:[[name:whitespace]]` | Whitespace-typed hole | `:[[ws:whitespace]]`
| Matches whitespace |
| `:[[name:expression]]` | Expression-typed hole |
`:[[expr:expression]]` | Matches expressions |

### Quantifiers

| Quantifier | Description | Example |
|------------|-------------|---------|
| `*` | Zero or more | `:[[stmt:block]]*` |
| `+` | One or more | `:[[expr:expression]]+` |
| `?` | Zero or one | `:[[ws:whitespace]]?` |

## Example Usage

```go
// Before
if :[condition] { :[body] }

// After - with types and quantifiers
if :[[cond:expression]] {
    :[[stmt:block]]*
}
```

## Next Steps
Future improvements could include:

- Implementing actual pattern matching logic for each hole type
- Adding pattern validation based on types
- Enhancing error reporting for invalid patterns
- Adding more specialized hole types for specific use cases

## Related Issue

#111
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC RFC document T-fixer Type: Auto fix
Projects
None yet
Development

No branches or pull requests

1 participant