Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full Rust language grammar definition to docs #19353

Merged
merged 2 commits into from
Jan 20, 2015

Conversation

icorderi
Copy link
Contributor

Original issue that inspired this patch.

The reference.md has evolved past simple grammatical constructs, and it serves a different purpose.
The intent for the proposed grammar.md is to hold only the official reference for the language grammar. This document would keep track of grammatical changes to the language over time, facilitate discussions over proposed changes to the existing grammar, and serve as basis for building parsers by third-parties (IDE's, GitHub linguist, CodeMirror, etc.).

The current state of the PR contains all the grammars that were available in reference.md and nothing else.
There are still a lot of missing pieces that weren't available. The following are just a few of the definitions missing:

We need help from people familiar with those grammatical constructs to fill in the missing pieces.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.


### Functions

**FIXME:** grammar?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like this might work:

fn_decl: "pub"? "unsafe"? ("extern" string_lit?)? "fn" ident ('<' generics '>')? where_clause? '(' arg-list ')' ("->" (type | '!'))? '{' stmt* expr? '}';

(arg-list is defined later in the Closure types section.) Note that functions inside extern blocks have a slightly different grammar:

extern_fn_decl: "pub"? "fn" ident '(' extern_arg_list ')' ("->" (type | '!'))? ';';
extern_arg_list: ident ':' type | ident ':' type ',' "..." | ident ':' type ',' extern_arg_list;

And, furthermore, functions inside trait definitions can have their argument names and/or body omitted, cannot be declared pub, can have self parameters, and have the order of extern and unsafe switched around (#19398):

trait_fn_decl: ("extern" string_lit?)? "unsafe"? "fn" ident ('<' generics '>')? where_clause? '(' trait_arg_list ')' ("->" (type | '!'))? ('{' stmt* expr? '}' | ';');
trait_arg_list: (ident ':')? type | (`&` lifetime)? "mut"? "self" (',' trait_arg_list)? | (ident ':')? type ',' trait_arg_list;

Trait implementations are similar, but cannot have their body or parameter names omitted:

impl_fn_decl: ("extern" string_lit?)? "unsafe"? "fn" ident ('<' generics '>')? where_clause? '(' trait_arg_list ')' ("->" (type | '!'))? '{' stmt* expr? '}';
impl_arg_list: ident ':' type | (`&` lifetime)? "mut"? "self" (',' impl_arg_list)? | ident ':' type ',' impl_arg_list;

I think I’ve handled most things here, but I’m sure I’ve forgotten something. Functions are pretty complicated!

@ftxqxd
Copy link
Contributor

ftxqxd commented Nov 29, 2014

There are a few inconsistencies here:

  • Sometimes [ foo ] is used for grouping, sometimes ( foo ) is, and sometimes [ foo ] is used to denote foo being optional;
  • Some names use underscores, and some use hyphens;
  • Sometimes := is used to define rules, and sometimes : is.

@icorderi
Copy link
Contributor Author

@P1start I think styling issues might be just due to different people writing pieces of it. They are all valid on some grammar notation or another. The entire documents was copied from the reference.md.
For what is worth, we do need consistency on the rules and the documents starts with a grammar for the grammars, which appears to be a slight modification over EBNF syntax.

grammar : rule + ;
rule    : nonterminal ':' productionrule ';' ;
productionrule : production [ '|' production ] * ;
production : term * ;
term : element repeats ;
element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;

Where:

  • Whitespace in the grammar is ignored.
  • Square brackets are used to group rules.
  • LITERAL is a single printable ASCII character, or an escaped hexadecimal ASCII > code of the form \xQQ, in single quotes, denoting the corresponding Unicode codepoint U+00QQ.
  • IDENTIFIER is a nonempty string of ASCII letters and underscores.
  • The repeat forms apply to the adjacent element, and are as follows:
    • ? means zero or one repetition
    • * means zero or more repetitions
    • + means one or more repetitions
  • NUMBER trailing a repeat symbol gives a maximum repetition count
  • NUMBER on its own gives an exact repetition count
    This EBNF dialect should hopefully be familiar to many readers.

Based on that ( foo ) doesn't seems to be part of the grammar.

Personally I prefer defining the grammar with ::= or := for productions. It's an extra key stroke but it makes the rule easier to read, at least to me.

In any case it seems we have to:
0. (pick a grammar...)

  1. clean up the existing productions to make them consistent with the grammar we will be using to define them.
  2. Complete the missing sections

The questions now is whether we do this collaboration against my fork or we merge this and do it against rust-lang. I don't know how you guys generally handle WIP from multiple sources.

@nodakai
Copy link
Contributor

nodakai commented Dec 2, 2014

This is a great step towards a matured programming language! Let me see how I can contribute to it.

I made some remarks on string/char literals with newline characters at #19399 . Hope this helps.

@steveklabnik
Copy link
Member

I wonder at what point this is good to just land, and fill them in as we go, rather than let this sit here.

@shepmaster
Copy link
Member

Should the grammar sections in the Reference be removed, so they don't go (more) stale?

@icorderi
Copy link
Contributor Author

@steveklabnik agreed. We should not wait for this to be complete to merge.

@shepmaster, I'm not sure how much having the formal grammar on the reference helps the reader. If this document makes it in and becomes the goto place for the rust grammar then the reference should not have grammar definitions. We can simply leave links on the reference.md to the corresponding grammar on each section in case the reader wants to take a look.

@steveklabnik
Copy link
Member

Well, the idea is that the reference is a borderline-language spec, so having the grammar in it is useful. But until we have a full grammar, it doesn't make sense to have part of it there and the rest in this document. We should pull the grammar out of the reference and then link to this instead.

@Gankra
Copy link
Contributor

Gankra commented Jan 2, 2015

Triage bump

@emberian
Copy link
Member

emberian commented Jan 3, 2015

@emberian
Copy link
Member

emberian commented Jan 5, 2015

My basic feeling is that none of the old grammar is worth keeping and we should instead, if we want, distribute a grammar as a separate appendix.

@steveklabnik
Copy link
Member

I'm going to merge this as-is, and we can keep improving it, including the part about adding links from the reference. Thank you!

bors added a commit that referenced this pull request Jan 14, 2015
Add full Rust language grammar definition to docs

Reviewed-by: steveklabnik
bors added a commit that referenced this pull request Jan 14, 2015
Add full Rust language grammar definition to docs

Reviewed-by: steveklabnik
@nikomatsakis
Copy link
Contributor

I think if we're going to distribute an official grammar, we really ought to have an automated testing mechanism. Not to say r- on this patch, just that this should be a priority. I think it'd be very helpful, personally, even if it's not what the actual rustc uses (and even if we don't check for correspondence between rustc and the official grammar, which of course would also be good).

@steveklabnik
Copy link
Member

@nikomatsakis we had some of this in src/grammar, but yes, such a thing would be nice.

@steveklabnik
Copy link
Member

Oh also, this isn't currently linked from anywhere yet, so while it's kind of official, people may not actually know about it 😉 maybe that's a good critiera for when to add the link.

@emberian
Copy link
Member

I also really think that this particular partial grammar is worthless, but we can replace it later with https://github.com/bleibig/rust-grammar or https://github.com/ptgreen/rfront as the need arises.

bors added a commit that referenced this pull request Jan 15, 2015
Add full Rust language grammar definition to docs

Reviewed-by: steveklabnik
@steveklabnik steveklabnik mentioned this pull request Jan 16, 2015
bors added a commit that referenced this pull request Jan 17, 2015
bors added a commit that referenced this pull request Jan 18, 2015
Add full Rust language grammar definition to docs

Reviewed-by: steveklabnik
@steveklabnik
Copy link
Member

@bors: r+ ab24ffe

@bors
Copy link
Contributor

bors commented Jan 20, 2015

⌛ Testing commit ab24ffe with merge a0f86de...

bors added a commit that referenced this pull request Jan 20, 2015
Original [issue](#19278) that inspired this patch.

The [reference.md] has evolved past simple grammatical constructs, and it serves a different purpose. 
The intent for the proposed _grammar.md_ is to hold **only** the official reference for the language grammar. This document would keep track of grammatical changes to the language over time, facilitate discussions over proposed changes to the existing grammar, and serve as basis for building parsers by third-parties (IDE's, GitHub linguist, CodeMirror, etc.). 

The current state of the PR contains all the grammars that were available in [reference.md] and nothing else. 
There are still a lot of missing pieces that weren't available. The following are just a few of the definitions missing:
- [Functions](https://github.com/icorderi/rust/blob/docs/grammar/src/doc/grammar.md#functions)
- [Structures](https://github.com/icorderi/rust/blob/docs/grammar/src/doc/grammar.md#structures)
- [Traits](https://github.com/icorderi/rust/blob/docs/grammar/src/doc/grammar.md#traits)
- [Implementations](https://github.com/icorderi/rust/blob/docs/grammar/src/doc/grammar.md#implementations)
- [Operators](https://github.com/icorderi/rust/blob/docs/grammar/src/doc/grammar.md#unary-operator-expressions)
- [Statements](https://github.com/icorderi/rust/blob/docs/grammar/src/doc/grammar.md#statements)
- [Expressions](https://github.com/icorderi/rust/blob/docs/grammar/src/doc/grammar.md#expressions)

[reference.md]: https://github.com/rust-lang/rust/blob/master/src/doc/reference.md

We need help from people familiar with those grammatical constructs to fill in the missing pieces.
@bors bors merged commit ab24ffe into rust-lang:master Jan 20, 2015
@brson
Copy link
Contributor

brson commented Jan 27, 2015

Epic PR. Thanks for sticking with it.

@SimonSapin
Copy link
Contributor

Does this new document shows up somewhere on http://doc.rust-lang.org? I couldn’t find it.

@steveklabnik
Copy link
Member

@SimonSapin it does not yet, as it's not finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.