mirror of
https://github.com/opencloud-eu/opencloud.git
synced 2026-01-24 13:58:12 -05:00
Bumps [github.com/mna/pigeon](https://github.com/mna/pigeon) from 1.1.0 to 1.2.1. - [Release notes](https://github.com/mna/pigeon/releases) - [Changelog](https://github.com/mna/pigeon/blob/master/.goreleaser.yml) - [Commits](https://github.com/mna/pigeon/compare/v1.1.0...v1.2.1) --- updated-dependencies: - dependency-name: github.com/mna/pigeon dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
619 lines
26 KiB
Go
619 lines
26 KiB
Go
/*
|
|
Command pigeon generates parsers in Go from a PEG grammar.
|
|
|
|
From Wikipedia [0]:
|
|
|
|
A parsing expression grammar is a type of analytic formal grammar, i.e.
|
|
it describes a formal language in terms of a set of rules for recognizing
|
|
strings in the language.
|
|
|
|
Its features and syntax are inspired by the PEG.js project [1], while
|
|
the implementation is loosely based on [2]. Formal presentation of the
|
|
PEG theory by Bryan Ford is also an important reference [3]. An introductory
|
|
blog post can be found at [4].
|
|
|
|
[0]: http://en.wikipedia.org/wiki/Parsing_expression_grammar
|
|
[1]: http://pegjs.org/
|
|
[2]: http://www.codeproject.com/Articles/29713/Parsing-Expression-Grammar-Support-for-C-Part
|
|
[3]: http://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf
|
|
[4]: http://0value.com/A-PEG-parser-generator-for-Go
|
|
|
|
Command-line usage
|
|
|
|
The pigeon tool must be called with PEG input as defined
|
|
by the accepted PEG syntax below. The grammar may be provided by a
|
|
file or read from stdin. The generated parser is written to stdout
|
|
by default.
|
|
|
|
pigeon [options] [GRAMMAR_FILE]
|
|
|
|
The following options can be specified:
|
|
|
|
-cache : cache parser results to avoid exponential parsing time in
|
|
pathological cases. Can make the parsing slower for typical
|
|
cases and uses more memory (default: false).
|
|
|
|
-debug : boolean, print debugging info to stdout (default: false).
|
|
|
|
-nolint: add '// nolint: ...' comments for generated parser to suppress
|
|
warnings by gometalinter (https://github.com/alecthomas/gometalinter) or
|
|
golangci-lint (https://golangci-lint.run/).
|
|
|
|
-no-recover : boolean, if set, do not recover from a panic. Useful
|
|
to access the panic stack when debugging, otherwise the panic
|
|
is converted to an error (default: false).
|
|
|
|
-o=FILE : string, output file where the generated parser will be
|
|
written (default: stdout).
|
|
|
|
-optimize-basic-latin : boolean, if set, a lookup table for the first 128
|
|
characters of the Unicode table (Basic Latin) is generated for each character
|
|
class matcher. This speeds up the parsing, if parsed data mainly consists
|
|
of characters from this range (default: false).
|
|
|
|
-optimize-grammar : boolean, (EXPERIMENTAL FEATURE) if set, several performance
|
|
optimizations on the grammar are performed, with focus to the reduction of the
|
|
grammar depth.
|
|
Optimization:
|
|
* removal of unreferenced rules
|
|
* replace rule references with a copy of the referenced Rule, if the
|
|
referenced rule it self has no references.
|
|
* resolve nested choice expressions
|
|
* resolve choice expressions with only one alternative
|
|
* resolve nested sequences expression
|
|
* resolve sequence expressions with only one element
|
|
* combine character class matcher and literal matcher, where possible
|
|
The resulting grammar is usually more memory consuming, but faster for parsing.
|
|
The optimization of the grammar is done in multiple rounds (optimize until no
|
|
more optimizations have applied). This process takes some time, depending on the
|
|
optimization potential of the grammar.
|
|
|
|
-optimize-parser : boolean, if set, the options Debug, Memoize and Statistics are
|
|
removed from the resulting parser. The global "state" is optimized as well by
|
|
either removing all related code if no state change expression is present in the
|
|
grammar or by removing the restoration of the global "state" store after action
|
|
and predicate code blocks. This saves a few cpu cycles, when using the generated
|
|
parser (default: false).
|
|
|
|
-x : boolean, if set, do not build the parser, just parse the input grammar
|
|
(default: false).
|
|
|
|
-receiver-name=NAME : string, name of the receiver variable for the generated
|
|
code blocks. Non-initializer code blocks in the grammar end up as methods on the
|
|
*current type, and this option sets the name of the receiver (default: c).
|
|
|
|
-alternate-entrypoints=RULE[,RULE...] : string, comma-separated list of rule names
|
|
that may be used as alternate entrypoints for the parser, in addition to the
|
|
default entrypoint (the first rule in the grammar) (default: none).
|
|
Such entrypoints can be specified in the call to Parse by passing an
|
|
Entrypoint option that specifies the alternate rule name to use. This is only
|
|
necessary if the -optimize-parser flag is set, as some rules may be optimized
|
|
out of the resulting parser.
|
|
|
|
-support-left-recursion : boolean, (EXPERIMENTAL FEATURE) if set, add support
|
|
for left recursion rules, including those with indirect recursion
|
|
(default: false).
|
|
E.g.:
|
|
expr = expr '*' term / expr '+' term
|
|
|
|
If the code blocks in the grammar (see below, section "Code block") are golint-
|
|
and go vet-compliant, then the resulting generated code will also be golint-
|
|
and go vet-compliant.
|
|
|
|
The generated code doesn't use any third-party dependency unless code blocks
|
|
in the grammar require such a dependency.
|
|
|
|
PEG syntax
|
|
|
|
The accepted syntax for the grammar is formally defined in the
|
|
grammar/pigeon.peg file, using the PEG syntax. What follows is an informal
|
|
description of this syntax.
|
|
|
|
Identifiers, whitespace, comments and literals follow the same
|
|
notation as the Go language, as defined in the language specification
|
|
(http://golang.org/ref/spec#Source_code_representation):
|
|
|
|
// single line comment*/
|
|
// /* multi-line comment */
|
|
/* 'x' (single quotes for single char literal)
|
|
"double quotes for string literal"
|
|
`backtick quotes for raw string literal`
|
|
RuleName (a valid identifier)
|
|
|
|
The grammar must be Unicode text encoded in UTF-8. New lines are identified
|
|
by the \n character (U+000A). Space (U+0020), horizontal tabs (U+0009) and
|
|
carriage returns (U+000D) are considered whitespace and are ignored except
|
|
to separate tokens.
|
|
|
|
Rules
|
|
|
|
A PEG grammar consists of a set of rules. A rule is an identifier followed
|
|
by a rule definition operator and an expression. An optional display name -
|
|
a string literal used in error messages instead of the rule identifier - can
|
|
be specified after the rule identifier. E.g.:
|
|
RuleA "friendly name" = 'a'+ // RuleA is one or more lowercase 'a's
|
|
|
|
The rule definition operator can be any one of those:
|
|
=, <-, ← (U+2190), ⟵ (U+27F5)
|
|
|
|
Expressions
|
|
|
|
A rule is defined by an expression. The following sections describe the
|
|
various expression types. Expressions can be grouped by using parentheses,
|
|
and a rule can be referenced by its identifier in place of an expression.
|
|
|
|
Choice expression
|
|
|
|
The choice expression is a list of expressions that will be tested in the
|
|
order they are defined. The first one that matches will be used. Expressions
|
|
are separated by the forward slash character "/". E.g.:
|
|
ChoiceExpr = A / B / C // A, B and C should be rules declared in the grammar
|
|
|
|
Because the first match is used, it is important to think about the order
|
|
of expressions. For example, in this rule, "<=" would never be used because
|
|
the "<" expression comes first:
|
|
BadChoiceExpr = "<" / "<="
|
|
|
|
Sequence expression
|
|
|
|
The sequence expression is a list of expressions that must all match in
|
|
that same order for the sequence expression to be considered a match.
|
|
Expressions are separated by whitespace. E.g.:
|
|
SeqExpr = "A" "b" "c" // matches "Abc", but not "Acb"
|
|
|
|
Labeled expression
|
|
|
|
A labeled expression consists of an identifier followed by a colon ":"
|
|
and an expression. A labeled expression introduces a variable named with
|
|
the label that can be referenced in the code blocks in the same scope.
|
|
The variable will have the value of the expression that follows the colon.
|
|
E.g.:
|
|
LabeledExpr = value:[a-z]+ {
|
|
fmt.Println(value)
|
|
return value, nil
|
|
}
|
|
|
|
The variable is typed as an empty interface, and the underlying type depends
|
|
on the following:
|
|
|
|
For terminals (character and string literals, character classes and
|
|
the any matcher), the value is []byte. E.g.:
|
|
Rule = label:'a' { // label is []byte }
|
|
|
|
For predicates (& and !), the value is always nil. E.g.:
|
|
Rule = label:&'a' { // label is nil }
|
|
|
|
For a sequence, the value is a slice of empty interfaces, one for each
|
|
expression value in the sequence. The underlying types of each value
|
|
in the slice follow the same rules described here, recursively. E.g.:
|
|
Rule = label:('a' 'b') { // label is []any }
|
|
|
|
For a repetition (+ and *), the value is a slice of empty interfaces, one for
|
|
each repetition. The underlying types of each value in the slice follow
|
|
the same rules described here, recursively. E.g.:
|
|
Rule = label:[a-z]+ { // label is []any }
|
|
|
|
For a choice expression, the value is that of the matching choice. E.g.:
|
|
Rule = label:('a' / 'b') { // label is []byte }
|
|
|
|
For the optional expression (?), the value is nil or the value of the
|
|
expression. E.g.:
|
|
Rule = label:'a'? { // label is nil or []byte }
|
|
|
|
Of course, the type of the value can be anything once an action code block
|
|
is used. E.g.:
|
|
RuleA = label:'3' {
|
|
return 3, nil
|
|
}
|
|
RuleB = label:RuleA { // label is int }
|
|
|
|
And and not expressions
|
|
|
|
An expression prefixed with the ampersand "&" is the "and" predicate
|
|
expression: it is considered a match if the following expression is a match,
|
|
but it does not consume any input.
|
|
|
|
An expression prefixed with the exclamation point "!" is the "not" predicate
|
|
expression: it is considered a match if the following expression is not
|
|
a match, but it does not consume any input. E.g.:
|
|
AndExpr = "A" &"B" // matches "A" if followed by a "B" (does not consume "B")
|
|
NotExpr = "A" !"B" // matches "A" if not followed by a "B" (does not consume "B")
|
|
|
|
The expression following the & and ! operators can be a code block. In that
|
|
case, the code block must return a bool and an error. The operator's semantic
|
|
is the same, & is a match if the code block returns true, ! is a match if the
|
|
code block returns false. The code block has access to any labeled value
|
|
defined in its scope. E.g.:
|
|
CodeAndExpr = value:[a-z] &{
|
|
// can access the value local variable...
|
|
return true, nil
|
|
}
|
|
|
|
Repeating expressions
|
|
|
|
An expression followed by "*", "?" or "+" is a match if the expression
|
|
occurs zero or more times ("*"), zero or one time "?" or one or more times
|
|
("+") respectively. The match is greedy, it will match as many times as
|
|
possible. E.g.
|
|
ZeroOrMoreAs = "A"*
|
|
|
|
Literal matcher
|
|
|
|
A literal matcher tries to match the input against a single character or a
|
|
string literal. The literal may be a single-quoted single character, a
|
|
double-quoted string or a backtick-quoted raw string. The same rules as in Go
|
|
apply regarding the allowed characters and escapes.
|
|
|
|
The literal may be followed by a lowercase "i" (outside the ending quote)
|
|
to indicate that the match is case-insensitive. E.g.:
|
|
LiteralMatch = "Awesome\n"i // matches "awesome" followed by a newline
|
|
|
|
Character class matcher
|
|
|
|
A character class matcher tries to match the input against a class of characters
|
|
inside square brackets "[...]". Inside the brackets, characters represent
|
|
themselves and the same escapes as in string literals are available, except
|
|
that the single- and double-quote escape is not valid, instead the closing
|
|
square bracket "]" must be escaped to be used.
|
|
|
|
Character ranges can be specified using the "[a-z]" notation. Unicode
|
|
classes can be specified using the "[\pL]" notation, where L is a
|
|
single-letter Unicode class of characters, or using the "[\p{Class}]"
|
|
notation where Class is a valid Unicode class (e.g. "Latin").
|
|
|
|
As for string literals, a lowercase "i" may follow the matcher (outside
|
|
the ending square bracket) to indicate that the match is case-insensitive.
|
|
A "^" as first character inside the square brackets indicates that the match
|
|
is inverted (it is a match if the input does not match the character class
|
|
matcher). E.g.:
|
|
NotAZ = [^a-z]i
|
|
|
|
Any matcher
|
|
|
|
The any matcher is represented by the dot ".". It matches any character
|
|
except the end of file, thus the "!." expression is used to indicate "match
|
|
the end of file". E.g.:
|
|
AnyChar = . // match a single character
|
|
EOF = !.
|
|
|
|
Code block
|
|
|
|
Code blocks can be added to generate custom Go code. There are three kinds
|
|
of code blocks: the initializer, the action and the predicate. All code blocks
|
|
appear inside curly braces "{...}".
|
|
|
|
The initializer must appear first in the grammar, before any rule. It is
|
|
copied as-is (minus the wrapping curly braces) at the top of the generated
|
|
parser. It may contain function declarations, types, variables, etc. just
|
|
like any Go file. Every symbol declared here will be available to all other
|
|
code blocks. Although the initializer is optional in a valid grammar, it is
|
|
usually required to generate a valid Go source code file (for the package
|
|
clause). E.g.:
|
|
{
|
|
package main
|
|
|
|
func someHelper() {
|
|
// ...
|
|
}
|
|
}
|
|
|
|
Action code blocks are code blocks declared after an expression in a rule.
|
|
Those code blocks are turned into a method on the "*current" type in the
|
|
generated source code. The method receives any labeled expression's value
|
|
as argument (as any) and must return two values, the first being
|
|
the value of the expression (an any), and the second an error.
|
|
If a non-nil error is returned, it is added to the list of errors that the
|
|
parser will return. E.g.:
|
|
RuleA = "A"+ {
|
|
// return the matched string, "c" is the default name for
|
|
// the *current receiver variable.
|
|
return string(c.text), nil
|
|
}
|
|
|
|
Predicate code blocks are code blocks declared immediately after the and "&"
|
|
or the not "!" operators. Like action code blocks, predicate code blocks
|
|
are turned into a method on the "*current" type in the generated source code.
|
|
The method receives any labeled expression's value as argument (as any)
|
|
and must return two opt, the first being a bool and the second an error.
|
|
If a non-nil error is returned, it is added to the list of errors that the
|
|
parser will return. E.g.:
|
|
RuleAB = [ab]i+ &{
|
|
return true, nil
|
|
}
|
|
|
|
State change code blocks are code blocks starting with "#". In contrast to
|
|
action and predicate code blocks, state change code blocks are allowed to
|
|
modify values in the global "state" store (see below).
|
|
State change code blocks are turned into a method on the "*current" type
|
|
in the generated source code.
|
|
The method is passed any labeled expression's value as an argument (of type
|
|
any) and must return a value of type error.
|
|
If a non-nil error is returned, it is added to the list of errors that the
|
|
parser will return, note that the parser does NOT backtrack if a non-nil
|
|
error is returned.
|
|
E.g:
|
|
Rule = [a] #{
|
|
c.state["a"]++
|
|
if c.state["a"] > 5 {
|
|
return fmt.Errorf("we have seen more than 5 a's") // parser will not backtrack
|
|
}
|
|
return nil
|
|
}
|
|
The "*current" type is a struct that provides four useful fields that can be
|
|
accessed in action, state change, and predicate code blocks: "pos", "text",
|
|
"state" and "globalStore".
|
|
|
|
The "pos" field indicates the current position of the parser in the source
|
|
input. It is itself a struct with three fields: "line", "col" and "offset".
|
|
Line is a 1-based line number, col is a 1-based column number that counts
|
|
runes from the start of the line, and offset is a 0-based byte offset.
|
|
|
|
The "text" field is the slice of bytes of the current match. It is empty
|
|
in a predicate code block.
|
|
|
|
The "state" field is a global store, with backtrack support, of type
|
|
"map[string]any". The values in the store are tied to the parser's
|
|
backtracking, in particular if a rule fails to match then all updates to the
|
|
state that occurred in the process of matching the rule are rolled back. For a
|
|
key-value store that is not tied to the parser's backtracking, see the
|
|
"globalStore".
|
|
The values in the "state" store are available for read access in action and
|
|
predicate code blocks, any changes made to the "state" store will be reverted
|
|
once the action or predicate code block is finished running. To update values
|
|
in the "state" use state change code blocks ("#{}").
|
|
|
|
IMPORTANT:
|
|
- In order to properly roll back the state if a rule fails to match the
|
|
parser must clone the state before trying to match a rule.
|
|
- The default clone mechanism makes a "shallow" copy of each value in the
|
|
"state", this implies that pointers, maps, slices, channels, and structs
|
|
containing any of the previous types are not properly copied.
|
|
- To support theses cases pigeon offers the "Cloner" interface which
|
|
consists of a single method "Clone". If a value stored in the "state"
|
|
store implements this interface, the "Clone" method is used to obtain a
|
|
proper copy.
|
|
- If a general solution is needed, external libraries which provide deep
|
|
copy functionality may be used in the "Clone" method
|
|
(e.g. https://github.com/mitchellh/copystructure).
|
|
|
|
The "globalStore" field is a global store of type "map[string]any",
|
|
which allows to store arbitrary values, which are available in action and
|
|
predicate code blocks for read as well as write access.
|
|
It is important to notice, that the global store is completely independent from
|
|
the backtrack mechanism of PEG and is therefore not set back to its old state
|
|
during backtrack.
|
|
The initialization of the global store may be achieved by using the GlobalStore
|
|
function (http://godoc.org/github.com/mna/pigeon/test/predicates#GlobalStore).
|
|
Be aware, that all keys starting with "_pigeon" are reserved for internal use
|
|
of pigeon and should not be used nor modified. Those keys are treated as
|
|
internal implementation details and therefore there are no guarantees given in
|
|
regards of API stability.
|
|
|
|
Left recursion
|
|
|
|
With options -support-left-recursion pigeon supports left recursion. E.g.:
|
|
expr = expr '*' term
|
|
Supports indirect recursion:
|
|
A = B / D
|
|
B = A / C
|
|
The implementation is based on the [Left-recursive PEG Grammars][9] article that
|
|
links to [Left Recursion in Parsing Expression Grammars][10] and
|
|
[Packrat Parsers Can Support Left Recursion][11] papers.
|
|
|
|
References:
|
|
|
|
[9]: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1
|
|
[10]: https://arxiv.org/pdf/1207.0443.pdf
|
|
[11]: http://web.cs.ucla.edu/~todd/research/pepm08.pdf
|
|
|
|
Failure labels, throw and recover
|
|
|
|
pigeon supports an extension of the classical PEG syntax called failure labels,
|
|
proposed by Maidl et al. in their paper "Error Reporting in Parsing Expression Grammars" [7].
|
|
The used syntax for the introduced expressions is borrowed from their lpeglabel [8]
|
|
implementation.
|
|
|
|
This extension allows to signal different kinds of errors and to specify, which
|
|
recovery pattern should handle a given label.
|
|
|
|
With labeled failures it is possible to distinguish between an ordinary failure
|
|
and an error. Usually, an ordinary failure is produced when the matching of a
|
|
character fails, and this failure is caught by ordered choice. An error
|
|
(a non-ordinary failure), by its turn, is produced by the throw operator and
|
|
may be caught by the recovery operator.
|
|
|
|
In pigeon, the recovery expression consists of the regular expression, the recovery
|
|
expression and a set of labels to be matched. First, the regular expression is tried.
|
|
If this fails with one of the provided labels, the recovery expression is tried. If
|
|
this fails as well, the error is propagated. E.g.:
|
|
FailureRecoveryExpr = RegularExpr //{FailureLabel1, FailureLabel2} RecoveryExpr
|
|
|
|
To signal a failure condition, the throw expression is used. E.g.:
|
|
ThrowExpr = %{FailureLabel1}
|
|
|
|
For concrete examples, how to use throw and recover, have a look at the examples
|
|
"labeled_failures" and "thrownrecover" in the "test" folder.
|
|
|
|
The implementation of the throw and recover operators work as follows:
|
|
The failure recover expression adds the recover expression for every failure label
|
|
to the recovery stack and runs the regular expression.
|
|
The throw expression checks the recovery stack in reversed order for the provided
|
|
failure label. If the label is found, the respective recovery expression is run. If
|
|
this expression is successful, the parser continues the processing of the input. If
|
|
the recovery expression is not successful, the parsing fails and the parser starts
|
|
to backtrack.
|
|
|
|
If throw and recover expressions are used together with global state, it is the
|
|
responsibility of the author of the grammar to reset the global state to a valid
|
|
state during the recovery operation.
|
|
|
|
[7]: https://arxiv.org/pdf/1405.6646v3.pdf
|
|
[8]: https://github.com/sqmedeiros/lpeglabel
|
|
|
|
Using the generated parser
|
|
|
|
The parser generated by pigeon exports a few symbols so that it can be used
|
|
as a package with public functions to parse input text. The exported API is:
|
|
- Parse(string, []byte, ...Option) (any, error)
|
|
- ParseFile(string, ...Option) (any, error)
|
|
- ParseReader(string, io.Reader, ...Option) (any, error)
|
|
- AllowInvalidUTF8(bool) Option
|
|
- Debug(bool) Option
|
|
- Entrypoint(string) Option
|
|
- GlobalStore(string, any) Option
|
|
- MaxExpressions(uint64) Option
|
|
- Memoize(bool) Option
|
|
- Recover(bool) Option
|
|
- Statistics(*Stats) Option
|
|
|
|
See the godoc page of the generated parser for the test/predicates grammar
|
|
for an example documentation page of the exported API:
|
|
http://godoc.org/github.com/mna/pigeon/test/predicates.
|
|
|
|
Like the grammar used to generate the parser, the input text must be
|
|
UTF-8-encoded Unicode.
|
|
|
|
The start rule of the parser is the first rule in the PEG grammar used
|
|
to generate the parser. A call to any of the Parse* functions returns
|
|
the value generated by executing the grammar on the provided input text,
|
|
and an optional error.
|
|
|
|
Typically, the grammar should generate some kind of abstract syntax tree (AST),
|
|
but for simple grammars it may evaluate the result immediately, such as in
|
|
the examples/calculator example. There are no constraints imposed on the
|
|
author of the grammar, it can return whatever is needed.
|
|
|
|
Error reporting
|
|
|
|
When the parser returns a non-nil error, the error is always of type errList,
|
|
which is defined as a slice of errors ([]error). Each error in the list is
|
|
of type *parserError. This is a struct that has an "Inner" field that can be
|
|
used to access the original error.
|
|
|
|
So if a code block returns some well-known error like:
|
|
{
|
|
return nil, io.EOF
|
|
}
|
|
|
|
The original error can be accessed this way:
|
|
_, err := ParseFile("some_file")
|
|
if err != nil {
|
|
list := err.(errList)
|
|
for _, err := range list {
|
|
pe := err.(*parserError)
|
|
if pe.Inner == io.EOF {
|
|
// ...
|
|
}
|
|
}
|
|
}
|
|
|
|
By defaut the parser will continue after an error is returned and will
|
|
cumulate all errors found during parsing. If the grammar reaches a point
|
|
where it shouldn't continue, a panic statement can be used to terminate
|
|
parsing. The panic will be caught at the top-level of the Parse* call
|
|
and will be converted into a *parserError like any error, and an errList
|
|
will still be returned to the caller.
|
|
|
|
The divide by zero error in the examples/calculator grammar leverages this
|
|
feature (no special code is needed to handle division by zero, if it
|
|
happens, the runtime panics and it is recovered and returned as a parsing
|
|
error).
|
|
|
|
Providing good error reporting in a parser is not a trivial task. Part
|
|
of it is provided by the pigeon tool, by offering features such as
|
|
filename, position, expected literals and rule name in the error message,
|
|
but an important part of good error reporting needs to be done by the grammar
|
|
author.
|
|
|
|
For example, many programming languages use double-quotes for string literals.
|
|
Usually, if the opening quote is found, the closing quote is expected, and if
|
|
none is found, there won't be any other rule that will match, there's no need
|
|
to backtrack and try other choices, an error should be added to the list
|
|
and the match should be consumed.
|
|
|
|
In order to do this, the grammar can look something like this:
|
|
|
|
StringLiteral = '"' ValidStringChar* '"' {
|
|
// this is the valid case, build string literal node
|
|
// node = ...
|
|
return node, nil
|
|
} / '"' ValidStringChar* !'"' {
|
|
// invalid case, build a replacement string literal node or build a BadNode
|
|
// node = ...
|
|
return node, errors.New("string literal not terminated")
|
|
}
|
|
|
|
This is just one example, but it illustrates the idea that error reporting
|
|
needs to be thought out when designing the grammar.
|
|
|
|
Because the above mentioned error types (errList and parserError) are not
|
|
exported, additional steps have to be taken, ff the generated parser is used as
|
|
library package in other packages (e.g. if the same parser is used in multiple
|
|
command line tools).
|
|
One possible implementation for exported errors (based on interfaces) and
|
|
customized error reporting (caret style formatting of the position, where
|
|
the parsing failed) is available in the json example and its command line tool:
|
|
http://godoc.org/github.com/mna/pigeon/examples/json
|
|
|
|
API stability
|
|
|
|
Generated parsers have user-provided code mixed with pigeon code
|
|
in the same package, so there is no package
|
|
boundary in the resulting code to prevent access to unexported symbols.
|
|
What is meant to be implementation
|
|
details in pigeon is also available to user code - which doesn't mean
|
|
it should be used.
|
|
|
|
For this reason, it is important to precisely define what is intended to be
|
|
the supported API of pigeon, the parts that will be stable
|
|
in future versions.
|
|
|
|
The "stability" of the version 1.0 API attempts to make a similar guarantee
|
|
as the Go 1 compatibility [5]. The following lists what part of the
|
|
current pigeon code falls under that guarantee (features may be added in
|
|
the future):
|
|
|
|
- The pigeon command-line flags and arguments: those will not be removed
|
|
and will maintain the same semantics.
|
|
|
|
- The explicitly exported API generated by pigeon. See [6] for the
|
|
documentation of this API on a generated parser.
|
|
|
|
- The PEG syntax, as documented above.
|
|
|
|
- The code blocks (except the initializer) will always be generated as
|
|
methods on the *current type, and this type is guaranteed to have
|
|
the fields pos (type position) and text (type []byte). There are no
|
|
guarantees on other fields and methods of this type.
|
|
|
|
- The position type will always have the fields line, col and offset,
|
|
all defined as int. There are no guarantees on other fields and methods
|
|
of this type.
|
|
|
|
- The type of the error value returned by the Parse* functions, when
|
|
not nil, will always be errList defined as a []error. There are no
|
|
guarantees on methods of this type, other than the fact it implements the
|
|
error interface.
|
|
|
|
- Individual errors in the errList will always be of type *parserError,
|
|
and this type is guaranteed to have an Inner field that contains the
|
|
original error value. There are no guarantees on other fields and methods
|
|
of this type.
|
|
|
|
The above guarantee is given to the version 1.0 (https://github.com/mna/pigeon/releases/tag/v1.0.0)
|
|
of pigeon, which has entered maintenance mode (bug fixes only). The current
|
|
master branch includes the development toward a future version 2.0, which
|
|
intends to further improve pigeon.
|
|
While the given API stability should be maintained as far as it makes sense,
|
|
breaking changes may be necessary to be able to improve pigeon.
|
|
The new version 2.0 API has not yet stabilized and therefore changes to the API
|
|
may occur at any time.
|
|
|
|
References:
|
|
|
|
[5]: https://golang.org/doc/go1compat
|
|
[6]: http://godoc.org/github.com/mna/pigeon/test/predicates
|
|
|
|
*/
|
|
package main
|