A regular expression library for Nim using PCRE to do the hard work. The top priorities are ergonomics & ease of use.
For documentation on how to write patterns, there exists the official PCRE pattern documentation. You can also search the internet for a wide variety of third-party documentation and tools.
If you love sequtils.toSeq
we have bad news for you. This library doesn't
work with it due to documented compiler limitations. As a workaround, use this:
import nre except toSeq
PCRE has some additional terms that you must agree to in order to use this module.
This library handles splitting with an empty string, i.e. if the splitting
regex is empty (""
), the same way as Perl,
Javascript, and Java.
This means that "123".split(re"") == @["1", "2", "3"]
, as opposed to the
Nim stdlib's @["123"]
Represents the pattern that things are matched against, constructed with
re(string)
. Examples: re"foo"
, re(r"(*ANYCRLF)(?x)foo #
comment".
pattern: string
- the string that was used to create the pattern. For details on how to write a pattern, please see the official PCRE pattern documentation.
captureCount: int
- the number of captures that the pattern has.
captureNameId: Table[string, int]
- a table from the capture names to their numeric id.
The following options may appear anywhere in the pattern, and they affect the rest of it.
(?i)
- case insensitive(?m)
- multi-line:^
and$
match the beginning and end of lines, not of the subject string(?s)
-.
also matches newline (dotall)(?U)
- expressions are not greedy by default.?
can be added to a qualifier to make it greedy(?x)
- whitespace and comments (#
) are ignored (extended)(?X)
- character escapes without special meaning (\w
vs.\a
) are errors (extra)
One or a combination of these options may appear only at the beginning of the pattern:
(*UTF8)
- treat both the pattern and subject as UTF-8(*UCP)
- Unicode character properties;\w
matchesя
(*U)
- a combination of the two options above(*FIRSTLINE*)
- fails if there is not a match on the first line(*NO_AUTO_CAPTURE)
- turn off auto-capture for groups;(?<name>...)
can be used to capture(*CR)
- newlines are separated by\r
(*LF)
- newlines are separated by\n
(UNIX default)(*CRLF)
- newlines are separated by\r\n
(Windows default)(*ANYCRLF)
- newlines are separated by any of the above(*ANY)
- newlines are separated by any of the above and Unicode newlines:single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit library, the last two are recognized only in UTF-8 mode. — man pcre
(*JAVASCRIPT_COMPAT)
- JavaScript compatibility(*NO_STUDY)
- turn off studying; study is enabled by default
For more details on the leading option groups, see the Option Setting and the Newline Convention sections of the PCRE syntax manual.
Some of these options are not part of PCRE and are converted by nre
into PCRE flags. These include NEVER_UTF
, ANCHORED
,
DOLLAR_ENDONLY
, FIRSTLINE
, NO_AUTO_CAPTURE
,
JAVASCRIPT_COMPAT
, U
, NO_STUDY
. In other PCRE wrappers, you
will need to pass these as seperate flags to PCRE.
Usually seen as Option[RegexMatch], it represents the result of an execution. On failure, it is none, on success, it is some.
pattern: Regex
- the pattern that is being matched
str: string
- the string that was matched against
captures[]: string
the string value of whatever was captured at that id. If the value is invalid, then behavior is undefined. If the id is
-1
, then the whole match is returned. If the given capture was not matched,nil
is returned."abc".match(re"(\w)").get.captures[0] == "a"
"abc".match(re"(?<letter>\w)").get.captures["letter"] == "a"
"abc".match(re"(\w)\w").get.captures[-1] == "ab"
captureBounds[]: HSlice[int, int]
gets the bounds of the given capture according to the same rules as the above. If the capture is not filled, then
None
is returned. The bounds are both inclusive."abc".match(re"(\w)").get.captureBounds[0] == 0 .. 0
0 in "abc".match(re"(\w)").get.captureBounds == true
"abc".match(re"").get.captureBounds[-1] == 0 .. -1
"abc".match(re"abc").get.captureBounds[-1] == 0 .. 2
match: string
- the full text of the match.
matchBounds: HSlice[int, int]
- the bounds of the match, as in
captureBounds[]
(captureBounds|captures).toTable
- returns a table with each named capture as a key.
(captureBounds|captures).toSeq
- returns all the captures by their number.
$: string
- same as
match
Internal error in the module, this probably means that there is a bug
Thrown when matching fails due to invalid unicode in strings
Thrown when there is a syntax error in the regular expression string passed in
Thrown when studying the regular expression failes for whatever reason. The message contains the error code.
Like `find(...)
<#proc-find>`__, but anchored to the start of the
string. This means that "foo".match(re"f").isSome == true
, but
"foo".match(re"o").isSome == false
.
Works the same as `find(...)
<#proc-find>`__, but finds every
non-overlapping match. "2222".findIter(re"22")
is "22", "22"
, not
"22", "22", "22"
.
Arguments are the same as `find(...)
<#proc-find>`__
Variants:
proc findAll(...)
returns aseq[string]
Finds the given pattern in the string between the end and start positions.
start
- The start point at which to start matching.
|abc
is0
;a|bc
is1
endpos
- The maximum index for a match;
int.high
means the end of the string, otherwise it’s an inclusive upper bound.
Splits the string with the given regex. This works according to the rules that Perl and Javascript use:
- If the match is zero-width, then the string is still split:
"123".split(r"") == @["1", "2", "3"]
. - If the pattern has a capture in it, it is added after the string
split:
"12".split(re"(\d)") == @["", "1", "", "2", ""]
. - If
maxsplit != -1
, then the string will only be splitmaxsplit - 1
times. This means that there will bemaxsplit
strings in the output seq."1.2.3".split(re"\.", maxsplit = 2) == @["1", "2.3"]
start
behaves the same as in `find(...)
<#proc-find>`__.
Replaces each match of Regex in the string with subproc
, which should
never be or return nil
.
If subproc
is a proc (RegexMatch): string
, then it is executed with
each match and the return value is the replacement value.
If subproc
is a proc (string): string
, then it is executed with the
full text of the match and and the return value is the replacement
value.
If subproc
is a string, the syntax is as follows:
$$
- literal$
$123
- capture number123
$foo
- named capturefoo
${foo}
- same as above$1$#
- first and second captures$#
- first capture$0
- full match
If a given capture is missing, IndexError
thrown for un-named captures
and KeyError
for named captures.
Escapes the string so it doesn’t match any special characters.
Incompatible with the Extra flag (X
).