pcre2test.1   pcre2test.1 
skipping to change at line 57 skipping to change at line 57
The input is processed using C's string functions, so must not conta in binary zeros, even though The input is processed using C's string functions, so must not conta in binary zeros, even though
in Unix-like environments, fgets() treats any bytes other than ne wline as data characters. An in Unix-like environments, fgets() treats any bytes other than ne wline as data characters. An
error is generated if a binary zero is encountered. By default subje ct lines are processed for error is generated if a binary zero is encountered. By default subje ct lines are processed for
backslash escapes, which makes it possible to include any data valu e in strings that are passed backslash escapes, which makes it possible to include any data valu e in strings that are passed
to the library for matching. For patterns, there is a facility for s pecifying some or all of the to the library for matching. For patterns, there is a facility for s pecifying some or all of the
8-bit input characters as hexadecimal pairs, which makes it possible to include binary zeros. 8-bit input characters as hexadecimal pairs, which makes it possible to include binary zeros.
Input for the 16-bit and 32-bit libraries Input for the 16-bit and 32-bit libraries
When testing the 16-bit or 32-bit libraries, there is a need to be a ble to generate character When testing the 16-bit or 32-bit libraries, there is a need to be a ble to generate character
code points greater than 255 in the strings that are passed to the code points greater than 255 in the strings that are passed to th
library. For subject lines, e library. For subject lines
backslash escapes can be used. In addition, when the utf modifier (s and some patterns, backslash escapes can be used. In addition, when
ee "Setting compilation op‐ the utf modifier (see "Set‐
tions" below) is set, the pattern and any following subject li ting compilation options" below) is set, the pattern and any follow
nes are interpreted as UTF-8 ing subject lines are inter‐
strings and translated to UTF-16 or UTF-32 as appropriate. preted as UTF-8 strings and translated to UTF-16 or UTF-32 as approp
riate.
For non-UTF testing of wide characters, the utf8_input modifier can be used. This is mutually For non-UTF testing of wide characters, the utf8_input modifier can be used. This is mutually
exclusive with utf, and is allowed only in 16-bit or 32-bit mode. It causes the pattern and fol‐ exclusive with utf, and is allowed only in 16-bit or 32-bit mode. It causes the pattern and fol‐
lowing subject lines to be treated as UTF-8 according to the ori ginal definition (RFC 2279), lowing subject lines to be treated as UTF-8 according to the ori ginal definition (RFC 2279),
which allows for character values up to 0x7fffffff. Each character i s placed in one 16-bit or which allows for character values up to 0x7fffffff. Each character i s placed in one 16-bit or
32-bit code unit (in the 16-bit case, values greater than 0xffff cau se an error to occur). 32-bit code unit (in the 16-bit case, values greater than 0xffff cau se an error to occur).
UTF-8 (in its original definition) is not capable of encoding val ues greater than 0x7fffffff, UTF-8 (in its original definition) is not capable of encoding val ues greater than 0x7fffffff,
but such values can be handled by the 32-bit library. When testing t his library in non-UTF mode but such values can be handled by the 32-bit library. When testing t his library in non-UTF mode
with utf8_input set, if any character is preceded by the byte 0xff (which is an invalid byte in with utf8_input set, if any character is preceded by the byte 0xff (which is an invalid byte in
UTF-8) 0x80000000 is added to the character's value. This is the onl UTF-8) 0x80000000 is added to the character's value. For subject str
y way of passing such code ings, using an escape se‐
points in a pattern string. For subject strings, using an escape seq quence is preferable.
uence is preferable.
COMMAND LINE OPTIONS COMMAND LINE OPTIONS
-8 If the 8-bit library has been built, this option causes it to be used (this is the de‐ -8 If the 8-bit library has been built, this option causes it to be used (this is the de‐
fault). If the 8-bit library has not been built, this opti on causes an error. fault). If the 8-bit library has not been built, this opti on causes an error.
-16 If the 16-bit library has been built, this option causes it to be used. If the 8-bit -16 If the 16-bit library has been built, this option causes it to be used. If the 8-bit
library has not been built, this is the default. If the 16 -bit library has not been library has not been built, this is the default. If the 16 -bit library has not been
built, this option causes an error. built, this option causes an error.
skipping to change at line 105 skipping to change at line 105
-C Output the version number of the PCRE2 library, and all a vailable information about -C Output the version number of the PCRE2 library, and all a vailable information about
the optional features that are included, and then exit wi th zero exit code. All other the optional features that are included, and then exit wi th zero exit code. All other
options are ignored. If both -C and -LM are present, which ever is first is recognized. options are ignored. If both -C and -LM are present, which ever is first is recognized.
-C option Output information about a specific build-time option, the n exit. This functionality -C option Output information about a specific build-time option, the n exit. This functionality
is intended for use in scripts such as RunTest. The follow ing options output the value is intended for use in scripts such as RunTest. The follow ing options output the value
and set the exit code as indicated: and set the exit code as indicated:
ebcdic-nl the code for LF (= NL) in an EBCDIC environme nt: ebcdic-nl the code for LF (= NL) in an EBCDIC environme nt:
0x15 or 0x25 either 0x15 or 0x25
0 if used in an ASCII environment 0 if used in an ASCII/Unicode environment
exit code is always 0 exit code is always 0
linksize the configured internal link size (2, 3, or 4 ) linksize the configured internal link size (2, 3, or 4 )
exit code is set to the link size exit code is set to the link size
newline the default newline setting: newline the default newline setting:
CR, LF, CRLF, ANYCRLF, ANY, or NUL CR, LF, CRLF, ANYCRLF, ANY, or NUL
exit code is always 0 exit code is always 0
bsr the default setting for what \R matches: bsr the default setting for what \R matches:
ANYCRLF or ANY ANYCRLF or ANY
exit code is always 0 exit code is always 0
skipping to change at line 128 skipping to change at line 128
same value: same value:
backslash-C \C is supported (not locked out) backslash-C \C is supported (not locked out)
ebcdic compiled for an EBCDIC environment ebcdic compiled for an EBCDIC environment
jit just-in-time support is available jit just-in-time support is available
pcre2-16 the 16-bit library was built pcre2-16 the 16-bit library was built
pcre2-32 the 32-bit library was built pcre2-32 the 32-bit library was built
pcre2-8 the 8-bit library was built pcre2-8 the 8-bit library was built
unicode Unicode support is available unicode Unicode support is available
Note that the availability of JIT support in the library d
oes not guarantee that it
can actually be used because in some environments it is u
nable to allocate executable
memory. The option "jitusable" gives more detailed informa
tion. It returns one of the
following values:
0 JIT is available and usable
1 JIT is available but cannot allocate executable memor
y
2 JIT is not available
3 Unexpected return from test call to pcre2_jit_compile
()
If an unknown option is given, an error message is output; the exit code is 0. If an unknown option is given, an error message is output; the exit code is 0.
-d Behave as if each pattern has the debug modifier; the inte rnal form and information -d Behave as if each pattern has the debug modifier; the i nternal form and information
about the compiled pattern is output after compilation; -d is equivalent to -b -i. about the compiled pattern is output after compilation; -d is equivalent to -b -i.
-dfa Behave as if each subject line has the dfa modifier ; matching is done using the -dfa Behave as if each subject line has the dfa modifier; ma tching is done using the
pcre2_dfa_match() function instead of the default pcre2_ma tch(). pcre2_dfa_match() function instead of the default pcre2_ma tch().
-error number[,number,...] -error number[,number,...]
Call pcre2_get_error_message() for each of the error numbe Call pcre2_get_error_message() for each of the error nu
rs in the comma-separated mbers in the comma-separated
list, display the resulting messages on the standard outp list, display the resulting messages on the standard outpu
ut, then exit with zero exit t, then exit with zero exit
code. The numbers may be positive or negative. This is a code. The numbers may be positive or negative. This i
convenience facility for s a convenience facility for
PCRE2 maintainers. PCRE2 maintainers.
-help Output a brief summary these options and then exit. -help Output a brief summary these options and then exit.
-i Behave as if each pattern has the info modifier; informa tion about the compiled pat‐ -i Behave as if each pattern has the info modifier; informati on about the compiled pat‐
tern is given after compilation. tern is given after compilation.
-jit Behave as if each pattern line has the jit modifier; aft er successful compilation, -jit Behave as if each pattern line has the jit modifier; a fter successful compilation,
each pattern is passed to the just-in-time compiler, if av ailable. each pattern is passed to the just-in-time compiler, if av ailable.
-jitfast Behave as if each pattern line has the jitfast modifier; a fter successful compilation, -jitfast Behave as if each pattern line has the jitfast modifier; a fter successful compilation,
each pattern is passed to the just-in-time compiler, if available, and each subject each pattern is passed to the just-in-time compiler, if av ailable, and each subject
line is passed directly to the JIT matcher via its "fast p ath". line is passed directly to the JIT matcher via its "fast p ath".
-jitverify -jitverify
Behave as if each pattern line has the jitverify modifier; Behave as if each pattern line has the jitverify modifie
after successful compila‐ r; after successful compila‐
tion, each pattern is passed to the just-in-time compile tion, each pattern is passed to the just-in-time compiler,
r, if available, and the use if available, and the use
of JIT for matching is verified. of JIT for matching is verified.
-LM List modifiers: write a list of available pattern and subj ect modifiers to the stan‐ -LM List modifiers: write a list of available pattern and su bject modifiers to the stan‐
dard output, then exit with zero exit code. All other opti ons are ignored. If both -C dard output, then exit with zero exit code. All other opti ons are ignored. If both -C
and any -Lx options are present, whichever is first is rec ognized. and any -Lx options are present, whichever is first is rec ognized.
-LP List properties: write a list of recognized Unicode proper ties to the standard output, -LP List properties: write a list of recognized Unicode proper ties to the standard output,
then exit with zero exit code. All other options are ign ored. If both -C and any -Lx then exit with zero exit code. All other options are ignor ed. If both -C and any -Lx
options are present, whichever is first is recognized. options are present, whichever is first is recognized.
-LS List scripts: write a list of recognized Unicode script na -LS List scripts: write a list of recognized Unicode script n
mes to the standard output, ames to the standard output,
then exit with zero exit code. All other options are ign then exit with zero exit code. All other options are ignor
ored. If both -C and any -Lx ed. If both -C and any -Lx
options are present, whichever is first is recognized. options are present, whichever is first is recognized.
-pattern modifier-list -pattern modifier-list
Behave as if each pattern line contains the given modifier s. Behave as if each pattern line contains the given modifier s.
-q Do not output the version number of pcre2test at the start of execution. -q Do not output the version number of pcre2test at the start of execution.
-S size On Unix-like systems, set the size of the run-time stack t o size mebibytes (units of -S size On Unix-like systems, set the size of the run-time stack to size mebibytes (units of
1024*1024 bytes). 1024*1024 bytes).
-subject modifier-list -subject modifier-list
Behave as if each subject line contains the given modifier s. Behave as if each subject line contains the given modifier s.
-t Run each compile and match many times with a timer, and ou tput the resulting times per -t Run each compile and match many times with a timer, and ou tput the resulting times per
compile or match. When JIT is used, separate times are g iven for the initial compile compile or match. When JIT is used, separate times are giv en for the initial compile
and the JIT compile. You can control the number of iterati ons that are used for timing and the JIT compile. You can control the number of iterati ons that are used for timing
by following -t with a number (as a separate item on the c ommand line). For example, by following -t with a number (as a separate item on the command line). For example,
"-t 1000" iterates 1000 times. The default is to iterate 5 00,000 times. "-t 1000" iterates 1000 times. The default is to iterate 5 00,000 times.
-tm This is like -t except that it times only the matching pha se, not the compile phase. -tm This is like -t except that it times only the matching pha se, not the compile phase.
-T -TM These behave like -t and -tm, but in addition, at the e nd of a run, the total times -T -TM These behave like -t and -tm, but in addition, at the end of a run, the total times
for all compiles and matches are output. for all compiles and matches are output.
-version Output the PCRE2 version number and then exit. -version Output the PCRE2 version number and then exit.
DESCRIPTION DESCRIPTION
If pcre2test is given two filename arguments, it reads from the firs t and writes to the second. If pcre2test is given two filename arguments, it reads from the fir st and writes to the second.
If the first name is "-", input is taken from the standard input. If pcre2test is given only one If the first name is "-", input is taken from the standard input. If pcre2test is given only one
argument, it reads from that file and writes to stdout. Otherw ise, it reads from stdin and argument, it reads from that file and writes to stdout. Otherwise, it reads from stdin and
writes to stdout. writes to stdout.
When pcre2test is built, a configuration option can specify that it When pcre2test is built, a configuration option can specify that
should be linked with the it should be linked with the
libreadline or libedit library. When this is done, if the input is libreadline or libedit library. When this is done, if the input is f
from a terminal, it is read rom a terminal, it is read
using the readline() function. This provides line-editing and histo using the readline() function. This provides line-editing and hi
ry facilities. The output story facilities. The output
from the -help option states whether or not readline() will be used. from the -help option states whether or not readline() will be used.
The program handles any number of tests, each of which consists o The program handles any number of tests, each of which consists of a
f a set of input lines. Each set of input lines. Each
set starts with a regular expression pattern, followed by any number set starts with a regular expression pattern, followed by any n
of subject lines to be umber of subject lines to be
matched against that pattern. In between sets of test data, command matched against that pattern. In between sets of test data, command
lines that begin with # may lines that begin with # may
appear. This file format, with some restrictions, can also be pro appear. This file format, with some restrictions, can also be
cessed by the perltest.sh processed by the perltest.sh
script that is distributed with PCRE2 as a means of checking tha script that is distributed with PCRE2 as a means of checking that th
t the behaviour of PCRE2 and e behaviour of PCRE2 and
Perl is the same. For a specification of perltest.sh, see the commen Perl is the same. For a specification of perltest.sh, see the comm
ts near its beginning. See ents near its beginning. See
also the #perltest command below. also the #perltest command below.
When the input is a terminal, pcre2test prompts for each line of input, using "re>" to prompt When the input is a terminal, pcre2test prompts for each line of inp ut, using "re>" to prompt
for regular expression patterns, and "data>" to prompt for subject l ines. Command lines starting for regular expression patterns, and "data>" to prompt for subject l ines. Command lines starting
with # can be entered only in response to the "re>" prompt. with # can be entered only in response to the "re>" prompt.
Each subject line is matched separately and independently. If you wa nt to do multi-line matches, Each subject line is matched separately and independently. If you wa nt to do multi-line matches,
you have to use the \n escape sequence (or \r or \r\n, etc., dependi you have to use the \n escape sequence (or \r or \r\n, etc., depen
ng on the newline setting) ding on the newline setting)
in a single line of input to encode the newline sequences. There in a single line of input to encode the newline sequences. There is
is no limit on the length of no limit on the length of
subject lines; the input buffer is automatically extended if it is t oo small. There are replica‐ subject lines; the input buffer is automatically extended if it is t oo small. There are replica‐
tion features that makes it possible to generate long repetitive pat tern or subject lines with‐ tion features that makes it possible to generate long repetitive pa ttern or subject lines with‐
out having to supply them explicitly. out having to supply them explicitly.
An empty line or the end of the file signals the end of the subjec t lines for a test, at which An empty line or the end of the file signals the end of the subject lines for a test, at which
point a new pattern or command line is expected if there is still in put to be read. point a new pattern or command line is expected if there is still in put to be read.
COMMAND LINES COMMAND LINES
In between sets of test data, a line that begins with # is interpret ed as a command line. If the In between sets of test data, a line that begins with # is interpret ed as a command line. If the
first character is followed by white space or an exclamation mark, t he line is treated as a com‐ first character is followed by white space or an exclamation mark, t he line is treated as a com‐
ment, and ignored. Otherwise, the following commands are recognized: ment, and ignored. Otherwise, the following commands are recognized:
#forbid_utf #forbid_utf
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PC Subsequent patterns automatically have the PCRE2_NEVER_UTF and
RE2_NEVER_UCP options set, PCRE2_NEVER_UCP options set,
which locks out the use of the PCRE2_UTF and PCRE2_UCP options and which locks out the use of the PCRE2_UTF and PCRE2_UCP options and t
the use of (*UTF) and (*UCP) he use of (*UTF) and (*UCP)
at the start of patterns. This command also forces an error if a sub sequent pattern contains any at the start of patterns. This command also forces an error if a sub sequent pattern contains any
occurrences of \P, \p, or \X, which are still supported when PCRE2_U TF is not set, but which re‐ occurrences of \P, \p, or \X, which are still supported when PCRE2_U TF is not set, but which re‐
quire Unicode property support to be included in the library. quire Unicode property support to be included in the library.
This is a trigger guard that is used in test files to ensure that UT This is a trigger guard that is used in test files to ensure that U
F or Unicode property tests TF or Unicode property tests
are not accidentally added to files that are used when Unicode su are not accidentally added to files that are used when Unicode suppo
pport is not included in the rt is not included in the
library. Setting PCRE2_NEVER_UTF and PCRE2_NEVER_UCP as a default ca library. Setting PCRE2_NEVER_UTF and PCRE2_NEVER_UCP as a default
n also be obtained by the can also be obtained by the
use of #pattern; the difference is that #forbid_utf cannot be unse use of #pattern; the difference is that #forbid_utf cannot be unset,
t, and the automatic options and the automatic options
are not displayed in pattern information, to avoid cluttering up tes t output. are not displayed in pattern information, to avoid cluttering up tes t output.
#load <filename> #load <filename>
This command is used to load a set of precompiled patterns from a fi le, as described in the sec‐ This command is used to load a set of precompiled patterns from a fi le, as described in the sec‐
tion entitled "Saving and restoring compiled patterns" below. tion entitled "Saving and restoring compiled patterns" below.
#loadtables <filename> #loadtables <filename>
This command is used to load a set of binary character tables that c an be accessed by the ta‐ This command is used to load a set of binary character tables tha t can be accessed by the ta‐
bles=3 qualifier. Such tables can be created by the pcre2_dftables p rogram with the -b option. bles=3 qualifier. Such tables can be created by the pcre2_dftables p rogram with the -b option.
#newline_default [<newline-list>] #newline_default [<newline-list>]
When PCRE2 is built, a default newline convention can be specified. When PCRE2 is built, a default newline convention can be specified.
This determines which char‐ This determines which char‐
acters and/or character pairs are recognized as indicating a newline acters and/or character pairs are recognized as indicating a new
in a pattern or subject line in a pattern or subject
string. The default can be overridden when a pattern is compiled. string. The default can be overridden when a pattern is compiled. Th
The standard test files con‐ e standard test files con‐
tain tests of various newline conventions, but the majority of the t tain tests of various newline conventions, but the majority of the
ests expect a single line‐ tests expect a single line‐
feed to be recognized as a newline by default. Without special acti feed to be recognized as a newline by default. Without special actio
on the tests would fail when n the tests would fail when
PCRE2 is compiled with either CR or CRLF as the default newline. PCRE2 is compiled with either CR or CRLF as the default newline.
The #newline_default command specifies a list of newline types that are acceptable as the de‐ The #newline_default command specifies a list of newline types th at are acceptable as the de‐
fault. The types must be one of CR, LF, CRLF, ANYCRLF, ANY, or NUL ( in upper or lower case), for fault. The types must be one of CR, LF, CRLF, ANYCRLF, ANY, or NUL ( in upper or lower case), for
example: example:
#newline_default LF Any anyCRLF #newline_default LF Any anyCRLF
If the default newline is in the list, this command has no effect. If the default newline is in the list, this command has no effect. O
Otherwise, except when test‐ therwise, except when test‐
ing the POSIX API, a newline modifier that specifies the first newli ing the POSIX API, a newline modifier that specifies the first ne
ne convention in the list wline convention in the list
(LF in the above example) is added to any pattern that does not alre ady have a newline modifier. (LF in the above example) is added to any pattern that does not alre ady have a newline modifier.
If the newline list is empty, the feature is turned off. This comma nd is present in a number of If the newline list is empty, the feature is turned off. This comman d is present in a number of
the standard test input files. the standard test input files.
When the POSIX API is being tested there is no way to override the d When the POSIX API is being tested there is no way to override the
efault newline convention, default newline convention,
though it is possible to set the newline convention from within the though it is possible to set the newline convention from within the
pattern. A warning is given pattern. A warning is given
if the posix or posix_nosub modifier is used when #newline_default w if the posix or posix_nosub modifier is used when #newline_default
ould set a default for the would set a default for the
non-POSIX API. non-POSIX API.
#pattern <modifier-list> #pattern <modifier-list>
This command sets a default modifier list that applies to all subse quent patterns. Modifiers on This command sets a default modifier list that applies to all subseq uent patterns. Modifiers on
a pattern can change these settings. a pattern can change these settings.
#perltest #perltest
This line is used in test files that can also be processed by perlte This line is used in test files that can also be processed by perl
st.sh to confirm that Perl test.sh to confirm that Perl
gives the same results as PCRE2. Subsequent tests are checked for t gives the same results as PCRE2. Subsequent tests are checked for th
he use of pcre2test features e use of pcre2test features
that are incompatible with the perltest.sh script. that are incompatible with the perltest.sh script.
Patterns must use '/' as their delimiter, and only certain modifie Patterns must use '/' as their delimiter, and only certain modi
rs are supported. Comment fiers are supported. Comment
lines, #pattern commands, and #subject commands that set or uns lines, #pattern commands, and #subject commands that set or unset
et "mark" are recognized and "mark" are recognized and
acted on. The #perltest, #forbid_utf, and #newline_default commands, acted on. The #perltest, #forbid_utf, and #newline_default comma
which are needed in the nds, which are needed in the
relevant pcre2test files, are silently ignored. All other command l relevant pcre2test files, are silently ignored. All other command li
ines are ignored, but give a nes are ignored, but give a
warning message. The #perltest command helps detect tests that are a ccidentally put in the wrong warning message. The #perltest command helps detect tests that are a ccidentally put in the wrong
file or use the wrong delimiter. For more details of the perltest.sh script see the comments it file or use the wrong delimiter. For more details of the perltest.s h script see the comments it
contains. contains.
#pop [<modifiers>] #pop [<modifiers>]
#popcopy [<modifiers>] #popcopy [<modifiers>]
These commands are used to manipulate the stack of compiled patter ns, as described in the sec‐ These commands are used to manipulate the stack of compiled patterns , as described in the sec‐
tion entitled "Saving and restoring compiled patterns" below. tion entitled "Saving and restoring compiled patterns" below.
#save <filename> #save <filename>
This command is used to save a set of compiled patterns to a file, a s described in the section This command is used to save a set of compiled patterns to a file, as described in the section
entitled "Saving and restoring compiled patterns" below. entitled "Saving and restoring compiled patterns" below.
#subject <modifier-list> #subject <modifier-list>
This command sets a default modifier list that applies to all sub sequent subject lines. Modi‐ This command sets a default modifier list that applies to all subseq uent subject lines. Modi‐
fiers on a subject line can change these settings. fiers on a subject line can change these settings.
MODIFIER SYNTAX MODIFIER SYNTAX
Modifier lists are used with both pattern and subject lines. Items i n a list are separated by Modifier lists are used with both pattern and subject lines. Item s in a list are separated by
commas followed by optional white space. Trailing whitespace in a mo difier list is ignored. Some commas followed by optional white space. Trailing whitespace in a mo difier list is ignored. Some
modifiers may be given for both patterns and subject lines, where modifiers may be given for both patterns and subject lines, whereas
as others are valid only for others are valid only for
one or the other. Each modifier has a long name, for example "anchor one or the other. Each modifier has a long name, for example "anch
ed", and some of them must ored", and some of them must
be followed by an equals sign and a value, for example, "offset=12". Values cannot contain comma be followed by an equals sign and a value, for example, "offset=12". Values cannot contain comma
characters, but may contain spaces. Modifiers that do not take value s may be preceded by a minus characters, but may contain spaces. Modifiers that do not take value s may be preceded by a minus
sign to turn off a previous setting. sign to turn off a previous setting.
A few of the more common modifiers can also be specified as single A few of the more common modifiers can also be specified as single l
letters, for example "i" for etters, for example "i" for
"caseless". In documentation, following the Perl convention, these "caseless". In documentation, following the Perl convention, th
are written with a slash ese are written with a slash
("the /i modifier") for clarity. Abbreviated modifiers must all ("the /i modifier") for clarity. Abbreviated modifiers must all be
be concatenated in the first concatenated in the first
item of a modifier list. If the first item is not recognized as a lo item of a modifier list. If the first item is not recognized as a l
ng modifier name, it is in‐ ong modifier name, it is in‐
terpreted as a sequence of these abbreviations. For example: terpreted as a sequence of these abbreviations. For example:
/abc/ig,newline=cr,jit=3 /abc/ig,newline=cr,jit=3
This is a pattern line whose modifier list starts with two one-lette r modifiers (/i and /g). The This is a pattern line whose modifier list starts with two one-lette r modifiers (/i and /g). The
lower-case abbreviated modifiers are the same as used in Perl. lower-case abbreviated modifiers are the same as used in Perl.
PATTERN SYNTAX PATTERN SYNTAX
A pattern line must start with one of the following characters (co mmon symbols, excluding pat‐ A pattern line must start with one of the following characters (comm on symbols, excluding pat‐
tern meta-characters): tern meta-characters):
/ ! " ' ` - = _ : ; , % & @ ~ / ! " ' ` - = _ : ; , % & @ ~
This is interpreted as the pattern's delimiter. A regular expression may be continued over sev‐ This is interpreted as the pattern's delimiter. A regular expressio n may be continued over sev‐
eral input lines, in which case the newline characters are included within it. It is possible to eral input lines, in which case the newline characters are included within it. It is possible to
include the delimiter as a literal within the pattern by escaping it with a backslash, for exam‐ include the delimiter as a literal within the pattern by escaping it with a backslash, for exam‐
ple ple
/abc\/def/ /abc\/def/
If you do this, the escape and the delimiter form part of the patte rn, but since the delimiters If you do this, the escape and the delimiter form part of the patter n, but since the delimiters
are all non-alphanumeric, the inclusion of the backslash does not af fect the pattern's interpre‐ are all non-alphanumeric, the inclusion of the backslash does not af fect the pattern's interpre‐
tation. Note, however, that this trick does not work within \Q...\E tation. Note, however, that this trick does not work within \Q...\
literal bracketing because E literal bracketing because
the backslash will itself be interpreted as a literal. If the term the backslash will itself be interpreted as a literal. If the termin
inating delimiter is immedi‐ ating delimiter is immedi‐
ately followed by a backslash, for example, ately followed by a backslash, for example,
/abc/\ /abc/\
a backslash is added to the end of the pattern. This is done to prov ide a way of testing the er‐ a backslash is added to the end of the pattern. This is done to prov ide a way of testing the er‐
ror condition that arises if a pattern finishes with a backslash, be cause ror condition that arises if a pattern finishes with a backslash, be cause
/abc\/ /abc\/
is interpreted as the first line of a pattern that starts with "abc/ ", causing pcre2test to read is interpreted as the first line of a pattern that starts with "abc/ ", causing pcre2test to read
the next line as a continuation of the regular expression. the next line as a continuation of the regular expression.
A pattern can be followed by a modifier list (details below). A pattern can be followed by a modifier list (details below).
SUBJECT LINE SYNTAX SUBJECT LINE SYNTAX
Before each subject line is passed to pcre2_match(), pcre2_dfa_matc Before each subject line is passed to pcre2_match(), pcre2_dfa_ma
h(), or pcre2_jit_match(), tch(), or pcre2_jit_match(),
leading and trailing white space is removed, and the line is scanne leading and trailing white space is removed, and the line is scanned
d for backslash escapes, un‐ for backslash escapes, un‐
less the subject_literal modifier was set for the pattern. The follo less the subject_literal modifier was set for the pattern. The foll
wing provide a means of en‐ owing provide a means of en‐
coding non-printing characters in a visible way: coding non-printing characters in a visible way:
\a alarm (BEL, \x07) \a alarm (BEL, \x07)
\b backspace (\x08) \b backspace (\x08)
\e escape (\x27) \e escape (\x27)
\f form feed (\x0c) \f form feed (\x0c)
\n newline (\x0a) \n newline (\x0a)
\r carriage return (\x0d) \N{U+hh...} unicode character (any number of hex digits)
\t tab (\x09) \r carriage return (\x0d)
\v vertical tab (\x0b) \t tab (\x09)
\nnn octal character (up to 3 octal digits); always \v vertical tab (\x0b)
a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode \ddd octal number (up to 3 octal digits); represent a singl
\o{dd...} octal character (any number of octal digits} e
\xhh hexadecimal byte (up to 2 hex digits) code point unless larger than 255 with the 8-bit lib
\x{hh...} hexadecimal character (any number of hex digits) rary
\o{dd...} octal number (any number of octal digits} representing
The use of \x{hh...} is not dependent on the use of the utf modifie a
r on the pattern. It is rec‐ character in UTF mode or a code point
ognized always. There may be any number of hexadecimal digits inside \xhh hexadecimal byte (up to 2 hex digits)
the braces; invalid values \x{hh...} hexadecimal number (up to 8 hex digits) representing a
provoke error messages. character in UTF mode or a code point
Note that \xhh specifies one byte rather than one character in UTF- Invoking \N{U+hh...} or \x{hh...} doesn't require the use of the utf
8 mode; this makes it possi‐ modifier on the pattern. It
ble to construct invalid UTF-8 sequences for testing purposes. On th is always recognized. There may be any number of hexadecimal digits
e other hand, \x{hh} is in‐ inside the braces; invalid
terpreted as a UTF-8 character in UTF-8 mode, generating more t values provoke error messages but when using \N{U+hh...} with som
han one byte if the value is e invalid unicode characters
greater than 127. When testing the 8-bit library not in UTF-8 mode, they will be accepted with a warning instead.
\x{hh} generates one byte
for values less than 256, and causes an error for greater values. Note that even in UTF-8 mode, \xhh (and depending of how large, \ddd
) describe one byte rather
than one character; this makes it possible to construct invalid UTF-
8 sequences for testing pur‐
poses. On the other hand, \x{hh...} is interpreted as a UTF-8 charac
ter in UTF-8 mode, only gen‐
erating more than one byte if the value is greater than 127. To av
oid the ambiguity it is pre‐
ferred to use \N{U+hh...} when describing characters. When testing
the 8-bit library not in
UTF-8 mode, \x{hh} generates one byte for values that could fit on
it, and causes an error for
greater values.
In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This ma When testing the 16-bit library, not in UTF-16 mode, all 4-digit \x{
kes it possible to construct hhhh} values are accepted.
invalid UTF-16 sequences for testing purposes. This makes it possible to construct invalid UTF-16 sequences for tes
ting purposes.
In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This When testing the 32-bit library, not in UTF-32 mode, all 4 to 8-
makes it possible to con‐ digit \x{...} values are ac‐
struct invalid UTF-32 sequences for testing purposes. cepted. This makes it possible to construct invalid UTF-32 sequences
for testing purposes.
There is a special backslash sequence that specifies replication of one or more characters: There is a special backslash sequence that specifies replication of one or more characters:
\[<characters>]{<count>} \[<characters>]{<count>}
This makes it possible to test long strings without having to provi de them as part of the file. This makes it possible to test long strings without having to provid e them as part of the file.
For example: For example:
\[abc]{4} \[abc]{4}
is converted to "abcabcabcabc". This feature does not support nesti ng. To include a closing is converted to "abcabcabcabc". This feature does not support n esting. To include a closing
square bracket in the characters, code it as \x5D. square bracket in the characters, code it as \x5D.
A backslash followed by an equals sign marks the end of the subje ct string and the start of a A backslash followed by an equals sign marks the end of the subject string and the start of a
modifier list. For example: modifier list. For example:
abc\=notbol,notempty abc\=notbol,notempty
If the subject string is empty and \= is followed by whitespace, the line is treated as a com‐ If the subject string is empty and \= is followed by whitespace, t he line is treated as a com‐
ment line, and is not used for matching. For example: ment line, and is not used for matching. For example:
\= This is a comment. \= This is a comment.
abc\= This is an invalid modifier list. abc\= This is an invalid modifier list.
A backslash followed by any other non-alphanumeric character ju A backslash followed by any other non-alphanumeric character just
st escapes that character. A escapes that character. A
backslash followed by anything else causes an error. However, if the backslash followed by anything else causes an error. However, if th
very last character in the e very last character in the
line is a backslash (and there is no modifier list), it is ignored. line is a backslash (and there is no modifier list), it is ignored.
This gives a way of passing This gives a way of passing
an empty line as data, since a real empty line terminates the data i nput. an empty line as data, since a real empty line terminates the data i nput.
If the subject_literal modifier is set for a pattern, all subject li nes that follow are treated If the subject_literal modifier is set for a pattern, all subject l ines that follow are treated
as literals, with no special treatment of backslashes. No replicati on is possible, and any sub‐ as literals, with no special treatment of backslashes. No replicati on is possible, and any sub‐
ject modifiers must be set as defaults by a #subject command. ject modifiers must be set as defaults by a #subject command.
PATTERN MODIFIERS PATTERN MODIFIERS
There are several types of modifier that can appear in pattern line s. Except where noted below, There are several types of modifier that can appear in pattern lines . Except where noted below,
they may also be used in #pattern commands. A pattern's modifier lis t can add to or override de‐ they may also be used in #pattern commands. A pattern's modifier lis t can add to or override de‐
fault modifiers that were set by a previous #pattern command. fault modifiers that were set by a previous #pattern command.
Setting compilation options Setting compilation options
The following modifiers set options for pcre2_compile(). Most of the The following modifiers set options for pcre2_compile(). Most of
m set bits in the options them set bits in the options
argument of that function, but those whose names start with PCRE2_ argument of that function, but those whose names start with PCRE2_EX
EXTRA are additional options TRA are additional options
that are set in the compile context. Some of these options have s that are set in the compile context. Some of these options have
ingle-letter abbreviations. single-letter abbreviations.
There is special handling for /x: if a second x is present, PCRE There is special handling for /x: if a second x is present, PCRE2_E
2_EXTENDED is converted into XTENDED is converted into
PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EXTEN PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_E
DED as well, though this XTENDED as well, though this
makes no difference to the way pcre2_compile() behaves. See pcre2 makes no difference to the way pcre2_compile() behaves. See pcre2api
api for a description of the for a description of the
effects of these options. effects of these options.
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCA PES allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCA PES
alt_bsux set PCRE2_ALT_BSUX alt_bsux set PCRE2_ALT_BSUX
alt_circumflex set PCRE2_ALT_CIRCUMFLEX alt_circumflex set PCRE2_ALT_CIRCUMFLEX
alt_extended_class set PCRE2_ALT_EXTENDED_CLASS
alt_verbnames set PCRE2_ALT_VERBNAMES alt_verbnames set PCRE2_ALT_VERBNAMES
anchored set PCRE2_ANCHORED anchored set PCRE2_ANCHORED
/a ascii_all set all ASCII options /a ascii_all set all ASCII options
ascii_bsd set PCRE2_EXTRA_ASCII_BSD ascii_bsd set PCRE2_EXTRA_ASCII_BSD
ascii_bss set PCRE2_EXTRA_ASCII_BSS ascii_bss set PCRE2_EXTRA_ASCII_BSS
ascii_bsw set PCRE2_EXTRA_ASCII_BSW ascii_bsw set PCRE2_EXTRA_ASCII_BSW
ascii_digit set PCRE2_EXTRA_ASCII_DIGIT ascii_digit set PCRE2_EXTRA_ASCII_DIGIT
ascii_posix set PCRE2_EXTRA_ASCII_POSIX ascii_posix set PCRE2_EXTRA_ASCII_POSIX
auto_callout set PCRE2_AUTO_CALLOUT auto_callout set PCRE2_AUTO_CALLOUT
bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERA L bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERA L
skipping to change at line 491 skipping to change at line 508
/xx extended_more set PCRE2_EXTENDED_MORE /xx extended_more set PCRE2_EXTENDED_MORE
extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
firstline set PCRE2_FIRSTLINE firstline set PCRE2_FIRSTLINE
literal set PCRE2_LITERAL literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE match_line set PCRE2_EXTRA_MATCH_LINE
match_invalid_utf set PCRE2_MATCH_INVALID_UTF match_invalid_utf set PCRE2_MATCH_INVALID_UTF
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
match_word set PCRE2_EXTRA_MATCH_WORD match_word set PCRE2_EXTRA_MATCH_WORD
/m multiline set PCRE2_MULTILINE /m multiline set PCRE2_MULTILINE
never_backslash_c set PCRE2_NEVER_BACKSLASH_C never_backslash_c set PCRE2_NEVER_BACKSLASH_C
never_callout set PCRE2_EXTRA_NEVER_CALLOUT
never_ucp set PCRE2_NEVER_UCP never_ucp set PCRE2_NEVER_UCP
never_utf set PCRE2_NEVER_UTF never_utf set PCRE2_NEVER_UTF
/n no_auto_capture set PCRE2_NO_AUTO_CAPTURE /n no_auto_capture set PCRE2_NO_AUTO_CAPTURE
no_auto_possess set PCRE2_NO_AUTO_POSSESS no_auto_possess set PCRE2_NO_AUTO_POSSESS
no_bs0 set PCRE2_EXTRA_NO_BS0
no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR
no_start_optimize set PCRE2_NO_START_OPTIMIZE no_start_optimize set PCRE2_NO_START_OPTIMIZE
no_utf_check set PCRE2_NO_UTF_CHECK no_utf_check set PCRE2_NO_UTF_CHECK
python_octal set PCRE2_EXTRA_PYTHON_OCTAL
turkish_casing set PCRE2_EXTRA_TURKISH_CASING
ucp set PCRE2_UCP ucp set PCRE2_UCP
ungreedy set PCRE2_UNGREEDY ungreedy set PCRE2_UNGREEDY
use_offset_limit set PCRE2_USE_OFFSET_LIMIT use_offset_limit set PCRE2_USE_OFFSET_LIMIT
utf set PCRE2_UTF utf set PCRE2_UTF
As well as turning on the PCRE2_UTF option, the utf modifier causes As well as turning on the PCRE2_UTF option, the utf modifier causes
all non-printing characters all non-printing characters
in output strings to be printed using the \x{hh...} notation. Othe in output strings to be printed using the \x{hh...} notation. Otherw
rwise, those less than 0x100 ise, those less than 0x100
are output in hex without the curly brackets. Setting utf in 16-bit are output in hex without the curly brackets. Setting utf in 16-bi
or 32-bit mode also causes t or 32-bit mode also causes
pattern and subject strings to be translated to UTF-16 or UTF-32 pattern and subject strings to be translated to UTF-16 or UTF-32,
, respectively, before being respectively, before being
passed to library functions. passed to library functions.
The following modifiers enable or disable performance optimization
s by calling pcre2_set_opti‐
mize() before invoking the regex compiler.
optimization_full enable all optional optimizations
optimization_none disable all optional optimizations
auto_possess auto-possessify variable quantifiers
auto_possess_off don't auto-possessify variable quantifi
ers
dotstar_anchor anchor patterns starting with .*
dotstar_anchor_off don't anchor patterns starting with .*
start_optimize enable pre-scan of subject string
start_optimize_off disable pre-scan of subject string
See the pcre2_set_optimize documentation for details on these optimi
zations.
Setting compilation controls Setting compilation controls
The following modifiers affect the compilation process or request in formation about the pattern. The following modifiers affect the compilation process or request in formation about the pattern.
There are single-letter abbreviations for some that are heavily used in the test files. There are single-letter abbreviations for some that are heavily used in the test files.
bsr=[anycrlf|unicode] specify \R handling
/B bincode show binary code without lengths /B bincode show binary code without lengths
bsr=[anycrlf|unicode] specify \R handling
callout_info show callout information callout_info show callout information
convert=<options> request foreign pattern conversion convert=<options> request foreign pattern conversion
convert_glob_escape=c set glob escape character convert_glob_escape=c set glob escape character
convert_glob_separator=c set glob separator character convert_glob_separator=c set glob separator character
convert_length set convert buffer length convert_length set convert buffer length
debug same as info,fullbincode debug same as info,fullbincode
expand expand repetition syntax in pattern
framesize show matching frame size framesize show matching frame size
fullbincode show binary code with lengths fullbincode show binary code with lengths
/I info show info about compiled pattern /I info show info about compiled pattern
hex unquoted characters are hexadecimal hex unquoted characters are hexadecimal
jit[=<number>] use JIT jit[=<number>] use JIT
jitfast use JIT fast path jitfast use JIT fast path
jitverify verify JIT use jitverify verify JIT use
locale=<name> use this locale locale=<name> use this locale
max_pattern_compiled ) set maximum compiled pattern max_pattern_compiled ) set maximum compiled pattern
_length=<n> ) length (bytes) _length=<n> ) length (bytes)
skipping to change at line 543 skipping to change at line 579
max_varlookbehind=<n> set maximum variable lookbehind leng th max_varlookbehind=<n> set maximum variable lookbehind leng th
memory show memory used memory show memory used
newline=<type> set newline type newline=<type> set newline type
null_context compile with a NULL context null_context compile with a NULL context
null_pattern pass pattern as NULL null_pattern pass pattern as NULL
parens_nest_limit=<n> set maximum parentheses depth parens_nest_limit=<n> set maximum parentheses depth
posix use the POSIX API posix use the POSIX API
posix_nosub use the POSIX API with REG_NOSUB posix_nosub use the POSIX API with REG_NOSUB
push push compiled pattern onto the stack push push compiled pattern onto the stack
pushcopy push a copy onto the stack pushcopy push a copy onto the stack
pushtablescopy push a copy with tables onto the sta ck
stackguard=<number> test the stackguard feature stackguard=<number> test the stackguard feature
subject_literal treat all subject lines as literal subject_literal treat all subject lines as literal
tables=[0|1|2|3] select internal tables tables=[0|1|2|3] select internal tables
use_length do not zero-terminate the pattern use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8 utf8_input treat input as UTF-8
The effects of these modifiers are described in the following sectio ns. The effects of these modifiers are described in the following sectio ns.
Newline and \R handling Newline and \R handling
skipping to change at line 858 skipping to change at line 895
allvector show the entire ovector allvector show the entire ovector
allusedtext show all consulted text allusedtext show all consulted text
altglobal alternative global matching altglobal alternative global matching
/g global global matching /g global global matching
heapframes_size show match data heapframes size heapframes_size show match data heapframes size
jitstack=<n> set size of JIT stack jitstack=<n> set size of JIT stack
mark show mark values mark show mark values
replace=<string> specify a replacement string replace=<string> specify a replacement string
startchar show starting character when relev ant startchar show starting character when relev ant
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_case_callout use substitution case callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENG TH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENG TH
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY
substitute_skip=<n> skip substitution <n> substitute_skip=<n> skip substitution <n>
substitute_stop=<n> skip substitution <n> and followin g substitute_stop=<n> skip substitution <n> and followin g
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
skipping to change at line 925 skipping to change at line 963
The convert_glob_escape and convert_glob_separator modifiers can be used to specify the escape The convert_glob_escape and convert_glob_separator modifiers can be used to specify the escape
and separator characters for glob processing, overriding the defaul ts, which are operating-sys‐ and separator characters for glob processing, overriding the defaul ts, which are operating-sys‐
tem dependent. tem dependent.
SUBJECT MODIFIERS SUBJECT MODIFIERS
The modifiers that can appear in subject lines and the #subject comm and are of two types. The modifiers that can appear in subject lines and the #subject comm and are of two types.
Setting match options Setting match options
The following modifiers set options for pcre2_match() or pcre2_dfa_m atch(). See pcreapi for a The following modifiers set options for pcre2_match() or pcre2_dfa_m atch(). See pcre2api for a
description of their effects. description of their effects.
anchored set PCRE2_ANCHORED anchored set PCRE2_ANCHORED
copy_matched_subject set PCRE2_COPY_MATCHED_SUBJECT
endanchored set PCRE2_ENDANCHORED endanchored set PCRE2_ENDANCHORED
dfa_restart set PCRE2_DFA_RESTART dfa_restart set PCRE2_DFA_RESTART
dfa_shortest set PCRE2_DFA_SHORTEST dfa_shortest set PCRE2_DFA_SHORTEST
disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK
no_jit set PCRE2_NO_JIT no_jit set PCRE2_NO_JIT
no_utf_check set PCRE2_NO_UTF_CHECK no_utf_check set PCRE2_NO_UTF_CHECK
notbol set PCRE2_NOTBOL notbol set PCRE2_NOTBOL
notempty set PCRE2_NOTEMPTY notempty set PCRE2_NOTEMPTY
notempty_atstart set PCRE2_NOTEMPTY_ATSTART notempty_atstart set PCRE2_NOTEMPTY_ATSTART
noteol set PCRE2_NOTEOL noteol set PCRE2_NOTEOL
skipping to change at line 972 skipping to change at line 1011
Setting match controls Setting match controls
The following modifiers affect the matching process or request addit ional information. Some of The following modifiers affect the matching process or request addit ional information. Some of
them may also be specified on a pattern line (see above), in which c ase they apply to every sub‐ them may also be specified on a pattern line (see above), in which c ase they apply to every sub‐
ject line that is matched against that pattern, but can be overrid den by modifiers on the sub‐ ject line that is matched against that pattern, but can be overrid den by modifiers on the sub‐
ject. ject.
aftertext show text after match aftertext show text after match
allaftertext show text after captures allaftertext show text after captures
allcaptures show all captures allcaptures show all captures
allvector show the entire ovector
allusedtext show all consulted text (non-JIT on ly) allusedtext show all consulted text (non-JIT on ly)
allvector show the entire ovector
altglobal alternative global matching altglobal alternative global matching
callout_capture show captures at callout time callout_capture show captures at callout time
callout_data=<n> set a value to pass via callouts callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error callout_error=<n>[:<m>] control callout error
callout_extra show extra callout information callout_extra show extra callout information
callout_fail=<n>[:<m>] control callout failure callout_fail=<n>[:<m>] control callout failure
callout_no_where do not show position of a callout callout_no_where do not show position of a callout
callout_none do not supply a callout function callout_none do not supply a callout function
copy=<number or name> copy captured substring copy=<number or name> copy captured substring
depth_limit=<n> set a depth limit depth_limit=<n> set a depth limit
skipping to change at line 1007 skipping to change at line 1046
null_replacement substitute with NULL replacement null_replacement substitute with NULL replacement
null_subject match with NULL subject null_subject match with NULL subject
offset=<n> set starting offset offset=<n> set starting offset
offset_limit=<n> set offset limit offset_limit=<n> set offset limit
ovector=<n> set size of output vector ovector=<n> set size of output vector
recursion_limit=<n> obsolete synonym for depth_limit recursion_limit=<n> obsolete synonym for depth_limit
replace=<string> specify a replacement string replace=<string> specify a replacement string
startchar show startchar when relevant startchar show startchar when relevant
startoffset=<n> same as offset=<n> startoffset=<n> same as offset=<n>
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED substitute_case_callout use substitution case callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGT H substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGT H
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY
substitute_skip=<n> skip substitution number n substitute_skip=<n> skip substitution number n
substitute_stop=<n> skip substitution number n and grea ter substitute_stop=<n> skip substitution number n and grea ter
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
zero_terminate pass the subject as zero-terminated zero_terminate pass the subject as zero-terminated
skipping to change at line 1236 skipping to change at line 1276
1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED" 1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
2(1) Old 6 9 "abc" New 6 11 "<abc>" 2(1) Old 6 9 "abc" New 6 11 "<abc>"
2: abcdef<abc>pqr 2: abcdef<abc>pqr
abcdefabcpqr\=substitute_stop=1 abcdefabcpqr\=substitute_stop=1
1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED" 1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
1: abcdefabcpqr 1: abcdefabcpqr
If both are set for the same number, stop takes precedence. Only a s ingle skip or stop is sup‐ If both are set for the same number, stop takes precedence. Only a s ingle skip or stop is sup‐
ported, which is sufficient for testing that the feature works. ported, which is sufficient for testing that the feature works.
Testing substitute case callouts
If the substitute_case_callout modifier is set, a substitution case
callout function is set up.
The callout function is called for each substituted chunk which is t
o be case-transformed.
The callout function passed is a fixed function with implementation
for certain behaviours: in‐
puts which shrink when case-transformed; inputs which grow; inputs w
ith distinct upper/lower/ti‐
tlecase forms. The characters which are not special-cased for testi
ng purposes are left unmodi‐
fied, as if they are caseless characters.
Setting the JIT stack size Setting the JIT stack size
The jitstack modifier provides a way of setting the maximum stack si ze that is used by the just- The jitstack modifier provides a way of setting the maximum stack si ze that is used by the just-
in-time optimization code. It is ignored if JIT optimization is no t being used. The value is a in-time optimization code. It is ignored if JIT optimization is not being used. The value is a
number of kibibytes (units of 1024 bytes). Setting zero reverts to t he default of 32KiB. Provid‐ number of kibibytes (units of 1024 bytes). Setting zero reverts to t he default of 32KiB. Provid‐
ing a stack that is larger than the default is necessary only for ve ry complicated patterns. If ing a stack that is larger than the default is necessary only for v ery complicated patterns. If
jitstack is set non-zero on a subject line it overrides any value th at was set on the pattern. jitstack is set non-zero on a subject line it overrides any value th at was set on the pattern.
Setting heap, match, and depth limits Setting heap, match, and depth limits
The heap_limit, match_limit, and depth_limit modifiers set the app The heap_limit, match_limit, and depth_limit modifiers set the appro
ropriate limits in the match priate limits in the match
context. These values are ignored when the find_limits or find_limit context. These values are ignored when the find_limits or find_limi
s_noheap modifier is speci‐ ts_noheap modifier is speci‐
fied. fied.
Finding minimum limits Finding minimum limits
If the find_limits modifier is present on a subject line, pcre2test If the find_limits modifier is present on a subject line, pcre2test
calls the relevant matching calls the relevant matching
function several times, setting different values in th function several times, setting different values in
e match context via the match context via
pcre2_set_heap_limit(), pcre2_set_match_limit(), or pcre2_set_dept pcre2_set_heap_limit(), pcre2_set_match_limit(), or pcre2_set_depth_
h_limit() until it finds the limit() until it finds the
smallest value for each parameter that allows the match to complete smallest value for each parameter that allows the match to complet
without a "limit exceeded" e without a "limit exceeded"
error. The match itself may succeed or fail. An alternative modifie error. The match itself may succeed or fail. An alternative modifier
r, find_limits_noheap, omits , find_limits_noheap, omits
the heap limit. This is used in the standard tests, because the mini the heap limit. This is used in the standard tests, because the m
mum heap limit varies be‐ inimum heap limit varies be‐
tween systems. If JIT is being used, only the match limit is relevan t, and the other two are au‐ tween systems. If JIT is being used, only the match limit is relevan t, and the other two are au‐
tomatically omitted. tomatically omitted.
When using this modifier, the pattern should not contain an y limit settings such as When using this modifier, the pattern should not contain an y limit settings such as
(*LIMIT_MATCH=...) within it. If such a setting is present and is lo (*LIMIT_MATCH=...) within it. If such a setting is present and is l
wer than the minimum match‐ ower than the minimum match‐
ing value, the minimum value cannot be found because pcre2_set_matc ing value, the minimum value cannot be found because pcre2_set_match
h_limit() etc. are only able _limit() etc. are only able
to reduce the value of an in-pattern limit; they cannot increase it. to reduce the value of an in-pattern limit; they cannot increase it.
For non-DFA matching, the minimum depth_limit number is a measure of For non-DFA matching, the minimum depth_limit number is a measure o
how much nested backtrack‐ f how much nested backtrack‐
ing happens (that is, how deeply the pattern's tree is searched). ing happens (that is, how deeply the pattern's tree is searched). In
In the case of DFA matching, the case of DFA matching,
depth_limit controls the depth of recursive calls of the internal fu nction that is used for han‐ depth_limit controls the depth of recursive calls of the internal fu nction that is used for han‐
dling pattern recursion, lookaround assertions, and atomic groups. dling pattern recursion, lookaround assertions, and atomic groups.
For non-DFA matching, the match_limit number is a measure of the am For non-DFA matching, the match_limit number is a measure of the
ount of backtracking that amount of backtracking that
takes place, and learning the minimum value can be instructive. takes place, and learning the minimum value can be instructive. For
For most simple matches, the most simple matches, the
number is quite small, but for patterns with very large numbers of number is quite small, but for patterns with very large numbers o
matching possibilities, it f matching possibilities, it
can become large very quickly with increasing length of subjec can become large very quickly with increasing length of subject str
t string. In the case of DFA ing. In the case of DFA
matching, match_limit controls the total number of calls, both recur matching, match_limit controls the total number of calls, both rec
sive and non-recursive, to ursive and non-recursive, to
the internal matching function, thus controlling the overall amoun the internal matching function, thus controlling the overall amount
t of computing resource that of computing resource that
is used. is used.
For both kinds of matching, the heap_limit number, which is in kibib ytes (units of 1024 bytes), For both kinds of matching, the heap_limit number, which is in kibi bytes (units of 1024 bytes),
limits the amount of heap memory used for matching. limits the amount of heap memory used for matching.
Showing MARK names Showing MARK names
The mark modifier causes the names from backtracking control verbs The mark modifier causes the names from backtracking control verbs t
that are returned from calls hat are returned from calls
to pcre2_match() to be displayed. If a mark is returned for a mat to pcre2_match() to be displayed. If a mark is returned for a
ch, non-match, or partial match, non-match, or partial
match, pcre2test shows it. For a match, it is on a line by itsel match, pcre2test shows it. For a match, it is on a line by itself,
f, tagged with "MK:". Other‐ tagged with "MK:". Other‐
wise, it is added to the non-match message. wise, it is added to the non-match message.
Showing memory usage Showing memory usage
The memory modifier causes pcre2test to log the sizes of all heap me The memory modifier causes pcre2test to log the sizes of all heap m
mory allocation and freeing emory allocation and freeing
calls that occur during a call to pcre2_match() or pcre2_dfa_match calls that occur during a call to pcre2_match() or pcre2_dfa_match()
(). In the latter case, heap . In the latter case, heap
memory is used only when a match requires more internal workspace th at the default allocation on memory is used only when a match requires more internal workspace th at the default allocation on
the stack, so in many cases there will be no output. No heap memory is allocated during matching the stack, so in many cases there will be no output. No heap memory is allocated during matching
with JIT. For this modifier to work, the null_context modifier must not be set on both the pat‐ with JIT. For this modifier to work, the null_context modifier must not be set on both the pat‐
tern and the subject, though it can be set on one or the other. tern and the subject, though it can be set on one or the other.
Showing the heap frame overall vector size Showing the heap frame overall vector size
The heapframes_size modifier is relevant for matches using pcre2_ The heapframes_size modifier is relevant for matches using pcre2_mat
match() without JIT. After a ch() without JIT. After a
match has run (whether successful or not) the size, in bytes, of the match has run (whether successful or not) the size, in bytes, of th
allocated heap frames vec‐ e allocated heap frames vec‐
tor that is left attached to the match data block is shown. If the m atching action involved sev‐ tor that is left attached to the match data block is shown. If the m atching action involved sev‐
eral calls to pcre2_match() (for example, global matching or for tim ing) only the final value is eral calls to pcre2_match() (for example, global matching or for tim ing) only the final value is
shown. shown.
This modifier is ignored, with a warning, for POSIX or DFA matchin This modifier is ignored, with a warning, for POSIX or DFA matching.
g. JIT matching does not use JIT matching does not use
the heap frames vector, so the size is always zero, unless there was the heap frames vector, so the size is always zero, unless there w
a previous non-JIT match. as a previous non-JIT match.
Note that specifing a size of zero for the output vector (see bel Note that specifing a size of zero for the output vector (see below)
ow) causes pcre2test to free causes pcre2test to free
its match data block (and associated heap frames vector) and allocat e a new one. its match data block (and associated heap frames vector) and allocat e a new one.
Setting a starting offset Setting a starting offset
The offset modifier sets an offset in the subject string at which ma tching starts. Its value is The offset modifier sets an offset in the subject string at which m atching starts. Its value is
a number of code units, not characters. a number of code units, not characters.
Setting an offset limit Setting an offset limit
The offset_limit modifier sets a limit for unanchored matches. If a match cannot be found start‐ The offset_limit modifier sets a limit for unanchored matches. If a match cannot be found start‐
ing at or before this offset in the subject, a "no match" return i ing at or before this offset in the subject, a "no match" return is
s given. The data value is a given. The data value is a
number of code units, not characters. When this modifier is used, th number of code units, not characters. When this modifier is used, t
e use_offset_limit modifier he use_offset_limit modifier
must have been set for the pattern; if not, an error is generated. must have been set for the pattern; if not, an error is generated.
Setting the size of the output vector Setting the size of the output vector
The ovector modifier applies only to the subject line in which it The ovector modifier applies only to the subject line in which it ap
appears, though of course it pears, though of course it
can also be used to set a default in a #subject command. It specifie can also be used to set a default in a #subject command. It spec
s the number of pairs of ifies the number of pairs of
offsets that are available for storing matching information. The def ault is 15. offsets that are available for storing matching information. The def ault is 15.
A value of zero is useful when testing the POSIX API because it c A value of zero is useful when testing the POSIX API because it caus
auses regexec() to be called es regexec() to be called
with a NULL capture vector. When not testing the POSIX API, a value with a NULL capture vector. When not testing the POSIX API, a va
of zero is used to cause lue of zero is used to cause
pcre2_match_data_create_from_pattern() to be called, in order to cre ate a new match block of ex‐ pcre2_match_data_create_from_pattern() to be called, in order to cre ate a new match block of ex‐
actly the right size for the pattern. (It is not possible to creat actly the right size for the pattern. (It is not possible to create
e a match block with a zero- a match block with a zero-
length ovector; there is always at least one pair of offsets.) The length ovector; there is always at least one pair of offsets.)
old match data block is The old match data block is
freed. freed.
Passing the subject as zero-terminated Passing the subject as zero-terminated
By default, the subject string is passed to a native API matchi By default, the subject string is passed to a native API matching
ng function with its correct function with its correct
length. In order to test the facility for passing a zero-terminated length. In order to test the facility for passing a zero-terminate
string, the zero_terminate d string, the zero_terminate
modifier is provided. It causes the length to be passed as PCRE2_ZE modifier is provided. It causes the length to be passed as PCRE2_ZER
RO_TERMINATED. When matching O_TERMINATED. When matching
via the POSIX interface, this modifier is ignored, with a warning. via the POSIX interface, this modifier is ignored, with a warning.
When testing pcre2_substitute(), this modifier also has the effect o f passing the replacement When testing pcre2_substitute(), this modifier also has the effec t of passing the replacement
string as zero-terminated. string as zero-terminated.
Passing a NULL context, subject, or replacement Passing a NULL context, subject, or replacement
Normally, pcre2test passes a context block to pcre2_m Normally, pcre2test passes a context block to pcre2_ma
atch(), pcre2_dfa_match(), tch(), pcre2_dfa_match(),
pcre2_jit_match() or pcre2_substitute(). If the null_context modifi pcre2_jit_match() or pcre2_substitute(). If the null_context modif
er is set, however, NULL is ier is set, however, NULL is
passed. This is for testing that the matching and substitution f passed. This is for testing that the matching and substitution func
unctions behave correctly in tions behave correctly in
this case (they use default values). This modifier cannot be u this case (they use default values). This modifier cannot b
sed with the find_limits, e used with the find_limits,
find_limits_noheap, or substitute_callout modifiers. find_limits_noheap, or substitute_callout modifiers.
Similarly, for testing purposes, if the null_subject or null_repl Similarly, for testing purposes, if the null_subject or null_replace
acement modifier is set, the ment modifier is set, the
subject or replacement string pointers are passed as NULL, respectiv subject or replacement string pointers are passed as NULL, respect
ely, to the relevant func‐ ively, to the relevant func‐
tions. tions.
THE ALTERNATIVE MATCHING FUNCTION THE ALTERNATIVE MATCHING FUNCTION
By default, pcre2test uses the standard PCRE2 matching function, pcre2_match() to match each By default, pcre2test uses the standard PCRE2 matching function, pc re2_match() to match each
subject line. PCRE2 also supports an alternative matching function, pcre2_dfa_match(), which op‐ subject line. PCRE2 also supports an alternative matching function, pcre2_dfa_match(), which op‐
erates in a different way, and has some restrictions. The difference s between the two functions erates in a different way, and has some restrictions. The differenc es between the two functions
are described in the pcre2matching documentation. are described in the pcre2matching documentation.
If the dfa modifier is set, the alternative matching function is us ed. This function finds all If the dfa modifier is set, the alternative matching function is use d. This function finds all
possible matches at a given point in the subject. If, however, the d fa_shortest modifier is set, possible matches at a given point in the subject. If, however, the d fa_shortest modifier is set,
processing stops after the first match is found. This is always the shortest possible match. processing stops after the first match is found. This is always the shortest possible match.
DEFAULT OUTPUT FROM pcre2test DEFAULT OUTPUT FROM pcre2test
This section describes the output when the normal matching function , pcre2_match(), is being This section describes the output when the normal matching funct ion, pcre2_match(), is being
used. used.
When a match succeeds, pcre2test outputs the list of captured substr ings, starting with number 0 When a match succeeds, pcre2test outputs the list of captured substr ings, starting with number 0
for the string that matched the whole pattern. Otherwise, it output s "No match" when the return for the string that matched the whole pattern. Otherwise, it output s "No match" when the return
is PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the parti is PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partiall
ally matching substring when y matching substring when
the return is PCRE2_ERROR_PARTIAL. (Note that this is the entire sub the return is PCRE2_ERROR_PARTIAL. (Note that this is the entire
string that was inspected substring that was inspected
during the partial match; it may include characters before the act during the partial match; it may include characters before the actua
ual match start if a lookbe‐ l match start if a lookbe‐
hind assertion, \K, \b, or \B was involved.) hind assertion, \K, \b, or \B was involved.)
For any other return, pcre2test outputs the PCRE2 negative error num For any other return, pcre2test outputs the PCRE2 negative error nu
ber and a short descriptive mber and a short descriptive
phrase. If the error is a failed UTF string check, the code uni phrase. If the error is a failed UTF string check, the code unit off
t offset of the start of the set of the start of the
failing character is also output. Here is an example of an interacti ve pcre2test run. failing character is also output. Here is an example of an interacti ve pcre2test run.
$ pcre2test $ pcre2test
PCRE2 version 10.22 2016-07-29 PCRE2 version 10.22 2016-07-29
re> /^abc(\d+)/ re> /^abc(\d+)/
data> abc123 data> abc123
0: abc123 0: abc123
1: 123 1: 123
data> xyz data> xyz
No match No match
Unset capturing substrings that are not followed by one that is set Unset capturing substrings that are not followed by one that is se
are not shown by pcre2test t are not shown by pcre2test
unless the allcaptures modifier is specified. In the following exam unless the allcaptures modifier is specified. In the following examp
ple, there are two capturing le, there are two capturing
substrings, but when the first data line is matched, the second, uns substrings, but when the first data line is matched, the second, u
et substring is not shown. nset substring is not shown.
An "internal" unset substring is shown as "<unset>", as for the seco nd data line. An "internal" unset substring is shown as "<unset>", as for the seco nd data line.
re> /(a)|(b)/ re> /(a)|(b)/
data> a data> a
0: a 0: a
1: a 1: a
data> b data> b
0: b 0: b
1: <unset> 1: <unset>
2: b 2: b
If the strings contain any non-printing characters, they are output as \xhh escapes if the value If the strings contain any non-printing characters, they are output as \xhh escapes if the value
is less than 256 and UTF mode is not set. Otherwise they are outp is less than 256 and UTF mode is not set. Otherwise they are output
ut as \x{hh...} escapes. See as \x{hh...} escapes. See
below for the definition of non-printing characters. If the aftertex below for the definition of non-printing characters. If the afterte
t modifier is set, the out‐ xt modifier is set, the out‐
put for substring 0 is followed by the rest of the subject string, i dentified by "0+" like this: put for substring 0 is followed by the rest of the subject string, i dentified by "0+" like this:
re> /cat/aftertext re> /cat/aftertext
data> cataract data> cataract
0: cat 0: cat
0+ aract 0+ aract
If global matching is requested, the results of successive matchin g attempts are output in se‐ If global matching is requested, the results of successive matching attempts are output in se‐
quence, like this: quence, like this:
re> /\Bi(\w\w)/g re> /\Bi(\w\w)/g
data> Mississippi data> Mississippi
0: iss 0: iss
1: ss 1: ss
0: iss 0: iss
1: ss 1: ss
0: ipp 0: ipp
1: pp 1: pp
"No match" is output only if the first match attempt fails. Here is an example of a failure mes‐ "No match" is output only if the first match attempt fails. Here is an example of a failure mes‐
sage (the offset 4 that is specified by the offset modifier is past the end of the subject sage (the offset 4 that is specified by the offset modifier is past the end of the subject
string): string):
re> /xyz/ re> /xyz/
data> xyz\=offset=4 data> xyz\=offset=4
Error -24 (bad offset value) Error -24 (bad offset value)
Note that whereas patterns can be continued over several lines (a plain ">" prompt is used for Note that whereas patterns can be continued over several lines (a pl ain ">" prompt is used for
continuations), subject lines may not. However newlines can be inclu ded in a subject by means of continuations), subject lines may not. However newlines can be inclu ded in a subject by means of
the \n escape (or \r, \r\n, etc., depending on the newline sequence setting). the \n escape (or \r, \r\n, etc., depending on the newline sequence setting).
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
When the alternative matching function, pcre2_dfa_match(), is used, the output consists of a When the alternative matching function, pcre2_dfa_match(), is us ed, the output consists of a
list of all the matches that start at the first point in the subject where there is at least one list of all the matches that start at the first point in the subject where there is at least one
match. For example: match. For example:
re> /(tang|tangerine|tan)/ re> /(tang|tangerine|tan)/
data> yellow tangerine\=dfa data> yellow tangerine\=dfa
0: tangerine 0: tangerine
1: tang 1: tang
2: tan 2: tan
Using the normal matching function on this data finds only "tang". Using the normal matching function on this data finds only "tang". T
The longest matching string he longest matching string
is always given first (and numbered zero). After a PCRE2_ERROR_PARTI is always given first (and numbered zero). After a PCRE2_ERROR_P
AL return, the output is ARTIAL return, the output is
"Partial match:", followed by the partially matching substring. "Partial match:", followed by the partially matching substring. Note
Note that this is the entire that this is the entire
substring that was inspected during the partial match; it may includ substring that was inspected during the partial match; it may inclu
e characters before the ac‐ de characters before the ac‐
tual match start if a lookbehind assertion, \b, or \B was involved. (\K is not supported for DFA tual match start if a lookbehind assertion, \b, or \B was involved. (\K is not supported for DFA
matching.) matching.)
If global matching is requested, the search for further match es resumes at the end of the If global matching is requested, the search for further matches re sumes at the end of the
longest match. For example: longest match. For example:
re> /(tang|tangerine|tan)/g re> /(tang|tangerine|tan)/g
data> yellow tangerine and tangy sultana\=dfa data> yellow tangerine and tangy sultana\=dfa
0: tangerine 0: tangerine
1: tang 1: tang
2: tan 2: tan
0: tang 0: tang
1: tan 1: tan
0: tan 0: tan
The alternative matching function does not support substring capture , so the modifiers that are The alternative matching function does not support substring captur e, so the modifiers that are
concerned with captured substrings are not relevant. concerned with captured substrings are not relevant.
RESTARTING AFTER A PARTIAL MATCH RESTARTING AFTER A PARTIAL MATCH
When the alternative matching function has given the PCRE2_ERROR_PAR TIAL return, indicating that When the alternative matching function has given the PCRE2_ERROR_PAR TIAL return, indicating that
the subject partially matched the pattern, you can restart the m atch with additional subject the subject partially matched the pattern, you can restart the matc h with additional subject
data by means of the dfa_restart modifier. For example: data by means of the dfa_restart modifier. For example:
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d $/ re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d $/
data> 23ja\=ps,dfa data> 23ja\=ps,dfa
Partial match: 23ja Partial match: 23ja
data> n05\=dfa,dfa_restart data> n05\=dfa,dfa_restart
0: n05 0: n05
For further information about partial matching, see the pcre2partial documentation. For further information about partial matching, see the pcre2partial documentation.
CALLOUTS CALLOUTS
If the pattern contains any callout requests, pcre2test's callout If the pattern contains any callout requests, pcre2test's callo
function is called during ut function is called during
matching unless callout_none is specified. This works with both matching unless callout_none is specified. This works with both mat
matching functions, and with ching functions, and with
JIT, though there are some differences in behaviour. The output for JIT, though there are some differences in behaviour. The output for
callouts with numerical ar‐ callouts with numerical ar‐
guments and those with string arguments is slightly different. guments and those with string arguments is slightly different.
Callouts with numerical arguments Callouts with numerical arguments
By default, the callout function displays the callout number, the st art and current positions in By default, the callout function displays the callout number, the st art and current positions in
the subject text at the callout time, and the next pattern item to b e tested. For example: the subject text at the callout time, and the next pattern item to b e tested. For example:
--->pqrabcdef --->pqrabcdef
0 ^ ^ \d 0 ^ ^ \d
This output indicates that callout number 0 occurred for a match at This output indicates that callout number 0 occurred for a match att
tempt starting at the fourth empt starting at the fourth
character of the subject string, when the pointer was at the seventh character of the subject string, when the pointer was at the sev
character, and when the enth character, and when the
next pattern item was \d. Just one circumflex is output if the sta next pattern item was \d. Just one circumflex is output if the start
rt and current positions are and current positions are
the same, or if the current position precedes the start position, wh the same, or if the current position precedes the start position, w
ich can happen if the call‐ hich can happen if the call‐
out is in a lookbehind assertion. out is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, Callouts numbered 255 are assumed to be automatic callouts, inse
inserted as a result of the rted as a result of the
auto_callout pattern modifier. In this case, instead of showing the auto_callout pattern modifier. In this case, instead of showing th
callout number, the offset e callout number, the offset
in the pattern, preceded by a plus, is output. For example: in the pattern, preceded by a plus, is output. For example:
re> /\d?[A-E]\*/auto_callout re> /\d?[A-E]\*/auto_callout
data> E* data> E*
--->E* --->E*
+0 ^ \d? +0 ^ \d?
+3 ^ [A-E] +3 ^ [A-E]
+8 ^^ \* +8 ^^ \*
+10 ^ ^ +10 ^ ^
0: E* 0: E*
If a pattern contains (*MARK) items, an additional line is output whenever a change of latest If a pattern contains (*MARK) items, an additional line is output wh enever a change of latest
mark is passed to the callout function. For example: mark is passed to the callout function. For example:
re> /a(*MARK:X)bc/auto_callout re> /a(*MARK:X)bc/auto_callout
data> abc data> abc
--->abc --->abc
+0 ^ a +0 ^ a
+1 ^^ (*MARK:X) +1 ^^ (*MARK:X)
+10 ^^ b +10 ^^ b
Latest Mark: X Latest Mark: X
+11 ^ ^ c +11 ^ ^ c
+12 ^ ^ +12 ^ ^
0: abc 0: abc
The mark changes between matching "a" and "b", but stays the same fo The mark changes between matching "a" and "b", but stays the same f
r the rest of the match, so or the rest of the match, so
nothing more is output. If, as a result of backtracking, the mark nothing more is output. If, as a result of backtracking, the mark re
reverts to being unset, the verts to being unset, the
text "<unset>" is output. text "<unset>" is output.
Callouts with string arguments Callouts with string arguments
The output for a callout with a string argument is similar, except t The output for a callout with a string argument is similar, except
hat instead of outputting a that instead of outputting a
callout number before the position indicators, the callout string a callout number before the position indicators, the callout string an
nd its offset in the pattern d its offset in the pattern
string are output before the reflection of the subject string, and t string are output before the reflection of the subject string, a
he subject string is re‐ nd the subject string is re‐
flected for each callout. For example: flected for each callout. For example:
re> /^ab(?C'first')cd(?C"second")ef/ re> /^ab(?C'first')cd(?C"second")ef/
data> abcdefg data> abcdefg
Callout (7): 'first' Callout (7): 'first'
--->abcdefg --->abcdefg
^ ^ c ^ ^ c
Callout (20): "second" Callout (20): "second"
--->abcdefg --->abcdefg
^ ^ e ^ ^ e
0: abcdef 0: abcdef
Callout modifiers Callout modifiers
The callout function in pcre2test returns zero (carry on matching) b y default, but you can use a The callout function in pcre2test returns zero (carry on matching) b y default, but you can use a
callout_fail modifier in a subject line to change this and other pa rameters of the callout (see callout_fail modifier in a subject line to change this and other par ameters of the callout (see
below). below).
If the callout_capture modifier is set, the current captured groups If the callout_capture modifier is set, the current captured grou
are output when a callout ps are output when a callout
occurs. This is useful only for non-DFA matching, as pcre2_dfa_matc occurs. This is useful only for non-DFA matching, as pcre2_dfa_match
h() does not support captur‐ () does not support captur‐
ing, so no captures are ever shown. ing, so no captures are ever shown.
The normal callout output, showing the callout number or pattern off set (as described above) is The normal callout output, showing the callout number or pattern of fset (as described above) is
suppressed if the callout_no_where modifier is set. suppressed if the callout_no_where modifier is set.
When using the interpretive matching function pcre2_match() without JIT, setting the callout_ex‐ When using the interpretive matching function pcre2_match() without JIT, setting the callout_ex‐
tra modifier causes additional output from pcre2test's callout funct ion to be generated. For the tra modifier causes additional output from pcre2test's callout funct ion to be generated. For the
first callout in a match attempt at a new starting position in the subject, "New match attempt" first callout in a match attempt at a new starting position in the s ubject, "New match attempt"
is output. If there has been a backtrack since the last callout (or start of matching if this is is output. If there has been a backtrack since the last callout (or start of matching if this is
the first callout), "Backtrack" is output, followed by "No other mat ching paths" if the back‐ the first callout), "Backtrack" is output, followed by "No other matching paths" if the back‐
track ended the previous match attempt. For example: track ended the previous match attempt. For example:
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
data> aac\=callout_extra data> aac\=callout_extra
New match attempt New match attempt
--->aac --->aac
+0 ^ ( +0 ^ (
+1 ^ a+ +1 ^ a+
+3 ^ ^ ) +3 ^ ^ )
+4 ^ ^ b +4 ^ ^ b
skipping to change at line 1614 skipping to change at line 1664
+0 ^ ( +0 ^ (
+1 ^ a+ +1 ^ a+
Backtrack Backtrack
No other matching paths No other matching paths
New match attempt New match attempt
--->aac --->aac
+0 ^ ( +0 ^ (
+1 ^ a+ +1 ^ a+
No match No match
Notice that various optimizations must be turned off if you want al Notice that various optimizations must be turned off if you want all
l possible matching paths to possible matching paths to
be scanned. If no_start_optimize is not used, there is an immediate be scanned. If no_start_optimize is not used, there is an immed
"no match", without any iate "no match", without any
callouts, because the starting optimization fails to find "b" in callouts, because the starting optimization fails to find "b" in the
the subject, which it knows subject, which it knows
must be present for any match. If no_auto_possess is not used, the must be present for any match. If no_auto_possess is not used,
"a+" item is turned into the "a+" item is turned into
"a++", which reduces the number of backtracks. "a++", which reduces the number of backtracks.
The callout_extra modifier has no effect if used with the DFA matchi ng function, or with JIT. The callout_extra modifier has no effect if used with the DFA matchi ng function, or with JIT.
Return values from callouts Return values from callouts
The default return from the callout function is zero, which allo The default return from the callout function is zero, which allows
ws matching to continue. The matching to continue. The
callout_fail modifier can be given one or two numbers. If there is o callout_fail modifier can be given one or two numbers. If there
nly one number, 1 is re‐ is only one number, 1 is re‐
turned instead of 0 (causing matching to backtrack) when a callout o f that number is reached. If turned instead of 0 (causing matching to backtrack) when a callout o f that number is reached. If
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is reached and there have been two numbers (<n>:<m>) are given, 1 is returned when callout <n> is r eached and there have been
at least <m> callouts. The callout_error modifier is similar, except that PCRE2_ERROR_CALLOUT is at least <m> callouts. The callout_error modifier is similar, except that PCRE2_ERROR_CALLOUT is
returned, causing the entire matching process to be aborted. If both these modifiers are set for returned, causing the entire matching process to be aborted. If both these modifiers are set for
the same callout number, callout_error takes precedence. Note that c allouts with string argu‐ the same callout number, callout_error takes precedence. Note tha t callouts with string argu‐
ments are always given the number zero. ments are always given the number zero.
The callout_data modifier can be given an unsigned or a negative The callout_data modifier can be given an unsigned or a negative num
number. This is set as the ber. This is set as the
"user data" that is passed to the matching function, and passed back "user data" that is passed to the matching function, and passed ba
when the callout function ck when the callout function
is invoked. Any value other than zero is used as a return from pcre2 test's callout function. is invoked. Any value other than zero is used as a return from pcre2 test's callout function.
Inserting callouts can be helpful when using pcre2test to check comp licated regular expressions. Inserting callouts can be helpful when using pcre2test to check comp licated regular expressions.
For further information about callouts, see the pcre2callout documen tation. For further information about callouts, see the pcre2callout documen tation.
NON-PRINTING CHARACTERS NON-PRINTING CHARACTERS
When pcre2test is outputting text in the compiled version of a patt ern, bytes other than 32-126 When pcre2test is outputting text in the compiled version of a patte rn, bytes other than 32-126
are always treated as non-printing characters and are therefore show n as hex escapes. are always treated as non-printing characters and are therefore show n as hex escapes.
When pcre2test is outputting text that is a matched part of a subjec t string, it behaves in the When pcre2test is outputting text that is a matched part of a subje ct string, it behaves in the
same way, unless a different locale has been set for the pattern (us ing the locale modifier). In same way, unless a different locale has been set for the pattern (us ing the locale modifier). In
this case, the isprint() function is used to distinguish printing an d non-printing characters. this case, the isprint() function is used to distinguish printing an d non-printing characters.
SAVING AND RESTORING COMPILED PATTERNS SAVING AND RESTORING COMPILED PATTERNS
It is possible to save compiled patterns on disc or elsewhere, and r eload them later, subject to It is possible to save compiled patterns on disc or elsewhere, and r eload them later, subject to
a number of restrictions. JIT data cannot be saved. The host on whi ch the patterns are reloaded a number of restrictions. JIT data cannot be saved. The host on whic h the patterns are reloaded
must be running the same version of PCRE2, with the same code unit w idth, and must also have the must be running the same version of PCRE2, with the same code unit w idth, and must also have the
same endianness, pointer width and PCRE2_SIZE type. Before compiled same endianness, pointer width and PCRE2_SIZE type. Before compile
patterns can be saved they d patterns can be saved they
must be serialized, that is, converted to a stream of bytes. A si must be serialized, that is, converted to a stream of bytes. A singl
ngle byte stream may contain e byte stream may contain
any number of compiled patterns, but they must all use the same char any number of compiled patterns, but they must all use the same cha
acter tables. A single copy racter tables. A single copy
of the tables is included in the byte stream (its size is 1088 bytes ). of the tables is included in the byte stream (its size is 1088 bytes ).
The functions whose names begin with pcre2_serialize_ are used for The functions whose names begin with pcre2_serialize_ are used for s
serializing and de-serializ‐ erializing and de-serializ‐
ing. They are described in the pcre2serialize documentation. In this ing. They are described in the pcre2serialize documentation. In
section we describe the this section we describe the
features of pcre2test that can be used to test these functions. features of pcre2test that can be used to test these functions.
Note that "serialization" in PCRE2 does not convert compiled pattern s to an abstract format like Note that "serialization" in PCRE2 does not convert compiled pattern s to an abstract format like
Java or .NET. It just makes a reloadable byte code stream. Hence t he restrictions on reloading Java or .NET. It just makes a reloadable byte code stream. Hence th e restrictions on reloading
mentioned above. mentioned above.
In pcre2test, when a pattern with push modifier is successfully comp In pcre2test, when a pattern with push modifier is successfully c
iled, it is pushed onto a ompiled, it is pushed onto a
stack of compiled patterns, and pcre2test expects the next line stack of compiled patterns, and pcre2test expects the next line to c
to contain a new pattern (or ontain a new pattern (or
command) instead of a subject line. By contrast, the pushcopy modifi er causes a copy of the com‐ command) instead of a subject line. By contrast, the pushcopy modifi er causes a copy of the com‐
piled pattern to be stacked, leaving the original available for imm ediate matching. By using piled pattern to be stacked, leaving the original available for immediate matching. By using
push and/or pushcopy, a number of patterns can be compiled and retai ned. These modifiers are in‐ push and/or pushcopy, a number of patterns can be compiled and retai ned. These modifiers are in‐
compatible with posix, and control modifiers that act at match time are ignored (with a message) compatible with posix, and control modifiers that act at match time are ignored (with a message)
for the stacked patterns. The jitverify modifier applies only at com pile time. for the stacked patterns. The jitverify modifier applies only at com pile time.
The command The command
#save <filename> #save <filename>
causes all the stacked patterns to be serialized and the result wr itten to the named file. Af‐ causes all the stacked patterns to be serialized and the result writ ten to the named file. Af‐
terwards, all the stacked patterns are freed. The command terwards, all the stacked patterns are freed. The command
#load <filename> #load <filename>
reads the data in the file, and then arranges for it to be de-seria reads the data in the file, and then arranges for it to be de-se
lized, with the resulting rialized, with the resulting
compiled patterns added to the pattern stack. The pattern on the compiled patterns added to the pattern stack. The pattern on the top
top of the stack can be re‐ of the stack can be re‐
trieved by the #pop command, which must be followed by lines of subj trieved by the #pop command, which must be followed by lines of sub
ects that are to be matched jects that are to be matched
with the pattern, terminated as usual by an empty line or end of fi with the pattern, terminated as usual by an empty line or end of fil
le. This command may be fol‐ e. This command may be fol‐
lowed by a modifier list containing only control modifiers that act lowed by a modifier list containing only control modifiers that
after a pattern has been act after a pattern has been
compiled. In particular, hex, posix, posix_nosub, push, and pushc compiled. In particular, hex, posix, posix_nosub, push, and pushcopy
opy are not allowed, nor are are not allowed, nor are
any option-setting modifiers. The JIT modifiers are, however permit ted. Here is an example that any option-setting modifiers. The JIT modifiers are, however permit ted. Here is an example that
saves and reloads two patterns. saves and reloads two patterns.
/abc/push /abc/push
/xyz/push /xyz/push
#save tempfile #save tempfile
#load tempfile #load tempfile
#pop info #pop info
xyz xyz
#pop jit,bincode #pop jit,bincode
abc abc
If jitverify is used with #pop, it does not automatically imply jit, which is different behav‐ If jitverify is used with #pop, it does not automatically imply ji t, which is different behav‐
iour from when it is used on a pattern. iour from when it is used on a pattern.
The #popcopy command is analogous to the pushcopy modifier in tha t it makes current a copy of The #popcopy command is analogous to the pushcopy modifier in that i t makes current a copy of
the topmost stack pattern, leaving the original still on the stack. the topmost stack pattern, leaving the original still on the stack.
SEE ALSO SEE ALSO
pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching(3), pcre2partial(d), pcre2pat‐ pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching(3) , pcre2partial(d), pcre2pat‐
tern(3), pcre2serialize(3). tern(3), pcre2serialize(3).
AUTHOR AUTHOR
Philip Hazel Philip Hazel
Retired from University Computing Service Retired from University Computing Service
Cambridge, England. Cambridge, England.
REVISION REVISION
Last updated: 24 April 2024 Last updated: 26 December 2024
Copyright (c) 1997-2024 University of Cambridge. Copyright (c) 1997-2024 University of Cambridge.
PCRE 10.44 24 April 2024 PCRE2TEST(1) PCRE2 10.45-RC1 26 December 2024 PCRE2TEST(1)
 End of changes. 146 change blocks. 
440 lines changed or deleted 511 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/