pcre2test.txt | pcre2test.txt | |||
---|---|---|---|---|
skipping to change at line 622 | skipping to change at line 622 | |||
convert_length set convert buffer length | convert_length set convert buffer length | |||
debug same as info,fullbincode | debug same as info,fullbincode | |||
framesize show matching frame size | framesize show matching frame size | |||
fullbincode show binary code with lengths | fullbincode show binary code with lengths | |||
/I info show info about compiled pattern | /I info show info about compiled pattern | |||
hex unquoted characters are hexadecimal | hex unquoted characters are hexadecimal | |||
jit[=<number>] use JIT | jit[=<number>] use JIT | |||
jitfast use JIT fast path | jitfast use JIT fast path | |||
jitverify verify JIT use | jitverify verify JIT use | |||
locale=<name> use this locale | locale=<name> use this locale | |||
max_pattern_length=<n> set maximum pattern length | max_pattern_compiled ) set maximum compiled pattern | |||
_length=<n> ) length (bytes) | ||||
max_pattern_length=<n> set maximum pattern length (code uni | ||||
ts) | ||||
max_varlookbehind=<n> set maximum variable lookbehind leng th | max_varlookbehind=<n> set maximum variable lookbehind leng th | |||
memory show memory used | memory show memory used | |||
newline=<type> set newline type | newline=<type> set newline type | |||
null_context compile with a NULL context | null_context compile with a NULL context | |||
null_pattern pass pattern as NULL | null_pattern pass pattern as NULL | |||
parens_nest_limit=<n> set maximum parentheses depth | parens_nest_limit=<n> set maximum parentheses depth | |||
posix use the POSIX API | posix use the POSIX API | |||
posix_nosub use the POSIX API with REG_NOSUB | posix_nosub use the POSIX API with REG_NOSUB | |||
push push compiled pattern onto the stack | push push compiled pattern onto the stack | |||
pushcopy push a copy onto the stack | pushcopy push a copy onto the stack | |||
skipping to change at line 904 | skipping to change at line 906 | |||
pcre2test sets its own default of 220, which is required for runn ing | pcre2test sets its own default of 220, which is required for runn ing | |||
the standard test suite. | the standard test suite. | |||
Limiting the pattern length | Limiting the pattern length | |||
The max_pattern_length modifier sets a limit, in code units, to the | The max_pattern_length modifier sets a limit, in code units, to the | |||
length of pattern that pcre2_compile() will accept. Breaching the li mit | length of pattern that pcre2_compile() will accept. Breaching the li mit | |||
causes a compilation error. The default is the largest number a | causes a compilation error. The default is the largest number a | |||
PCRE2_SIZE variable can hold (essentially unlimited). | PCRE2_SIZE variable can hold (essentially unlimited). | |||
Limiting the size of a compiled pattern | ||||
The max_pattern_compiled_length modifier sets a limit, in bytes, to | ||||
the | ||||
amount of memory used by a compiled pattern. Breaching the limit cau | ||||
ses | ||||
a compilation error. The default is the largest number a PCRE2_S | ||||
IZE | ||||
variable can hold (essentially unlimited). | ||||
Using the POSIX wrapper API | Using the POSIX wrapper API | |||
The posix and posix_nosub modifiers cause pcre2test to call PCRE2 | The posix and posix_nosub modifiers cause pcre2test to call PCRE2 | |||
via | via | |||
the POSIX wrapper API rather than its native API. When posix_nosub | the POSIX wrapper API rather than its native API. When posix_nosub | |||
is | is | |||
used, the POSIX option REG_NOSUB is passed to regcomp(). The PO | used, the POSIX option REG_NOSUB is passed to regcomp(). The PO | |||
SIX | SIX | |||
wrapper supports only the 8-bit library. Note that it does not im | wrapper supports only the 8-bit library. Note that it does not im | |||
ply | ply | |||
POSIX matching semantics; for more detail see the pcre2posix documen ta- | POSIX matching semantics; for more detail see the pcre2posix documen ta- | |||
tion. The following pattern modifiers set options for the regcom p() | tion. The following pattern modifiers set options for the regcom p() | |||
function: | function: | |||
caseless REG_ICASE | caseless REG_ICASE | |||
multiline REG_NEWLINE | multiline REG_NEWLINE | |||
dotall REG_DOTALL ) | dotall REG_DOTALL ) | |||
ungreedy REG_UNGREEDY ) These options are not part of | ungreedy REG_UNGREEDY ) These options are not part of | |||
ucp REG_UCP ) the POSIX standard | ucp REG_UCP ) the POSIX standard | |||
utf REG_UTF8 ) | utf REG_UTF8 ) | |||
The regerror_buffsize modifier specifies a size for the error buf | The regerror_buffsize modifier specifies a size for the error buf | |||
fer | fer | |||
that is passed to regerror() in the event of a compilation error. | that is passed to regerror() in the event of a compilation error. | |||
For | For | |||
example: | example: | |||
/abc/posix,regerror_buffsize=20 | /abc/posix,regerror_buffsize=20 | |||
This provides a means of testing the behaviour of regerror() when | This provides a means of testing the behaviour of regerror() when | |||
the | the | |||
buffer is too small for the error message. If this modifier has | buffer is too small for the error message. If this modifier has | |||
not | not | |||
been set, a large buffer is used. | been set, a large buffer is used. | |||
The aftertext and allaftertext subject modifiers work as described be- | The aftertext and allaftertext subject modifiers work as described be- | |||
low. All other modifiers are either ignored, with a warning message, or | low. All other modifiers are either ignored, with a warning message, or | |||
cause an error. | cause an error. | |||
The pattern is passed to regcomp() as a zero-terminated string by de- | The pattern is passed to regcomp() as a zero-terminated string by de- | |||
fault, but if the use_length or hex modifiers are set, the REG_PEND ex- | fault, but if the use_length or hex modifiers are set, the REG_PEND ex- | |||
tension is used to pass it by length. | tension is used to pass it by length. | |||
Testing the stack guard feature | Testing the stack guard feature | |||
The stackguard modifier is used to test the use of pcre2_set_c | The stackguard modifier is used to test the use of pcre2_set_c | |||
om- | om- | |||
pile_recursion_guard(), a function that is provided to enable st | pile_recursion_guard(), a function that is provided to enable st | |||
ack | ack | |||
availability to be checked during compilation (see the pcre2api do | availability to be checked during compilation (see the pcre2api do | |||
cu- | cu- | |||
mentation for details). If the number specified by the modifier | mentation for details). If the number specified by the modifier | |||
is | is | |||
greater than zero, pcre2_set_compile_recursion_guard() is called to set | greater than zero, pcre2_set_compile_recursion_guard() is called to set | |||
up callback from pcre2_compile() to a local function. The argument | up callback from pcre2_compile() to a local function. The argument | |||
it | it | |||
receives is the current nesting parenthesis depth; if this is grea | receives is the current nesting parenthesis depth; if this is grea | |||
ter | ter | |||
than the value given by the modifier, non-zero is returned, causing the | than the value given by the modifier, non-zero is returned, causing the | |||
compilation to be aborted. | compilation to be aborted. | |||
Using alternative character tables | Using alternative character tables | |||
The value specified for the tables modifier must be one of the dig its | The value specified for the tables modifier must be one of the dig its | |||
0, 1, 2, or 3. It causes a specific set of built-in character tables to | 0, 1, 2, or 3. It causes a specific set of built-in character tables to | |||
be passed to pcre2_compile(). This is used in the PCRE2 tests to ch | be passed to pcre2_compile(). This is used in the PCRE2 tests to ch | |||
eck | eck | |||
behaviour with different character tables. The digit specifies the | behaviour with different character tables. The digit specifies the | |||
ta- | ta- | |||
bles as follows: | bles as follows: | |||
0 do not pass any special character tables | 0 do not pass any special character tables | |||
1 the default ASCII tables, as distributed in | 1 the default ASCII tables, as distributed in | |||
pcre2_chartables.c.dist | pcre2_chartables.c.dist | |||
2 a set of tables defining ISO 8859 characters | 2 a set of tables defining ISO 8859 characters | |||
3 a set of tables loaded by the #loadtables command | 3 a set of tables loaded by the #loadtables command | |||
In tables 2, some characters whose codes are greater than 128 are id en- | In tables 2, some characters whose codes are greater than 128 are id en- | |||
tified as letters, digits, spaces, etc. Tables 3 can be used only af ter | tified as letters, digits, spaces, etc. Tables 3 can be used only af ter | |||
a #loadtables command has loaded them from a binary file. Setting al- | a #loadtables command has loaded them from a binary file. Setting al- | |||
ternate character tables and a locale are mutually exclusive. | ternate character tables and a locale are mutually exclusive. | |||
Setting certain match controls | Setting certain match controls | |||
The following modifiers are really subject modifiers, and are descri bed | The following modifiers are really subject modifiers, and are descri bed | |||
under "Subject Modifiers" below. However, they may be included in | under "Subject Modifiers" below. However, they may be included i | |||
a | n a | |||
pattern's modifier list, in which case they are applied to every s | pattern's modifier list, in which case they are applied to every s | |||
ub- | ub- | |||
ject line that is processed with that pattern. These modifiers do | ject line that is processed with that pattern. These modifiers do | |||
not | not | |||
affect the compilation process. | affect the compilation process. | |||
aftertext show text after match | aftertext show text after match | |||
allaftertext show text after captures | allaftertext show text after captures | |||
allcaptures show all captures | allcaptures show all captures | |||
allvector show the entire ovector | allvector show the entire ovector | |||
allusedtext show all consulted text | allusedtext show all consulted text | |||
altglobal alternative global matching | altglobal alternative global matching | |||
/g global global matching | /g global global matching | |||
heapframes_size show match data heapframes size | heapframes_size show match data heapframes size | |||
skipping to change at line 1001 | skipping to change at line 1010 | |||
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED | substitute_extended use PCRE2_SUBSTITUTE_EXTENDED | |||
substitute_literal use PCRE2_SUBSTITUTE_LITERAL | substitute_literal use PCRE2_SUBSTITUTE_LITERAL | |||
substitute_matched use PCRE2_SUBSTITUTE_MATCHED | substitute_matched use PCRE2_SUBSTITUTE_MATCHED | |||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENG TH | substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENG TH | |||
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | |||
substitute_skip=<n> skip substitution <n> | substitute_skip=<n> skip substitution <n> | |||
substitute_stop=<n> skip substitution <n> and followin g | substitute_stop=<n> skip substitution <n> and followin g | |||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | |||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | |||
These modifiers may not appear in a #pattern command. If you want t hem | These modifiers may not appear in a #pattern command. If you want t hem | |||
as defaults, set them in a #subject command. | as defaults, set them in a #subject command. | |||
Specifying literal subject lines | Specifying literal subject lines | |||
If the subject_literal modifier is present on a pattern, all the s ub- | If the subject_literal modifier is present on a pattern, all the s ub- | |||
ject lines that it matches are taken as literal strings, with no int er- | ject lines that it matches are taken as literal strings, with no int er- | |||
pretation of backslashes. It is not possible to set subject modifi | pretation of backslashes. It is not possible to set subject modifi | |||
ers | ers | |||
on such lines, but any that are set as defaults by a #subject comm | on such lines, but any that are set as defaults by a #subject comm | |||
and | and | |||
are recognized. | are recognized. | |||
Saving a compiled pattern | Saving a compiled pattern | |||
When a pattern with the push modifier is successfully compiled, it | When a pattern with the push modifier is successfully compiled, it | |||
is | is | |||
pushed onto a stack of compiled patterns, and pcre2test expects | pushed onto a stack of compiled patterns, and pcre2test expects | |||
the | the | |||
next line to contain a new pattern (or a command) instead of a subj | next line to contain a new pattern (or a command) instead of a subj | |||
ect | ect | |||
line. This facility is used when saving compiled patterns to a file, as | line. This facility is used when saving compiled patterns to a file, as | |||
described in the section entitled "Saving and restoring compiled p | described in the section entitled "Saving and restoring compiled p | |||
at- | at- | |||
terns" below. If pushcopy is used instead of push, a copy of the c | terns" below. If pushcopy is used instead of push, a copy of the c | |||
om- | om- | |||
piled pattern is stacked, leaving the original as current, ready | piled pattern is stacked, leaving the original as current, ready | |||
to | to | |||
match the following input lines. This provides a way of testing | match the following input lines. This provides a way of testing | |||
the | the | |||
pcre2_code_copy() function. The push and pushcopy modifiers are | pcre2_code_copy() function. The push and pushcopy modifiers are | |||
in- | in- | |||
compatible with compilation modifiers such as global that act at ma | compatible with compilation modifiers such as global that act at ma | |||
tch | tch | |||
time. Any that are specified are ignored (for the stacked copy), wit h a | time. Any that are specified are ignored (for the stacked copy), wit h a | |||
warning message, except for replace, which causes an error. Note t | warning message, except for replace, which causes an error. Note t | |||
hat | hat | |||
jitverify, which is allowed, does not carry through to any subsequ | jitverify, which is allowed, does not carry through to any subsequ | |||
ent | ent | |||
matching that uses a stacked pattern. | matching that uses a stacked pattern. | |||
Testing foreign pattern conversion | Testing foreign pattern conversion | |||
The experimental foreign pattern conversion functions in PCRE2 can | The experimental foreign pattern conversion functions in PCRE2 can | |||
be | be | |||
tested by setting the convert modifier. Its argument is a colon-se | tested by setting the convert modifier. Its argument is a colon-se | |||
pa- | pa- | |||
rated list of options, which set the equivalent option for | rated list of options, which set the equivalent option for | |||
the | the | |||
pcre2_pattern_convert() function: | pcre2_pattern_convert() function: | |||
glob PCRE2_CONVERT_GLOB | glob PCRE2_CONVERT_GLOB | |||
glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR | glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR | |||
glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR | glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR | |||
posix_basic PCRE2_CONVERT_POSIX_BASIC | posix_basic PCRE2_CONVERT_POSIX_BASIC | |||
posix_extended PCRE2_CONVERT_POSIX_EXTENDED | posix_extended PCRE2_CONVERT_POSIX_EXTENDED | |||
unset Unset all options | unset Unset all options | |||
The "unset" value is useful for turning off a default that has been set | The "unset" value is useful for turning off a default that has been set | |||
by a #pattern command. When one of these options is set, the input p at- | by a #pattern command. When one of these options is set, the input p at- | |||
tern is passed to pcre2_pattern_convert(). If the conversion is s | tern is passed to pcre2_pattern_convert(). If the conversion is s | |||
uc- | uc- | |||
cessful, the result is reflected in the output and then passed | cessful, the result is reflected in the output and then passed | |||
to | to | |||
pcre2_compile(). The normal utf and no_utf_check options, if set, ca use | pcre2_compile(). The normal utf and no_utf_check options, if set, ca use | |||
the PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be | the PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be | |||
passed to pcre2_pattern_convert(). | passed to pcre2_pattern_convert(). | |||
By default, the conversion function is allowed to allocate a buffer for | By default, the conversion function is allowed to allocate a buffer for | |||
its output. However, if the convert_length modifier is set to a va | its output. However, if the convert_length modifier is set to a va | |||
lue | lue | |||
greater than zero, pcre2test passes a buffer of the given length. T | greater than zero, pcre2test passes a buffer of the given length. T | |||
his | his | |||
makes it possible to test the length check. | makes it possible to test the length check. | |||
The convert_glob_escape and convert_glob_separator modifiers can | The convert_glob_escape and convert_glob_separator modifiers can | |||
be | be | |||
used to specify the escape and separator characters for glob proce | used to specify the escape and separator characters for glob proce | |||
ss- | ss- | |||
ing, overriding the defaults, which are operating-system dependent. | ing, overriding the defaults, which are operating-system dependent. | |||
SUBJECT MODIFIERS | SUBJECT MODIFIERS | |||
The modifiers that can appear in subject lines and the #subject comm and | The modifiers that can appear in subject lines and the #subject comm and | |||
are of two types. | are of two types. | |||
Setting match options | Setting match options | |||
The following modifiers set options for pcre2_match() or | The following modifiers set options for pcre2_match() or | |||
pcre2_dfa_match(). See pcreapi for a description of their effects. | pcre2_dfa_match(). See pcreapi for a description of their effects. | |||
anchored set PCRE2_ANCHORED | anchored set PCRE2_ANCHORED | |||
endanchored set PCRE2_ENDANCHORED | endanchored set PCRE2_ENDANCHORED | |||
dfa_restart set PCRE2_DFA_RESTART | dfa_restart set PCRE2_DFA_RESTART | |||
dfa_shortest set PCRE2_DFA_SHORTEST | dfa_shortest set PCRE2_DFA_SHORTEST | |||
disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK | disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK | |||
no_jit set PCRE2_NO_JIT | no_jit set PCRE2_NO_JIT | |||
no_utf_check set PCRE2_NO_UTF_CHECK | no_utf_check set PCRE2_NO_UTF_CHECK | |||
notbol set PCRE2_NOTBOL | notbol set PCRE2_NOTBOL | |||
notempty set PCRE2_NOTEMPTY | notempty set PCRE2_NOTEMPTY | |||
notempty_atstart set PCRE2_NOTEMPTY_ATSTART | notempty_atstart set PCRE2_NOTEMPTY_ATSTART | |||
noteol set PCRE2_NOTEOL | noteol set PCRE2_NOTEOL | |||
partial_hard (or ph) set PCRE2_PARTIAL_HARD | partial_hard (or ph) set PCRE2_PARTIAL_HARD | |||
partial_soft (or ps) set PCRE2_PARTIAL_SOFT | partial_soft (or ps) set PCRE2_PARTIAL_SOFT | |||
The partial matching modifiers are provided with abbreviations beca use | The partial matching modifiers are provided with abbreviations beca use | |||
they appear frequently in tests. | they appear frequently in tests. | |||
If the posix or posix_nosub modifier was present on the pattern, ca us- | If the posix or posix_nosub modifier was present on the pattern, ca us- | |||
ing the POSIX wrapper API to be used, the only option-setting modifi ers | ing the POSIX wrapper API to be used, the only option-setting modifi ers | |||
that have any effect are notbol, notempty, and noteol, causing REG_N OT- | that have any effect are notbol, notempty, and noteol, causing REG_N OT- | |||
BOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to | BOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to | |||
regexec(). The other modifiers are ignored, with a warning message. | regexec(). The other modifiers are ignored, with a warning message. | |||
There is one additional modifier that can be used with the POSIX wr ap- | There is one additional modifier that can be used with the POSIX wr ap- | |||
per. It is ignored (with a warning) if used for non-POSIX matching. | per. It is ignored (with a warning) if used for non-POSIX matching. | |||
posix_startend=<n>[:<m>] | posix_startend=<n>[:<m>] | |||
This causes the subject string to be passed to regexec() using | This causes the subject string to be passed to regexec() using | |||
the | the | |||
REG_STARTEND option, which uses offsets to specify which part of | REG_STARTEND option, which uses offsets to specify which part of | |||
the | the | |||
string is searched. If only one number is given, the end offset | string is searched. If only one number is given, the end offset | |||
is | is | |||
passed as the end of the subject string. For more detail of REG_ST | passed as the end of the subject string. For more detail of REG_ST | |||
AR- | AR- | |||
TEND, see the pcre2posix documentation. If the subject string conta | TEND, see the pcre2posix documentation. If the subject string conta | |||
ins | ins | |||
binary zeros (coded as escapes such as \x{00} because pcre2test d | binary zeros (coded as escapes such as \x{00} because pcre2test d | |||
oes | oes | |||
not support actual binary zeros in its input), you must use posix_st ar- | not support actual binary zeros in its input), you must use posix_st ar- | |||
tend to specify its length. | tend to specify its length. | |||
Setting match controls | Setting match controls | |||
The following modifiers affect the matching process or request ad | The following modifiers affect the matching process or request ad | |||
di- | di- | |||
tional information. Some of them may also be specified on a patt | tional information. Some of them may also be specified on a patt | |||
ern | ern | |||
line (see above), in which case they apply to every subject line t | line (see above), in which case they apply to every subject line t | |||
hat | hat | |||
is matched against that pattern, but can be overridden by modifiers | is matched against that pattern, but can be overridden by modifiers | |||
on | on | |||
the subject. | the subject. | |||
aftertext show text after match | aftertext show text after match | |||
allaftertext show text after captures | allaftertext show text after captures | |||
allcaptures show all captures | allcaptures show all captures | |||
allvector show the entire ovector | allvector show the entire ovector | |||
allusedtext show all consulted text (non-JIT on ly) | allusedtext show all consulted text (non-JIT on ly) | |||
altglobal alternative global matching | altglobal alternative global matching | |||
callout_capture show captures at callout time | callout_capture show captures at callout time | |||
callout_data=<n> set a value to pass via callouts | callout_data=<n> set a value to pass via callouts | |||
skipping to change at line 1165 | skipping to change at line 1174 | |||
substitute_matched use PCRE2_SUBSTITUTE_MATCHED | substitute_matched use PCRE2_SUBSTITUTE_MATCHED | |||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGT H | substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGT H | |||
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | |||
substitute_skip=<n> skip substitution number n | substitute_skip=<n> skip substitution number n | |||
substitute_stop=<n> skip substitution number n and grea ter | substitute_stop=<n> skip substitution number n and grea ter | |||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | |||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | |||
zero_terminate pass the subject as zero-terminated | zero_terminate pass the subject as zero-terminated | |||
The effects of these modifiers are described in the following sectio ns. | The effects of these modifiers are described in the following sectio ns. | |||
When matching via the POSIX wrapper API, the aftertext, allafterte | When matching via the POSIX wrapper API, the aftertext, allafterte | |||
xt, | xt, | |||
and ovector subject modifiers work as described below. All other mo | and ovector subject modifiers work as described below. All other mo | |||
di- | di- | |||
fiers are either ignored, with a warning message, or cause an error. | fiers are either ignored, with a warning message, or cause an error. | |||
Showing more text | Showing more text | |||
The aftertext modifier requests that as well as outputting the part of | The aftertext modifier requests that as well as outputting the part of | |||
the subject string that matched the entire pattern, pcre2test should in | the subject string that matched the entire pattern, pcre2test should in | |||
addition output the remainder of the subject string. This is useful for | addition output the remainder of the subject string. This is useful for | |||
tests where the subject contains multiple copies of the same substri ng. | tests where the subject contains multiple copies of the same substri ng. | |||
The allaftertext modifier requests the same action for captured s ub- | The allaftertext modifier requests the same action for captured s ub- | |||
strings as well as the main matched substring. In each case the rema in- | strings as well as the main matched substring. In each case the rema in- | |||
der is output on the following line with a plus character following the | der is output on the following line with a plus character following the | |||
capture number. | capture number. | |||
The allusedtext modifier requests that all the text that was consul | The allusedtext modifier requests that all the text that was consul | |||
ted | ted | |||
during a successful pattern match by the interpreter should be sho | during a successful pattern match by the interpreter should be sho | |||
wn, | wn, | |||
for both full and partial matches. This feature is not supported | for both full and partial matches. This feature is not supported | |||
for | for | |||
JIT matching, and if requested with JIT it is ignored (with a warn | JIT matching, and if requested with JIT it is ignored (with a warn | |||
ing | ing | |||
message). Setting this modifier affects the output if there is a lo | message). Setting this modifier affects the output if there is a lo | |||
ok- | ok- | |||
behind at the start of a match, or, for a complete match, a lookah | behind at the start of a match, or, for a complete match, a lookah | |||
ead | ead | |||
at the end, or if \K is used in the pattern. Characters that precede or | at the end, or if \K is used in the pattern. Characters that precede or | |||
follow the start and end of the actual match are indicated in the o ut- | follow the start and end of the actual match are indicated in the o ut- | |||
put by '<' or '>' characters underneath them. Here is an example: | put by '<' or '>' characters underneath them. Here is an example: | |||
re> /(?<=pqr)abc(?=xyz)/ | re> /(?<=pqr)abc(?=xyz)/ | |||
data> 123pqrabcxyz456\=allusedtext | data> 123pqrabcxyz456\=allusedtext | |||
0: pqrabcxyz | 0: pqrabcxyz | |||
<<< >>> | <<< >>> | |||
data> 123pqrabcxy\=ph,allusedtext | data> 123pqrabcxy\=ph,allusedtext | |||
Partial match: pqrabcxy | Partial match: pqrabcxy | |||
<<< | <<< | |||
The first, complete match shows that the matched string is "abc", w | The first, complete match shows that the matched string is "abc", w | |||
ith | ith | |||
the preceding and following strings "pqr" and "xyz" having been c | the preceding and following strings "pqr" and "xyz" having been c | |||
on- | on- | |||
sulted during the match (when processing the assertions). The part | sulted during the match (when processing the assertions). The part | |||
ial | ial | |||
match can indicate only the preceding string. | match can indicate only the preceding string. | |||
The startchar modifier requests that the starting character for | The startchar modifier requests that the starting character for | |||
the | the | |||
match be indicated, if it is different to the start of the matc | match be indicated, if it is different to the start of the matc | |||
hed | hed | |||
string. The only time when this occurs is when \K has been processed as | string. The only time when this occurs is when \K has been processed as | |||
part of the match. In this situation, the output for the matched str ing | part of the match. In this situation, the output for the matched str ing | |||
is displayed from the starting character instead of from the ma tch | is displayed from the starting character instead of from the ma tch | |||
point, with circumflex characters under the earlier characters. For ex- | point, with circumflex characters under the earlier characters. For ex- | |||
ample: | ample: | |||
re> /abc\Kxyz/ | re> /abc\Kxyz/ | |||
data> abcxyz\=startchar | data> abcxyz\=startchar | |||
0: abcxyz | 0: abcxyz | |||
^^^ | ^^^ | |||
Unlike allusedtext, the startchar modifier can be used with JIT. H ow- | Unlike allusedtext, the startchar modifier can be used with JIT. H ow- | |||
ever, these two modifiers are mutually exclusive. | ever, these two modifiers are mutually exclusive. | |||
Showing the value of all capture groups | Showing the value of all capture groups | |||
The allcaptures modifier requests that the values of all potential c ap- | The allcaptures modifier requests that the values of all potential c ap- | |||
tured parentheses be output after a match. By default, only those up to | tured parentheses be output after a match. By default, only those up to | |||
the highest one actually used in the match are output (corresponding to | the highest one actually used in the match are output (corresponding to | |||
the return code from pcre2_match()). Groups that did not take part | the return code from pcre2_match()). Groups that did not take part | |||
in | in | |||
the match are output as "<unset>". This modifier is not relevant | the match are output as "<unset>". This modifier is not relevant | |||
for | for | |||
DFA matching (which does no capturing) and does not apply when repl | DFA matching (which does no capturing) and does not apply when repl | |||
ace | ace | |||
is specified; it is ignored, with a warning message, if present. | is specified; it is ignored, with a warning message, if present. | |||
Showing the entire ovector, for all outcomes | Showing the entire ovector, for all outcomes | |||
The allvector modifier requests that the entire ovector be shown, wh at- | The allvector modifier requests that the entire ovector be shown, wh at- | |||
ever the outcome of the match. Compare allcaptures, which shows only up | ever the outcome of the match. Compare allcaptures, which shows only up | |||
to the maximum number of capture groups for the pattern, and then o | to the maximum number of capture groups for the pattern, and then o | |||
nly | nly | |||
for a successful complete non-DFA match. This modifier, which acts | for a successful complete non-DFA match. This modifier, which acts | |||
af- | af- | |||
ter any match result, and also for DFA matching, provides a means | ter any match result, and also for DFA matching, provides a means | |||
of | of | |||
checking that there are no unexpected modifications to ovector fiel | checking that there are no unexpected modifications to ovector fiel | |||
ds. | ds. | |||
Before each match attempt, the ovector is filled with a special val | Before each match attempt, the ovector is filled with a special val | |||
ue, | ue, | |||
and if this is found in both elements of a capturing pair, "< | and if this is found in both elements of a capturing pair, "< | |||
un- | un- | |||
changed>" is output. After a successful match, this applies to | changed>" is output. After a successful match, this applies to | |||
all | all | |||
groups after the maximum capture group for the pattern. In other ca | groups after the maximum capture group for the pattern. In other ca | |||
ses | ses | |||
it applies to the entire ovector. After a partial match, the first | it applies to the entire ovector. After a partial match, the first | |||
two | two | |||
elements are the only ones that should be set. After a DFA match, | elements are the only ones that should be set. After a DFA match, | |||
the | the | |||
amount of ovector that is used depends on the number of matches t | amount of ovector that is used depends on the number of matches t | |||
hat | hat | |||
were found. | were found. | |||
Testing pattern callouts | Testing pattern callouts | |||
A callout function is supplied when pcre2test calls the library mat | A callout function is supplied when pcre2test calls the library mat | |||
ch- | ch- | |||
ing functions, unless callout_none is specified. Its behaviour can | ing functions, unless callout_none is specified. Its behaviour can | |||
be | be | |||
controlled by various modifiers listed above whose names begin w | controlled by various modifiers listed above whose names begin w | |||
ith | ith | |||
callout_. Details are given in the section entitled "Callouts" bel | callout_. Details are given in the section entitled "Callouts" bel | |||
ow. | ow. | |||
Testing callouts from pcre2_substitute() is described separately | Testing callouts from pcre2_substitute() is described separately | |||
in | in | |||
"Testing the substitution function" below. | "Testing the substitution function" below. | |||
Finding all matches in a string | Finding all matches in a string | |||
Searching for all possible matches within a subject can be requested by | Searching for all possible matches within a subject can be requested by | |||
the global or altglobal modifier. After finding a match, the match | the global or altglobal modifier. After finding a match, the match | |||
ing | ing | |||
function is called again to search the remainder of the subject. | function is called again to search the remainder of the subject. | |||
The | The | |||
difference between global and altglobal is that the former uses | difference between global and altglobal is that the former uses | |||
the | the | |||
start_offset argument to pcre2_match() or pcre2_dfa_match() to st | start_offset argument to pcre2_match() or pcre2_dfa_match() to st | |||
art | art | |||
searching at a new point within the entire string (which is what P | searching at a new point within the entire string (which is what P | |||
erl | erl | |||
does), whereas the latter passes over a shortened subject. This make s a | does), whereas the latter passes over a shortened subject. This make s a | |||
difference to the matching process if the pattern begins with a look be- | difference to the matching process if the pattern begins with a look be- | |||
hind assertion (including \b or \B). | hind assertion (including \b or \B). | |||
If an empty string is matched, the next match is done with the | If an empty string is matched, the next match is done with the | |||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to sea rch | PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to sea rch | |||
for another, non-empty, match at the same point in the subject. If t his | for another, non-empty, match at the same point in the subject. If t his | |||
match fails, the start offset is advanced, and the normal match is | match fails, the start offset is advanced, and the normal match is | |||
re- | re- | |||
tried. This imitates the way Perl handles such cases when using the | tried. This imitates the way Perl handles such cases when using the | |||
/g | /g | |||
modifier or the split() function. Normally, the start offset is | modifier or the split() function. Normally, the start offset is | |||
ad- | ad- | |||
vanced by one character, but if the newline convention recognizes C | vanced by one character, but if the newline convention recognizes C | |||
RLF | RLF | |||
as a newline, and the current character is CR followed by LF, an | as a newline, and the current character is CR followed by LF, an | |||
ad- | ad- | |||
vance of two characters occurs. | vance of two characters occurs. | |||
Testing substring extraction functions | Testing substring extraction functions | |||
The copy and get modifiers can be used to test the pcre2_s ub- | The copy and get modifiers can be used to test the pcre2_s ub- | |||
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be | string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be | |||
given more than once, and each can specify a capture group name or n um- | given more than once, and each can specify a capture group name or n um- | |||
ber, for example: | ber, for example: | |||
abcd\=copy=1,copy=3,get=G1 | abcd\=copy=1,copy=3,get=G1 | |||
If the #subject command is used to set default copy and/or get lis | If the #subject command is used to set default copy and/or get lis | |||
ts, | ts, | |||
these can be unset by specifying a negative number to cancel all n | these can be unset by specifying a negative number to cancel all n | |||
um- | um- | |||
bered groups and an empty name to cancel all named groups. | bered groups and an empty name to cancel all named groups. | |||
The getall modifier tests pcre2_substring_list_get(), which extra cts | The getall modifier tests pcre2_substring_list_get(), which extra cts | |||
all captured substrings. | all captured substrings. | |||
If the subject line is successfully matched, the substrings extrac | If the subject line is successfully matched, the substrings extrac | |||
ted | ted | |||
by the convenience functions are output with C, G, or L after | by the convenience functions are output with C, G, or L after | |||
the | the | |||
string number instead of a colon. This is in addition to the nor | string number instead of a colon. This is in addition to the nor | |||
mal | mal | |||
full list. The string length (that is, the return from the extract | full list. The string length (that is, the return from the extract | |||
ion | ion | |||
function) is given in parentheses after each substring, followed by the | function) is given in parentheses after each substring, followed by the | |||
name when the extraction was by name. | name when the extraction was by name. | |||
Testing the substitution function | Testing the substitution function | |||
If the replace modifier is set, the pcre2_substitute() function | If the replace modifier is set, the pcre2_substitute() function | |||
is | is | |||
called instead of one of the matching functions (or after one call | called instead of one of the matching functions (or after one call | |||
of | of | |||
pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note that | pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note that | |||
re- | re- | |||
placement strings cannot contain commas, because a comma signifies | placement strings cannot contain commas, because a comma signifies | |||
the | the | |||
end of a modifier. This is not thought to be an issue in a test p | end of a modifier. This is not thought to be an issue in a test p | |||
ro- | ro- | |||
gram. | gram. | |||
Specifying a completely empty replacement string disables this mo | Specifying a completely empty replacement string disables this mo | |||
di- | di- | |||
fier. However, it is possible to specify an empty replacement by p | fier. However, it is possible to specify an empty replacement by p | |||
ro- | ro- | |||
viding a buffer length, as described below, for an otherwise empty | viding a buffer length, as described below, for an otherwise empty | |||
re- | re- | |||
placement. | placement. | |||
Unlike subject strings, pcre2test does not process replacement stri | Unlike subject strings, pcre2test does not process replacement stri | |||
ngs | ngs | |||
for escape sequences. In UTF mode, a replacement string is checked | for escape sequences. In UTF mode, a replacement string is checked | |||
to | to | |||
see if it is a valid UTF-8 string. If so, it is correctly converted | see if it is a valid UTF-8 string. If so, it is correctly converted | |||
to | to | |||
a UTF string of the appropriate code unit width. If it is not a va | a UTF string of the appropriate code unit width. If it is not a va | |||
lid | lid | |||
UTF-8 string, the individual code units are copied directly. This p | UTF-8 string, the individual code units are copied directly. This p | |||
ro- | ro- | |||
vides a means of passing an invalid UTF-8 string for testing purpose s. | vides a means of passing an invalid UTF-8 string for testing purpose s. | |||
The following modifiers set options (in additional to the normal ma tch | The following modifiers set options (in additional to the normal ma tch | |||
options) for pcre2_substitute(): | options) for pcre2_substitute(): | |||
global PCRE2_SUBSTITUTE_GLOBAL | global PCRE2_SUBSTITUTE_GLOBAL | |||
substitute_extended PCRE2_SUBSTITUTE_EXTENDED | substitute_extended PCRE2_SUBSTITUTE_EXTENDED | |||
substitute_literal PCRE2_SUBSTITUTE_LITERAL | substitute_literal PCRE2_SUBSTITUTE_LITERAL | |||
substitute_matched PCRE2_SUBSTITUTE_MATCHED | substitute_matched PCRE2_SUBSTITUTE_MATCHED | |||
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH | substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH | |||
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY | substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY | |||
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET | substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET | |||
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY | substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY | |||
See the pcre2api documentation for details of these options. | See the pcre2api documentation for details of these options. | |||
After a successful substitution, the modified string is output, p | After a successful substitution, the modified string is output, p | |||
re- | re- | |||
ceded by the number of replacements. This may be zero if there were | ceded by the number of replacements. This may be zero if there were | |||
no | no | |||
matches. Here is a simple example of a substitution test: | matches. Here is a simple example of a substitution test: | |||
/abc/replace=xxx | /abc/replace=xxx | |||
=abc=abc= | =abc=abc= | |||
1: =xxx=abc= | 1: =xxx=abc= | |||
=abc=abc=\=global | =abc=abc=\=global | |||
2: =xxx=xxx= | 2: =xxx=xxx= | |||
Subject and replacement strings should be kept relatively short (fe | Subject and replacement strings should be kept relatively short (fe | |||
wer | wer | |||
than 256 characters) for substitution tests, as fixed-size buffers | than 256 characters) for substitution tests, as fixed-size buffers | |||
are | are | |||
used. To make it easy to test for buffer overflow, if the replacem | used. To make it easy to test for buffer overflow, if the replacem | |||
ent | ent | |||
string starts with a number in square brackets, that number is pas | string starts with a number in square brackets, that number is pas | |||
sed | sed | |||
to pcre2_substitute() as the size of the output buffer, with the | to pcre2_substitute() as the size of the output buffer, with the | |||
re- | re- | |||
placement string starting at the next character. Here is an exam | placement string starting at the next character. Here is an exam | |||
ple | ple | |||
that tests the edge case: | that tests the edge case: | |||
/abc/ | /abc/ | |||
123abc123\=replace=[10]XYZ | 123abc123\=replace=[10]XYZ | |||
1: 123XYZ123 | 1: 123XYZ123 | |||
123abc123\=replace=[9]XYZ | 123abc123\=replace=[9]XYZ | |||
Failed: error -47: no more memory | Failed: error -47: no more memory | |||
The default action of pcre2_substitute() is to return PCRE2_ ER- | The default action of pcre2_substitute() is to return PCRE2_ ER- | |||
ROR_NOMEMORY when the output buffer is too small. However, if | ROR_NOMEMORY when the output buffer is too small. However, if | |||
the | the | |||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the subs | PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the subs | |||
ti- | ti- | |||
tute_overflow_length modifier), pcre2_substitute() continues to go | tute_overflow_length modifier), pcre2_substitute() continues to go | |||
through the motions of matching and substituting (but not doing | through the motions of matching and substituting (but not doing | |||
any | any | |||
callouts), in order to compute the size of buffer that is requir | callouts), in order to compute the size of buffer that is requir | |||
ed. | ed. | |||
When this happens, pcre2test shows the required buffer length (wh | When this happens, pcre2test shows the required buffer length (wh | |||
ich | ich | |||
includes space for the trailing zero) as part of the error message. For | includes space for the trailing zero) as part of the error message. For | |||
example: | example: | |||
/abc/substitute_overflow_length | /abc/substitute_overflow_length | |||
123abc123\=replace=[9]XYZ | 123abc123\=replace=[9]XYZ | |||
Failed: error -47: no more memory: 10 code units are needed | Failed: error -47: no more memory: 10 code units are needed | |||
A replacement string is ignored with POSIX and DFA matching. Specify ing | A replacement string is ignored with POSIX and DFA matching. Specify ing | |||
partial matching provokes an error return ("bad option value") f rom | partial matching provokes an error return ("bad option value") f rom | |||
pcre2_substitute(). | pcre2_substitute(). | |||
Testing substitute callouts | Testing substitute callouts | |||
If the substitute_callout modifier is set, a substitution callout fu nc- | If the substitute_callout modifier is set, a substitution callout fu nc- | |||
tion is set up. The null_context modifier must not be set, because | tion is set up. The null_context modifier must not be set, because | |||
the | the | |||
address of the callout function is passed in a match context. When | address of the callout function is passed in a match context. When | |||
the | the | |||
callout function is called (after each substitution), details of | callout function is called (after each substitution), details of | |||
the | the | |||
input and output strings are output. For example: | input and output strings are output. For example: | |||
/abc/g,replace=<$0>,substitute_callout | /abc/g,replace=<$0>,substitute_callout | |||
abcdefabcpqr | abcdefabcpqr | |||
1(1) Old 0 3 "abc" New 0 5 "<abc>" | 1(1) Old 0 3 "abc" New 0 5 "<abc>" | |||
2(1) Old 6 9 "abc" New 8 13 "<abc>" | 2(1) Old 6 9 "abc" New 8 13 "<abc>" | |||
2: <abc>def<abc>pqr | 2: <abc>def<abc>pqr | |||
The first number on each callout line is the count of matches. The | The first number on each callout line is the count of matches. The | |||
parenthesized number is the number of pairs that are set in the ovec tor | parenthesized number is the number of pairs that are set in the ovec tor | |||
(that is, one more than the number of capturing groups that were se t). | (that is, one more than the number of capturing groups that were se t). | |||
Then are listed the offsets of the old substring, its contents, and the | Then are listed the offsets of the old substring, its contents, and the | |||
same for the replacement. | same for the replacement. | |||
By default, the substitution callout function returns zero, which | By default, the substitution callout function returns zero, which | |||
ac- | ac- | |||
cepts the replacement and causes matching to continue if /g was us | cepts the replacement and causes matching to continue if /g was us | |||
ed. | ed. | |||
Two further modifiers can be used to test other return values. If s | Two further modifiers can be used to test other return values. If s | |||
ub- | ub- | |||
stitute_skip is set to a value greater than zero the callout funct | stitute_skip is set to a value greater than zero the callout funct | |||
ion | ion | |||
returns +1 for the match of that number, and similarly substitute_s | returns +1 for the match of that number, and similarly substitute_s | |||
top | top | |||
returns -1. These cause the replacement to be rejected, and -1 cau | returns -1. These cause the replacement to be rejected, and -1 cau | |||
ses | ses | |||
no further matching to take place. If either of them are set, subs | no further matching to take place. If either of them are set, subs | |||
ti- | ti- | |||
tute_callout is assumed. For example: | tute_callout is assumed. For example: | |||
/abc/g,replace=<$0>,substitute_skip=1 | /abc/g,replace=<$0>,substitute_skip=1 | |||
abcdefabcpqr | abcdefabcpqr | |||
1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED" | 1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED" | |||
2(1) Old 6 9 "abc" New 6 11 "<abc>" | 2(1) Old 6 9 "abc" New 6 11 "<abc>" | |||
2: abcdef<abc>pqr | 2: abcdef<abc>pqr | |||
abcdefabcpqr\=substitute_stop=1 | abcdefabcpqr\=substitute_stop=1 | |||
1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED" | 1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED" | |||
1: abcdefabcpqr | 1: abcdefabcpqr | |||
If both are set for the same number, stop takes precedence. Only a s in- | If both are set for the same number, stop takes precedence. Only a s in- | |||
gle skip or stop is supported, which is sufficient for testing that the | gle skip or stop is supported, which is sufficient for testing that the | |||
feature works. | feature works. | |||
Setting the JIT stack size | Setting the JIT stack size | |||
The jitstack modifier provides a way of setting the maximum stack s | The jitstack modifier provides a way of setting the maximum stack s | |||
ize | ize | |||
that is used by the just-in-time optimization code. It is ignored | that is used by the just-in-time optimization code. It is ignored | |||
if | if | |||
JIT optimization is not being used. The value is a number of kibiby | JIT optimization is not being used. The value is a number of kibiby | |||
tes | tes | |||
(units of 1024 bytes). Setting zero reverts to the default of 32K | (units of 1024 bytes). Setting zero reverts to the default of 32K | |||
iB. | iB. | |||
Providing a stack that is larger than the default is necessary only for | Providing a stack that is larger than the default is necessary only for | |||
very complicated patterns. If jitstack is set non-zero on a subj ect | very complicated patterns. If jitstack is set non-zero on a subj ect | |||
line it overrides any value that was set on the pattern. | line it overrides any value that was set on the pattern. | |||
Setting heap, match, and depth limits | Setting heap, match, and depth limits | |||
The heap_limit, match_limit, and depth_limit modifiers set the app | The heap_limit, match_limit, and depth_limit modifiers set the app | |||
ro- | ro- | |||
priate limits in the match context. These values are ignored when | priate limits in the match context. These values are ignored when | |||
the | the | |||
find_limits or find_limits_noheap modifier is specified. | find_limits or find_limits_noheap modifier is specified. | |||
Finding minimum limits | Finding minimum limits | |||
If the find_limits modifier is present on a subject line, pcre2t | If the find_limits modifier is present on a subject line, pcre2t | |||
est | est | |||
calls the relevant matching function several times, setting differ | calls the relevant matching function several times, setting differ | |||
ent | ent | |||
values in the match context via pcre2_set_heap_limit | values in the match context via pcre2_set_heap_limit | |||
(), | (), | |||
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds | pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds | |||
the | the | |||
smallest value for each parameter that allows the match to compl | smallest value for each parameter that allows the match to compl | |||
ete | ete | |||
without a "limit exceeded" error. The match itself may succeed or fa il. | without a "limit exceeded" error. The match itself may succeed or fa il. | |||
An alternative modifier, find_limits_noheap, omits the heap limit. T his | An alternative modifier, find_limits_noheap, omits the heap limit. T his | |||
is used in the standard tests, because the minimum heap limit var | is used in the standard tests, because the minimum heap limit var | |||
ies | ies | |||
between systems. If JIT is being used, only the match limit is re | between systems. If JIT is being used, only the match limit is re | |||
le- | le- | |||
vant, and the other two are automatically omitted. | vant, and the other two are automatically omitted. | |||
When using this modifier, the pattern should not contain any limit s et- | When using this modifier, the pattern should not contain any limit s et- | |||
tings such as (*LIMIT_MATCH=...) within it. If such a setting is | tings such as (*LIMIT_MATCH=...) within it. If such a setting is | |||
present and is lower than the minimum matching value, the minimum va lue | present and is lower than the minimum matching value, the minimum va lue | |||
cannot be found because pcre2_set_match_limit() etc. are only able to | cannot be found because pcre2_set_match_limit() etc. are only able to | |||
reduce the value of an in-pattern limit; they cannot increase it. | reduce the value of an in-pattern limit; they cannot increase it. | |||
For non-DFA matching, the minimum depth_limit number is a measure of | For non-DFA matching, the minimum depth_limit number is a measure of | |||
how much nested backtracking happens (that is, how deeply the patter n's | how much nested backtracking happens (that is, how deeply the patter n's | |||
tree is searched). In the case of DFA matching, depth_limit contr | tree is searched). In the case of DFA matching, depth_limit contr | |||
ols | ols | |||
the depth of recursive calls of the internal function that is used | the depth of recursive calls of the internal function that is used | |||
for | for | |||
handling pattern recursion, lookaround assertions, and atomic groups . | handling pattern recursion, lookaround assertions, and atomic groups . | |||
For non-DFA matching, the match_limit number is a measure of the amo unt | For non-DFA matching, the match_limit number is a measure of the amo unt | |||
of backtracking that takes place, and learning the minimum value can be | of backtracking that takes place, and learning the minimum value can be | |||
instructive. For most simple matches, the number is quite small, | instructive. For most simple matches, the number is quite small, | |||
but | but | |||
for patterns with very large numbers of matching possibilities, it | for patterns with very large numbers of matching possibilities, it | |||
can | can | |||
become large very quickly with increasing length of subject string. | become large very quickly with increasing length of subject string. | |||
In | In | |||
the case of DFA matching, match_limit controls the total number | the case of DFA matching, match_limit controls the total number | |||
of | of | |||
calls, both recursive and non-recursive, to the internal matching fu nc- | calls, both recursive and non-recursive, to the internal matching fu nc- | |||
tion, thus controlling the overall amount of computing resource that is | tion, thus controlling the overall amount of computing resource that is | |||
used. | used. | |||
For both kinds of matching, the heap_limit number, which is | For both kinds of matching, the heap_limit number, which is | |||
in | in | |||
kibibytes (units of 1024 bytes), limits the amount of heap memory u | kibibytes (units of 1024 bytes), limits the amount of heap memory u | |||
sed | sed | |||
for matching. | for matching. | |||
Showing MARK names | Showing MARK names | |||
The mark modifier causes the names from backtracking control verbs t hat | The mark modifier causes the names from backtracking control verbs t hat | |||
are returned from calls to pcre2_match() to be displayed. If a mark | are returned from calls to pcre2_match() to be displayed. If a mark | |||
is | is | |||
returned for a match, non-match, or partial match, pcre2test shows | returned for a match, non-match, or partial match, pcre2test shows | |||
it. | it. | |||
For a match, it is on a line by itself, tagged with "MK:". Otherwi | For a match, it is on a line by itself, tagged with "MK:". Otherwi | |||
se, | se, | |||
it is added to the non-match message. | it is added to the non-match message. | |||
Showing memory usage | Showing memory usage | |||
The memory modifier causes pcre2test to log the sizes of all heap m | The memory modifier causes pcre2test to log the sizes of all heap m | |||
em- | em- | |||
ory allocation and freeing calls that occur during a call | ory allocation and freeing calls that occur during a call | |||
to | to | |||
pcre2_match() or pcre2_dfa_match(). In the latter case, heap memory | pcre2_match() or pcre2_dfa_match(). In the latter case, heap memory | |||
is | is | |||
used only when a match requires more internal workspace that the | used only when a match requires more internal workspace that the | |||
de- | de- | |||
fault allocation on the stack, so in many cases there will be no o | fault allocation on the stack, so in many cases there will be no o | |||
ut- | ut- | |||
put. No heap memory is allocated during matching with JIT. For t | put. No heap memory is allocated during matching with JIT. For t | |||
his | his | |||
modifier to work, the null_context modifier must not be set on both the | modifier to work, the null_context modifier must not be set on both the | |||
pattern and the subject, though it can be set on one or the other. | pattern and the subject, though it can be set on one or the other. | |||
Showing the heap frame overall vector size | Showing the heap frame overall vector size | |||
The heapframes_size modifier is relevant for matches us ing | The heapframes_size modifier is relevant for matches us ing | |||
pcre2_match() without JIT. After a match has run (whether successful or | pcre2_match() without JIT. After a match has run (whether successful or | |||
not) the size, in bytes, of the allocated heap frames vector that | not) the size, in bytes, of the allocated heap frames vector that | |||
is | is | |||
left attached to the match data block is shown. If the matching act | left attached to the match data block is shown. If the matching act | |||
ion | ion | |||
involved several calls to pcre2_match() (for example, global match | involved several calls to pcre2_match() (for example, global match | |||
ing | ing | |||
or for timing) only the final value is shown. | or for timing) only the final value is shown. | |||
This modifier is ignored, with a warning, for POSIX or DFA matchi ng. | This modifier is ignored, with a warning, for POSIX or DFA matchi ng. | |||
JIT matching does not use the heap frames vector, so the size is alw ays | JIT matching does not use the heap frames vector, so the size is alw ays | |||
zero, unless there was a previous non-JIT match. Note that specifing a | zero, unless there was a previous non-JIT match. Note that specifin g a | |||
size of zero for the output vector (see below) causes pcre2test to f ree | size of zero for the output vector (see below) causes pcre2test to f ree | |||
its match data block (and associated heap frames vector) and allocat e a | its match data block (and associated heap frames vector) and allocat e a | |||
new one. | new one. | |||
Setting a starting offset | Setting a starting offset | |||
The offset modifier sets an offset in the subject string at wh ich | The offset modifier sets an offset in the subject string at wh ich | |||
matching starts. Its value is a number of code units, not characters . | matching starts. Its value is a number of code units, not characters . | |||
Setting an offset limit | Setting an offset limit | |||
The offset_limit modifier sets a limit for unanchored matches. If a | The offset_limit modifier sets a limit for unanchored matches. I f a | |||
match cannot be found starting at or before this offset in the subje ct, | match cannot be found starting at or before this offset in the subje ct, | |||
a "no match" return is given. The data value is a number of code uni ts, | a "no match" return is given. The data value is a number of code uni ts, | |||
not characters. When this modifier is used, the use_offset_limit mo di- | not characters. When this modifier is used, the use_offset_limit mo di- | |||
fier must have been set for the pattern; if not, an error is generat ed. | fier must have been set for the pattern; if not, an error is generat ed. | |||
Setting the size of the output vector | Setting the size of the output vector | |||
The ovector modifier applies only to the subject line in which it ap- | The ovector modifier applies only to the subject line in which it ap- | |||
pears, though of course it can also be used to set a default in a #s ub- | pears, though of course it can also be used to set a default in a #s ub- | |||
ject command. It specifies the number of pairs of offsets that are | ject command. It specifies the number of pairs of offsets that are | |||
available for storing matching information. The default is 15. | available for storing matching information. The default is 15. | |||
A value of zero is useful when testing the POSIX API because it cau ses | A value of zero is useful when testing the POSIX API because it cau ses | |||
regexec() to be called with a NULL capture vector. When not testing the | regexec() to be called with a NULL capture vector. When not testing the | |||
POSIX API, a value of zero is used to cause pcre2_match_data_c | POSIX API, a value of zero is used to cause pcre2_match_data_c | |||
re- | re- | |||
ate_from_pattern() to be called, in order to create a new match bl | ate_from_pattern() to be called, in order to create a new match bl | |||
ock | ock | |||
of exactly the right size for the pattern. (It is not possible to c | of exactly the right size for the pattern. (It is not possible to c | |||
re- | re- | |||
ate a match block with a zero-length ovector; there is always at le | ate a match block with a zero-length ovector; there is always at le | |||
ast | ast | |||
one pair of offsets.) The old match data block is freed. | one pair of offsets.) The old match data block is freed. | |||
Passing the subject as zero-terminated | Passing the subject as zero-terminated | |||
By default, the subject string is passed to a native API matching fu nc- | By default, the subject string is passed to a native API matching fu nc- | |||
tion with its correct length. In order to test the facility for pass ing | tion with its correct length. In order to test the facility for pass ing | |||
a zero-terminated string, the zero_terminate modifier is provided. | a zero-terminated string, the zero_terminate modifier is provided. | |||
It | It | |||
causes the length to be passed as PCRE2_ZERO_TERMINATED. When match | causes the length to be passed as PCRE2_ZERO_TERMINATED. When match | |||
ing | ing | |||
via the POSIX interface, this modifier is ignored, with a warning. | via the POSIX interface, this modifier is ignored, with a warning. | |||
When testing pcre2_substitute(), this modifier also has the effect of | When testing pcre2_substitute(), this modifier also has the effect of | |||
passing the replacement string as zero-terminated. | passing the replacement string as zero-terminated. | |||
Passing a NULL context, subject, or replacement | Passing a NULL context, subject, or replacement | |||
Normally, pcre2test passes a context block to pcre2_match | Normally, pcre2test passes a context block to pcre2_match | |||
(), | (), | |||
pcre2_dfa_match(), pcre2_jit_match() or pcre2_substitute(). If | pcre2_dfa_match(), pcre2_jit_match() or pcre2_substitute(). If | |||
the | the | |||
null_context modifier is set, however, NULL is passed. This is | null_context modifier is set, however, NULL is passed. This is | |||
for | for | |||
testing that the matching and substitution functions behave correc | testing that the matching and substitution functions behave correc | |||
tly | tly | |||
in this case (they use default values). This modifier cannot be u | in this case (they use default values). This modifier cannot be u | |||
sed | sed | |||
with the find_limits, find_limits_noheap, or substitute_callout mo | with the find_limits, find_limits_noheap, or substitute_callout mo | |||
di- | di- | |||
fiers. | fiers. | |||
Similarly, for testing purposes, if the null_subject or null_repla | Similarly, for testing purposes, if the null_subject or null_repla | |||
ce- | ce- | |||
ment modifier is set, the subject or replacement string pointers | ment modifier is set, the subject or replacement string pointers | |||
are | are | |||
passed as NULL, respectively, to the relevant functions. | passed as NULL, respectively, to the relevant functions. | |||
THE ALTERNATIVE MATCHING FUNCTION | THE ALTERNATIVE MATCHING FUNCTION | |||
By default, pcre2test uses the standard PCRE2 matching functi on, | By default, pcre2test uses the standard PCRE2 matching functi on, | |||
pcre2_match() to match each subject line. PCRE2 also supports an alt er- | pcre2_match() to match each subject line. PCRE2 also supports an alt er- | |||
native matching function, pcre2_dfa_match(), which operates in a d | native matching function, pcre2_dfa_match(), which operates in a d | |||
if- | if- | |||
ferent way, and has some restrictions. The differences between the | ferent way, and has some restrictions. The differences between the | |||
two | two | |||
functions are described in the pcre2matching documentation. | functions are described in the pcre2matching documentation. | |||
If the dfa modifier is set, the alternative matching function is us | If the dfa modifier is set, the alternative matching function is us | |||
ed. | ed. | |||
This function finds all possible matches at a given point in the s | This function finds all possible matches at a given point in the s | |||
ub- | ub- | |||
ject. If, however, the dfa_shortest modifier is set, processing st | ject. If, however, the dfa_shortest modifier is set, processing st | |||
ops | ops | |||
after the first match is found. This is always the shortest possi | after the first match is found. This is always the shortest possi | |||
ble | ble | |||
match. | match. | |||
DEFAULT OUTPUT FROM pcre2test | DEFAULT OUTPUT FROM pcre2test | |||
This section describes the output when the normal matching functi on, | This section describes the output when the normal matching functi on, | |||
pcre2_match(), is being used. | pcre2_match(), is being used. | |||
When a match succeeds, pcre2test outputs the list of captured s | When a match succeeds, pcre2test outputs the list of captured s | |||
ub- | ub- | |||
strings, starting with number 0 for the string that matched the wh | strings, starting with number 0 for the string that matched the wh | |||
ole | ole | |||
pattern. Otherwise, it outputs "No match" when the return is PCRE2_ ER- | pattern. Otherwise, it outputs "No match" when the return is PCRE2_ ER- | |||
ROR_NOMATCH, or "Partial match:" followed by the partially match | ROR_NOMATCH, or "Partial match:" followed by the partially match | |||
ing | ing | |||
substring when the return is PCRE2_ERROR_PARTIAL. (Note that this | substring when the return is PCRE2_ERROR_PARTIAL. (Note that this | |||
is | is | |||
the entire substring that was inspected during the partial match; | the entire substring that was inspected during the partial match; | |||
it | it | |||
may include characters before the actual match start if a lookbeh | may include characters before the actual match start if a lookbeh | |||
ind | ind | |||
assertion, \K, \b, or \B was involved.) | assertion, \K, \b, or \B was involved.) | |||
For any other return, pcre2test outputs the PCRE2 negative error num ber | For any other return, pcre2test outputs the PCRE2 negative error num ber | |||
and a short descriptive phrase. If the error is a failed UTF str | and a short descriptive phrase. If the error is a failed UTF str | |||
ing | ing | |||
check, the code unit offset of the start of the failing character | check, the code unit offset of the start of the failing character | |||
is | is | |||
also output. Here is an example of an interactive pcre2test run. | also output. Here is an example of an interactive pcre2test run. | |||
$ pcre2test | $ pcre2test | |||
PCRE2 version 10.22 2016-07-29 | PCRE2 version 10.22 2016-07-29 | |||
re> /^abc(\d+)/ | re> /^abc(\d+)/ | |||
data> abc123 | data> abc123 | |||
0: abc123 | 0: abc123 | |||
1: 123 | 1: 123 | |||
data> xyz | data> xyz | |||
No match | No match | |||
Unset capturing substrings that are not followed by one that is set are | Unset capturing substrings that are not followed by one that is set are | |||
not shown by pcre2test unless the allcaptures modifier is specified. In | not shown by pcre2test unless the allcaptures modifier is specified. In | |||
the following example, there are two capturing substrings, but when the | the following example, there are two capturing substrings, but when the | |||
first data line is matched, the second, unset substring is not sho | first data line is matched, the second, unset substring is not sho | |||
wn. | wn. | |||
An "internal" unset substring is shown as "<unset>", as for the sec | An "internal" unset substring is shown as "<unset>", as for the sec | |||
ond | ond | |||
data line. | data line. | |||
re> /(a)|(b)/ | re> /(a)|(b)/ | |||
data> a | data> a | |||
0: a | 0: a | |||
1: a | 1: a | |||
data> b | data> b | |||
0: b | 0: b | |||
1: <unset> | 1: <unset> | |||
2: b | 2: b | |||
If the strings contain any non-printing characters, they are output | If the strings contain any non-printing characters, they are output | |||
as | as | |||
\xhh escapes if the value is less than 256 and UTF mode is not s | \xhh escapes if the value is less than 256 and UTF mode is not s | |||
et. | et. | |||
Otherwise they are output as \x{hh...} escapes. See below for the de fi- | Otherwise they are output as \x{hh...} escapes. See below for the de fi- | |||
nition of non-printing characters. If the aftertext modifier is s | nition of non-printing characters. If the aftertext modifier is s | |||
et, | et, | |||
the output for substring 0 is followed by the rest of the subj | the output for substring 0 is followed by the rest of the subj | |||
ect | ect | |||
string, identified by "0+" like this: | string, identified by "0+" like this: | |||
re> /cat/aftertext | re> /cat/aftertext | |||
data> cataract | data> cataract | |||
0: cat | 0: cat | |||
0+ aract | 0+ aract | |||
If global matching is requested, the results of successive matching at- | If global matching is requested, the results of successive matching at- | |||
tempts are output in sequence, like this: | tempts are output in sequence, like this: | |||
re> /\Bi(\w\w)/g | re> /\Bi(\w\w)/g | |||
data> Mississippi | data> Mississippi | |||
0: iss | 0: iss | |||
1: ss | 1: ss | |||
0: iss | 0: iss | |||
1: ss | 1: ss | |||
0: ipp | 0: ipp | |||
1: pp | 1: pp | |||
"No match" is output only if the first match attempt fails. Here is | "No match" is output only if the first match attempt fails. Here is | |||
an | an | |||
example of a failure message (the offset 4 that is specified by | example of a failure message (the offset 4 that is specified by | |||
the | the | |||
offset modifier is past the end of the subject string): | offset modifier is past the end of the subject string): | |||
re> /xyz/ | re> /xyz/ | |||
data> xyz\=offset=4 | data> xyz\=offset=4 | |||
Error -24 (bad offset value) | Error -24 (bad offset value) | |||
Note that whereas patterns can be continued over several lines (a pl ain | Note that whereas patterns can be continued over several lines (a pl ain | |||
">" prompt is used for continuations), subject lines may not. Howe ver | ">" prompt is used for continuations), subject lines may not. Howe ver | |||
newlines can be included in a subject by means of the \n escape (or \r, | newlines can be included in a subject by means of the \n escape (or \r, | |||
\r\n, etc., depending on the newline sequence setting). | \r\n, etc., depending on the newline sequence setting). | |||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION | OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION | |||
When the alternative matching function, pcre2_dfa_match(), is used, the | When the alternative matching function, pcre2_dfa_match(), is used, the | |||
output consists of a list of all the matches that start at the fi rst | output consists of a list of all the matches that start at the fi rst | |||
point in the subject where there is at least one match. For example: | point in the subject where there is at least one match. For example: | |||
re> /(tang|tangerine|tan)/ | re> /(tang|tangerine|tan)/ | |||
data> yellow tangerine\=dfa | data> yellow tangerine\=dfa | |||
0: tangerine | 0: tangerine | |||
1: tang | 1: tang | |||
2: tan | 2: tan | |||
Using the normal matching function on this data finds only "tang". | Using the normal matching function on this data finds only "tang". | |||
The | The | |||
longest matching string is always given first (and numbered zero). | longest matching string is always given first (and numbered zero). | |||
Af- | Af- | |||
ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", f | ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", f | |||
ol- | ol- | |||
lowed by the partially matching substring. Note that this is the ent ire | lowed by the partially matching substring. Note that this is the ent ire | |||
substring that was inspected during the partial match; it may incl ude | substring that was inspected during the partial match; it may incl ude | |||
characters before the actual match start if a lookbehind assertion, \b, | characters before the actual match start if a lookbehind assertion, \b, | |||
or \B was involved. (\K is not supported for DFA matching.) | or \B was involved. (\K is not supported for DFA matching.) | |||
If global matching is requested, the search for further matches resu mes | If global matching is requested, the search for further matches resu mes | |||
at the end of the longest match. For example: | at the end of the longest match. For example: | |||
re> /(tang|tangerine|tan)/g | re> /(tang|tangerine|tan)/g | |||
data> yellow tangerine and tangy sultana\=dfa | data> yellow tangerine and tangy sultana\=dfa | |||
0: tangerine | 0: tangerine | |||
1: tang | 1: tang | |||
2: tan | 2: tan | |||
0: tang | 0: tang | |||
1: tan | 1: tan | |||
0: tan | 0: tan | |||
The alternative matching function does not support substring captu | The alternative matching function does not support substring captu | |||
re, | re, | |||
so the modifiers that are concerned with captured substrings are | so the modifiers that are concerned with captured substrings are | |||
not | not | |||
relevant. | relevant. | |||
RESTARTING AFTER A PARTIAL MATCH | RESTARTING AFTER A PARTIAL MATCH | |||
When the alternative matching function has given the PCRE2_ERROR_P AR- | When the alternative matching function has given the PCRE2_ERROR_P AR- | |||
TIAL return, indicating that the subject partially matched the patte rn, | TIAL return, indicating that the subject partially matched the patte rn, | |||
you can restart the match with additional subject data by means of the | you can restart the match with additional subject data by means of the | |||
dfa_restart modifier. For example: | dfa_restart modifier. For example: | |||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d $/ | re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d $/ | |||
data> 23ja\=ps,dfa | data> 23ja\=ps,dfa | |||
Partial match: 23ja | Partial match: 23ja | |||
data> n05\=dfa,dfa_restart | data> n05\=dfa,dfa_restart | |||
0: n05 | 0: n05 | |||
For further information about partial matching, see the pcre2part ial | For further information about partial matching, see the pcre2part ial | |||
documentation. | documentation. | |||
CALLOUTS | CALLOUTS | |||
If the pattern contains any callout requests, pcre2test's callout fu nc- | If the pattern contains any callout requests, pcre2test's callout fu nc- | |||
tion is called during matching unless callout_none is specified. T his | tion is called during matching unless callout_none is specified. T his | |||
works with both matching functions, and with JIT, though there are s ome | works with both matching functions, and with JIT, though there are s ome | |||
differences in behaviour. The output for callouts with numerical ar gu- | differences in behaviour. The output for callouts with numerical ar gu- | |||
ments and those with string arguments is slightly different. | ments and those with string arguments is slightly different. | |||
Callouts with numerical arguments | Callouts with numerical arguments | |||
By default, the callout function displays the callout number, the st art | By default, the callout function displays the callout number, the st art | |||
and current positions in the subject text at the callout time, and the | and current positions in the subject text at the callout time, and the | |||
next pattern item to be tested. For example: | next pattern item to be tested. For example: | |||
--->pqrabcdef | --->pqrabcdef | |||
0 ^ ^ \d | 0 ^ ^ \d | |||
This output indicates that callout number 0 occurred for a match | This output indicates that callout number 0 occurred for a match | |||
at- | at- | |||
tempt starting at the fourth character of the subject string, when | tempt starting at the fourth character of the subject string, when | |||
the | the | |||
pointer was at the seventh character, and when the next pattern i | pointer was at the seventh character, and when the next pattern i | |||
tem | tem | |||
was \d. Just one circumflex is output if the start and current po | was \d. Just one circumflex is output if the start and current po | |||
si- | si- | |||
tions are the same, or if the current position precedes the start po si- | tions are the same, or if the current position precedes the start po si- | |||
tion, which can happen if the callout is in a lookbehind assertion. | tion, which can happen if the callout is in a lookbehind assertion. | |||
Callouts numbered 255 are assumed to be automatic callouts, inserted as | Callouts numbered 255 are assumed to be automatic callouts, inserted as | |||
a result of the auto_callout pattern modifier. In this case, instead of | a result of the auto_callout pattern modifier. In this case, instead of | |||
showing the callout number, the offset in the pattern, preceded by a | showing the callout number, the offset in the pattern, preceded b y a | |||
plus, is output. For example: | plus, is output. For example: | |||
re> /\d?[A-E]\*/auto_callout | re> /\d?[A-E]\*/auto_callout | |||
data> E* | data> E* | |||
--->E* | --->E* | |||
+0 ^ \d? | +0 ^ \d? | |||
+3 ^ [A-E] | +3 ^ [A-E] | |||
+8 ^^ \* | +8 ^^ \* | |||
+10 ^ ^ | +10 ^ ^ | |||
0: E* | 0: E* | |||
skipping to change at line 1763 | skipping to change at line 1772 | |||
data> abc | data> abc | |||
--->abc | --->abc | |||
+0 ^ a | +0 ^ a | |||
+1 ^^ (*MARK:X) | +1 ^^ (*MARK:X) | |||
+10 ^^ b | +10 ^^ b | |||
Latest Mark: X | Latest Mark: X | |||
+11 ^ ^ c | +11 ^ ^ c | |||
+12 ^ ^ | +12 ^ ^ | |||
0: abc | 0: abc | |||
The mark changes between matching "a" and "b", but stays the same | The mark changes between matching "a" and "b", but stays the same | |||
for | for | |||
the rest of the match, so nothing more is output. If, as a result | the rest of the match, so nothing more is output. If, as a result | |||
of | of | |||
backtracking, the mark reverts to being unset, the text "<unset>" | backtracking, the mark reverts to being unset, the text "<unset>" | |||
is | is | |||
output. | output. | |||
Callouts with string arguments | Callouts with string arguments | |||
The output for a callout with a string argument is similar, except t hat | The output for a callout with a string argument is similar, except t hat | |||
instead of outputting a callout number before the position indicato | instead of outputting a callout number before the position indicato | |||
rs, | rs, | |||
the callout string and its offset in the pattern string are output | the callout string and its offset in the pattern string are output | |||
be- | be- | |||
fore the reflection of the subject string, and the subject string | fore the reflection of the subject string, and the subject string | |||
is | is | |||
reflected for each callout. For example: | reflected for each callout. For example: | |||
re> /^ab(?C'first')cd(?C"second")ef/ | re> /^ab(?C'first')cd(?C"second")ef/ | |||
data> abcdefg | data> abcdefg | |||
Callout (7): 'first' | Callout (7): 'first' | |||
--->abcdefg | --->abcdefg | |||
^ ^ c | ^ ^ c | |||
Callout (20): "second" | Callout (20): "second" | |||
--->abcdefg | --->abcdefg | |||
^ ^ e | ^ ^ e | |||
0: abcdef | 0: abcdef | |||
Callout modifiers | Callout modifiers | |||
The callout function in pcre2test returns zero (carry on matching) | The callout function in pcre2test returns zero (carry on matching) | |||
by | by | |||
default, but you can use a callout_fail modifier in a subject line | default, but you can use a callout_fail modifier in a subject line | |||
to | to | |||
change this and other parameters of the callout (see below). | change this and other parameters of the callout (see below). | |||
If the callout_capture modifier is set, the current captured groups are | If the callout_capture modifier is set, the current captured groups are | |||
output when a callout occurs. This is useful only for non-DFA matchi ng, | output when a callout occurs. This is useful only for non-DFA matchi ng, | |||
as pcre2_dfa_match() does not support capturing, so no captures are | as pcre2_dfa_match() does not support capturing, so no captures are | |||
ever shown. | ever shown. | |||
The normal callout output, showing the callout number or pattern off set | The normal callout output, showing the callout number or pattern off set | |||
(as described above) is suppressed if the callout_no_where modifier is | (as described above) is suppressed if the callout_no_where modifier is | |||
set. | set. | |||
When using the interpretive matching function pcre2_match() with | When using the interpretive matching function pcre2_match() with | |||
out | out | |||
JIT, setting the callout_extra modifier causes additional output f | JIT, setting the callout_extra modifier causes additional output f | |||
rom | rom | |||
pcre2test's callout function to be generated. For the first callout | pcre2test's callout function to be generated. For the first callout | |||
in | in | |||
a match attempt at a new starting position in the subject, "New ma | a match attempt at a new starting position in the subject, "New ma | |||
tch | tch | |||
attempt" is output. If there has been a backtrack since the last ca | attempt" is output. If there has been a backtrack since the last ca | |||
ll- | ll- | |||
out (or start of matching if this is the first callout), "Backtrack" is | out (or start of matching if this is the first callout), "Backtrack" is | |||
output, followed by "No other matching paths" if the backtrack en ded | output, followed by "No other matching paths" if the backtrack en ded | |||
the previous match attempt. For example: | the previous match attempt. For example: | |||
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess | re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess | |||
data> aac\=callout_extra | data> aac\=callout_extra | |||
New match attempt | New match attempt | |||
--->aac | --->aac | |||
+0 ^ ( | +0 ^ ( | |||
+1 ^ a+ | +1 ^ a+ | |||
+3 ^ ^ ) | +3 ^ ^ ) | |||
+4 ^ ^ b | +4 ^ ^ b | |||
skipping to change at line 1844 | skipping to change at line 1853 | |||
+0 ^ ( | +0 ^ ( | |||
+1 ^ a+ | +1 ^ a+ | |||
Backtrack | Backtrack | |||
No other matching paths | No other matching paths | |||
New match attempt | New match attempt | |||
--->aac | --->aac | |||
+0 ^ ( | +0 ^ ( | |||
+1 ^ a+ | +1 ^ a+ | |||
No match | No match | |||
Notice that various optimizations must be turned off if you want | Notice that various optimizations must be turned off if you want | |||
all | all | |||
possible matching paths to be scanned. If no_start_optimize is | possible matching paths to be scanned. If no_start_optimize is | |||
not | not | |||
used, there is an immediate "no match", without any callouts, beca | used, there is an immediate "no match", without any callouts, beca | |||
use | use | |||
the starting optimization fails to find "b" in the subject, which | the starting optimization fails to find "b" in the subject, which | |||
it | it | |||
knows must be present for any match. If no_auto_possess is not us | knows must be present for any match. If no_auto_possess is not us | |||
ed, | ed, | |||
the "a+" item is turned into "a++", which reduces the number of ba | the "a+" item is turned into "a++", which reduces the number of ba | |||
ck- | ck- | |||
tracks. | tracks. | |||
The callout_extra modifier has no effect if used with the DFA match ing | The callout_extra modifier has no effect if used with the DFA match ing | |||
function, or with JIT. | function, or with JIT. | |||
Return values from callouts | Return values from callouts | |||
The default return from the callout function is zero, which all ows | The default return from the callout function is zero, which all ows | |||
matching to continue. The callout_fail modifier can be given one or two | matching to continue. The callout_fail modifier can be given one or two | |||
numbers. If there is only one number, 1 is returned instead of 0 (ca us- | numbers. If there is only one number, 1 is returned instead of 0 (ca us- | |||
ing matching to backtrack) when a callout of that number is reached. If | ing matching to backtrack) when a callout of that number is reached. If | |||
two numbers (<n>:<m>) are given, 1 is returned when callout <n> | two numbers (<n>:<m>) are given, 1 is returned when callout <n> | |||
is | is | |||
reached and there have been at least <m> callouts. The callout_er | reached and there have been at least <m> callouts. The callout_er | |||
ror | ror | |||
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, ca us- | modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, ca us- | |||
ing the entire matching process to be aborted. If both these modifi | ing the entire matching process to be aborted. If both these modifi | |||
ers | ers | |||
are set for the same callout number, callout_error takes preceden | are set for the same callout number, callout_error takes preceden | |||
ce. | ce. | |||
Note that callouts with string arguments are always given the num | Note that callouts with string arguments are always given the num | |||
ber | ber | |||
zero. | zero. | |||
The callout_data modifier can be given an unsigned or a negative n | The callout_data modifier can be given an unsigned or a negative n | |||
um- | um- | |||
ber. This is set as the "user data" that is passed to the match | ber. This is set as the "user data" that is passed to the match | |||
ing | ing | |||
function, and passed back when the callout function is invoked. | function, and passed back when the callout function is invoked. | |||
Any | Any | |||
value other than zero is used as a return from pcre2test's call | value other than zero is used as a return from pcre2test's call | |||
out | out | |||
function. | function. | |||
Inserting callouts can be helpful when using pcre2test to check comp li- | Inserting callouts can be helpful when using pcre2test to check comp li- | |||
cated regular expressions. For further information about callouts, see | cated regular expressions. For further information about callouts, see | |||
the pcre2callout documentation. | the pcre2callout documentation. | |||
NON-PRINTING CHARACTERS | NON-PRINTING CHARACTERS | |||
When pcre2test is outputting text in the compiled version of a patte rn, | When pcre2test is outputting text in the compiled version of a patte rn, | |||
bytes other than 32-126 are always treated as non-printing charact ers | bytes other than 32-126 are always treated as non-printing charact ers | |||
and are therefore shown as hex escapes. | and are therefore shown as hex escapes. | |||
When pcre2test is outputting text that is a matched part of a subj | When pcre2test is outputting text that is a matched part of a subj | |||
ect | ect | |||
string, it behaves in the same way, unless a different locale has b | string, it behaves in the same way, unless a different locale has b | |||
een | een | |||
set for the pattern (using the locale modifier). In this case, the | set for the pattern (using the locale modifier). In this case, the | |||
is- | is- | |||
print() function is used to distinguish printing and non-printing ch ar- | print() function is used to distinguish printing and non-printing ch ar- | |||
acters. | acters. | |||
SAVING AND RESTORING COMPILED PATTERNS | SAVING AND RESTORING COMPILED PATTERNS | |||
It is possible to save compiled patterns on disc or elsewhere, and | It is possible to save compiled patterns on disc or elsewhere, and | |||
re- | re- | |||
load them later, subject to a number of restrictions. JIT data can | load them later, subject to a number of restrictions. JIT data can | |||
not | not | |||
be saved. The host on which the patterns are reloaded must be runn | be saved. The host on which the patterns are reloaded must be runn | |||
ing | ing | |||
the same version of PCRE2, with the same code unit width, and must a lso | the same version of PCRE2, with the same code unit width, and must a lso | |||
have the same endianness, pointer width and PCRE2_SIZE type. Bef | have the same endianness, pointer width and PCRE2_SIZE type. Bef | |||
ore | ore | |||
compiled patterns can be saved they must be serialized, that is, c | compiled patterns can be saved they must be serialized, that is, c | |||
on- | on- | |||
verted to a stream of bytes. A single byte stream may contain any n | verted to a stream of bytes. A single byte stream may contain any n | |||
um- | um- | |||
ber of compiled patterns, but they must all use the same character | ber of compiled patterns, but they must all use the same character | |||
ta- | ta- | |||
bles. A single copy of the tables is included in the byte stream ( | bles. A single copy of the tables is included in the byte stream ( | |||
its | its | |||
size is 1088 bytes). | size is 1088 bytes). | |||
The functions whose names begin with pcre2_serialize_ are used for | The functions whose names begin with pcre2_serialize_ are used for | |||
se- | se- | |||
rializing and de-serializing. They are described in the pcre2serial | rializing and de-serializing. They are described in the pcre2serial | |||
ize | ize | |||
documentation. In this section we describe the features of pcre2t | documentation. In this section we describe the features of pcre2t | |||
est | est | |||
that can be used to test these functions. | that can be used to test these functions. | |||
Note that "serialization" in PCRE2 does not convert compiled patte | Note that "serialization" in PCRE2 does not convert compiled patte | |||
rns | rns | |||
to an abstract format like Java or .NET. It just makes a reloada | to an abstract format like Java or .NET. It just makes a reloada | |||
ble | ble | |||
byte code stream. Hence the restrictions on reloading mentioned abo ve. | byte code stream. Hence the restrictions on reloading mentioned abo ve. | |||
In pcre2test, when a pattern with push modifier is successfully c | In pcre2test, when a pattern with push modifier is successfully c | |||
om- | om- | |||
piled, it is pushed onto a stack of compiled patterns, and pcre2t | piled, it is pushed onto a stack of compiled patterns, and pcre2t | |||
est | est | |||
expects the next line to contain a new pattern (or command) instead | expects the next line to contain a new pattern (or command) instead | |||
of | of | |||
a subject line. By contrast, the pushcopy modifier causes a copy of the | a subject line. By contrast, the pushcopy modifier causes a copy of the | |||
compiled pattern to be stacked, leaving the original available for | compiled pattern to be stacked, leaving the original available for | |||
im- | im- | |||
mediate matching. By using push and/or pushcopy, a number of patte | mediate matching. By using push and/or pushcopy, a number of patte | |||
rns | rns | |||
can be compiled and retained. These modifiers are incompatible w | can be compiled and retained. These modifiers are incompatible w | |||
ith | ith | |||
posix, and control modifiers that act at match time are ignored (wit h a | posix, and control modifiers that act at match time are ignored (wit h a | |||
message) for the stacked patterns. The jitverify modifier applies o nly | message) for the stacked patterns. The jitverify modifier applies o nly | |||
at compile time. | at compile time. | |||
The command | The command | |||
#save <filename> | #save <filename> | |||
causes all the stacked patterns to be serialized and the result writ ten | causes all the stacked patterns to be serialized and the result writ ten | |||
to the named file. Afterwards, all the stacked patterns are freed. The | to the named file. Afterwards, all the stacked patterns are freed. The | |||
command | command | |||
#load <filename> | #load <filename> | |||
reads the data in the file, and then arranges for it to be de-seri | reads the data in the file, and then arranges for it to be de-seri | |||
al- | al- | |||
ized, with the resulting compiled patterns added to the pattern sta | ized, with the resulting compiled patterns added to the pattern sta | |||
ck. | ck. | |||
The pattern on the top of the stack can be retrieved by the #pop c | The pattern on the top of the stack can be retrieved by the #pop c | |||
om- | om- | |||
mand, which must be followed by lines of subjects that are to | mand, which must be followed by lines of subjects that are to | |||
be | be | |||
matched with the pattern, terminated as usual by an empty line or | matched with the pattern, terminated as usual by an empty line or | |||
end | end | |||
of file. This command may be followed by a modifier list contain | of file. This command may be followed by a modifier list contain | |||
ing | ing | |||
only control modifiers that act after a pattern has been compiled. | only control modifiers that act after a pattern has been compiled. | |||
In | In | |||
particular, hex, posix, posix_nosub, push, and pushcopy are not | particular, hex, posix, posix_nosub, push, and pushcopy are not | |||
al- | al- | |||
lowed, nor are any option-setting modifiers. The JIT modifiers a | lowed, nor are any option-setting modifiers. The JIT modifiers a | |||
re, | re, | |||
however permitted. Here is an example that saves and reloads two p | however permitted. Here is an example that saves and reloads two p | |||
at- | at- | |||
terns. | terns. | |||
/abc/push | /abc/push | |||
/xyz/push | /xyz/push | |||
#save tempfile | #save tempfile | |||
#load tempfile | #load tempfile | |||
#pop info | #pop info | |||
xyz | xyz | |||
#pop jit,bincode | #pop jit,bincode | |||
abc | abc | |||
If jitverify is used with #pop, it does not automatically imply j it, | If jitverify is used with #pop, it does not automatically imply j it, | |||
which is different behaviour from when it is used on a pattern. | which is different behaviour from when it is used on a pattern. | |||
The #popcopy command is analogous to the pushcopy modifier in that it | The #popcopy command is analogous to the pushcopy modifier in that it | |||
makes current a copy of the topmost stack pattern, leaving the origi nal | makes current a copy of the topmost stack pattern, leaving the origi nal | |||
still on the stack. | still on the stack. | |||
SEE ALSO | SEE ALSO | |||
pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching( 3), | pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching( 3), | |||
pcre2partial(d), pcre2pattern(3), pcre2serialize(3). | pcre2partial(d), pcre2pattern(3), pcre2serialize(3). | |||
AUTHOR | AUTHOR | |||
Philip Hazel | Philip Hazel | |||
Retired from University Computing Service | Retired from University Computing Service | |||
Cambridge, England. | Cambridge, England. | |||
REVISION | REVISION | |||
Last updated: 27 January 2024 | Last updated: 24 April 2024 | |||
Copyright (c) 1997-2024 University of Cambridge. | Copyright (c) 1997-2024 University of Cambridge. | |||
PCRE 10.43 27 January 2024 PCRE2TEST (1) | PCRE 10.44 24 April 2024 PCRE2TEST (1) | |||
End of changes. 145 change blocks. | ||||
610 lines changed or deleted | 623 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |