pcre2test.1 | pcre2test.1 | |||
---|---|---|---|---|
skipping to change at line 570 | skipping to change at line 570 | |||
convert_length set convert buffer length | convert_length set convert buffer length | |||
debug same as info,fullbincode | debug same as info,fullbincode | |||
framesize show matching frame size | framesize show matching frame size | |||
fullbincode show binary code with lengths | fullbincode show binary code with lengths | |||
/I info show info about compiled pattern | /I info show info about compiled pattern | |||
hex unquoted characters are hexadecimal | hex unquoted characters are hexadecimal | |||
jit[=<number>] use JIT | jit[=<number>] use JIT | |||
jitfast use JIT fast path | jitfast use JIT fast path | |||
jitverify verify JIT use | jitverify verify JIT use | |||
locale=<name> use this locale | locale=<name> use this locale | |||
max_pattern_length=<n> set maximum pattern length | max_pattern_compiled ) set maximum compiled pattern | |||
_length=<n> ) length (bytes) | ||||
max_pattern_length=<n> set maximum pattern length (code uni | ||||
ts) | ||||
max_varlookbehind=<n> set maximum variable lookbehind leng th | max_varlookbehind=<n> set maximum variable lookbehind leng th | |||
memory show memory used | memory show memory used | |||
newline=<type> set newline type | newline=<type> set newline type | |||
null_context compile with a NULL context | null_context compile with a NULL context | |||
null_pattern pass pattern as NULL | null_pattern pass pattern as NULL | |||
parens_nest_limit=<n> set maximum parentheses depth | parens_nest_limit=<n> set maximum parentheses depth | |||
posix use the POSIX API | posix use the POSIX API | |||
posix_nosub use the POSIX API with REG_NOSUB | posix_nosub use the POSIX API with REG_NOSUB | |||
push push compiled pattern onto the stack | push push compiled pattern onto the stack | |||
pushcopy push a copy onto the stack | pushcopy push a copy onto the stack | |||
skipping to change at line 826 | skipping to change at line 828 | |||
brary is set when PCRE2 is built, but pcre2test sets its own default of 220, which | brary is set when PCRE2 is built, but pcre2test sets its own default of 220, which | |||
is required for running the standard test suite. | is required for running the standard test suite. | |||
Limiting the pattern length | Limiting the pattern length | |||
The max_pattern_length modifier sets a limit, in code units, to the length of pat‐ | The max_pattern_length modifier sets a limit, in code units, to the length of pat‐ | |||
tern that pcre2_compile() will accept. Breaching the limit causes a compilation er‐ | tern that pcre2_compile() will accept. Breaching the limit causes a compilation er‐ | |||
ror. The default is the largest number a PCRE2_SIZE variable can hol d (essentially | ror. The default is the largest number a PCRE2_SIZE variable can hol d (essentially | |||
unlimited). | unlimited). | |||
Limiting the size of a compiled pattern | ||||
The max_pattern_compiled_length modifier sets a limit, in bytes, t | ||||
o the amount of | ||||
memory used by a compiled pattern. Breaching the limit causes a comp | ||||
ilation error. | ||||
The default is the largest number a PCRE2_SIZE variable can hold ( | ||||
essentially un‐ | ||||
limited). | ||||
Using the POSIX wrapper API | Using the POSIX wrapper API | |||
The posix and posix_nosub modifiers cause pcre2test to call PCRE | The posix and posix_nosub modifiers cause pcre2test to call PCRE2 | |||
2 via the POSIX | via the POSIX | |||
wrapper API rather than its native API. When posix_nosub is used, th | wrapper API rather than its native API. When posix_nosub is used, t | |||
e POSIX option | he POSIX option | |||
REG_NOSUB is passed to regcomp(). The POSIX wrapper supports onl | REG_NOSUB is passed to regcomp(). The POSIX wrapper supports only | |||
y the 8-bit li‐ | the 8-bit li‐ | |||
brary. Note that it does not imply POSIX matching semantics; for mo | brary. Note that it does not imply POSIX matching semantics; for | |||
re detail see | more detail see | |||
the pcre2posix documentation. The following pattern modifiers set | the pcre2posix documentation. The following pattern modifiers set op | |||
options for the | tions for the | |||
regcomp() function: | regcomp() function: | |||
caseless REG_ICASE | caseless REG_ICASE | |||
multiline REG_NEWLINE | multiline REG_NEWLINE | |||
dotall REG_DOTALL ) | dotall REG_DOTALL ) | |||
ungreedy REG_UNGREEDY ) These options are not part of | ungreedy REG_UNGREEDY ) These options are not part of | |||
ucp REG_UCP ) the POSIX standard | ucp REG_UCP ) the POSIX standard | |||
utf REG_UTF8 ) | utf REG_UTF8 ) | |||
The regerror_buffsize modifier specifies a size for the error buffer that is passed | The regerror_buffsize modifier specifies a size for the error buffer that is passed | |||
to regerror() in the event of a compilation error. For example: | to regerror() in the event of a compilation error. For example: | |||
/abc/posix,regerror_buffsize=20 | /abc/posix,regerror_buffsize=20 | |||
This provides a means of testing the behaviour of regerror() when th e buffer is too | This provides a means of testing the behaviour of regerror() when th e buffer is too | |||
small for the error message. If this modifier has not been set, a la rge buffer is | small for the error message. If this modifier has not been set, a large buffer is | |||
used. | used. | |||
The aftertext and allaftertext subject modifiers work as described b elow. All other | The aftertext and allaftertext subject modifiers work as described b elow. All other | |||
modifiers are either ignored, with a warning message, or cause an er ror. | modifiers are either ignored, with a warning message, or cause an er ror. | |||
The pattern is passed to regcomp() as a zero-terminated string by | The pattern is passed to regcomp() as a zero-terminated string by de | |||
default, but if | fault, but if | |||
the use_length or hex modifiers are set, the REG_PEND extension is u | the use_length or hex modifiers are set, the REG_PEND extension is | |||
sed to pass it | used to pass it | |||
by length. | by length. | |||
Testing the stack guard feature | Testing the stack guard feature | |||
The stackguard modifier is used to test the use of pcre2_set | The stackguard modifier is used to test the use of pcre2_set | |||
_compile_recur‐ | _compile_recur‐ | |||
sion_guard(), a function that is provided to enable stack avail | sion_guard(), a function that is provided to enable stack ava | |||
ability to be | ilability to be | |||
checked during compilation (see the pcre2api documentation for d | checked during compilation (see the pcre2api documentation for det | |||
etails). If the | ails). If the | |||
number specified by the modifier is greater than zero, pcre2_set | number specified by the modifier is greater than zero, pcre2_set | |||
_compile_recur‐ | _compile_recur‐ | |||
sion_guard() is called to set up callback from pcre2_compile() to a local function. | sion_guard() is called to set up callback from pcre2_compile() to a local function. | |||
The argument it receives is the current nesting parenthesis de | The argument it receives is the current nesting parenthesis depth | |||
pth; if this is | ; if this is | |||
greater than the value given by the modifier, non-zero is returned | greater than the value given by the modifier, non-zero is return | |||
, causing the | ed, causing the | |||
compilation to be aborted. | compilation to be aborted. | |||
Using alternative character tables | Using alternative character tables | |||
The value specified for the tables modifier must be one of the dig its 0, 1, 2, or | The value specified for the tables modifier must be one of the digit s 0, 1, 2, or | |||
3. It causes a specific set of built-in character tables to be passe d to pcre2_com‐ | 3. It causes a specific set of built-in character tables to be passe d to pcre2_com‐ | |||
pile(). This is used in the PCRE2 tests to check behaviour with diff erent character | pile(). This is used in the PCRE2 tests to check behaviour with diff erent character | |||
tables. The digit specifies the tables as follows: | tables. The digit specifies the tables as follows: | |||
0 do not pass any special character tables | 0 do not pass any special character tables | |||
1 the default ASCII tables, as distributed in | 1 the default ASCII tables, as distributed in | |||
pcre2_chartables.c.dist | pcre2_chartables.c.dist | |||
2 a set of tables defining ISO 8859 characters | 2 a set of tables defining ISO 8859 characters | |||
3 a set of tables loaded by the #loadtables command | 3 a set of tables loaded by the #loadtables command | |||
In tables 2, some characters whose codes are greater than 128 are identified as | In tables 2, some characters whose codes are greater than 128 ar e identified as | |||
letters, digits, spaces, etc. Tables 3 can be used only after a #loa dtables command | letters, digits, spaces, etc. Tables 3 can be used only after a #loa dtables command | |||
has loaded them from a binary file. Setting alternate character tabl es and a locale | has loaded them from a binary file. Setting alternate character tabl es and a locale | |||
are mutually exclusive. | are mutually exclusive. | |||
Setting certain match controls | Setting certain match controls | |||
The following modifiers are really subject modifiers, and are descri bed under "Sub‐ | The following modifiers are really subject modifiers, and are descri bed under "Sub‐ | |||
ject Modifiers" below. However, they may be included in a pattern's | ject Modifiers" below. However, they may be included in a pattern's | |||
modifier list, | modifier list, | |||
in which case they are applied to every subject line that is proces | in which case they are applied to every subject line that is proc | |||
sed with that | essed with that | |||
pattern. These modifiers do not affect the compilation process. | pattern. These modifiers do not affect the compilation process. | |||
aftertext show text after match | aftertext show text after match | |||
allaftertext show text after captures | allaftertext show text after captures | |||
allcaptures show all captures | allcaptures show all captures | |||
allvector show the entire ovector | allvector show the entire ovector | |||
allusedtext show all consulted text | allusedtext show all consulted text | |||
altglobal alternative global matching | altglobal alternative global matching | |||
/g global global matching | /g global global matching | |||
heapframes_size show match data heapframes size | heapframes_size show match data heapframes size | |||
skipping to change at line 923 | skipping to change at line 932 | |||
substitute_stop=<n> skip substitution <n> and followin g | substitute_stop=<n> skip substitution <n> and followin g | |||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | |||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | |||
These modifiers may not appear in a #pattern command. If you want th em as defaults, | These modifiers may not appear in a #pattern command. If you want th em as defaults, | |||
set them in a #subject command. | set them in a #subject command. | |||
Specifying literal subject lines | Specifying literal subject lines | |||
If the subject_literal modifier is present on a pattern, all the sub ject lines that | If the subject_literal modifier is present on a pattern, all the sub ject lines that | |||
it matches are taken as literal strings, with no interpretation of backslashes. It | it matches are taken as literal strings, with no interpretation of b ackslashes. It | |||
is not possible to set subject modifiers on such lines, but any that are set as de‐ | is not possible to set subject modifiers on such lines, but any that are set as de‐ | |||
faults by a #subject command are recognized. | faults by a #subject command are recognized. | |||
Saving a compiled pattern | Saving a compiled pattern | |||
When a pattern with the push modifier is successfully compiled, it i s pushed onto a | When a pattern with the push modifier is successfully compiled, it i s pushed onto a | |||
stack of compiled patterns, and pcre2test expects the next line to contain a new | stack of compiled patterns, and pcre2test expects the next line t o contain a new | |||
pattern (or a command) instead of a subject line. This facility is u sed when saving | pattern (or a command) instead of a subject line. This facility is u sed when saving | |||
compiled patterns to a file, as described in the section entit led "Saving and | compiled patterns to a file, as described in the section entitle d "Saving and | |||
restoring compiled patterns" below. If pushcopy is used instead of push, a copy of | restoring compiled patterns" below. If pushcopy is used instead of push, a copy of | |||
the compiled pattern is stacked, leaving the original as current, r | the compiled pattern is stacked, leaving the original as current, | |||
eady to match | ready to match | |||
the following input lines. This provides a way of testing the pc | the following input lines. This provides a way of testing the pc | |||
re2_code_copy() | re2_code_copy() | |||
function. The push and pushcopy modifiers are incompatible with co mpilation modi‐ | function. The push and pushcopy modifiers are incompatible with co mpilation modi‐ | |||
fiers such as global that act at match time. Any that are specifie d are ignored | fiers such as global that act at match time. Any that are specif ied are ignored | |||
(for the stacked copy), with a warning message, except for replace, which causes an | (for the stacked copy), with a warning message, except for replace, which causes an | |||
error. Note that jitverify, which is allowed, does not carry throug h to any subse‐ | error. Note that jitverify, which is allowed, does not carry through to any subse‐ | |||
quent matching that uses a stacked pattern. | quent matching that uses a stacked pattern. | |||
Testing foreign pattern conversion | Testing foreign pattern conversion | |||
The experimental foreign pattern conversion functions in PCRE2 can | The experimental foreign pattern conversion functions in PCRE2 c | |||
be tested by | an be tested by | |||
setting the convert modifier. Its argument is a colon-separated l | setting the convert modifier. Its argument is a colon-separated lis | |||
ist of options, | t of options, | |||
which set the equivalent option for the pcre2_pattern_convert() func tion: | which set the equivalent option for the pcre2_pattern_convert() func tion: | |||
glob PCRE2_CONVERT_GLOB | glob PCRE2_CONVERT_GLOB | |||
glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR | glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR | |||
glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR | glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR | |||
posix_basic PCRE2_CONVERT_POSIX_BASIC | posix_basic PCRE2_CONVERT_POSIX_BASIC | |||
posix_extended PCRE2_CONVERT_POSIX_EXTENDED | posix_extended PCRE2_CONVERT_POSIX_EXTENDED | |||
unset Unset all options | unset Unset all options | |||
The "unset" value is useful for turning off a default that has been | The "unset" value is useful for turning off a default that has been | |||
set by a #pat‐ | set by a #pat‐ | |||
tern command. When one of these options is set, the input patte | tern command. When one of these options is set, the input pattern | |||
rn is passed to | is passed to | |||
pcre2_pattern_convert(). If the conversion is successful, the result | pcre2_pattern_convert(). If the conversion is successful, the resu | |||
is reflected | lt is reflected | |||
in the output and then passed to pcre2_compile(). The normal utf a | in the output and then passed to pcre2_compile(). The normal utf an | |||
nd no_utf_check | d no_utf_check | |||
options, if set, cause the PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UT F_CHECK options | options, if set, cause the PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UT F_CHECK options | |||
to be passed to pcre2_pattern_convert(). | to be passed to pcre2_pattern_convert(). | |||
By default, the conversion function is allowed to allocate a buffer for its output. | By default, the conversion function is allowed to allocate a buffer for its output. | |||
However, if the convert_length modifier is set to a value great | However, if the convert_length modifier is set to a value gre | |||
er than zero, | ater than zero, | |||
pcre2test passes a buffer of the given length. This makes it possi | pcre2test passes a buffer of the given length. This makes it possibl | |||
ble to test the | e to test the | |||
length check. | length check. | |||
The convert_glob_escape and convert_glob_separator modifiers can be used to specify | The convert_glob_escape and convert_glob_separator modifiers can be used to specify | |||
the escape and separator characters for glob processing, overriding the defaults, | the escape and separator characters for glob processing, overridin g the defaults, | |||
which are operating-system dependent. | which are operating-system dependent. | |||
SUBJECT MODIFIERS | SUBJECT MODIFIERS | |||
The modifiers that can appear in subject lines and the #subject com mand are of two | The modifiers that can appear in subject lines and the #subject comm and are of two | |||
types. | types. | |||
Setting match options | Setting match options | |||
The following modifiers set options for pcre2_match() or pcre2_df a_match(). See | The following modifiers set options for pcre2_match() or pcre2_d fa_match(). See | |||
pcreapi for a description of their effects. | pcreapi for a description of their effects. | |||
anchored set PCRE2_ANCHORED | anchored set PCRE2_ANCHORED | |||
endanchored set PCRE2_ENDANCHORED | endanchored set PCRE2_ENDANCHORED | |||
dfa_restart set PCRE2_DFA_RESTART | dfa_restart set PCRE2_DFA_RESTART | |||
dfa_shortest set PCRE2_DFA_SHORTEST | dfa_shortest set PCRE2_DFA_SHORTEST | |||
disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK | disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK | |||
no_jit set PCRE2_NO_JIT | no_jit set PCRE2_NO_JIT | |||
no_utf_check set PCRE2_NO_UTF_CHECK | no_utf_check set PCRE2_NO_UTF_CHECK | |||
notbol set PCRE2_NOTBOL | notbol set PCRE2_NOTBOL | |||
notempty set PCRE2_NOTEMPTY | notempty set PCRE2_NOTEMPTY | |||
notempty_atstart set PCRE2_NOTEMPTY_ATSTART | notempty_atstart set PCRE2_NOTEMPTY_ATSTART | |||
noteol set PCRE2_NOTEOL | noteol set PCRE2_NOTEOL | |||
partial_hard (or ph) set PCRE2_PARTIAL_HARD | partial_hard (or ph) set PCRE2_PARTIAL_HARD | |||
partial_soft (or ps) set PCRE2_PARTIAL_SOFT | partial_soft (or ps) set PCRE2_PARTIAL_SOFT | |||
The partial matching modifiers are provided with abbreviations beca use they appear | The partial matching modifiers are provided with abbreviations becau se they appear | |||
frequently in tests. | frequently in tests. | |||
If the posix or posix_nosub modifier was present on the pattern, cau | If the posix or posix_nosub modifier was present on the pattern, ca | |||
sing the POSIX | using the POSIX | |||
wrapper API to be used, the only option-setting modifiers that have | wrapper API to be used, the only option-setting modifiers that have | |||
any effect are | any effect are | |||
notbol, notempty, and noteol, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, re‐ | notbol, notempty, and noteol, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, re‐ | |||
spectively, to be passed to regexec(). The other modifiers are ig nored, with a | spectively, to be passed to regexec(). The other modifiers are ignored, with a | |||
warning message. | warning message. | |||
There is one additional modifier that can be used with the POSIX wra pper. It is ig‐ | There is one additional modifier that can be used with the POSIX wra pper. It is ig‐ | |||
nored (with a warning) if used for non-POSIX matching. | nored (with a warning) if used for non-POSIX matching. | |||
posix_startend=<n>[:<m>] | posix_startend=<n>[:<m>] | |||
This causes the subject string to be passed to regexec() using the R EG_STARTEND op‐ | This causes the subject string to be passed to regexec() using the R EG_STARTEND op‐ | |||
tion, which uses offsets to specify which part of the string is se arched. If only | tion, which uses offsets to specify which part of the string is sear ched. If only | |||
one number is given, the end offset is passed as the end of the subj ect string. For | one number is given, the end offset is passed as the end of the subj ect string. For | |||
more detail of REG_STARTEND, see the pcre2posix documentation. I | more detail of REG_STARTEND, see the pcre2posix documentation. | |||
f the subject | If the subject | |||
string contains binary zeros (coded as escapes such as \x{00} be | string contains binary zeros (coded as escapes such as \x{00} bec | |||
cause pcre2test | ause pcre2test | |||
does not support actual binary zeros in its input), you must use pos | does not support actual binary zeros in its input), you must use po | |||
ix_startend to | six_startend to | |||
specify its length. | specify its length. | |||
Setting match controls | Setting match controls | |||
The following modifiers affect the matching process or request addi | The following modifiers affect the matching process or request addit | |||
tional informa‐ | ional informa‐ | |||
tion. Some of them may also be specified on a pattern line (see abo | tion. Some of them may also be specified on a pattern line (see a | |||
ve), in which | bove), in which | |||
case they apply to every subject line that is matched against that p attern, but can | case they apply to every subject line that is matched against that p attern, but can | |||
be overridden by modifiers on the subject. | be overridden by modifiers on the subject. | |||
aftertext show text after match | aftertext show text after match | |||
allaftertext show text after captures | allaftertext show text after captures | |||
allcaptures show all captures | allcaptures show all captures | |||
allvector show the entire ovector | allvector show the entire ovector | |||
allusedtext show all consulted text (non-JIT on ly) | allusedtext show all consulted text (non-JIT on ly) | |||
altglobal alternative global matching | altglobal alternative global matching | |||
callout_capture show captures at callout time | callout_capture show captures at callout time | |||
skipping to change at line 1074 | skipping to change at line 1083 | |||
substitute_matched use PCRE2_SUBSTITUTE_MATCHED | substitute_matched use PCRE2_SUBSTITUTE_MATCHED | |||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGT H | substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGT H | |||
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | |||
substitute_skip=<n> skip substitution number n | substitute_skip=<n> skip substitution number n | |||
substitute_stop=<n> skip substitution number n and grea ter | substitute_stop=<n> skip substitution number n and grea ter | |||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | |||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | |||
zero_terminate pass the subject as zero-terminated | zero_terminate pass the subject as zero-terminated | |||
The effects of these modifiers are described in the following sectio ns. When match‐ | The effects of these modifiers are described in the following sectio ns. When match‐ | |||
ing via the POSIX wrapper API, the aftertext, allaftertext, and | ing via the POSIX wrapper API, the aftertext, allaftertext, and o | |||
ovector subject | vector subject | |||
modifiers work as described below. All other modifiers are either ig | modifiers work as described below. All other modifiers are either | |||
nored, with a | ignored, with a | |||
warning message, or cause an error. | warning message, or cause an error. | |||
Showing more text | Showing more text | |||
The aftertext modifier requests that as well as outputting the part of the subject | The aftertext modifier requests that as well as outputting the part of the subject | |||
string that matched the entire pattern, pcre2test should in addition output the re‐ | string that matched the entire pattern, pcre2test should in addition output the re‐ | |||
mainder of the subject string. This is useful for tests where the su | mainder of the subject string. This is useful for tests where the s | |||
bject contains | ubject contains | |||
multiple copies of the same substring. The allaftertext modifier re | multiple copies of the same substring. The allaftertext modifier req | |||
quests the same | uests the same | |||
action for captured substrings as well as the main matched substring | action for captured substrings as well as the main matched substrin | |||
. In each case | g. In each case | |||
the remainder is output on the following line with a plus characte | the remainder is output on the following line with a plus character | |||
r following the | following the | |||
capture number. | capture number. | |||
The allusedtext modifier requests that all the text that was consu lted during a | The allusedtext modifier requests that all the text that was con sulted during a | |||
successful pattern match by the interpreter should be shown, for bot h full and par‐ | successful pattern match by the interpreter should be shown, for bot h full and par‐ | |||
tial matches. This feature is not supported for JIT matching, and if requested with | tial matches. This feature is not supported for JIT matching, and if requested with | |||
JIT it is ignored (with a warning message). Setting this modifier a | JIT it is ignored (with a warning message). Setting this modifier af | |||
ffects the out‐ | fects the out‐ | |||
put if there is a lookbehind at the start of a match, or, for a comp | put if there is a lookbehind at the start of a match, or, for a co | |||
lete match, a | mplete match, a | |||
lookahead at the end, or if \K is used in the pattern. Characters | lookahead at the end, or if \K is used in the pattern. Characters th | |||
that precede or | at precede or | |||
follow the start and end of the actual match are indicated in the ou | follow the start and end of the actual match are indicated in the o | |||
tput by '<' or | utput by '<' or | |||
'>' characters underneath them. Here is an example: | '>' characters underneath them. Here is an example: | |||
re> /(?<=pqr)abc(?=xyz)/ | re> /(?<=pqr)abc(?=xyz)/ | |||
data> 123pqrabcxyz456\=allusedtext | data> 123pqrabcxyz456\=allusedtext | |||
0: pqrabcxyz | 0: pqrabcxyz | |||
<<< >>> | <<< >>> | |||
data> 123pqrabcxy\=ph,allusedtext | data> 123pqrabcxy\=ph,allusedtext | |||
Partial match: pqrabcxy | Partial match: pqrabcxy | |||
<<< | <<< | |||
The first, complete match shows that the matched string is "abc", w | The first, complete match shows that the matched string is "abc", wi | |||
ith the preced‐ | th the preced‐ | |||
ing and following strings "pqr" and "xyz" having been consulted dur | ing and following strings "pqr" and "xyz" having been consulted d | |||
ing the match | uring the match | |||
(when processing the assertions). The partial match can indicate onl y the preceding | (when processing the assertions). The partial match can indicate onl y the preceding | |||
string. | string. | |||
The startchar modifier requests that the starting character for the | The startchar modifier requests that the starting character for the | |||
match be indi‐ | match be indi‐ | |||
cated, if it is different to the start of the matched string. The o | cated, if it is different to the start of the matched string. The | |||
nly time when | only time when | |||
this occurs is when \K has been processed as part of the match. In | this occurs is when \K has been processed as part of the match. In t | |||
this situation, | his situation, | |||
the output for the matched string is displayed from the starting cha | the output for the matched string is displayed from the starting ch | |||
racter instead | aracter instead | |||
of from the match point, with circumflex characters under the earl | of from the match point, with circumflex characters under the earli | |||
ier characters. | er characters. | |||
For example: | For example: | |||
re> /abc\Kxyz/ | re> /abc\Kxyz/ | |||
data> abcxyz\=startchar | data> abcxyz\=startchar | |||
0: abcxyz | 0: abcxyz | |||
^^^ | ^^^ | |||
Unlike allusedtext, the startchar modifier can be used with JIT. However, these | Unlike allusedtext, the startchar modifier can be used with JIT. However, these | |||
two modifiers are mutually exclusive. | two modifiers are mutually exclusive. | |||
Showing the value of all capture groups | Showing the value of all capture groups | |||
The allcaptures modifier requests that the values of all potential | The allcaptures modifier requests that the values of all potential c | |||
captured paren‐ | aptured paren‐ | |||
theses be output after a match. By default, only those up to the hig | theses be output after a match. By default, only those up to the hi | |||
hest one actu‐ | ghest one actu‐ | |||
ally used in the match are output (corresponding to the re | ally used in the match are output (corresponding to the ret | |||
turn code from | urn code from | |||
pcre2_match()). Groups that did not take part in the match are outpu t as "<unset>". | pcre2_match()). Groups that did not take part in the match are outpu t as "<unset>". | |||
This modifier is not relevant for DFA matching (which does no captur | This modifier is not relevant for DFA matching (which does no capt | |||
ing) and does | uring) and does | |||
not apply when replace is specified; it is ignored, with a warn | not apply when replace is specified; it is ignored, with a warnin | |||
ing message, if | g message, if | |||
present. | present. | |||
Showing the entire ovector, for all outcomes | Showing the entire ovector, for all outcomes | |||
The allvector modifier requests that the entire ovector be shown, wh atever the out‐ | The allvector modifier requests that the entire ovector be shown, wh atever the out‐ | |||
come of the match. Compare allcaptures, which shows only up to the | come of the match. Compare allcaptures, which shows only up to the | |||
maximum number | maximum number | |||
of capture groups for the pattern, and then only for a successful c | of capture groups for the pattern, and then only for a successful co | |||
omplete non-DFA | mplete non-DFA | |||
match. This modifier, which acts after any match result, and also fo r DFA matching, | match. This modifier, which acts after any match result, and also fo r DFA matching, | |||
provides a means of checking that there are no unexpected modificati | provides a means of checking that there are no unexpected modificat | |||
ons to ovector | ions to ovector | |||
fields. Before each match attempt, the ovector is filled with a spe | fields. Before each match attempt, the ovector is filled with a spec | |||
cial value, and | ial value, and | |||
if this is found in both elements of a capturing pair, "<unchanged>" is output. Af‐ | if this is found in both elements of a capturing pair, "<unchanged>" is output. Af‐ | |||
ter a successful match, this applies to all groups after the maximum | ter a successful match, this applies to all groups after the maximu | |||
capture group | m capture group | |||
for the pattern. In other cases it applies to the entire ovector. | for the pattern. In other cases it applies to the entire ovector. Af | |||
After a partial | ter a partial | |||
match, the first two elements are the only ones that should be set. | match, the first two elements are the only ones that should be s | |||
After a DFA | et. After a DFA | |||
match, the amount of ovector that is used depends on the number | match, the amount of ovector that is used depends on the number of | |||
of matches that | matches that | |||
were found. | were found. | |||
Testing pattern callouts | Testing pattern callouts | |||
A callout function is supplied when pcre2test calls the library matc hing functions, | A callout function is supplied when pcre2test calls the library matc hing functions, | |||
unless callout_none is specified. Its behaviour can be controlled by | unless callout_none is specified. Its behaviour can be controlled b | |||
various modi‐ | y various modi‐ | |||
fiers listed above whose names begin with callout_. Details are gi | fiers listed above whose names begin with callout_. Details are give | |||
ven in the sec‐ | n in the sec‐ | |||
tion entitled "Callouts" below. Testing callouts from pcre2_substi | tion entitled "Callouts" below. Testing callouts from pcre2_subs | |||
tute() is de‐ | titute() is de‐ | |||
scribed separately in "Testing the substitution function" below. | scribed separately in "Testing the substitution function" below. | |||
Finding all matches in a string | Finding all matches in a string | |||
Searching for all possible matches within a subject can be requeste d by the global | Searching for all possible matches within a subject can be requested by the global | |||
or altglobal modifier. After finding a match, the matching function is called again | or altglobal modifier. After finding a match, the matching function is called again | |||
to search the remainder of the subject. The difference between globa l and altglobal | to search the remainder of the subject. The difference between globa l and altglobal | |||
is that the former uses the start_offset argument to pcr e2_match() or | is that the former uses the start_offset argument to pc re2_match() or | |||
pcre2_dfa_match() to start searching at a new point within the entir e string (which | pcre2_dfa_match() to start searching at a new point within the entir e string (which | |||
is what Perl does), whereas the latter passes over a shortened subj ect. This makes | is what Perl does), whereas the latter passes over a shortened subje ct. This makes | |||
a difference to the matching process if the pattern begins with a lo okbehind asser‐ | a difference to the matching process if the pattern begins with a lo okbehind asser‐ | |||
tion (including \b or \B). | tion (including \b or \B). | |||
If an empty string is matched, the next match is done with the PCR | If an empty string is matched, the next match is done with the PCR | |||
E2_NOTEMPTY_AT‐ | E2_NOTEMPTY_AT‐ | |||
START and PCRE2_ANCHORED flags set, in order to search for anot | START and PCRE2_ANCHORED flags set, in order to search for anoth | |||
her, non-empty, | er, non-empty, | |||
match at the same point in the subject. If this match fails, the st | match at the same point in the subject. If this match fails, the | |||
art offset is | start offset is | |||
advanced, and the normal match is retried. This imitates the way Pe | advanced, and the normal match is retried. This imitates the way Per | |||
rl handles such | l handles such | |||
cases when using the /g modifier or the split() function. Normally, | cases when using the /g modifier or the split() function. Normally, | |||
the start off‐ | the start off‐ | |||
set is advanced by one character, but if the newline convention rec | set is advanced by one character, but if the newline convention reco | |||
ognizes CRLF as | gnizes CRLF as | |||
a newline, and the current character is CR followed by LF, an advanc | a newline, and the current character is CR followed by LF, an advan | |||
e of two char‐ | ce of two char‐ | |||
acters occurs. | acters occurs. | |||
Testing substring extraction functions | Testing substring extraction functions | |||
The copy and get modifiers can be used to test the pcre2_substring | The copy and get modifiers can be used to test the pcre2_substring_ | |||
_copy_xxx() and | copy_xxx() and | |||
pcre2_substring_get_xxx() functions. They can be given more than o | pcre2_substring_get_xxx() functions. They can be given more than | |||
nce, and each | once, and each | |||
can specify a capture group name or number, for example: | can specify a capture group name or number, for example: | |||
abcd\=copy=1,copy=3,get=G1 | abcd\=copy=1,copy=3,get=G1 | |||
If the #subject command is used to set default copy and/or get list | If the #subject command is used to set default copy and/or get lists | |||
s, these can be | , these can be | |||
unset by specifying a negative number to cancel all numbered groups | unset by specifying a negative number to cancel all numbered grou | |||
and an empty | ps and an empty | |||
name to cancel all named groups. | name to cancel all named groups. | |||
The getall modifier tests pcre2_substring_list_get(), which extrac ts all captured | The getall modifier tests pcre2_substring_list_get(), which extracts all captured | |||
substrings. | substrings. | |||
If the subject line is successfully matched, the substrings extracte d by the conve‐ | If the subject line is successfully matched, the substrings extracte d by the conve‐ | |||
nience functions are output with C, G, or L after the string number instead of a | nience functions are output with C, G, or L after the string numb er instead of a | |||
colon. This is in addition to the normal full list. The string lengt h (that is, the | colon. This is in addition to the normal full list. The string lengt h (that is, the | |||
return from the extraction function) is given in parentheses after each substring, | return from the extraction function) is given in parentheses after e ach substring, | |||
followed by the name when the extraction was by name. | followed by the name when the extraction was by name. | |||
Testing the substitution function | Testing the substitution function | |||
If the replace modifier is set, the pcre2_substitute() function is called instead | If the replace modifier is set, the pcre2_substitute() function is called instead | |||
of one of the matching functions (or after one call of pcre2_match() in the case of | of one of the matching functions (or after one call of pcre2_match() in the case of | |||
PCRE2_SUBSTITUTE_MATCHED). Note that replacement strings cannot cont ain commas, be‐ | PCRE2_SUBSTITUTE_MATCHED). Note that replacement strings cannot cont ain commas, be‐ | |||
cause a comma signifies the end of a modifier. This is not thought to be an issue | cause a comma signifies the end of a modifier. This is not thought t o be an issue | |||
in a test program. | in a test program. | |||
Specifying a completely empty replacement string disables this modif ier. However, | Specifying a completely empty replacement string disables this modi fier. However, | |||
it is possible to specify an empty replacement by providing a buffer length, as de‐ | it is possible to specify an empty replacement by providing a buffer length, as de‐ | |||
scribed below, for an otherwise empty replacement. | scribed below, for an otherwise empty replacement. | |||
Unlike subject strings, pcre2test does not process replacement str | Unlike subject strings, pcre2test does not process replacement strin | |||
ings for escape | gs for escape | |||
sequences. In UTF mode, a replacement string is checked to see if i | sequences. In UTF mode, a replacement string is checked to see i | |||
t is a valid | f it is a valid | |||
UTF-8 string. If so, it is correctly converted to a UTF string of | UTF-8 string. If so, it is correctly converted to a UTF string of t | |||
the appropriate | he appropriate | |||
code unit width. If it is not a valid UTF-8 string, the individual c | code unit width. If it is not a valid UTF-8 string, the individual | |||
ode units are | code units are | |||
copied directly. This provides a means of passing an invalid UTF-8 s tring for test‐ | copied directly. This provides a means of passing an invalid UTF-8 s tring for test‐ | |||
ing purposes. | ing purposes. | |||
The following modifiers set options (in additional to the normal mat ch options) for | The following modifiers set options (in additional to the normal mat ch options) for | |||
pcre2_substitute(): | pcre2_substitute(): | |||
global PCRE2_SUBSTITUTE_GLOBAL | global PCRE2_SUBSTITUTE_GLOBAL | |||
substitute_extended PCRE2_SUBSTITUTE_EXTENDED | substitute_extended PCRE2_SUBSTITUTE_EXTENDED | |||
substitute_literal PCRE2_SUBSTITUTE_LITERAL | substitute_literal PCRE2_SUBSTITUTE_LITERAL | |||
substitute_matched PCRE2_SUBSTITUTE_MATCHED | substitute_matched PCRE2_SUBSTITUTE_MATCHED | |||
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH | substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH | |||
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY | substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY | |||
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET | substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET | |||
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY | substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY | |||
See the pcre2api documentation for details of these options. | See the pcre2api documentation for details of these options. | |||
After a successful substitution, the modified string is output, preceded by the | After a successful substitution, the modified string is output, pr eceded by the | |||
number of replacements. This may be zero if there were no matches. H ere is a simple | number of replacements. This may be zero if there were no matches. H ere is a simple | |||
example of a substitution test: | example of a substitution test: | |||
/abc/replace=xxx | /abc/replace=xxx | |||
=abc=abc= | =abc=abc= | |||
1: =xxx=abc= | 1: =xxx=abc= | |||
=abc=abc=\=global | =abc=abc=\=global | |||
2: =xxx=xxx= | 2: =xxx=xxx= | |||
Subject and replacement strings should be kept relatively short (f ewer than 256 | Subject and replacement strings should be kept relatively short (fewer than 256 | |||
characters) for substitution tests, as fixed-size buffers are used. To make it easy | characters) for substitution tests, as fixed-size buffers are used. To make it easy | |||
to test for buffer overflow, if the replacement string starts w | to test for buffer overflow, if the replacement string starts with | |||
ith a number in | a number in | |||
square brackets, that number is passed to pcre2_substitute() as the | square brackets, that number is passed to pcre2_substitute() as | |||
size of the | the size of the | |||
output buffer, with the replacement string starting at the next cha | output buffer, with the replacement string starting at the next char | |||
racter. Here is | acter. Here is | |||
an example that tests the edge case: | an example that tests the edge case: | |||
/abc/ | /abc/ | |||
123abc123\=replace=[10]XYZ | 123abc123\=replace=[10]XYZ | |||
1: 123XYZ123 | 1: 123XYZ123 | |||
123abc123\=replace=[9]XYZ | 123abc123\=replace=[9]XYZ | |||
Failed: error -47: no more memory | Failed: error -47: no more memory | |||
The default action of pcre2_substitute() is to return PCRE2_ERROR_NO MEMORY when the | The default action of pcre2_substitute() is to return PCRE2_ERROR_NO MEMORY when the | |||
output buffer is too small. However, if the PCRE2_SUBSTITUTE_OVERFLO W_LENGTH option | output buffer is too small. However, if the PCRE2_SUBSTITUTE_OVERFLO W_LENGTH option | |||
is set (by using the substitute_overflow_length modifier), pcre2_sub | is set (by using the substitute_overflow_length modifier), pcre2_su | |||
stitute() con‐ | bstitute() con‐ | |||
tinues to go through the motions of matching and substituting (bu | tinues to go through the motions of matching and substituting (but | |||
t not doing any | not doing any | |||
callouts), in order to compute the size of buffer that is required. | callouts), in order to compute the size of buffer that is required. | |||
When this hap‐ | When this hap‐ | |||
pens, pcre2test shows the required buffer length (which include | pens, pcre2test shows the required buffer length (which includes | |||
s space for the | space for the | |||
trailing zero) as part of the error message. For example: | trailing zero) as part of the error message. For example: | |||
/abc/substitute_overflow_length | /abc/substitute_overflow_length | |||
123abc123\=replace=[9]XYZ | 123abc123\=replace=[9]XYZ | |||
Failed: error -47: no more memory: 10 code units are needed | Failed: error -47: no more memory: 10 code units are needed | |||
A replacement string is ignored with POSIX and DFA matching. Spec ifying partial | A replacement string is ignored with POSIX and DFA matching. Spe cifying partial | |||
matching provokes an error return ("bad option value") from pcre2_su bstitute(). | matching provokes an error return ("bad option value") from pcre2_su bstitute(). | |||
Testing substitute callouts | Testing substitute callouts | |||
If the substitute_callout modifier is set, a substitution callout | If the substitute_callout modifier is set, a substitution callout fu | |||
function is set | nction is set | |||
up. The null_context modifier must not be set, because the address o | up. The null_context modifier must not be set, because the address | |||
f the callout | of the callout | |||
function is passed in a match context. When the callout function i | function is passed in a match context. When the callout function is | |||
s called (after | called (after | |||
each substitution), details of the input and output strings are outp | each substitution), details of the input and output strings are ou | |||
ut. For exam‐ | tput. For exam‐ | |||
ple: | ple: | |||
/abc/g,replace=<$0>,substitute_callout | /abc/g,replace=<$0>,substitute_callout | |||
abcdefabcpqr | abcdefabcpqr | |||
1(1) Old 0 3 "abc" New 0 5 "<abc>" | 1(1) Old 0 3 "abc" New 0 5 "<abc>" | |||
2(1) Old 6 9 "abc" New 8 13 "<abc>" | 2(1) Old 6 9 "abc" New 8 13 "<abc>" | |||
2: <abc>def<abc>pqr | 2: <abc>def<abc>pqr | |||
The first number on each callout line is the count of matches. Th | The first number on each callout line is the count of matches. The | |||
e parenthesized | parenthesized | |||
number is the number of pairs that are set in the ovector (that is, | number is the number of pairs that are set in the ovector (that is | |||
one more than | , one more than | |||
the number of capturing groups that were set). Then are listed the | the number of capturing groups that were set). Then are listed the o | |||
offsets of the | ffsets of the | |||
old substring, its contents, and the same for the replacement. | old substring, its contents, and the same for the replacement. | |||
By default, the substitution callout function returns zero, which ac cepts the re‐ | By default, the substitution callout function returns zero, which accepts the re‐ | |||
placement and causes matching to continue if /g was used. Two furthe r modifiers can | placement and causes matching to continue if /g was used. Two furthe r modifiers can | |||
be used to test other return values. If substitute_skip is set to | be used to test other return values. If substitute_skip is set to a | |||
a value greater | value greater | |||
than zero the callout function returns +1 for the match of that numb | than zero the callout function returns +1 for the match of that nu | |||
er, and simi‐ | mber, and simi‐ | |||
larly substitute_stop returns -1. These cause the replacement to b | larly substitute_stop returns -1. These cause the replacement to be | |||
e rejected, and | rejected, and | |||
-1 causes no further matching to take place. If either of them are | -1 causes no further matching to take place. If either of them a | |||
set, substi‐ | re set, substi‐ | |||
tute_callout is assumed. For example: | tute_callout is assumed. For example: | |||
/abc/g,replace=<$0>,substitute_skip=1 | /abc/g,replace=<$0>,substitute_skip=1 | |||
abcdefabcpqr | abcdefabcpqr | |||
1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED" | 1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED" | |||
2(1) Old 6 9 "abc" New 6 11 "<abc>" | 2(1) Old 6 9 "abc" New 6 11 "<abc>" | |||
2: abcdef<abc>pqr | 2: abcdef<abc>pqr | |||
abcdefabcpqr\=substitute_stop=1 | abcdefabcpqr\=substitute_stop=1 | |||
1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED" | 1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED" | |||
1: abcdefabcpqr | 1: abcdefabcpqr | |||
If both are set for the same number, stop takes precedence. Only a single skip or | If both are set for the same number, stop takes precedence. Only a s ingle skip or | |||
stop is supported, which is sufficient for testing that the feature works. | stop is supported, which is sufficient for testing that the feature works. | |||
Setting the JIT stack size | Setting the JIT stack size | |||
The jitstack modifier provides a way of setting the maximum stack si ze that is used | The jitstack modifier provides a way of setting the maximum stack si ze that is used | |||
by the just-in-time optimization code. It is ignored if JIT optimiza tion is not be‐ | by the just-in-time optimization code. It is ignored if JIT optimiza tion is not be‐ | |||
ing used. The value is a number of kibibytes (units of 1024 bytes). | ing used. The value is a number of kibibytes (units of 1024 bytes | |||
Setting zero | ). Setting zero | |||
reverts to the default of 32KiB. Providing a stack that is larger t | reverts to the default of 32KiB. Providing a stack that is larger th | |||
han the default | an the default | |||
is necessary only for very complicated patterns. If jitstack is set | is necessary only for very complicated patterns. If jitstack is se | |||
non-zero on a | t non-zero on a | |||
subject line it overrides any value that was set on the pattern. | subject line it overrides any value that was set on the pattern. | |||
Setting heap, match, and depth limits | Setting heap, match, and depth limits | |||
The heap_limit, match_limit, and depth_limit modifiers set the app | The heap_limit, match_limit, and depth_limit modifiers set the appr | |||
ropriate limits | opriate limits | |||
in the match context. These values are ignored when the find_limits | in the match context. These values are ignored when the find_limi | |||
or find_lim‐ | ts or find_lim‐ | |||
its_noheap modifier is specified. | its_noheap modifier is specified. | |||
Finding minimum limits | Finding minimum limits | |||
If the find_limits modifier is present on a subject line, pcre2test calls the rele‐ | If the find_limits modifier is present on a subject line, pcre2test calls the rele‐ | |||
vant matching function several times, setting different values in th e match context | vant matching function several times, setting different values in th e match context | |||
via pcre2_set_heap_limit(), pcre2_set_match_limit(), or pcre2_set_de pth_limit() un‐ | via pcre2_set_heap_limit(), pcre2_set_match_limit(), or pcre2_set_de pth_limit() un‐ | |||
til it finds the smallest value for each parameter that allows th | til it finds the smallest value for each parameter that allows the | |||
e match to com‐ | match to com‐ | |||
plete without a "limit exceeded" error. The match itself may succeed | plete without a "limit exceeded" error. The match itself may succ | |||
or fail. An | eed or fail. An | |||
alternative modifier, find_limits_noheap, omits the heap limit. This is used in the | alternative modifier, find_limits_noheap, omits the heap limit. This is used in the | |||
standard tests, because the minimum heap limit varies between sys | standard tests, because the minimum heap limit varies between system | |||
tems. If JIT is | s. If JIT is | |||
being used, only the match limit is relevant, and the other two are | being used, only the match limit is relevant, and the other two ar | |||
automatically | e automatically | |||
omitted. | omitted. | |||
When using this modifier, the pattern should not contain any limit s ettings such as | When using this modifier, the pattern should not contain any limit s ettings such as | |||
(*LIMIT_MATCH=...) within it. If such a setting is present and is lower than the | (*LIMIT_MATCH=...) within it. If such a setting is present and is l ower than the | |||
minimum matching value, the minimum value cannot be found because | minimum matching value, the minimum value cannot be found because | |||
pcre2_set_match_limit() etc. are only able to reduce the value o f an in-pattern | pcre2_set_match_limit() etc. are only able to reduce the value of an in-pattern | |||
limit; they cannot increase it. | limit; they cannot increase it. | |||
For non-DFA matching, the minimum depth_limit number is a measure | For non-DFA matching, the minimum depth_limit number is a meas | |||
of how much | ure of how much | |||
nested backtracking happens (that is, how deeply the pattern's tre | nested backtracking happens (that is, how deeply the pattern's tree | |||
e is searched). | is searched). | |||
In the case of DFA matching, depth_limit controls the depth of recur | In the case of DFA matching, depth_limit controls the depth of rec | |||
sive calls of | ursive calls of | |||
the internal function that is used for handling pattern recursion, | the internal function that is used for handling pattern recursion, | |||
lookaround as‐ | lookaround as‐ | |||
sertions, and atomic groups. | sertions, and atomic groups. | |||
For non-DFA matching, the match_limit number is a measure of the am | For non-DFA matching, the match_limit number is a measure of the | |||
ount of back‐ | amount of back‐ | |||
tracking that takes place, and learning the minimum value can be i | tracking that takes place, and learning the minimum value can be in | |||
nstructive. For | structive. For | |||
most simple matches, the number is quite small, but for patterns wi | most simple matches, the number is quite small, but for patterns | |||
th very large | with very large | |||
numbers of matching possibilities, it can become large very quickly with increasing | numbers of matching possibilities, it can become large very quickly with increasing | |||
length of subject string. In the case of DFA matching, match_limit c ontrols the to‐ | length of subject string. In the case of DFA matching, match_limit c ontrols the to‐ | |||
tal number of calls, both recursive and non-recursive, to the in ternal matching | tal number of calls, both recursive and non-recursive, to the int ernal matching | |||
function, thus controlling the overall amount of computing resource that is used. | function, thus controlling the overall amount of computing resource that is used. | |||
For both kinds of matching, the heap_limit number, which is in kibib ytes (units of | For both kinds of matching, the heap_limit number, which is in kibi bytes (units of | |||
1024 bytes), limits the amount of heap memory used for matching. | 1024 bytes), limits the amount of heap memory used for matching. | |||
Showing MARK names | Showing MARK names | |||
The mark modifier causes the names from backtracking control ver | The mark modifier causes the names from backtracking control verbs | |||
bs that are re‐ | that are re‐ | |||
turned from calls to pcre2_match() to be displayed. If a mark is r | turned from calls to pcre2_match() to be displayed. If a mark is | |||
eturned for a | returned for a | |||
match, non-match, or partial match, pcre2test shows it. For a ma | match, non-match, or partial match, pcre2test shows it. For a match | |||
tch, it is on a | , it is on a | |||
line by itself, tagged with "MK:". Otherwise, it is added to the non -match message. | line by itself, tagged with "MK:". Otherwise, it is added to the non -match message. | |||
Showing memory usage | Showing memory usage | |||
The memory modifier causes pcre2test to log the sizes of all heap me mory allocation | The memory modifier causes pcre2test to log the sizes of all heap me mory allocation | |||
and freeing calls that occur during a call to pcre2_match() or pcr | and freeing calls that occur during a call to pcre2_match() or pcr | |||
e2_dfa_match(). | e2_dfa_match(). | |||
In the latter case, heap memory is used only when a match require | In the latter case, heap memory is used only when a match requires | |||
s more internal | more internal | |||
workspace that the default allocation on the stack, so in many cases | workspace that the default allocation on the stack, so in many case | |||
there will be | s there will be | |||
no output. No heap memory is allocated during matching with JIT. Fo | no output. No heap memory is allocated during matching with JIT. For | |||
r this modifier | this modifier | |||
to work, the null_context modifier must not be set on both the patte rn and the sub‐ | to work, the null_context modifier must not be set on both the patte rn and the sub‐ | |||
ject, though it can be set on one or the other. | ject, though it can be set on one or the other. | |||
Showing the heap frame overall vector size | Showing the heap frame overall vector size | |||
The heapframes_size modifier is relevant for matches using pcre2_m | The heapframes_size modifier is relevant for matches using pcre2_ | |||
atch() without | match() without | |||
JIT. After a match has run (whether successful or not) the size, i | JIT. After a match has run (whether successful or not) the size, in | |||
n bytes, of the | bytes, of the | |||
allocated heap frames vector that is left attached to the match | allocated heap frames vector that is left attached to the matc | |||
data block is | h data block is | |||
shown. If the matching action involved several calls to pcre2_match( ) (for example, | shown. If the matching action involved several calls to pcre2_match( ) (for example, | |||
global matching or for timing) only the final value is shown. | global matching or for timing) only the final value is shown. | |||
This modifier is ignored, with a warning, for POSIX or DFA matchin g. JIT matching | This modifier is ignored, with a warning, for POSIX or DFA matching. JIT matching | |||
does not use the heap frames vector, so the size is always zero, unl ess there was a | does not use the heap frames vector, so the size is always zero, unl ess there was a | |||
previous non-JIT match. Note that specifing a size of zero for the | previous non-JIT match. Note that specifing a size of zero for th | |||
output vector | e output vector | |||
(see below) causes pcre2test to free its match data block (and | (see below) causes pcre2test to free its match data block (and a | |||
associated heap | ssociated heap | |||
frames vector) and allocate a new one. | frames vector) and allocate a new one. | |||
Setting a starting offset | Setting a starting offset | |||
The offset modifier sets an offset in the subject string at which ma tching starts. | The offset modifier sets an offset in the subject string at which m atching starts. | |||
Its value is a number of code units, not characters. | Its value is a number of code units, not characters. | |||
Setting an offset limit | Setting an offset limit | |||
The offset_limit modifier sets a limit for unanchored matches. If a match cannot be | The offset_limit modifier sets a limit for unanchored matches. If a match cannot be | |||
found starting at or before this offset in the subject, a "no m atch" return is | found starting at or before this offset in the subject, a "no mat ch" return is | |||
given. The data value is a number of code units, not characters. Whe n this modifier | given. The data value is a number of code units, not characters. Whe n this modifier | |||
is used, the use_offset_limit modifier must have been set for the pa ttern; if not, | is used, the use_offset_limit modifier must have been set for the p attern; if not, | |||
an error is generated. | an error is generated. | |||
Setting the size of the output vector | Setting the size of the output vector | |||
The ovector modifier applies only to the subject line in which it | The ovector modifier applies only to the subject line in which it a | |||
appears, though | ppears, though | |||
of course it can also be used to set a default in a #subject command | of course it can also be used to set a default in a #subject comman | |||
. It specifies | d. It specifies | |||
the number of pairs of offsets that are available for storing matchi ng information. | the number of pairs of offsets that are available for storing matchi ng information. | |||
The default is 15. | The default is 15. | |||
A value of zero is useful when testing the POSIX API because it caus es regexec() to | A value of zero is useful when testing the POSIX API because it caus es regexec() to | |||
be called with a NULL capture vector. When not testing the POSIX API, a value of | be called with a NULL capture vector. When not testing the POSIX API , a value of | |||
zero is used to cause pcre2_match_data_create_from_pattern() to be c alled, in order | zero is used to cause pcre2_match_data_create_from_pattern() to be c alled, in order | |||
to create a new match block of exactly the right size for the patter | to create a new match block of exactly the right size for the patt | |||
n. (It is not | ern. (It is not | |||
possible to create a match block with a zero-length ovector; the | possible to create a match block with a zero-length ovector; there | |||
re is always at | is always at | |||
least one pair of offsets.) The old match data block is freed. | least one pair of offsets.) The old match data block is freed. | |||
Passing the subject as zero-terminated | Passing the subject as zero-terminated | |||
By default, the subject string is passed to a native API matching fu nction with its | By default, the subject string is passed to a native API matching fu nction with its | |||
correct length. In order to test the facility for passing a zero-ter minated string, | correct length. In order to test the facility for passing a zero-ter minated string, | |||
the zero_terminate modifier is provided. It causes the length to | the zero_terminate modifier is provided. It causes the length | |||
be passed as | to be passed as | |||
PCRE2_ZERO_TERMINATED. When matching via the POSIX interface, this | PCRE2_ZERO_TERMINATED. When matching via the POSIX interface, this m | |||
modifier is ig‐ | odifier is ig‐ | |||
nored, with a warning. | nored, with a warning. | |||
When testing pcre2_substitute(), this modifier also has the effect o f passing the | When testing pcre2_substitute(), this modifier also has the effect of passing the | |||
replacement string as zero-terminated. | replacement string as zero-terminated. | |||
Passing a NULL context, subject, or replacement | Passing a NULL context, subject, or replacement | |||
Normally, pcre2test passes a context block to pcre2_match(), pcr e2_dfa_match(), | Normally, pcre2test passes a context block to pcre2_match(), pcr e2_dfa_match(), | |||
pcre2_jit_match() or pcre2_substitute(). If the null_context modifi er is set, how‐ | pcre2_jit_match() or pcre2_substitute(). If the null_context modifi er is set, how‐ | |||
ever, NULL is passed. This is for testing that the matching and subs titution func‐ | ever, NULL is passed. This is for testing that the matching and sub stitution func‐ | |||
tions behave correctly in this case (they use default values). This modifier cannot | tions behave correctly in this case (they use default values). This modifier cannot | |||
be used with the find_limits, find_limits_noheap, or substitute_call out modifiers. | be used with the find_limits, find_limits_noheap, or substitute_call out modifiers. | |||
Similarly, for testing purposes, if the null_subject or null_repla | Similarly, for testing purposes, if the null_subject or null_replac | |||
cement modifier | ement modifier | |||
is set, the subject or replacement string pointers are passed as | is set, the subject or replacement string pointers are passed a | |||
NULL, respec‐ | s NULL, respec‐ | |||
tively, to the relevant functions. | tively, to the relevant functions. | |||
THE ALTERNATIVE MATCHING FUNCTION | THE ALTERNATIVE MATCHING FUNCTION | |||
By default, pcre2test uses the standard PCRE2 matching function, p | By default, pcre2test uses the standard PCRE2 matching function, pc | |||
cre2_match() to | re2_match() to | |||
match each subject line. PCRE2 also supports an alternative matc | match each subject line. PCRE2 also supports an alternative mat | |||
hing function, | ching function, | |||
pcre2_dfa_match(), which operates in a different way, and has som | pcre2_dfa_match(), which operates in a different way, and has some | |||
e restrictions. | restrictions. | |||
The differences between the two functions are described in the pcre2 | The differences between the two functions are described in the pcre | |||
matching docu‐ | 2matching docu‐ | |||
mentation. | mentation. | |||
If the dfa modifier is set, the alternative matching function is us | If the dfa modifier is set, the alternative matching function is use | |||
ed. This func‐ | d. This func‐ | |||
tion finds all possible matches at a given point in the subject. If, | tion finds all possible matches at a given point in the subject. I | |||
however, the | f, however, the | |||
dfa_shortest modifier is set, processing stops after the first match is found. This | dfa_shortest modifier is set, processing stops after the first match is found. This | |||
is always the shortest possible match. | is always the shortest possible match. | |||
DEFAULT OUTPUT FROM pcre2test | DEFAULT OUTPUT FROM pcre2test | |||
This section describes the output when the normal matching function, pcre2_match(), | This section describes the output when the normal matching function, pcre2_match(), | |||
is being used. | is being used. | |||
When a match succeeds, pcre2test outputs the list of captured subst rings, starting | When a match succeeds, pcre2test outputs the list of captured substr ings, starting | |||
with number 0 for the string that matched the whole pattern. Otherw ise, it outputs | with number 0 for the string that matched the whole pattern. Otherw ise, it outputs | |||
"No match" when the return is PCRE2_ERROR_NOMATCH, or "Partial match :" followed by | "No match" when the return is PCRE2_ERROR_NOMATCH, or "Partial matc h:" followed by | |||
the partially matching substring when the return is PCRE2_ERROR_PART IAL. (Note that | the partially matching substring when the return is PCRE2_ERROR_PART IAL. (Note that | |||
this is the entire substring that was inspected during the partia l match; it may | this is the entire substring that was inspected during the partial match; it may | |||
include characters before the actual match start if a lookbehind ass ertion, \K, \b, | include characters before the actual match start if a lookbehind ass ertion, \K, \b, | |||
or \B was involved.) | or \B was involved.) | |||
For any other return, pcre2test outputs the PCRE2 negative error num ber and a short | For any other return, pcre2test outputs the PCRE2 negative error num ber and a short | |||
descriptive phrase. If the error is a failed UTF string check, the c ode unit offset | descriptive phrase. If the error is a failed UTF string check, the c ode unit offset | |||
of the start of the failing character is also output. Here is an exa mple of an in‐ | of the start of the failing character is also output. Here is an ex ample of an in‐ | |||
teractive pcre2test run. | teractive pcre2test run. | |||
$ pcre2test | $ pcre2test | |||
PCRE2 version 10.22 2016-07-29 | PCRE2 version 10.22 2016-07-29 | |||
re> /^abc(\d+)/ | re> /^abc(\d+)/ | |||
data> abc123 | data> abc123 | |||
0: abc123 | 0: abc123 | |||
1: 123 | 1: 123 | |||
data> xyz | data> xyz | |||
No match | No match | |||
Unset capturing substrings that are not followed by one that is se | Unset capturing substrings that are not followed by one that is set | |||
t are not shown | are not shown | |||
by pcre2test unless the allcaptures modifier is specified. In the f | by pcre2test unless the allcaptures modifier is specified. In the | |||
ollowing exam‐ | following exam‐ | |||
ple, there are two capturing substrings, but when the first data l | ple, there are two capturing substrings, but when the first data lin | |||
ine is matched, | e is matched, | |||
the second, unset substring is not shown. An "internal" unset substr ing is shown as | the second, unset substring is not shown. An "internal" unset substr ing is shown as | |||
"<unset>", as for the second data line. | "<unset>", as for the second data line. | |||
re> /(a)|(b)/ | re> /(a)|(b)/ | |||
data> a | data> a | |||
0: a | 0: a | |||
1: a | 1: a | |||
data> b | data> b | |||
0: b | 0: b | |||
1: <unset> | 1: <unset> | |||
2: b | 2: b | |||
If the strings contain any non-printing characters, they are output as \xhh escapes | If the strings contain any non-printing characters, they are output as \xhh escapes | |||
if the value is less than 256 and UTF mode is not set. Otherwise the y are output as | if the value is less than 256 and UTF mode is not set. Otherwise the y are output as | |||
\x{hh...} escapes. See below for the definition of non-printing char | \x{hh...} escapes. See below for the definition of non-printing cha | |||
acters. If the | racters. If the | |||
aftertext modifier is set, the output for substring 0 is followed | aftertext modifier is set, the output for substring 0 is followed by | |||
by the rest of | the rest of | |||
the subject string, identified by "0+" like this: | the subject string, identified by "0+" like this: | |||
re> /cat/aftertext | re> /cat/aftertext | |||
data> cataract | data> cataract | |||
0: cat | 0: cat | |||
0+ aract | 0+ aract | |||
If global matching is requested, the results of successive matching attempts are | If global matching is requested, the results of successive matchi ng attempts are | |||
output in sequence, like this: | output in sequence, like this: | |||
re> /\Bi(\w\w)/g | re> /\Bi(\w\w)/g | |||
data> Mississippi | data> Mississippi | |||
0: iss | 0: iss | |||
1: ss | 1: ss | |||
0: iss | 0: iss | |||
1: ss | 1: ss | |||
0: ipp | 0: ipp | |||
1: pp | 1: pp | |||
"No match" is output only if the first match attempt fails. Here is an example of a | "No match" is output only if the first match attempt fails. Here is an example of a | |||
failure message (the offset 4 that is specified by the offset modif ier is past the | failure message (the offset 4 that is specified by the offset modifi er is past the | |||
end of the subject string): | end of the subject string): | |||
re> /xyz/ | re> /xyz/ | |||
data> xyz\=offset=4 | data> xyz\=offset=4 | |||
Error -24 (bad offset value) | Error -24 (bad offset value) | |||
Note that whereas patterns can be continued over several lines (a pl ain ">" prompt | Note that whereas patterns can be continued over several lines (a p lain ">" prompt | |||
is used for continuations), subject lines may not. However newlines can be included | is used for continuations), subject lines may not. However newlines can be included | |||
in a subject by means of the \n escape (or \r, \r\n, etc., depending on the newline | in a subject by means of the \n escape (or \r, \r\n, etc., depending on the newline | |||
sequence setting). | sequence setting). | |||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION | OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION | |||
When the alternative matching function, pcre2_dfa_match(), is used, the output con‐ | When the alternative matching function, pcre2_dfa_match(), is used, the output con‐ | |||
sists of a list of all the matches that start at the first point in the subject | sists of a list of all the matches that start at the first point i n the subject | |||
where there is at least one match. For example: | where there is at least one match. For example: | |||
re> /(tang|tangerine|tan)/ | re> /(tang|tangerine|tan)/ | |||
data> yellow tangerine\=dfa | data> yellow tangerine\=dfa | |||
0: tangerine | 0: tangerine | |||
1: tang | 1: tang | |||
2: tan | 2: tan | |||
Using the normal matching function on this data finds only "tang" . The longest | Using the normal matching function on this data finds only "tan g". The longest | |||
matching string is always given first (and numbered zero). After a P CRE2_ERROR_PAR‐ | matching string is always given first (and numbered zero). After a P CRE2_ERROR_PAR‐ | |||
TIAL return, the output is "Partial match:", followed by the par | TIAL return, the output is "Partial match:", followed by the part | |||
tially matching | ially matching | |||
substring. Note that this is the entire substring that was inspect | substring. Note that this is the entire substring that was inspe | |||
ed during the | cted during the | |||
partial match; it may include characters before the actual match sta rt if a lookbe‐ | partial match; it may include characters before the actual match sta rt if a lookbe‐ | |||
hind assertion, \b, or \B was involved. (\K is not supported for DFA matching.) | hind assertion, \b, or \B was involved. (\K is not supported for DFA matching.) | |||
If global matching is requested, the search for further matches res umes at the end | If global matching is requested, the search for further matches resu mes at the end | |||
of the longest match. For example: | of the longest match. For example: | |||
re> /(tang|tangerine|tan)/g | re> /(tang|tangerine|tan)/g | |||
data> yellow tangerine and tangy sultana\=dfa | data> yellow tangerine and tangy sultana\=dfa | |||
0: tangerine | 0: tangerine | |||
1: tang | 1: tang | |||
2: tan | 2: tan | |||
0: tang | 0: tang | |||
1: tan | 1: tan | |||
0: tan | 0: tan | |||
The alternative matching function does not support substring capture , so the modi‐ | The alternative matching function does not support substring captur e, so the modi‐ | |||
fiers that are concerned with captured substrings are not relevant. | fiers that are concerned with captured substrings are not relevant. | |||
RESTARTING AFTER A PARTIAL MATCH | RESTARTING AFTER A PARTIAL MATCH | |||
When the alternative matching function has given the PCRE2_ERROR_ | When the alternative matching function has given the PCRE2_ERROR_P | |||
PARTIAL return, | ARTIAL return, | |||
indicating that the subject partially matched the pattern, you ca | indicating that the subject partially matched the pattern, you | |||
n restart the | can restart the | |||
match with additional subject data by means of the dfa_restart modi | match with additional subject data by means of the dfa_restart modif | |||
fier. For exam‐ | ier. For exam‐ | |||
ple: | ple: | |||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d $/ | re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d $/ | |||
data> 23ja\=ps,dfa | data> 23ja\=ps,dfa | |||
Partial match: 23ja | Partial match: 23ja | |||
data> n05\=dfa,dfa_restart | data> n05\=dfa,dfa_restart | |||
0: n05 | 0: n05 | |||
For further information about partial matching, see the pcre2partial documentation. | For further information about partial matching, see the pcre2partial documentation. | |||
CALLOUTS | CALLOUTS | |||
If the pattern contains any callout requests, pcre2test's callou | If the pattern contains any callout requests, pcre2test's call | |||
t function is | out function is | |||
called during matching unless callout_none is specified. This | called during matching unless callout_none is specified. This wo | |||
works with both | rks with both | |||
matching functions, and with JIT, though there are some differences | matching functions, and with JIT, though there are some difference | |||
in behaviour. | s in behaviour. | |||
The output for callouts with numerical arguments and those with stri ng arguments is | The output for callouts with numerical arguments and those with stri ng arguments is | |||
slightly different. | slightly different. | |||
Callouts with numerical arguments | Callouts with numerical arguments | |||
By default, the callout function displays the callout number, the st art and current | By default, the callout function displays the callout number, the st art and current | |||
positions in the subject text at the callout time, and the next pat tern item to be | positions in the subject text at the callout time, and the next patt ern item to be | |||
tested. For example: | tested. For example: | |||
--->pqrabcdef | --->pqrabcdef | |||
0 ^ ^ \d | 0 ^ ^ \d | |||
This output indicates that callout number 0 occurred for a match at | This output indicates that callout number 0 occurred for a match a | |||
tempt starting | ttempt starting | |||
at the fourth character of the subject string, when the pointer was | at the fourth character of the subject string, when the pointer was | |||
at the seventh | at the seventh | |||
character, and when the next pattern item was \d. Just one circumfle | character, and when the next pattern item was \d. Just one circumfl | |||
x is output if | ex is output if | |||
the start and current positions are the same, or if the current po | the start and current positions are the same, or if the current pos | |||
sition precedes | ition precedes | |||
the start position, which can happen if the callout is in a lookbehi nd assertion. | the start position, which can happen if the callout is in a lookbehi nd assertion. | |||
Callouts numbered 255 are assumed to be automatic callouts, inserted as a result of | Callouts numbered 255 are assumed to be automatic callouts, inserted as a result of | |||
the auto_callout pattern modifier. In this case, instead of showin g the callout | the auto_callout pattern modifier. In this case, instead of show ing the callout | |||
number, the offset in the pattern, preceded by a plus, is output. Fo r example: | number, the offset in the pattern, preceded by a plus, is output. Fo r example: | |||
re> /\d?[A-E]\*/auto_callout | re> /\d?[A-E]\*/auto_callout | |||
data> E* | data> E* | |||
--->E* | --->E* | |||
+0 ^ \d? | +0 ^ \d? | |||
+3 ^ [A-E] | +3 ^ [A-E] | |||
+8 ^^ \* | +8 ^^ \* | |||
+10 ^ ^ | +10 ^ ^ | |||
0: E* | 0: E* | |||
skipping to change at line 1631 | skipping to change at line 1640 | |||
data> abc | data> abc | |||
--->abc | --->abc | |||
+0 ^ a | +0 ^ a | |||
+1 ^^ (*MARK:X) | +1 ^^ (*MARK:X) | |||
+10 ^^ b | +10 ^^ b | |||
Latest Mark: X | Latest Mark: X | |||
+11 ^ ^ c | +11 ^ ^ c | |||
+12 ^ ^ | +12 ^ ^ | |||
0: abc | 0: abc | |||
The mark changes between matching "a" and "b", but stays the same for the rest of | The mark changes between matching "a" and "b", but stays the same fo r the rest of | |||
the match, so nothing more is output. If, as a result of backtrackin g, the mark re‐ | the match, so nothing more is output. If, as a result of backtrackin g, the mark re‐ | |||
verts to being unset, the text "<unset>" is output. | verts to being unset, the text "<unset>" is output. | |||
Callouts with string arguments | Callouts with string arguments | |||
The output for a callout with a string argument is similar, except t | The output for a callout with a string argument is similar, except | |||
hat instead of | that instead of | |||
outputting a callout number before the position indicators, the cal | outputting a callout number before the position indicators, the call | |||
lout string and | out string and | |||
its offset in the pattern string are output before the reflection o | its offset in the pattern string are output before the reflection | |||
f the subject | of the subject | |||
string, and the subject string is reflected for each callout. For ex ample: | string, and the subject string is reflected for each callout. For ex ample: | |||
re> /^ab(?C'first')cd(?C"second")ef/ | re> /^ab(?C'first')cd(?C"second")ef/ | |||
data> abcdefg | data> abcdefg | |||
Callout (7): 'first' | Callout (7): 'first' | |||
--->abcdefg | --->abcdefg | |||
^ ^ c | ^ ^ c | |||
Callout (20): "second" | Callout (20): "second" | |||
--->abcdefg | --->abcdefg | |||
^ ^ e | ^ ^ e | |||
0: abcdef | 0: abcdef | |||
Callout modifiers | Callout modifiers | |||
The callout function in pcre2test returns zero (carry on matching) | The callout function in pcre2test returns zero (carry on matching) b | |||
by default, but | y default, but | |||
you can use a callout_fail modifier in a subject line to change this | you can use a callout_fail modifier in a subject line to change thi | |||
and other pa‐ | s and other pa‐ | |||
rameters of the callout (see below). | rameters of the callout (see below). | |||
If the callout_capture modifier is set, the current captured groups are output when | If the callout_capture modifier is set, the current captured groups are output when | |||
a callout occurs. This is useful only for non-DFA matching, as pc re2_dfa_match() | a callout occurs. This is useful only for non-DFA matching, as pc re2_dfa_match() | |||
does not support capturing, so no captures are ever shown. | does not support capturing, so no captures are ever shown. | |||
The normal callout output, showing the callout number or pattern o ffset (as de‐ | The normal callout output, showing the callout number or pattern offset (as de‐ | |||
scribed above) is suppressed if the callout_no_where modifier is set . | scribed above) is suppressed if the callout_no_where modifier is set . | |||
When using the interpretive matching function pcre2_match() witho | When using the interpretive matching function pcre2_match() without | |||
ut JIT, setting | JIT, setting | |||
the callout_extra modifier causes additional output from pcre2test's | the callout_extra modifier causes additional output from pcre2test' | |||
callout func‐ | s callout func‐ | |||
tion to be generated. For the first callout in a match attempt at | tion to be generated. For the first callout in a match attempt at a | |||
a new starting | new starting | |||
position in the subject, "New match attempt" is output. If there has | position in the subject, "New match attempt" is output. If there h | |||
been a back‐ | as been a back‐ | |||
track since the last callout (or start of matching if this is the | track since the last callout (or start of matching if this is the f | |||
first callout), | irst callout), | |||
"Backtrack" is output, followed by "No other matching paths" if the backtrack ended | "Backtrack" is output, followed by "No other matching paths" if the backtrack ended | |||
the previous match attempt. For example: | the previous match attempt. For example: | |||
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess | re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess | |||
data> aac\=callout_extra | data> aac\=callout_extra | |||
New match attempt | New match attempt | |||
--->aac | --->aac | |||
+0 ^ ( | +0 ^ ( | |||
+1 ^ a+ | +1 ^ a+ | |||
+3 ^ ^ ) | +3 ^ ^ ) | |||
skipping to change at line 1707 | skipping to change at line 1716 | |||
+0 ^ ( | +0 ^ ( | |||
+1 ^ a+ | +1 ^ a+ | |||
Backtrack | Backtrack | |||
No other matching paths | No other matching paths | |||
New match attempt | New match attempt | |||
--->aac | --->aac | |||
+0 ^ ( | +0 ^ ( | |||
+1 ^ a+ | +1 ^ a+ | |||
No match | No match | |||
Notice that various optimizations must be turned off if you want all possible | Notice that various optimizations must be turned off if you wa nt all possible | |||
matching paths to be scanned. If no_start_optimize is not used, ther e is an immedi‐ | matching paths to be scanned. If no_start_optimize is not used, ther e is an immedi‐ | |||
ate "no match", without any callouts, because the starting optimi | ate "no match", without any callouts, because the starting optimiza | |||
zation fails to | tion fails to | |||
find "b" in the subject, which it knows must be present for | find "b" in the subject, which it knows must be present fo | |||
any match. If | r any match. If | |||
no_auto_possess is not used, the "a+" item is turned into "a++", wh | no_auto_possess is not used, the "a+" item is turned into "a++", whi | |||
ich reduces the | ch reduces the | |||
number of backtracks. | number of backtracks. | |||
The callout_extra modifier has no effect if used with the DFA matchi ng function, or | The callout_extra modifier has no effect if used with the DFA matchi ng function, or | |||
with JIT. | with JIT. | |||
Return values from callouts | Return values from callouts | |||
The default return from the callout function is zero, which allows m atching to con‐ | The default return from the callout function is zero, which allows m atching to con‐ | |||
tinue. The callout_fail modifier can be given one or two numbers. If there is only | tinue. The callout_fail modifier can be given one or two numbers. I f there is only | |||
one number, 1 is returned instead of 0 (causing matching to backtrac k) when a call‐ | one number, 1 is returned instead of 0 (causing matching to backtrac k) when a call‐ | |||
out of that number is reached. If two numbers (<n>:<m>) are given | out of that number is reached. If two numbers (<n>:<m>) are given, | |||
, 1 is returned | 1 is returned | |||
when callout <n> is reached and there have been at least <m> callou | when callout <n> is reached and there have been at least <m> call | |||
ts. The call‐ | outs. The call‐ | |||
out_error modifier is similar, except that PCRE2_ERROR_CALLOUT is re turned, causing | out_error modifier is similar, except that PCRE2_ERROR_CALLOUT is re turned, causing | |||
the entire matching process to be aborted. If both these modifiers are set for the | the entire matching process to be aborted. If both these modifiers a re set for the | |||
same callout number, callout_error takes precedence. Note that callo uts with string | same callout number, callout_error takes precedence. Note that callo uts with string | |||
arguments are always given the number zero. | arguments are always given the number zero. | |||
The callout_data modifier can be given an unsigned or a negative num | The callout_data modifier can be given an unsigned or a negative n | |||
ber. This is | umber. This is | |||
set as the "user data" that is passed to the matching function, | set as the "user data" that is passed to the matching function, an | |||
and passed back | d passed back | |||
when the callout function is invoked. Any value other than zero is u sed as a return | when the callout function is invoked. Any value other than zero is u sed as a return | |||
from pcre2test's callout function. | from pcre2test's callout function. | |||
Inserting callouts can be helpful when using pcre2test to check comp licated regular | Inserting callouts can be helpful when using pcre2test to check comp licated regular | |||
expressions. For further information about callouts, see the pcre2ca llout documen‐ | expressions. For further information about callouts, see the pcre2c allout documen‐ | |||
tation. | tation. | |||
NON-PRINTING CHARACTERS | NON-PRINTING CHARACTERS | |||
When pcre2test is outputting text in the compiled version of a patte rn, bytes other | When pcre2test is outputting text in the compiled version of a patte rn, bytes other | |||
than 32-126 are always treated as non-printing characters and are therefore shown | than 32-126 are always treated as non-printing characters and are t herefore shown | |||
as hex escapes. | as hex escapes. | |||
When pcre2test is outputting text that is a matched part of a subje | When pcre2test is outputting text that is a matched part of a sub | |||
ct string, it | ject string, it | |||
behaves in the same way, unless a different locale has been set | behaves in the same way, unless a different locale has been set fo | |||
for the pattern | r the pattern | |||
(using the locale modifier). In this case, the isprint() function is | (using the locale modifier). In this case, the isprint() function | |||
used to dis‐ | is used to dis‐ | |||
tinguish printing and non-printing characters. | tinguish printing and non-printing characters. | |||
SAVING AND RESTORING COMPILED PATTERNS | SAVING AND RESTORING COMPILED PATTERNS | |||
It is possible to save compiled patterns on disc or elsewhere, | It is possible to save compiled patterns on disc or elsewhere, an | |||
and reload them | d reload them | |||
later, subject to a number of restrictions. JIT data cannot be saved | later, subject to a number of restrictions. JIT data cannot be sav | |||
. The host on | ed. The host on | |||
which the patterns are reloaded must be running the same version of PCRE2, with the | which the patterns are reloaded must be running the same version of PCRE2, with the | |||
same code unit width, and must also have the same endianness, po | same code unit width, and must also have the same endianness, poin | |||
inter width and | ter width and | |||
PCRE2_SIZE type. Before compiled patterns can be saved they must | PCRE2_SIZE type. Before compiled patterns can be saved they must | |||
be serialized, | be serialized, | |||
that is, converted to a stream of bytes. A single byte stream may c | that is, converted to a stream of bytes. A single byte stream may co | |||
ontain any num‐ | ntain any num‐ | |||
ber of compiled patterns, but they must all use the same character t ables. A single | ber of compiled patterns, but they must all use the same character t ables. A single | |||
copy of the tables is included in the byte stream (its size is 1088 bytes). | copy of the tables is included in the byte stream (its size is 1088 bytes). | |||
The functions whose names begin with pcre2_serialize_ are used for s | The functions whose names begin with pcre2_serialize_ are used for | |||
erializing and | serializing and | |||
de-serializing. They are described in the pcre2serialize documen | de-serializing. They are described in the pcre2serialize documenta | |||
tation. In this | tion. In this | |||
section we describe the features of pcre2test that can be used to te | section we describe the features of pcre2test that can be used to t | |||
st these func‐ | est these func‐ | |||
tions. | tions. | |||
Note that "serialization" in PCRE2 does not convert compiled pat | Note that "serialization" in PCRE2 does not convert compiled patter | |||
terns to an ab‐ | ns to an ab‐ | |||
stract format like Java or .NET. It just makes a reloadable byte | stract format like Java or .NET. It just makes a reloadable by | |||
code stream. | te code stream. | |||
Hence the restrictions on reloading mentioned above. | Hence the restrictions on reloading mentioned above. | |||
In pcre2test, when a pattern with push modifier is successfully | In pcre2test, when a pattern with push modifier is successfully co | |||
compiled, it is | mpiled, it is | |||
pushed onto a stack of compiled patterns, and pcre2test expects the | pushed onto a stack of compiled patterns, and pcre2test expects t | |||
next line to | he next line to | |||
contain a new pattern (or command) instead of a subject line. B | contain a new pattern (or command) instead of a subject line. By | |||
y contrast, the | contrast, the | |||
pushcopy modifier causes a copy of the compiled pattern to be stacke | pushcopy modifier causes a copy of the compiled pattern to be stack | |||
d, leaving the | ed, leaving the | |||
original available for immediate matching. By using push and/or pus | original available for immediate matching. By using push and/or push | |||
hcopy, a number | copy, a number | |||
of patterns can be compiled and retained. These modifiers are inc | of patterns can be compiled and retained. These modifiers are in | |||
ompatible with | compatible with | |||
posix, and control modifiers that act at match time are ignored ( | posix, and control modifiers that act at match time are ignored (wi | |||
with a message) | th a message) | |||
for the stacked patterns. The jitverify modifier applies only at com pile time. | for the stacked patterns. The jitverify modifier applies only at com pile time. | |||
The command | The command | |||
#save <filename> | #save <filename> | |||
causes all the stacked patterns to be serialized and the result w ritten to the | causes all the stacked patterns to be serialized and the result written to the | |||
named file. Afterwards, all the stacked patterns are freed. The comm and | named file. Afterwards, all the stacked patterns are freed. The comm and | |||
#load <filename> | #load <filename> | |||
reads the data in the file, and then arranges for it to be de-seria | reads the data in the file, and then arranges for it to be de-serial | |||
lized, with the | ized, with the | |||
resulting compiled patterns added to the pattern stack. The pattern | resulting compiled patterns added to the pattern stack. The patter | |||
on the top of | n on the top of | |||
the stack can be retrieved by the #pop command, which must be follo | the stack can be retrieved by the #pop command, which must be follow | |||
wed by lines of | ed by lines of | |||
subjects that are to be matched with the pattern, terminated as usua | subjects that are to be matched with the pattern, terminated as us | |||
l by an empty | ual by an empty | |||
line or end of file. This command may be followed by a modifier | line or end of file. This command may be followed by a modifier l | |||
list containing | ist containing | |||
only control modifiers that act after a pattern has been compiled. | only control modifiers that act after a pattern has been compiled. | |||
In particular, | In particular, | |||
hex, posix, posix_nosub, push, and pushcopy are not allowed, nor | hex, posix, posix_nosub, push, and pushcopy are not allowed, nor ar | |||
are any option- | e any option- | |||
setting modifiers. The JIT modifiers are, however permitted. Here | setting modifiers. The JIT modifiers are, however permitted. Her | |||
is an example | e is an example | |||
that saves and reloads two patterns. | that saves and reloads two patterns. | |||
/abc/push | /abc/push | |||
/xyz/push | /xyz/push | |||
#save tempfile | #save tempfile | |||
#load tempfile | #load tempfile | |||
#pop info | #pop info | |||
xyz | xyz | |||
#pop jit,bincode | #pop jit,bincode | |||
abc | abc | |||
If jitverify is used with #pop, it does not automatically imply jit , which is dif‐ | If jitverify is used with #pop, it does not automatically imply jit, which is dif‐ | |||
ferent behaviour from when it is used on a pattern. | ferent behaviour from when it is used on a pattern. | |||
The #popcopy command is analogous to the pushcopy modifier in that i t makes current | The #popcopy command is analogous to the pushcopy modifier in that i t makes current | |||
a copy of the topmost stack pattern, leaving the original still on t he stack. | a copy of the topmost stack pattern, leaving the original still on t he stack. | |||
SEE ALSO | SEE ALSO | |||
pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching( 3), pcre2par‐ | pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching (3), pcre2par‐ | |||
tial(d), pcre2pattern(3), pcre2serialize(3). | tial(d), pcre2pattern(3), pcre2serialize(3). | |||
AUTHOR | AUTHOR | |||
Philip Hazel | Philip Hazel | |||
Retired from University Computing Service | Retired from University Computing Service | |||
Cambridge, England. | Cambridge, England. | |||
REVISION | REVISION | |||
Last updated: 27 January 2024 | Last updated: 24 April 2024 | |||
Copyright (c) 1997-2024 University of Cambridge. | Copyright (c) 1997-2024 University of Cambridge. | |||
PCRE 10.43 27 January 2024 PCRE2TEST(1) | PCRE 10.44 24 April 2024 PCRE2TEST(1) | |||
End of changes. 138 change blocks. | ||||
473 lines changed or deleted | 486 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |