| pcre2test.txt | pcre2test.txt | |||
|---|---|---|---|---|
| skipping to change at line 622 | skipping to change at line 622 | |||
| convert_length set convert buffer length | convert_length set convert buffer length | |||
| debug same as info,fullbincode | debug same as info,fullbincode | |||
| framesize show matching frame size | framesize show matching frame size | |||
| fullbincode show binary code with lengths | fullbincode show binary code with lengths | |||
| /I info show info about compiled pattern | /I info show info about compiled pattern | |||
| hex unquoted characters are hexadecimal | hex unquoted characters are hexadecimal | |||
| jit[=<number>] use JIT | jit[=<number>] use JIT | |||
| jitfast use JIT fast path | jitfast use JIT fast path | |||
| jitverify verify JIT use | jitverify verify JIT use | |||
| locale=<name> use this locale | locale=<name> use this locale | |||
| max_pattern_length=<n> set maximum pattern length | max_pattern_compiled ) set maximum compiled pattern | |||
| _length=<n> ) length (bytes) | ||||
| max_pattern_length=<n> set maximum pattern length (code uni | ||||
| ts) | ||||
| max_varlookbehind=<n> set maximum variable lookbehind leng th | max_varlookbehind=<n> set maximum variable lookbehind leng th | |||
| memory show memory used | memory show memory used | |||
| newline=<type> set newline type | newline=<type> set newline type | |||
| null_context compile with a NULL context | null_context compile with a NULL context | |||
| null_pattern pass pattern as NULL | null_pattern pass pattern as NULL | |||
| parens_nest_limit=<n> set maximum parentheses depth | parens_nest_limit=<n> set maximum parentheses depth | |||
| posix use the POSIX API | posix use the POSIX API | |||
| posix_nosub use the POSIX API with REG_NOSUB | posix_nosub use the POSIX API with REG_NOSUB | |||
| push push compiled pattern onto the stack | push push compiled pattern onto the stack | |||
| pushcopy push a copy onto the stack | pushcopy push a copy onto the stack | |||
| skipping to change at line 904 | skipping to change at line 906 | |||
| pcre2test sets its own default of 220, which is required for runn ing | pcre2test sets its own default of 220, which is required for runn ing | |||
| the standard test suite. | the standard test suite. | |||
| Limiting the pattern length | Limiting the pattern length | |||
| The max_pattern_length modifier sets a limit, in code units, to the | The max_pattern_length modifier sets a limit, in code units, to the | |||
| length of pattern that pcre2_compile() will accept. Breaching the li mit | length of pattern that pcre2_compile() will accept. Breaching the li mit | |||
| causes a compilation error. The default is the largest number a | causes a compilation error. The default is the largest number a | |||
| PCRE2_SIZE variable can hold (essentially unlimited). | PCRE2_SIZE variable can hold (essentially unlimited). | |||
| Limiting the size of a compiled pattern | ||||
| The max_pattern_compiled_length modifier sets a limit, in bytes, to | ||||
| the | ||||
| amount of memory used by a compiled pattern. Breaching the limit cau | ||||
| ses | ||||
| a compilation error. The default is the largest number a PCRE2_S | ||||
| IZE | ||||
| variable can hold (essentially unlimited). | ||||
| Using the POSIX wrapper API | Using the POSIX wrapper API | |||
| The posix and posix_nosub modifiers cause pcre2test to call PCRE2 | The posix and posix_nosub modifiers cause pcre2test to call PCRE2 | |||
| via | via | |||
| the POSIX wrapper API rather than its native API. When posix_nosub | the POSIX wrapper API rather than its native API. When posix_nosub | |||
| is | is | |||
| used, the POSIX option REG_NOSUB is passed to regcomp(). The PO | used, the POSIX option REG_NOSUB is passed to regcomp(). The PO | |||
| SIX | SIX | |||
| wrapper supports only the 8-bit library. Note that it does not im | wrapper supports only the 8-bit library. Note that it does not im | |||
| ply | ply | |||
| POSIX matching semantics; for more detail see the pcre2posix documen ta- | POSIX matching semantics; for more detail see the pcre2posix documen ta- | |||
| tion. The following pattern modifiers set options for the regcom p() | tion. The following pattern modifiers set options for the regcom p() | |||
| function: | function: | |||
| caseless REG_ICASE | caseless REG_ICASE | |||
| multiline REG_NEWLINE | multiline REG_NEWLINE | |||
| dotall REG_DOTALL ) | dotall REG_DOTALL ) | |||
| ungreedy REG_UNGREEDY ) These options are not part of | ungreedy REG_UNGREEDY ) These options are not part of | |||
| ucp REG_UCP ) the POSIX standard | ucp REG_UCP ) the POSIX standard | |||
| utf REG_UTF8 ) | utf REG_UTF8 ) | |||
| The regerror_buffsize modifier specifies a size for the error buf | The regerror_buffsize modifier specifies a size for the error buf | |||
| fer | fer | |||
| that is passed to regerror() in the event of a compilation error. | that is passed to regerror() in the event of a compilation error. | |||
| For | For | |||
| example: | example: | |||
| /abc/posix,regerror_buffsize=20 | /abc/posix,regerror_buffsize=20 | |||
| This provides a means of testing the behaviour of regerror() when | This provides a means of testing the behaviour of regerror() when | |||
| the | the | |||
| buffer is too small for the error message. If this modifier has | buffer is too small for the error message. If this modifier has | |||
| not | not | |||
| been set, a large buffer is used. | been set, a large buffer is used. | |||
| The aftertext and allaftertext subject modifiers work as described be- | The aftertext and allaftertext subject modifiers work as described be- | |||
| low. All other modifiers are either ignored, with a warning message, or | low. All other modifiers are either ignored, with a warning message, or | |||
| cause an error. | cause an error. | |||
| The pattern is passed to regcomp() as a zero-terminated string by de- | The pattern is passed to regcomp() as a zero-terminated string by de- | |||
| fault, but if the use_length or hex modifiers are set, the REG_PEND ex- | fault, but if the use_length or hex modifiers are set, the REG_PEND ex- | |||
| tension is used to pass it by length. | tension is used to pass it by length. | |||
| Testing the stack guard feature | Testing the stack guard feature | |||
| The stackguard modifier is used to test the use of pcre2_set_c | The stackguard modifier is used to test the use of pcre2_set_c | |||
| om- | om- | |||
| pile_recursion_guard(), a function that is provided to enable st | pile_recursion_guard(), a function that is provided to enable st | |||
| ack | ack | |||
| availability to be checked during compilation (see the pcre2api do | availability to be checked during compilation (see the pcre2api do | |||
| cu- | cu- | |||
| mentation for details). If the number specified by the modifier | mentation for details). If the number specified by the modifier | |||
| is | is | |||
| greater than zero, pcre2_set_compile_recursion_guard() is called to set | greater than zero, pcre2_set_compile_recursion_guard() is called to set | |||
| up callback from pcre2_compile() to a local function. The argument | up callback from pcre2_compile() to a local function. The argument | |||
| it | it | |||
| receives is the current nesting parenthesis depth; if this is grea | receives is the current nesting parenthesis depth; if this is grea | |||
| ter | ter | |||
| than the value given by the modifier, non-zero is returned, causing the | than the value given by the modifier, non-zero is returned, causing the | |||
| compilation to be aborted. | compilation to be aborted. | |||
| Using alternative character tables | Using alternative character tables | |||
| The value specified for the tables modifier must be one of the dig its | The value specified for the tables modifier must be one of the dig its | |||
| 0, 1, 2, or 3. It causes a specific set of built-in character tables to | 0, 1, 2, or 3. It causes a specific set of built-in character tables to | |||
| be passed to pcre2_compile(). This is used in the PCRE2 tests to ch | be passed to pcre2_compile(). This is used in the PCRE2 tests to ch | |||
| eck | eck | |||
| behaviour with different character tables. The digit specifies the | behaviour with different character tables. The digit specifies the | |||
| ta- | ta- | |||
| bles as follows: | bles as follows: | |||
| 0 do not pass any special character tables | 0 do not pass any special character tables | |||
| 1 the default ASCII tables, as distributed in | 1 the default ASCII tables, as distributed in | |||
| pcre2_chartables.c.dist | pcre2_chartables.c.dist | |||
| 2 a set of tables defining ISO 8859 characters | 2 a set of tables defining ISO 8859 characters | |||
| 3 a set of tables loaded by the #loadtables command | 3 a set of tables loaded by the #loadtables command | |||
| In tables 2, some characters whose codes are greater than 128 are id en- | In tables 2, some characters whose codes are greater than 128 are id en- | |||
| tified as letters, digits, spaces, etc. Tables 3 can be used only af ter | tified as letters, digits, spaces, etc. Tables 3 can be used only af ter | |||
| a #loadtables command has loaded them from a binary file. Setting al- | a #loadtables command has loaded them from a binary file. Setting al- | |||
| ternate character tables and a locale are mutually exclusive. | ternate character tables and a locale are mutually exclusive. | |||
| Setting certain match controls | Setting certain match controls | |||
| The following modifiers are really subject modifiers, and are descri bed | The following modifiers are really subject modifiers, and are descri bed | |||
| under "Subject Modifiers" below. However, they may be included in | under "Subject Modifiers" below. However, they may be included i | |||
| a | n a | |||
| pattern's modifier list, in which case they are applied to every s | pattern's modifier list, in which case they are applied to every s | |||
| ub- | ub- | |||
| ject line that is processed with that pattern. These modifiers do | ject line that is processed with that pattern. These modifiers do | |||
| not | not | |||
| affect the compilation process. | affect the compilation process. | |||
| aftertext show text after match | aftertext show text after match | |||
| allaftertext show text after captures | allaftertext show text after captures | |||
| allcaptures show all captures | allcaptures show all captures | |||
| allvector show the entire ovector | allvector show the entire ovector | |||
| allusedtext show all consulted text | allusedtext show all consulted text | |||
| altglobal alternative global matching | altglobal alternative global matching | |||
| /g global global matching | /g global global matching | |||
| heapframes_size show match data heapframes size | heapframes_size show match data heapframes size | |||
| skipping to change at line 1001 | skipping to change at line 1010 | |||
| substitute_extended use PCRE2_SUBSTITUTE_EXTENDED | substitute_extended use PCRE2_SUBSTITUTE_EXTENDED | |||
| substitute_literal use PCRE2_SUBSTITUTE_LITERAL | substitute_literal use PCRE2_SUBSTITUTE_LITERAL | |||
| substitute_matched use PCRE2_SUBSTITUTE_MATCHED | substitute_matched use PCRE2_SUBSTITUTE_MATCHED | |||
| substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENG TH | substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENG TH | |||
| substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | |||
| substitute_skip=<n> skip substitution <n> | substitute_skip=<n> skip substitution <n> | |||
| substitute_stop=<n> skip substitution <n> and followin g | substitute_stop=<n> skip substitution <n> and followin g | |||
| substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | |||
| substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | |||
| These modifiers may not appear in a #pattern command. If you want t hem | These modifiers may not appear in a #pattern command. If you want t hem | |||
| as defaults, set them in a #subject command. | as defaults, set them in a #subject command. | |||
| Specifying literal subject lines | Specifying literal subject lines | |||
| If the subject_literal modifier is present on a pattern, all the s ub- | If the subject_literal modifier is present on a pattern, all the s ub- | |||
| ject lines that it matches are taken as literal strings, with no int er- | ject lines that it matches are taken as literal strings, with no int er- | |||
| pretation of backslashes. It is not possible to set subject modifi | pretation of backslashes. It is not possible to set subject modifi | |||
| ers | ers | |||
| on such lines, but any that are set as defaults by a #subject comm | on such lines, but any that are set as defaults by a #subject comm | |||
| and | and | |||
| are recognized. | are recognized. | |||
| Saving a compiled pattern | Saving a compiled pattern | |||
| When a pattern with the push modifier is successfully compiled, it | When a pattern with the push modifier is successfully compiled, it | |||
| is | is | |||
| pushed onto a stack of compiled patterns, and pcre2test expects | pushed onto a stack of compiled patterns, and pcre2test expects | |||
| the | the | |||
| next line to contain a new pattern (or a command) instead of a subj | next line to contain a new pattern (or a command) instead of a subj | |||
| ect | ect | |||
| line. This facility is used when saving compiled patterns to a file, as | line. This facility is used when saving compiled patterns to a file, as | |||
| described in the section entitled "Saving and restoring compiled p | described in the section entitled "Saving and restoring compiled p | |||
| at- | at- | |||
| terns" below. If pushcopy is used instead of push, a copy of the c | terns" below. If pushcopy is used instead of push, a copy of the c | |||
| om- | om- | |||
| piled pattern is stacked, leaving the original as current, ready | piled pattern is stacked, leaving the original as current, ready | |||
| to | to | |||
| match the following input lines. This provides a way of testing | match the following input lines. This provides a way of testing | |||
| the | the | |||
| pcre2_code_copy() function. The push and pushcopy modifiers are | pcre2_code_copy() function. The push and pushcopy modifiers are | |||
| in- | in- | |||
| compatible with compilation modifiers such as global that act at ma | compatible with compilation modifiers such as global that act at ma | |||
| tch | tch | |||
| time. Any that are specified are ignored (for the stacked copy), wit h a | time. Any that are specified are ignored (for the stacked copy), wit h a | |||
| warning message, except for replace, which causes an error. Note t | warning message, except for replace, which causes an error. Note t | |||
| hat | hat | |||
| jitverify, which is allowed, does not carry through to any subsequ | jitverify, which is allowed, does not carry through to any subsequ | |||
| ent | ent | |||
| matching that uses a stacked pattern. | matching that uses a stacked pattern. | |||
| Testing foreign pattern conversion | Testing foreign pattern conversion | |||
| The experimental foreign pattern conversion functions in PCRE2 can | The experimental foreign pattern conversion functions in PCRE2 can | |||
| be | be | |||
| tested by setting the convert modifier. Its argument is a colon-se | tested by setting the convert modifier. Its argument is a colon-se | |||
| pa- | pa- | |||
| rated list of options, which set the equivalent option for | rated list of options, which set the equivalent option for | |||
| the | the | |||
| pcre2_pattern_convert() function: | pcre2_pattern_convert() function: | |||
| glob PCRE2_CONVERT_GLOB | glob PCRE2_CONVERT_GLOB | |||
| glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR | glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR | |||
| glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR | glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR | |||
| posix_basic PCRE2_CONVERT_POSIX_BASIC | posix_basic PCRE2_CONVERT_POSIX_BASIC | |||
| posix_extended PCRE2_CONVERT_POSIX_EXTENDED | posix_extended PCRE2_CONVERT_POSIX_EXTENDED | |||
| unset Unset all options | unset Unset all options | |||
| The "unset" value is useful for turning off a default that has been set | The "unset" value is useful for turning off a default that has been set | |||
| by a #pattern command. When one of these options is set, the input p at- | by a #pattern command. When one of these options is set, the input p at- | |||
| tern is passed to pcre2_pattern_convert(). If the conversion is s | tern is passed to pcre2_pattern_convert(). If the conversion is s | |||
| uc- | uc- | |||
| cessful, the result is reflected in the output and then passed | cessful, the result is reflected in the output and then passed | |||
| to | to | |||
| pcre2_compile(). The normal utf and no_utf_check options, if set, ca use | pcre2_compile(). The normal utf and no_utf_check options, if set, ca use | |||
| the PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be | the PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be | |||
| passed to pcre2_pattern_convert(). | passed to pcre2_pattern_convert(). | |||
| By default, the conversion function is allowed to allocate a buffer for | By default, the conversion function is allowed to allocate a buffer for | |||
| its output. However, if the convert_length modifier is set to a va | its output. However, if the convert_length modifier is set to a va | |||
| lue | lue | |||
| greater than zero, pcre2test passes a buffer of the given length. T | greater than zero, pcre2test passes a buffer of the given length. T | |||
| his | his | |||
| makes it possible to test the length check. | makes it possible to test the length check. | |||
| The convert_glob_escape and convert_glob_separator modifiers can | The convert_glob_escape and convert_glob_separator modifiers can | |||
| be | be | |||
| used to specify the escape and separator characters for glob proce | used to specify the escape and separator characters for glob proce | |||
| ss- | ss- | |||
| ing, overriding the defaults, which are operating-system dependent. | ing, overriding the defaults, which are operating-system dependent. | |||
| SUBJECT MODIFIERS | SUBJECT MODIFIERS | |||
| The modifiers that can appear in subject lines and the #subject comm and | The modifiers that can appear in subject lines and the #subject comm and | |||
| are of two types. | are of two types. | |||
| Setting match options | Setting match options | |||
| The following modifiers set options for pcre2_match() or | The following modifiers set options for pcre2_match() or | |||
| pcre2_dfa_match(). See pcreapi for a description of their effects. | pcre2_dfa_match(). See pcreapi for a description of their effects. | |||
| anchored set PCRE2_ANCHORED | anchored set PCRE2_ANCHORED | |||
| endanchored set PCRE2_ENDANCHORED | endanchored set PCRE2_ENDANCHORED | |||
| dfa_restart set PCRE2_DFA_RESTART | dfa_restart set PCRE2_DFA_RESTART | |||
| dfa_shortest set PCRE2_DFA_SHORTEST | dfa_shortest set PCRE2_DFA_SHORTEST | |||
| disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK | disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK | |||
| no_jit set PCRE2_NO_JIT | no_jit set PCRE2_NO_JIT | |||
| no_utf_check set PCRE2_NO_UTF_CHECK | no_utf_check set PCRE2_NO_UTF_CHECK | |||
| notbol set PCRE2_NOTBOL | notbol set PCRE2_NOTBOL | |||
| notempty set PCRE2_NOTEMPTY | notempty set PCRE2_NOTEMPTY | |||
| notempty_atstart set PCRE2_NOTEMPTY_ATSTART | notempty_atstart set PCRE2_NOTEMPTY_ATSTART | |||
| noteol set PCRE2_NOTEOL | noteol set PCRE2_NOTEOL | |||
| partial_hard (or ph) set PCRE2_PARTIAL_HARD | partial_hard (or ph) set PCRE2_PARTIAL_HARD | |||
| partial_soft (or ps) set PCRE2_PARTIAL_SOFT | partial_soft (or ps) set PCRE2_PARTIAL_SOFT | |||
| The partial matching modifiers are provided with abbreviations beca use | The partial matching modifiers are provided with abbreviations beca use | |||
| they appear frequently in tests. | they appear frequently in tests. | |||
| If the posix or posix_nosub modifier was present on the pattern, ca us- | If the posix or posix_nosub modifier was present on the pattern, ca us- | |||
| ing the POSIX wrapper API to be used, the only option-setting modifi ers | ing the POSIX wrapper API to be used, the only option-setting modifi ers | |||
| that have any effect are notbol, notempty, and noteol, causing REG_N OT- | that have any effect are notbol, notempty, and noteol, causing REG_N OT- | |||
| BOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to | BOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to | |||
| regexec(). The other modifiers are ignored, with a warning message. | regexec(). The other modifiers are ignored, with a warning message. | |||
| There is one additional modifier that can be used with the POSIX wr ap- | There is one additional modifier that can be used with the POSIX wr ap- | |||
| per. It is ignored (with a warning) if used for non-POSIX matching. | per. It is ignored (with a warning) if used for non-POSIX matching. | |||
| posix_startend=<n>[:<m>] | posix_startend=<n>[:<m>] | |||
| This causes the subject string to be passed to regexec() using | This causes the subject string to be passed to regexec() using | |||
| the | the | |||
| REG_STARTEND option, which uses offsets to specify which part of | REG_STARTEND option, which uses offsets to specify which part of | |||
| the | the | |||
| string is searched. If only one number is given, the end offset | string is searched. If only one number is given, the end offset | |||
| is | is | |||
| passed as the end of the subject string. For more detail of REG_ST | passed as the end of the subject string. For more detail of REG_ST | |||
| AR- | AR- | |||
| TEND, see the pcre2posix documentation. If the subject string conta | TEND, see the pcre2posix documentation. If the subject string conta | |||
| ins | ins | |||
| binary zeros (coded as escapes such as \x{00} because pcre2test d | binary zeros (coded as escapes such as \x{00} because pcre2test d | |||
| oes | oes | |||
| not support actual binary zeros in its input), you must use posix_st ar- | not support actual binary zeros in its input), you must use posix_st ar- | |||
| tend to specify its length. | tend to specify its length. | |||
| Setting match controls | Setting match controls | |||
| The following modifiers affect the matching process or request ad | The following modifiers affect the matching process or request ad | |||
| di- | di- | |||
| tional information. Some of them may also be specified on a patt | tional information. Some of them may also be specified on a patt | |||
| ern | ern | |||
| line (see above), in which case they apply to every subject line t | line (see above), in which case they apply to every subject line t | |||
| hat | hat | |||
| is matched against that pattern, but can be overridden by modifiers | is matched against that pattern, but can be overridden by modifiers | |||
| on | on | |||
| the subject. | the subject. | |||
| aftertext show text after match | aftertext show text after match | |||
| allaftertext show text after captures | allaftertext show text after captures | |||
| allcaptures show all captures | allcaptures show all captures | |||
| allvector show the entire ovector | allvector show the entire ovector | |||
| allusedtext show all consulted text (non-JIT on ly) | allusedtext show all consulted text (non-JIT on ly) | |||
| altglobal alternative global matching | altglobal alternative global matching | |||
| callout_capture show captures at callout time | callout_capture show captures at callout time | |||
| callout_data=<n> set a value to pass via callouts | callout_data=<n> set a value to pass via callouts | |||
| skipping to change at line 1165 | skipping to change at line 1174 | |||
| substitute_matched use PCRE2_SUBSTITUTE_MATCHED | substitute_matched use PCRE2_SUBSTITUTE_MATCHED | |||
| substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGT H | substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGT H | |||
| substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_O NLY | |||
| substitute_skip=<n> skip substitution number n | substitute_skip=<n> skip substitution number n | |||
| substitute_stop=<n> skip substitution number n and grea ter | substitute_stop=<n> skip substitution number n and grea ter | |||
| substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET | |||
| substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY | |||
| zero_terminate pass the subject as zero-terminated | zero_terminate pass the subject as zero-terminated | |||
| The effects of these modifiers are described in the following sectio ns. | The effects of these modifiers are described in the following sectio ns. | |||
| When matching via the POSIX wrapper API, the aftertext, allafterte | When matching via the POSIX wrapper API, the aftertext, allafterte | |||
| xt, | xt, | |||
| and ovector subject modifiers work as described below. All other mo | and ovector subject modifiers work as described below. All other mo | |||
| di- | di- | |||
| fiers are either ignored, with a warning message, or cause an error. | fiers are either ignored, with a warning message, or cause an error. | |||
| Showing more text | Showing more text | |||
| The aftertext modifier requests that as well as outputting the part of | The aftertext modifier requests that as well as outputting the part of | |||
| the subject string that matched the entire pattern, pcre2test should in | the subject string that matched the entire pattern, pcre2test should in | |||
| addition output the remainder of the subject string. This is useful for | addition output the remainder of the subject string. This is useful for | |||
| tests where the subject contains multiple copies of the same substri ng. | tests where the subject contains multiple copies of the same substri ng. | |||
| The allaftertext modifier requests the same action for captured s ub- | The allaftertext modifier requests the same action for captured s ub- | |||
| strings as well as the main matched substring. In each case the rema in- | strings as well as the main matched substring. In each case the rema in- | |||
| der is output on the following line with a plus character following the | der is output on the following line with a plus character following the | |||
| capture number. | capture number. | |||
| The allusedtext modifier requests that all the text that was consul | The allusedtext modifier requests that all the text that was consul | |||
| ted | ted | |||
| during a successful pattern match by the interpreter should be sho | during a successful pattern match by the interpreter should be sho | |||
| wn, | wn, | |||
| for both full and partial matches. This feature is not supported | for both full and partial matches. This feature is not supported | |||
| for | for | |||
| JIT matching, and if requested with JIT it is ignored (with a warn | JIT matching, and if requested with JIT it is ignored (with a warn | |||
| ing | ing | |||
| message). Setting this modifier affects the output if there is a lo | message). Setting this modifier affects the output if there is a lo | |||
| ok- | ok- | |||
| behind at the start of a match, or, for a complete match, a lookah | behind at the start of a match, or, for a complete match, a lookah | |||
| ead | ead | |||
| at the end, or if \K is used in the pattern. Characters that precede or | at the end, or if \K is used in the pattern. Characters that precede or | |||
| follow the start and end of the actual match are indicated in the o ut- | follow the start and end of the actual match are indicated in the o ut- | |||
| put by '<' or '>' characters underneath them. Here is an example: | put by '<' or '>' characters underneath them. Here is an example: | |||
| re> /(?<=pqr)abc(?=xyz)/ | re> /(?<=pqr)abc(?=xyz)/ | |||
| data> 123pqrabcxyz456\=allusedtext | data> 123pqrabcxyz456\=allusedtext | |||
| 0: pqrabcxyz | 0: pqrabcxyz | |||
| <<< >>> | <<< >>> | |||
| data> 123pqrabcxy\=ph,allusedtext | data> 123pqrabcxy\=ph,allusedtext | |||
| Partial match: pqrabcxy | Partial match: pqrabcxy | |||
| <<< | <<< | |||
| The first, complete match shows that the matched string is "abc", w | The first, complete match shows that the matched string is "abc", w | |||
| ith | ith | |||
| the preceding and following strings "pqr" and "xyz" having been c | the preceding and following strings "pqr" and "xyz" having been c | |||
| on- | on- | |||
| sulted during the match (when processing the assertions). The part | sulted during the match (when processing the assertions). The part | |||
| ial | ial | |||
| match can indicate only the preceding string. | match can indicate only the preceding string. | |||
| The startchar modifier requests that the starting character for | The startchar modifier requests that the starting character for | |||
| the | the | |||
| match be indicated, if it is different to the start of the matc | match be indicated, if it is different to the start of the matc | |||
| hed | hed | |||
| string. The only time when this occurs is when \K has been processed as | string. The only time when this occurs is when \K has been processed as | |||
| part of the match. In this situation, the output for the matched str ing | part of the match. In this situation, the output for the matched str ing | |||
| is displayed from the starting character instead of from the ma tch | is displayed from the starting character instead of from the ma tch | |||
| point, with circumflex characters under the earlier characters. For ex- | point, with circumflex characters under the earlier characters. For ex- | |||
| ample: | ample: | |||
| re> /abc\Kxyz/ | re> /abc\Kxyz/ | |||
| data> abcxyz\=startchar | data> abcxyz\=startchar | |||
| 0: abcxyz | 0: abcxyz | |||
| ^^^ | ^^^ | |||
| Unlike allusedtext, the startchar modifier can be used with JIT. H ow- | Unlike allusedtext, the startchar modifier can be used with JIT. H ow- | |||
| ever, these two modifiers are mutually exclusive. | ever, these two modifiers are mutually exclusive. | |||
| Showing the value of all capture groups | Showing the value of all capture groups | |||
| The allcaptures modifier requests that the values of all potential c ap- | The allcaptures modifier requests that the values of all potential c ap- | |||
| tured parentheses be output after a match. By default, only those up to | tured parentheses be output after a match. By default, only those up to | |||
| the highest one actually used in the match are output (corresponding to | the highest one actually used in the match are output (corresponding to | |||
| the return code from pcre2_match()). Groups that did not take part | the return code from pcre2_match()). Groups that did not take part | |||
| in | in | |||
| the match are output as "<unset>". This modifier is not relevant | the match are output as "<unset>". This modifier is not relevant | |||
| for | for | |||
| DFA matching (which does no capturing) and does not apply when repl | DFA matching (which does no capturing) and does not apply when repl | |||
| ace | ace | |||
| is specified; it is ignored, with a warning message, if present. | is specified; it is ignored, with a warning message, if present. | |||
| Showing the entire ovector, for all outcomes | Showing the entire ovector, for all outcomes | |||
| The allvector modifier requests that the entire ovector be shown, wh at- | The allvector modifier requests that the entire ovector be shown, wh at- | |||
| ever the outcome of the match. Compare allcaptures, which shows only up | ever the outcome of the match. Compare allcaptures, which shows only up | |||
| to the maximum number of capture groups for the pattern, and then o | to the maximum number of capture groups for the pattern, and then o | |||
| nly | nly | |||
| for a successful complete non-DFA match. This modifier, which acts | for a successful complete non-DFA match. This modifier, which acts | |||
| af- | af- | |||
| ter any match result, and also for DFA matching, provides a means | ter any match result, and also for DFA matching, provides a means | |||
| of | of | |||
| checking that there are no unexpected modifications to ovector fiel | checking that there are no unexpected modifications to ovector fiel | |||
| ds. | ds. | |||
| Before each match attempt, the ovector is filled with a special val | Before each match attempt, the ovector is filled with a special val | |||
| ue, | ue, | |||
| and if this is found in both elements of a capturing pair, "< | and if this is found in both elements of a capturing pair, "< | |||
| un- | un- | |||
| changed>" is output. After a successful match, this applies to | changed>" is output. After a successful match, this applies to | |||
| all | all | |||
| groups after the maximum capture group for the pattern. In other ca | groups after the maximum capture group for the pattern. In other ca | |||
| ses | ses | |||
| it applies to the entire ovector. After a partial match, the first | it applies to the entire ovector. After a partial match, the first | |||
| two | two | |||
| elements are the only ones that should be set. After a DFA match, | elements are the only ones that should be set. After a DFA match, | |||
| the | the | |||
| amount of ovector that is used depends on the number of matches t | amount of ovector that is used depends on the number of matches t | |||
| hat | hat | |||
| were found. | were found. | |||
| Testing pattern callouts | Testing pattern callouts | |||
| A callout function is supplied when pcre2test calls the library mat | A callout function is supplied when pcre2test calls the library mat | |||
| ch- | ch- | |||
| ing functions, unless callout_none is specified. Its behaviour can | ing functions, unless callout_none is specified. Its behaviour can | |||
| be | be | |||
| controlled by various modifiers listed above whose names begin w | controlled by various modifiers listed above whose names begin w | |||
| ith | ith | |||
| callout_. Details are given in the section entitled "Callouts" bel | callout_. Details are given in the section entitled "Callouts" bel | |||
| ow. | ow. | |||
| Testing callouts from pcre2_substitute() is described separately | Testing callouts from pcre2_substitute() is described separately | |||
| in | in | |||
| "Testing the substitution function" below. | "Testing the substitution function" below. | |||
| Finding all matches in a string | Finding all matches in a string | |||
| Searching for all possible matches within a subject can be requested by | Searching for all possible matches within a subject can be requested by | |||
| the global or altglobal modifier. After finding a match, the match | the global or altglobal modifier. After finding a match, the match | |||
| ing | ing | |||
| function is called again to search the remainder of the subject. | function is called again to search the remainder of the subject. | |||
| The | The | |||
| difference between global and altglobal is that the former uses | difference between global and altglobal is that the former uses | |||
| the | the | |||
| start_offset argument to pcre2_match() or pcre2_dfa_match() to st | start_offset argument to pcre2_match() or pcre2_dfa_match() to st | |||
| art | art | |||
| searching at a new point within the entire string (which is what P | searching at a new point within the entire string (which is what P | |||
| erl | erl | |||
| does), whereas the latter passes over a shortened subject. This make s a | does), whereas the latter passes over a shortened subject. This make s a | |||
| difference to the matching process if the pattern begins with a look be- | difference to the matching process if the pattern begins with a look be- | |||
| hind assertion (including \b or \B). | hind assertion (including \b or \B). | |||
| If an empty string is matched, the next match is done with the | If an empty string is matched, the next match is done with the | |||
| PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to sea rch | PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to sea rch | |||
| for another, non-empty, match at the same point in the subject. If t his | for another, non-empty, match at the same point in the subject. If t his | |||
| match fails, the start offset is advanced, and the normal match is | match fails, the start offset is advanced, and the normal match is | |||
| re- | re- | |||
| tried. This imitates the way Perl handles such cases when using the | tried. This imitates the way Perl handles such cases when using the | |||
| /g | /g | |||
| modifier or the split() function. Normally, the start offset is | modifier or the split() function. Normally, the start offset is | |||
| ad- | ad- | |||
| vanced by one character, but if the newline convention recognizes C | vanced by one character, but if the newline convention recognizes C | |||
| RLF | RLF | |||
| as a newline, and the current character is CR followed by LF, an | as a newline, and the current character is CR followed by LF, an | |||
| ad- | ad- | |||
| vance of two characters occurs. | vance of two characters occurs. | |||
| Testing substring extraction functions | Testing substring extraction functions | |||
| The copy and get modifiers can be used to test the pcre2_s ub- | The copy and get modifiers can be used to test the pcre2_s ub- | |||
| string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be | string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be | |||
| given more than once, and each can specify a capture group name or n um- | given more than once, and each can specify a capture group name or n um- | |||
| ber, for example: | ber, for example: | |||
| abcd\=copy=1,copy=3,get=G1 | abcd\=copy=1,copy=3,get=G1 | |||
| If the #subject command is used to set default copy and/or get lis | If the #subject command is used to set default copy and/or get lis | |||
| ts, | ts, | |||
| these can be unset by specifying a negative number to cancel all n | these can be unset by specifying a negative number to cancel all n | |||
| um- | um- | |||
| bered groups and an empty name to cancel all named groups. | bered groups and an empty name to cancel all named groups. | |||
| The getall modifier tests pcre2_substring_list_get(), which extra cts | The getall modifier tests pcre2_substring_list_get(), which extra cts | |||
| all captured substrings. | all captured substrings. | |||
| If the subject line is successfully matched, the substrings extrac | If the subject line is successfully matched, the substrings extrac | |||
| ted | ted | |||
| by the convenience functions are output with C, G, or L after | by the convenience functions are output with C, G, or L after | |||
| the | the | |||
| string number instead of a colon. This is in addition to the nor | string number instead of a colon. This is in addition to the nor | |||
| mal | mal | |||
| full list. The string length (that is, the return from the extract | full list. The string length (that is, the return from the extract | |||
| ion | ion | |||
| function) is given in parentheses after each substring, followed by the | function) is given in parentheses after each substring, followed by the | |||
| name when the extraction was by name. | name when the extraction was by name. | |||
| Testing the substitution function | Testing the substitution function | |||
| If the replace modifier is set, the pcre2_substitute() function | If the replace modifier is set, the pcre2_substitute() function | |||
| is | is | |||
| called instead of one of the matching functions (or after one call | called instead of one of the matching functions (or after one call | |||
| of | of | |||
| pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note that | pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note that | |||
| re- | re- | |||
| placement strings cannot contain commas, because a comma signifies | placement strings cannot contain commas, because a comma signifies | |||
| the | the | |||
| end of a modifier. This is not thought to be an issue in a test p | end of a modifier. This is not thought to be an issue in a test p | |||
| ro- | ro- | |||
| gram. | gram. | |||
| Specifying a completely empty replacement string disables this mo | Specifying a completely empty replacement string disables this mo | |||
| di- | di- | |||
| fier. However, it is possible to specify an empty replacement by p | fier. However, it is possible to specify an empty replacement by p | |||
| ro- | ro- | |||
| viding a buffer length, as described below, for an otherwise empty | viding a buffer length, as described below, for an otherwise empty | |||
| re- | re- | |||
| placement. | placement. | |||
| Unlike subject strings, pcre2test does not process replacement stri | Unlike subject strings, pcre2test does not process replacement stri | |||
| ngs | ngs | |||
| for escape sequences. In UTF mode, a replacement string is checked | for escape sequences. In UTF mode, a replacement string is checked | |||
| to | to | |||
| see if it is a valid UTF-8 string. If so, it is correctly converted | see if it is a valid UTF-8 string. If so, it is correctly converted | |||
| to | to | |||
| a UTF string of the appropriate code unit width. If it is not a va | a UTF string of the appropriate code unit width. If it is not a va | |||
| lid | lid | |||
| UTF-8 string, the individual code units are copied directly. This p | UTF-8 string, the individual code units are copied directly. This p | |||
| ro- | ro- | |||
| vides a means of passing an invalid UTF-8 string for testing purpose s. | vides a means of passing an invalid UTF-8 string for testing purpose s. | |||
| The following modifiers set options (in additional to the normal ma tch | The following modifiers set options (in additional to the normal ma tch | |||
| options) for pcre2_substitute(): | options) for pcre2_substitute(): | |||
| global PCRE2_SUBSTITUTE_GLOBAL | global PCRE2_SUBSTITUTE_GLOBAL | |||
| substitute_extended PCRE2_SUBSTITUTE_EXTENDED | substitute_extended PCRE2_SUBSTITUTE_EXTENDED | |||
| substitute_literal PCRE2_SUBSTITUTE_LITERAL | substitute_literal PCRE2_SUBSTITUTE_LITERAL | |||
| substitute_matched PCRE2_SUBSTITUTE_MATCHED | substitute_matched PCRE2_SUBSTITUTE_MATCHED | |||
| substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH | substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH | |||
| substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY | substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY | |||
| substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET | substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET | |||
| substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY | substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY | |||
| See the pcre2api documentation for details of these options. | See the pcre2api documentation for details of these options. | |||
| After a successful substitution, the modified string is output, p | After a successful substitution, the modified string is output, p | |||
| re- | re- | |||
| ceded by the number of replacements. This may be zero if there were | ceded by the number of replacements. This may be zero if there were | |||
| no | no | |||
| matches. Here is a simple example of a substitution test: | matches. Here is a simple example of a substitution test: | |||
| /abc/replace=xxx | /abc/replace=xxx | |||
| =abc=abc= | =abc=abc= | |||
| 1: =xxx=abc= | 1: =xxx=abc= | |||
| =abc=abc=\=global | =abc=abc=\=global | |||
| 2: =xxx=xxx= | 2: =xxx=xxx= | |||
| Subject and replacement strings should be kept relatively short (fe | Subject and replacement strings should be kept relatively short (fe | |||
| wer | wer | |||
| than 256 characters) for substitution tests, as fixed-size buffers | than 256 characters) for substitution tests, as fixed-size buffers | |||
| are | are | |||
| used. To make it easy to test for buffer overflow, if the replacem | used. To make it easy to test for buffer overflow, if the replacem | |||
| ent | ent | |||
| string starts with a number in square brackets, that number is pas | string starts with a number in square brackets, that number is pas | |||
| sed | sed | |||
| to pcre2_substitute() as the size of the output buffer, with the | to pcre2_substitute() as the size of the output buffer, with the | |||
| re- | re- | |||
| placement string starting at the next character. Here is an exam | placement string starting at the next character. Here is an exam | |||
| ple | ple | |||
| that tests the edge case: | that tests the edge case: | |||
| /abc/ | /abc/ | |||
| 123abc123\=replace=[10]XYZ | 123abc123\=replace=[10]XYZ | |||
| 1: 123XYZ123 | 1: 123XYZ123 | |||
| 123abc123\=replace=[9]XYZ | 123abc123\=replace=[9]XYZ | |||
| Failed: error -47: no more memory | Failed: error -47: no more memory | |||
| The default action of pcre2_substitute() is to return PCRE2_ ER- | The default action of pcre2_substitute() is to return PCRE2_ ER- | |||
| ROR_NOMEMORY when the output buffer is too small. However, if | ROR_NOMEMORY when the output buffer is too small. However, if | |||
| the | the | |||
| PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the subs | PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the subs | |||
| ti- | ti- | |||
| tute_overflow_length modifier), pcre2_substitute() continues to go | tute_overflow_length modifier), pcre2_substitute() continues to go | |||
| through the motions of matching and substituting (but not doing | through the motions of matching and substituting (but not doing | |||
| any | any | |||
| callouts), in order to compute the size of buffer that is requir | callouts), in order to compute the size of buffer that is requir | |||
| ed. | ed. | |||
| When this happens, pcre2test shows the required buffer length (wh | When this happens, pcre2test shows the required buffer length (wh | |||
| ich | ich | |||
| includes space for the trailing zero) as part of the error message. For | includes space for the trailing zero) as part of the error message. For | |||
| example: | example: | |||
| /abc/substitute_overflow_length | /abc/substitute_overflow_length | |||
| 123abc123\=replace=[9]XYZ | 123abc123\=replace=[9]XYZ | |||
| Failed: error -47: no more memory: 10 code units are needed | Failed: error -47: no more memory: 10 code units are needed | |||
| A replacement string is ignored with POSIX and DFA matching. Specify ing | A replacement string is ignored with POSIX and DFA matching. Specify ing | |||
| partial matching provokes an error return ("bad option value") f rom | partial matching provokes an error return ("bad option value") f rom | |||
| pcre2_substitute(). | pcre2_substitute(). | |||
| Testing substitute callouts | Testing substitute callouts | |||
| If the substitute_callout modifier is set, a substitution callout fu nc- | If the substitute_callout modifier is set, a substitution callout fu nc- | |||
| tion is set up. The null_context modifier must not be set, because | tion is set up. The null_context modifier must not be set, because | |||
| the | the | |||
| address of the callout function is passed in a match context. When | address of the callout function is passed in a match context. When | |||
| the | the | |||
| callout function is called (after each substitution), details of | callout function is called (after each substitution), details of | |||
| the | the | |||
| input and output strings are output. For example: | input and output strings are output. For example: | |||
| /abc/g,replace=<$0>,substitute_callout | /abc/g,replace=<$0>,substitute_callout | |||
| abcdefabcpqr | abcdefabcpqr | |||
| 1(1) Old 0 3 "abc" New 0 5 "<abc>" | 1(1) Old 0 3 "abc" New 0 5 "<abc>" | |||
| 2(1) Old 6 9 "abc" New 8 13 "<abc>" | 2(1) Old 6 9 "abc" New 8 13 "<abc>" | |||
| 2: <abc>def<abc>pqr | 2: <abc>def<abc>pqr | |||
| The first number on each callout line is the count of matches. The | The first number on each callout line is the count of matches. The | |||
| parenthesized number is the number of pairs that are set in the ovec tor | parenthesized number is the number of pairs that are set in the ovec tor | |||
| (that is, one more than the number of capturing groups that were se t). | (that is, one more than the number of capturing groups that were se t). | |||
| Then are listed the offsets of the old substring, its contents, and the | Then are listed the offsets of the old substring, its contents, and the | |||
| same for the replacement. | same for the replacement. | |||
| By default, the substitution callout function returns zero, which | By default, the substitution callout function returns zero, which | |||
| ac- | ac- | |||
| cepts the replacement and causes matching to continue if /g was us | cepts the replacement and causes matching to continue if /g was us | |||
| ed. | ed. | |||
| Two further modifiers can be used to test other return values. If s | Two further modifiers can be used to test other return values. If s | |||
| ub- | ub- | |||
| stitute_skip is set to a value greater than zero the callout funct | stitute_skip is set to a value greater than zero the callout funct | |||
| ion | ion | |||
| returns +1 for the match of that number, and similarly substitute_s | returns +1 for the match of that number, and similarly substitute_s | |||
| top | top | |||
| returns -1. These cause the replacement to be rejected, and -1 cau | returns -1. These cause the replacement to be rejected, and -1 cau | |||
| ses | ses | |||
| no further matching to take place. If either of them are set, subs | no further matching to take place. If either of them are set, subs | |||
| ti- | ti- | |||
| tute_callout is assumed. For example: | tute_callout is assumed. For example: | |||
| /abc/g,replace=<$0>,substitute_skip=1 | /abc/g,replace=<$0>,substitute_skip=1 | |||
| abcdefabcpqr | abcdefabcpqr | |||
| 1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED" | 1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED" | |||
| 2(1) Old 6 9 "abc" New 6 11 "<abc>" | 2(1) Old 6 9 "abc" New 6 11 "<abc>" | |||
| 2: abcdef<abc>pqr | 2: abcdef<abc>pqr | |||
| abcdefabcpqr\=substitute_stop=1 | abcdefabcpqr\=substitute_stop=1 | |||
| 1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED" | 1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED" | |||
| 1: abcdefabcpqr | 1: abcdefabcpqr | |||
| If both are set for the same number, stop takes precedence. Only a s in- | If both are set for the same number, stop takes precedence. Only a s in- | |||
| gle skip or stop is supported, which is sufficient for testing that the | gle skip or stop is supported, which is sufficient for testing that the | |||
| feature works. | feature works. | |||
| Setting the JIT stack size | Setting the JIT stack size | |||
| The jitstack modifier provides a way of setting the maximum stack s | The jitstack modifier provides a way of setting the maximum stack s | |||
| ize | ize | |||
| that is used by the just-in-time optimization code. It is ignored | that is used by the just-in-time optimization code. It is ignored | |||
| if | if | |||
| JIT optimization is not being used. The value is a number of kibiby | JIT optimization is not being used. The value is a number of kibiby | |||
| tes | tes | |||
| (units of 1024 bytes). Setting zero reverts to the default of 32K | (units of 1024 bytes). Setting zero reverts to the default of 32K | |||
| iB. | iB. | |||
| Providing a stack that is larger than the default is necessary only for | Providing a stack that is larger than the default is necessary only for | |||
| very complicated patterns. If jitstack is set non-zero on a subj ect | very complicated patterns. If jitstack is set non-zero on a subj ect | |||
| line it overrides any value that was set on the pattern. | line it overrides any value that was set on the pattern. | |||
| Setting heap, match, and depth limits | Setting heap, match, and depth limits | |||
| The heap_limit, match_limit, and depth_limit modifiers set the app | The heap_limit, match_limit, and depth_limit modifiers set the app | |||
| ro- | ro- | |||
| priate limits in the match context. These values are ignored when | priate limits in the match context. These values are ignored when | |||
| the | the | |||
| find_limits or find_limits_noheap modifier is specified. | find_limits or find_limits_noheap modifier is specified. | |||
| Finding minimum limits | Finding minimum limits | |||
| If the find_limits modifier is present on a subject line, pcre2t | If the find_limits modifier is present on a subject line, pcre2t | |||
| est | est | |||
| calls the relevant matching function several times, setting differ | calls the relevant matching function several times, setting differ | |||
| ent | ent | |||
| values in the match context via pcre2_set_heap_limit | values in the match context via pcre2_set_heap_limit | |||
| (), | (), | |||
| pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds | pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds | |||
| the | the | |||
| smallest value for each parameter that allows the match to compl | smallest value for each parameter that allows the match to compl | |||
| ete | ete | |||
| without a "limit exceeded" error. The match itself may succeed or fa il. | without a "limit exceeded" error. The match itself may succeed or fa il. | |||
| An alternative modifier, find_limits_noheap, omits the heap limit. T his | An alternative modifier, find_limits_noheap, omits the heap limit. T his | |||
| is used in the standard tests, because the minimum heap limit var | is used in the standard tests, because the minimum heap limit var | |||
| ies | ies | |||
| between systems. If JIT is being used, only the match limit is re | between systems. If JIT is being used, only the match limit is re | |||
| le- | le- | |||
| vant, and the other two are automatically omitted. | vant, and the other two are automatically omitted. | |||
| When using this modifier, the pattern should not contain any limit s et- | When using this modifier, the pattern should not contain any limit s et- | |||
| tings such as (*LIMIT_MATCH=...) within it. If such a setting is | tings such as (*LIMIT_MATCH=...) within it. If such a setting is | |||
| present and is lower than the minimum matching value, the minimum va lue | present and is lower than the minimum matching value, the minimum va lue | |||
| cannot be found because pcre2_set_match_limit() etc. are only able to | cannot be found because pcre2_set_match_limit() etc. are only able to | |||
| reduce the value of an in-pattern limit; they cannot increase it. | reduce the value of an in-pattern limit; they cannot increase it. | |||
| For non-DFA matching, the minimum depth_limit number is a measure of | For non-DFA matching, the minimum depth_limit number is a measure of | |||
| how much nested backtracking happens (that is, how deeply the patter n's | how much nested backtracking happens (that is, how deeply the patter n's | |||
| tree is searched). In the case of DFA matching, depth_limit contr | tree is searched). In the case of DFA matching, depth_limit contr | |||
| ols | ols | |||
| the depth of recursive calls of the internal function that is used | the depth of recursive calls of the internal function that is used | |||
| for | for | |||
| handling pattern recursion, lookaround assertions, and atomic groups . | handling pattern recursion, lookaround assertions, and atomic groups . | |||
| For non-DFA matching, the match_limit number is a measure of the amo unt | For non-DFA matching, the match_limit number is a measure of the amo unt | |||
| of backtracking that takes place, and learning the minimum value can be | of backtracking that takes place, and learning the minimum value can be | |||
| instructive. For most simple matches, the number is quite small, | instructive. For most simple matches, the number is quite small, | |||
| but | but | |||
| for patterns with very large numbers of matching possibilities, it | for patterns with very large numbers of matching possibilities, it | |||
| can | can | |||
| become large very quickly with increasing length of subject string. | become large very quickly with increasing length of subject string. | |||
| In | In | |||
| the case of DFA matching, match_limit controls the total number | the case of DFA matching, match_limit controls the total number | |||
| of | of | |||
| calls, both recursive and non-recursive, to the internal matching fu nc- | calls, both recursive and non-recursive, to the internal matching fu nc- | |||
| tion, thus controlling the overall amount of computing resource that is | tion, thus controlling the overall amount of computing resource that is | |||
| used. | used. | |||
| For both kinds of matching, the heap_limit number, which is | For both kinds of matching, the heap_limit number, which is | |||
| in | in | |||
| kibibytes (units of 1024 bytes), limits the amount of heap memory u | kibibytes (units of 1024 bytes), limits the amount of heap memory u | |||
| sed | sed | |||
| for matching. | for matching. | |||
| Showing MARK names | Showing MARK names | |||
| The mark modifier causes the names from backtracking control verbs t hat | The mark modifier causes the names from backtracking control verbs t hat | |||
| are returned from calls to pcre2_match() to be displayed. If a mark | are returned from calls to pcre2_match() to be displayed. If a mark | |||
| is | is | |||
| returned for a match, non-match, or partial match, pcre2test shows | returned for a match, non-match, or partial match, pcre2test shows | |||
| it. | it. | |||
| For a match, it is on a line by itself, tagged with "MK:". Otherwi | For a match, it is on a line by itself, tagged with "MK:". Otherwi | |||
| se, | se, | |||
| it is added to the non-match message. | it is added to the non-match message. | |||
| Showing memory usage | Showing memory usage | |||
| The memory modifier causes pcre2test to log the sizes of all heap m | The memory modifier causes pcre2test to log the sizes of all heap m | |||
| em- | em- | |||
| ory allocation and freeing calls that occur during a call | ory allocation and freeing calls that occur during a call | |||
| to | to | |||
| pcre2_match() or pcre2_dfa_match(). In the latter case, heap memory | pcre2_match() or pcre2_dfa_match(). In the latter case, heap memory | |||
| is | is | |||
| used only when a match requires more internal workspace that the | used only when a match requires more internal workspace that the | |||
| de- | de- | |||
| fault allocation on the stack, so in many cases there will be no o | fault allocation on the stack, so in many cases there will be no o | |||
| ut- | ut- | |||
| put. No heap memory is allocated during matching with JIT. For t | put. No heap memory is allocated during matching with JIT. For t | |||
| his | his | |||
| modifier to work, the null_context modifier must not be set on both the | modifier to work, the null_context modifier must not be set on both the | |||
| pattern and the subject, though it can be set on one or the other. | pattern and the subject, though it can be set on one or the other. | |||
| Showing the heap frame overall vector size | Showing the heap frame overall vector size | |||
| The heapframes_size modifier is relevant for matches us ing | The heapframes_size modifier is relevant for matches us ing | |||
| pcre2_match() without JIT. After a match has run (whether successful or | pcre2_match() without JIT. After a match has run (whether successful or | |||
| not) the size, in bytes, of the allocated heap frames vector that | not) the size, in bytes, of the allocated heap frames vector that | |||
| is | is | |||
| left attached to the match data block is shown. If the matching act | left attached to the match data block is shown. If the matching act | |||
| ion | ion | |||
| involved several calls to pcre2_match() (for example, global match | involved several calls to pcre2_match() (for example, global match | |||
| ing | ing | |||
| or for timing) only the final value is shown. | or for timing) only the final value is shown. | |||
| This modifier is ignored, with a warning, for POSIX or DFA matchi ng. | This modifier is ignored, with a warning, for POSIX or DFA matchi ng. | |||
| JIT matching does not use the heap frames vector, so the size is alw ays | JIT matching does not use the heap frames vector, so the size is alw ays | |||
| zero, unless there was a previous non-JIT match. Note that specifing a | zero, unless there was a previous non-JIT match. Note that specifin g a | |||
| size of zero for the output vector (see below) causes pcre2test to f ree | size of zero for the output vector (see below) causes pcre2test to f ree | |||
| its match data block (and associated heap frames vector) and allocat e a | its match data block (and associated heap frames vector) and allocat e a | |||
| new one. | new one. | |||
| Setting a starting offset | Setting a starting offset | |||
| The offset modifier sets an offset in the subject string at wh ich | The offset modifier sets an offset in the subject string at wh ich | |||
| matching starts. Its value is a number of code units, not characters . | matching starts. Its value is a number of code units, not characters . | |||
| Setting an offset limit | Setting an offset limit | |||
| The offset_limit modifier sets a limit for unanchored matches. If a | The offset_limit modifier sets a limit for unanchored matches. I f a | |||
| match cannot be found starting at or before this offset in the subje ct, | match cannot be found starting at or before this offset in the subje ct, | |||
| a "no match" return is given. The data value is a number of code uni ts, | a "no match" return is given. The data value is a number of code uni ts, | |||
| not characters. When this modifier is used, the use_offset_limit mo di- | not characters. When this modifier is used, the use_offset_limit mo di- | |||
| fier must have been set for the pattern; if not, an error is generat ed. | fier must have been set for the pattern; if not, an error is generat ed. | |||
| Setting the size of the output vector | Setting the size of the output vector | |||
| The ovector modifier applies only to the subject line in which it ap- | The ovector modifier applies only to the subject line in which it ap- | |||
| pears, though of course it can also be used to set a default in a #s ub- | pears, though of course it can also be used to set a default in a #s ub- | |||
| ject command. It specifies the number of pairs of offsets that are | ject command. It specifies the number of pairs of offsets that are | |||
| available for storing matching information. The default is 15. | available for storing matching information. The default is 15. | |||
| A value of zero is useful when testing the POSIX API because it cau ses | A value of zero is useful when testing the POSIX API because it cau ses | |||
| regexec() to be called with a NULL capture vector. When not testing the | regexec() to be called with a NULL capture vector. When not testing the | |||
| POSIX API, a value of zero is used to cause pcre2_match_data_c | POSIX API, a value of zero is used to cause pcre2_match_data_c | |||
| re- | re- | |||
| ate_from_pattern() to be called, in order to create a new match bl | ate_from_pattern() to be called, in order to create a new match bl | |||
| ock | ock | |||
| of exactly the right size for the pattern. (It is not possible to c | of exactly the right size for the pattern. (It is not possible to c | |||
| re- | re- | |||
| ate a match block with a zero-length ovector; there is always at le | ate a match block with a zero-length ovector; there is always at le | |||
| ast | ast | |||
| one pair of offsets.) The old match data block is freed. | one pair of offsets.) The old match data block is freed. | |||
| Passing the subject as zero-terminated | Passing the subject as zero-terminated | |||
| By default, the subject string is passed to a native API matching fu nc- | By default, the subject string is passed to a native API matching fu nc- | |||
| tion with its correct length. In order to test the facility for pass ing | tion with its correct length. In order to test the facility for pass ing | |||
| a zero-terminated string, the zero_terminate modifier is provided. | a zero-terminated string, the zero_terminate modifier is provided. | |||
| It | It | |||
| causes the length to be passed as PCRE2_ZERO_TERMINATED. When match | causes the length to be passed as PCRE2_ZERO_TERMINATED. When match | |||
| ing | ing | |||
| via the POSIX interface, this modifier is ignored, with a warning. | via the POSIX interface, this modifier is ignored, with a warning. | |||
| When testing pcre2_substitute(), this modifier also has the effect of | When testing pcre2_substitute(), this modifier also has the effect of | |||
| passing the replacement string as zero-terminated. | passing the replacement string as zero-terminated. | |||
| Passing a NULL context, subject, or replacement | Passing a NULL context, subject, or replacement | |||
| Normally, pcre2test passes a context block to pcre2_match | Normally, pcre2test passes a context block to pcre2_match | |||
| (), | (), | |||
| pcre2_dfa_match(), pcre2_jit_match() or pcre2_substitute(). If | pcre2_dfa_match(), pcre2_jit_match() or pcre2_substitute(). If | |||
| the | the | |||
| null_context modifier is set, however, NULL is passed. This is | null_context modifier is set, however, NULL is passed. This is | |||
| for | for | |||
| testing that the matching and substitution functions behave correc | testing that the matching and substitution functions behave correc | |||
| tly | tly | |||
| in this case (they use default values). This modifier cannot be u | in this case (they use default values). This modifier cannot be u | |||
| sed | sed | |||
| with the find_limits, find_limits_noheap, or substitute_callout mo | with the find_limits, find_limits_noheap, or substitute_callout mo | |||
| di- | di- | |||
| fiers. | fiers. | |||
| Similarly, for testing purposes, if the null_subject or null_repla | Similarly, for testing purposes, if the null_subject or null_repla | |||
| ce- | ce- | |||
| ment modifier is set, the subject or replacement string pointers | ment modifier is set, the subject or replacement string pointers | |||
| are | are | |||
| passed as NULL, respectively, to the relevant functions. | passed as NULL, respectively, to the relevant functions. | |||
| THE ALTERNATIVE MATCHING FUNCTION | THE ALTERNATIVE MATCHING FUNCTION | |||
| By default, pcre2test uses the standard PCRE2 matching functi on, | By default, pcre2test uses the standard PCRE2 matching functi on, | |||
| pcre2_match() to match each subject line. PCRE2 also supports an alt er- | pcre2_match() to match each subject line. PCRE2 also supports an alt er- | |||
| native matching function, pcre2_dfa_match(), which operates in a d | native matching function, pcre2_dfa_match(), which operates in a d | |||
| if- | if- | |||
| ferent way, and has some restrictions. The differences between the | ferent way, and has some restrictions. The differences between the | |||
| two | two | |||
| functions are described in the pcre2matching documentation. | functions are described in the pcre2matching documentation. | |||
| If the dfa modifier is set, the alternative matching function is us | If the dfa modifier is set, the alternative matching function is us | |||
| ed. | ed. | |||
| This function finds all possible matches at a given point in the s | This function finds all possible matches at a given point in the s | |||
| ub- | ub- | |||
| ject. If, however, the dfa_shortest modifier is set, processing st | ject. If, however, the dfa_shortest modifier is set, processing st | |||
| ops | ops | |||
| after the first match is found. This is always the shortest possi | after the first match is found. This is always the shortest possi | |||
| ble | ble | |||
| match. | match. | |||
| DEFAULT OUTPUT FROM pcre2test | DEFAULT OUTPUT FROM pcre2test | |||
| This section describes the output when the normal matching functi on, | This section describes the output when the normal matching functi on, | |||
| pcre2_match(), is being used. | pcre2_match(), is being used. | |||
| When a match succeeds, pcre2test outputs the list of captured s | When a match succeeds, pcre2test outputs the list of captured s | |||
| ub- | ub- | |||
| strings, starting with number 0 for the string that matched the wh | strings, starting with number 0 for the string that matched the wh | |||
| ole | ole | |||
| pattern. Otherwise, it outputs "No match" when the return is PCRE2_ ER- | pattern. Otherwise, it outputs "No match" when the return is PCRE2_ ER- | |||
| ROR_NOMATCH, or "Partial match:" followed by the partially match | ROR_NOMATCH, or "Partial match:" followed by the partially match | |||
| ing | ing | |||
| substring when the return is PCRE2_ERROR_PARTIAL. (Note that this | substring when the return is PCRE2_ERROR_PARTIAL. (Note that this | |||
| is | is | |||
| the entire substring that was inspected during the partial match; | the entire substring that was inspected during the partial match; | |||
| it | it | |||
| may include characters before the actual match start if a lookbeh | may include characters before the actual match start if a lookbeh | |||
| ind | ind | |||
| assertion, \K, \b, or \B was involved.) | assertion, \K, \b, or \B was involved.) | |||
| For any other return, pcre2test outputs the PCRE2 negative error num ber | For any other return, pcre2test outputs the PCRE2 negative error num ber | |||
| and a short descriptive phrase. If the error is a failed UTF str | and a short descriptive phrase. If the error is a failed UTF str | |||
| ing | ing | |||
| check, the code unit offset of the start of the failing character | check, the code unit offset of the start of the failing character | |||
| is | is | |||
| also output. Here is an example of an interactive pcre2test run. | also output. Here is an example of an interactive pcre2test run. | |||
| $ pcre2test | $ pcre2test | |||
| PCRE2 version 10.22 2016-07-29 | PCRE2 version 10.22 2016-07-29 | |||
| re> /^abc(\d+)/ | re> /^abc(\d+)/ | |||
| data> abc123 | data> abc123 | |||
| 0: abc123 | 0: abc123 | |||
| 1: 123 | 1: 123 | |||
| data> xyz | data> xyz | |||
| No match | No match | |||
| Unset capturing substrings that are not followed by one that is set are | Unset capturing substrings that are not followed by one that is set are | |||
| not shown by pcre2test unless the allcaptures modifier is specified. In | not shown by pcre2test unless the allcaptures modifier is specified. In | |||
| the following example, there are two capturing substrings, but when the | the following example, there are two capturing substrings, but when the | |||
| first data line is matched, the second, unset substring is not sho | first data line is matched, the second, unset substring is not sho | |||
| wn. | wn. | |||
| An "internal" unset substring is shown as "<unset>", as for the sec | An "internal" unset substring is shown as "<unset>", as for the sec | |||
| ond | ond | |||
| data line. | data line. | |||
| re> /(a)|(b)/ | re> /(a)|(b)/ | |||
| data> a | data> a | |||
| 0: a | 0: a | |||
| 1: a | 1: a | |||
| data> b | data> b | |||
| 0: b | 0: b | |||
| 1: <unset> | 1: <unset> | |||
| 2: b | 2: b | |||
| If the strings contain any non-printing characters, they are output | If the strings contain any non-printing characters, they are output | |||
| as | as | |||
| \xhh escapes if the value is less than 256 and UTF mode is not s | \xhh escapes if the value is less than 256 and UTF mode is not s | |||
| et. | et. | |||
| Otherwise they are output as \x{hh...} escapes. See below for the de fi- | Otherwise they are output as \x{hh...} escapes. See below for the de fi- | |||
| nition of non-printing characters. If the aftertext modifier is s | nition of non-printing characters. If the aftertext modifier is s | |||
| et, | et, | |||
| the output for substring 0 is followed by the rest of the subj | the output for substring 0 is followed by the rest of the subj | |||
| ect | ect | |||
| string, identified by "0+" like this: | string, identified by "0+" like this: | |||
| re> /cat/aftertext | re> /cat/aftertext | |||
| data> cataract | data> cataract | |||
| 0: cat | 0: cat | |||
| 0+ aract | 0+ aract | |||
| If global matching is requested, the results of successive matching at- | If global matching is requested, the results of successive matching at- | |||
| tempts are output in sequence, like this: | tempts are output in sequence, like this: | |||
| re> /\Bi(\w\w)/g | re> /\Bi(\w\w)/g | |||
| data> Mississippi | data> Mississippi | |||
| 0: iss | 0: iss | |||
| 1: ss | 1: ss | |||
| 0: iss | 0: iss | |||
| 1: ss | 1: ss | |||
| 0: ipp | 0: ipp | |||
| 1: pp | 1: pp | |||
| "No match" is output only if the first match attempt fails. Here is | "No match" is output only if the first match attempt fails. Here is | |||
| an | an | |||
| example of a failure message (the offset 4 that is specified by | example of a failure message (the offset 4 that is specified by | |||
| the | the | |||
| offset modifier is past the end of the subject string): | offset modifier is past the end of the subject string): | |||
| re> /xyz/ | re> /xyz/ | |||
| data> xyz\=offset=4 | data> xyz\=offset=4 | |||
| Error -24 (bad offset value) | Error -24 (bad offset value) | |||
| Note that whereas patterns can be continued over several lines (a pl ain | Note that whereas patterns can be continued over several lines (a pl ain | |||
| ">" prompt is used for continuations), subject lines may not. Howe ver | ">" prompt is used for continuations), subject lines may not. Howe ver | |||
| newlines can be included in a subject by means of the \n escape (or \r, | newlines can be included in a subject by means of the \n escape (or \r, | |||
| \r\n, etc., depending on the newline sequence setting). | \r\n, etc., depending on the newline sequence setting). | |||
| OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION | OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION | |||
| When the alternative matching function, pcre2_dfa_match(), is used, the | When the alternative matching function, pcre2_dfa_match(), is used, the | |||
| output consists of a list of all the matches that start at the fi rst | output consists of a list of all the matches that start at the fi rst | |||
| point in the subject where there is at least one match. For example: | point in the subject where there is at least one match. For example: | |||
| re> /(tang|tangerine|tan)/ | re> /(tang|tangerine|tan)/ | |||
| data> yellow tangerine\=dfa | data> yellow tangerine\=dfa | |||
| 0: tangerine | 0: tangerine | |||
| 1: tang | 1: tang | |||
| 2: tan | 2: tan | |||
| Using the normal matching function on this data finds only "tang". | Using the normal matching function on this data finds only "tang". | |||
| The | The | |||
| longest matching string is always given first (and numbered zero). | longest matching string is always given first (and numbered zero). | |||
| Af- | Af- | |||
| ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", f | ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", f | |||
| ol- | ol- | |||
| lowed by the partially matching substring. Note that this is the ent ire | lowed by the partially matching substring. Note that this is the ent ire | |||
| substring that was inspected during the partial match; it may incl ude | substring that was inspected during the partial match; it may incl ude | |||
| characters before the actual match start if a lookbehind assertion, \b, | characters before the actual match start if a lookbehind assertion, \b, | |||
| or \B was involved. (\K is not supported for DFA matching.) | or \B was involved. (\K is not supported for DFA matching.) | |||
| If global matching is requested, the search for further matches resu mes | If global matching is requested, the search for further matches resu mes | |||
| at the end of the longest match. For example: | at the end of the longest match. For example: | |||
| re> /(tang|tangerine|tan)/g | re> /(tang|tangerine|tan)/g | |||
| data> yellow tangerine and tangy sultana\=dfa | data> yellow tangerine and tangy sultana\=dfa | |||
| 0: tangerine | 0: tangerine | |||
| 1: tang | 1: tang | |||
| 2: tan | 2: tan | |||
| 0: tang | 0: tang | |||
| 1: tan | 1: tan | |||
| 0: tan | 0: tan | |||
| The alternative matching function does not support substring captu | The alternative matching function does not support substring captu | |||
| re, | re, | |||
| so the modifiers that are concerned with captured substrings are | so the modifiers that are concerned with captured substrings are | |||
| not | not | |||
| relevant. | relevant. | |||
| RESTARTING AFTER A PARTIAL MATCH | RESTARTING AFTER A PARTIAL MATCH | |||
| When the alternative matching function has given the PCRE2_ERROR_P AR- | When the alternative matching function has given the PCRE2_ERROR_P AR- | |||
| TIAL return, indicating that the subject partially matched the patte rn, | TIAL return, indicating that the subject partially matched the patte rn, | |||
| you can restart the match with additional subject data by means of the | you can restart the match with additional subject data by means of the | |||
| dfa_restart modifier. For example: | dfa_restart modifier. For example: | |||
| re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d $/ | re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d $/ | |||
| data> 23ja\=ps,dfa | data> 23ja\=ps,dfa | |||
| Partial match: 23ja | Partial match: 23ja | |||
| data> n05\=dfa,dfa_restart | data> n05\=dfa,dfa_restart | |||
| 0: n05 | 0: n05 | |||
| For further information about partial matching, see the pcre2part ial | For further information about partial matching, see the pcre2part ial | |||
| documentation. | documentation. | |||
| CALLOUTS | CALLOUTS | |||
| If the pattern contains any callout requests, pcre2test's callout fu nc- | If the pattern contains any callout requests, pcre2test's callout fu nc- | |||
| tion is called during matching unless callout_none is specified. T his | tion is called during matching unless callout_none is specified. T his | |||
| works with both matching functions, and with JIT, though there are s ome | works with both matching functions, and with JIT, though there are s ome | |||
| differences in behaviour. The output for callouts with numerical ar gu- | differences in behaviour. The output for callouts with numerical ar gu- | |||
| ments and those with string arguments is slightly different. | ments and those with string arguments is slightly different. | |||
| Callouts with numerical arguments | Callouts with numerical arguments | |||
| By default, the callout function displays the callout number, the st art | By default, the callout function displays the callout number, the st art | |||
| and current positions in the subject text at the callout time, and the | and current positions in the subject text at the callout time, and the | |||
| next pattern item to be tested. For example: | next pattern item to be tested. For example: | |||
| --->pqrabcdef | --->pqrabcdef | |||
| 0 ^ ^ \d | 0 ^ ^ \d | |||
| This output indicates that callout number 0 occurred for a match | This output indicates that callout number 0 occurred for a match | |||
| at- | at- | |||
| tempt starting at the fourth character of the subject string, when | tempt starting at the fourth character of the subject string, when | |||
| the | the | |||
| pointer was at the seventh character, and when the next pattern i | pointer was at the seventh character, and when the next pattern i | |||
| tem | tem | |||
| was \d. Just one circumflex is output if the start and current po | was \d. Just one circumflex is output if the start and current po | |||
| si- | si- | |||
| tions are the same, or if the current position precedes the start po si- | tions are the same, or if the current position precedes the start po si- | |||
| tion, which can happen if the callout is in a lookbehind assertion. | tion, which can happen if the callout is in a lookbehind assertion. | |||
| Callouts numbered 255 are assumed to be automatic callouts, inserted as | Callouts numbered 255 are assumed to be automatic callouts, inserted as | |||
| a result of the auto_callout pattern modifier. In this case, instead of | a result of the auto_callout pattern modifier. In this case, instead of | |||
| showing the callout number, the offset in the pattern, preceded by a | showing the callout number, the offset in the pattern, preceded b y a | |||
| plus, is output. For example: | plus, is output. For example: | |||
| re> /\d?[A-E]\*/auto_callout | re> /\d?[A-E]\*/auto_callout | |||
| data> E* | data> E* | |||
| --->E* | --->E* | |||
| +0 ^ \d? | +0 ^ \d? | |||
| +3 ^ [A-E] | +3 ^ [A-E] | |||
| +8 ^^ \* | +8 ^^ \* | |||
| +10 ^ ^ | +10 ^ ^ | |||
| 0: E* | 0: E* | |||
| skipping to change at line 1763 | skipping to change at line 1772 | |||
| data> abc | data> abc | |||
| --->abc | --->abc | |||
| +0 ^ a | +0 ^ a | |||
| +1 ^^ (*MARK:X) | +1 ^^ (*MARK:X) | |||
| +10 ^^ b | +10 ^^ b | |||
| Latest Mark: X | Latest Mark: X | |||
| +11 ^ ^ c | +11 ^ ^ c | |||
| +12 ^ ^ | +12 ^ ^ | |||
| 0: abc | 0: abc | |||
| The mark changes between matching "a" and "b", but stays the same | The mark changes between matching "a" and "b", but stays the same | |||
| for | for | |||
| the rest of the match, so nothing more is output. If, as a result | the rest of the match, so nothing more is output. If, as a result | |||
| of | of | |||
| backtracking, the mark reverts to being unset, the text "<unset>" | backtracking, the mark reverts to being unset, the text "<unset>" | |||
| is | is | |||
| output. | output. | |||
| Callouts with string arguments | Callouts with string arguments | |||
| The output for a callout with a string argument is similar, except t hat | The output for a callout with a string argument is similar, except t hat | |||
| instead of outputting a callout number before the position indicato | instead of outputting a callout number before the position indicato | |||
| rs, | rs, | |||
| the callout string and its offset in the pattern string are output | the callout string and its offset in the pattern string are output | |||
| be- | be- | |||
| fore the reflection of the subject string, and the subject string | fore the reflection of the subject string, and the subject string | |||
| is | is | |||
| reflected for each callout. For example: | reflected for each callout. For example: | |||
| re> /^ab(?C'first')cd(?C"second")ef/ | re> /^ab(?C'first')cd(?C"second")ef/ | |||
| data> abcdefg | data> abcdefg | |||
| Callout (7): 'first' | Callout (7): 'first' | |||
| --->abcdefg | --->abcdefg | |||
| ^ ^ c | ^ ^ c | |||
| Callout (20): "second" | Callout (20): "second" | |||
| --->abcdefg | --->abcdefg | |||
| ^ ^ e | ^ ^ e | |||
| 0: abcdef | 0: abcdef | |||
| Callout modifiers | Callout modifiers | |||
| The callout function in pcre2test returns zero (carry on matching) | The callout function in pcre2test returns zero (carry on matching) | |||
| by | by | |||
| default, but you can use a callout_fail modifier in a subject line | default, but you can use a callout_fail modifier in a subject line | |||
| to | to | |||
| change this and other parameters of the callout (see below). | change this and other parameters of the callout (see below). | |||
| If the callout_capture modifier is set, the current captured groups are | If the callout_capture modifier is set, the current captured groups are | |||
| output when a callout occurs. This is useful only for non-DFA matchi ng, | output when a callout occurs. This is useful only for non-DFA matchi ng, | |||
| as pcre2_dfa_match() does not support capturing, so no captures are | as pcre2_dfa_match() does not support capturing, so no captures are | |||
| ever shown. | ever shown. | |||
| The normal callout output, showing the callout number or pattern off set | The normal callout output, showing the callout number or pattern off set | |||
| (as described above) is suppressed if the callout_no_where modifier is | (as described above) is suppressed if the callout_no_where modifier is | |||
| set. | set. | |||
| When using the interpretive matching function pcre2_match() with | When using the interpretive matching function pcre2_match() with | |||
| out | out | |||
| JIT, setting the callout_extra modifier causes additional output f | JIT, setting the callout_extra modifier causes additional output f | |||
| rom | rom | |||
| pcre2test's callout function to be generated. For the first callout | pcre2test's callout function to be generated. For the first callout | |||
| in | in | |||
| a match attempt at a new starting position in the subject, "New ma | a match attempt at a new starting position in the subject, "New ma | |||
| tch | tch | |||
| attempt" is output. If there has been a backtrack since the last ca | attempt" is output. If there has been a backtrack since the last ca | |||
| ll- | ll- | |||
| out (or start of matching if this is the first callout), "Backtrack" is | out (or start of matching if this is the first callout), "Backtrack" is | |||
| output, followed by "No other matching paths" if the backtrack en ded | output, followed by "No other matching paths" if the backtrack en ded | |||
| the previous match attempt. For example: | the previous match attempt. For example: | |||
| re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess | re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess | |||
| data> aac\=callout_extra | data> aac\=callout_extra | |||
| New match attempt | New match attempt | |||
| --->aac | --->aac | |||
| +0 ^ ( | +0 ^ ( | |||
| +1 ^ a+ | +1 ^ a+ | |||
| +3 ^ ^ ) | +3 ^ ^ ) | |||
| +4 ^ ^ b | +4 ^ ^ b | |||
| skipping to change at line 1844 | skipping to change at line 1853 | |||
| +0 ^ ( | +0 ^ ( | |||
| +1 ^ a+ | +1 ^ a+ | |||
| Backtrack | Backtrack | |||
| No other matching paths | No other matching paths | |||
| New match attempt | New match attempt | |||
| --->aac | --->aac | |||
| +0 ^ ( | +0 ^ ( | |||
| +1 ^ a+ | +1 ^ a+ | |||
| No match | No match | |||
| Notice that various optimizations must be turned off if you want | Notice that various optimizations must be turned off if you want | |||
| all | all | |||
| possible matching paths to be scanned. If no_start_optimize is | possible matching paths to be scanned. If no_start_optimize is | |||
| not | not | |||
| used, there is an immediate "no match", without any callouts, beca | used, there is an immediate "no match", without any callouts, beca | |||
| use | use | |||
| the starting optimization fails to find "b" in the subject, which | the starting optimization fails to find "b" in the subject, which | |||
| it | it | |||
| knows must be present for any match. If no_auto_possess is not us | knows must be present for any match. If no_auto_possess is not us | |||
| ed, | ed, | |||
| the "a+" item is turned into "a++", which reduces the number of ba | the "a+" item is turned into "a++", which reduces the number of ba | |||
| ck- | ck- | |||
| tracks. | tracks. | |||
| The callout_extra modifier has no effect if used with the DFA match ing | The callout_extra modifier has no effect if used with the DFA match ing | |||
| function, or with JIT. | function, or with JIT. | |||
| Return values from callouts | Return values from callouts | |||
| The default return from the callout function is zero, which all ows | The default return from the callout function is zero, which all ows | |||
| matching to continue. The callout_fail modifier can be given one or two | matching to continue. The callout_fail modifier can be given one or two | |||
| numbers. If there is only one number, 1 is returned instead of 0 (ca us- | numbers. If there is only one number, 1 is returned instead of 0 (ca us- | |||
| ing matching to backtrack) when a callout of that number is reached. If | ing matching to backtrack) when a callout of that number is reached. If | |||
| two numbers (<n>:<m>) are given, 1 is returned when callout <n> | two numbers (<n>:<m>) are given, 1 is returned when callout <n> | |||
| is | is | |||
| reached and there have been at least <m> callouts. The callout_er | reached and there have been at least <m> callouts. The callout_er | |||
| ror | ror | |||
| modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, ca us- | modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, ca us- | |||
| ing the entire matching process to be aborted. If both these modifi | ing the entire matching process to be aborted. If both these modifi | |||
| ers | ers | |||
| are set for the same callout number, callout_error takes preceden | are set for the same callout number, callout_error takes preceden | |||
| ce. | ce. | |||
| Note that callouts with string arguments are always given the num | Note that callouts with string arguments are always given the num | |||
| ber | ber | |||
| zero. | zero. | |||
| The callout_data modifier can be given an unsigned or a negative n | The callout_data modifier can be given an unsigned or a negative n | |||
| um- | um- | |||
| ber. This is set as the "user data" that is passed to the match | ber. This is set as the "user data" that is passed to the match | |||
| ing | ing | |||
| function, and passed back when the callout function is invoked. | function, and passed back when the callout function is invoked. | |||
| Any | Any | |||
| value other than zero is used as a return from pcre2test's call | value other than zero is used as a return from pcre2test's call | |||
| out | out | |||
| function. | function. | |||
| Inserting callouts can be helpful when using pcre2test to check comp li- | Inserting callouts can be helpful when using pcre2test to check comp li- | |||
| cated regular expressions. For further information about callouts, see | cated regular expressions. For further information about callouts, see | |||
| the pcre2callout documentation. | the pcre2callout documentation. | |||
| NON-PRINTING CHARACTERS | NON-PRINTING CHARACTERS | |||
| When pcre2test is outputting text in the compiled version of a patte rn, | When pcre2test is outputting text in the compiled version of a patte rn, | |||
| bytes other than 32-126 are always treated as non-printing charact ers | bytes other than 32-126 are always treated as non-printing charact ers | |||
| and are therefore shown as hex escapes. | and are therefore shown as hex escapes. | |||
| When pcre2test is outputting text that is a matched part of a subj | When pcre2test is outputting text that is a matched part of a subj | |||
| ect | ect | |||
| string, it behaves in the same way, unless a different locale has b | string, it behaves in the same way, unless a different locale has b | |||
| een | een | |||
| set for the pattern (using the locale modifier). In this case, the | set for the pattern (using the locale modifier). In this case, the | |||
| is- | is- | |||
| print() function is used to distinguish printing and non-printing ch ar- | print() function is used to distinguish printing and non-printing ch ar- | |||
| acters. | acters. | |||
| SAVING AND RESTORING COMPILED PATTERNS | SAVING AND RESTORING COMPILED PATTERNS | |||
| It is possible to save compiled patterns on disc or elsewhere, and | It is possible to save compiled patterns on disc or elsewhere, and | |||
| re- | re- | |||
| load them later, subject to a number of restrictions. JIT data can | load them later, subject to a number of restrictions. JIT data can | |||
| not | not | |||
| be saved. The host on which the patterns are reloaded must be runn | be saved. The host on which the patterns are reloaded must be runn | |||
| ing | ing | |||
| the same version of PCRE2, with the same code unit width, and must a lso | the same version of PCRE2, with the same code unit width, and must a lso | |||
| have the same endianness, pointer width and PCRE2_SIZE type. Bef | have the same endianness, pointer width and PCRE2_SIZE type. Bef | |||
| ore | ore | |||
| compiled patterns can be saved they must be serialized, that is, c | compiled patterns can be saved they must be serialized, that is, c | |||
| on- | on- | |||
| verted to a stream of bytes. A single byte stream may contain any n | verted to a stream of bytes. A single byte stream may contain any n | |||
| um- | um- | |||
| ber of compiled patterns, but they must all use the same character | ber of compiled patterns, but they must all use the same character | |||
| ta- | ta- | |||
| bles. A single copy of the tables is included in the byte stream ( | bles. A single copy of the tables is included in the byte stream ( | |||
| its | its | |||
| size is 1088 bytes). | size is 1088 bytes). | |||
| The functions whose names begin with pcre2_serialize_ are used for | The functions whose names begin with pcre2_serialize_ are used for | |||
| se- | se- | |||
| rializing and de-serializing. They are described in the pcre2serial | rializing and de-serializing. They are described in the pcre2serial | |||
| ize | ize | |||
| documentation. In this section we describe the features of pcre2t | documentation. In this section we describe the features of pcre2t | |||
| est | est | |||
| that can be used to test these functions. | that can be used to test these functions. | |||
| Note that "serialization" in PCRE2 does not convert compiled patte | Note that "serialization" in PCRE2 does not convert compiled patte | |||
| rns | rns | |||
| to an abstract format like Java or .NET. It just makes a reloada | to an abstract format like Java or .NET. It just makes a reloada | |||
| ble | ble | |||
| byte code stream. Hence the restrictions on reloading mentioned abo ve. | byte code stream. Hence the restrictions on reloading mentioned abo ve. | |||
| In pcre2test, when a pattern with push modifier is successfully c | In pcre2test, when a pattern with push modifier is successfully c | |||
| om- | om- | |||
| piled, it is pushed onto a stack of compiled patterns, and pcre2t | piled, it is pushed onto a stack of compiled patterns, and pcre2t | |||
| est | est | |||
| expects the next line to contain a new pattern (or command) instead | expects the next line to contain a new pattern (or command) instead | |||
| of | of | |||
| a subject line. By contrast, the pushcopy modifier causes a copy of the | a subject line. By contrast, the pushcopy modifier causes a copy of the | |||
| compiled pattern to be stacked, leaving the original available for | compiled pattern to be stacked, leaving the original available for | |||
| im- | im- | |||
| mediate matching. By using push and/or pushcopy, a number of patte | mediate matching. By using push and/or pushcopy, a number of patte | |||
| rns | rns | |||
| can be compiled and retained. These modifiers are incompatible w | can be compiled and retained. These modifiers are incompatible w | |||
| ith | ith | |||
| posix, and control modifiers that act at match time are ignored (wit h a | posix, and control modifiers that act at match time are ignored (wit h a | |||
| message) for the stacked patterns. The jitverify modifier applies o nly | message) for the stacked patterns. The jitverify modifier applies o nly | |||
| at compile time. | at compile time. | |||
| The command | The command | |||
| #save <filename> | #save <filename> | |||
| causes all the stacked patterns to be serialized and the result writ ten | causes all the stacked patterns to be serialized and the result writ ten | |||
| to the named file. Afterwards, all the stacked patterns are freed. The | to the named file. Afterwards, all the stacked patterns are freed. The | |||
| command | command | |||
| #load <filename> | #load <filename> | |||
| reads the data in the file, and then arranges for it to be de-seri | reads the data in the file, and then arranges for it to be de-seri | |||
| al- | al- | |||
| ized, with the resulting compiled patterns added to the pattern sta | ized, with the resulting compiled patterns added to the pattern sta | |||
| ck. | ck. | |||
| The pattern on the top of the stack can be retrieved by the #pop c | The pattern on the top of the stack can be retrieved by the #pop c | |||
| om- | om- | |||
| mand, which must be followed by lines of subjects that are to | mand, which must be followed by lines of subjects that are to | |||
| be | be | |||
| matched with the pattern, terminated as usual by an empty line or | matched with the pattern, terminated as usual by an empty line or | |||
| end | end | |||
| of file. This command may be followed by a modifier list contain | of file. This command may be followed by a modifier list contain | |||
| ing | ing | |||
| only control modifiers that act after a pattern has been compiled. | only control modifiers that act after a pattern has been compiled. | |||
| In | In | |||
| particular, hex, posix, posix_nosub, push, and pushcopy are not | particular, hex, posix, posix_nosub, push, and pushcopy are not | |||
| al- | al- | |||
| lowed, nor are any option-setting modifiers. The JIT modifiers a | lowed, nor are any option-setting modifiers. The JIT modifiers a | |||
| re, | re, | |||
| however permitted. Here is an example that saves and reloads two p | however permitted. Here is an example that saves and reloads two p | |||
| at- | at- | |||
| terns. | terns. | |||
| /abc/push | /abc/push | |||
| /xyz/push | /xyz/push | |||
| #save tempfile | #save tempfile | |||
| #load tempfile | #load tempfile | |||
| #pop info | #pop info | |||
| xyz | xyz | |||
| #pop jit,bincode | #pop jit,bincode | |||
| abc | abc | |||
| If jitverify is used with #pop, it does not automatically imply j it, | If jitverify is used with #pop, it does not automatically imply j it, | |||
| which is different behaviour from when it is used on a pattern. | which is different behaviour from when it is used on a pattern. | |||
| The #popcopy command is analogous to the pushcopy modifier in that it | The #popcopy command is analogous to the pushcopy modifier in that it | |||
| makes current a copy of the topmost stack pattern, leaving the origi nal | makes current a copy of the topmost stack pattern, leaving the origi nal | |||
| still on the stack. | still on the stack. | |||
| SEE ALSO | SEE ALSO | |||
| pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching( 3), | pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching( 3), | |||
| pcre2partial(d), pcre2pattern(3), pcre2serialize(3). | pcre2partial(d), pcre2pattern(3), pcre2serialize(3). | |||
| AUTHOR | AUTHOR | |||
| Philip Hazel | Philip Hazel | |||
| Retired from University Computing Service | Retired from University Computing Service | |||
| Cambridge, England. | Cambridge, England. | |||
| REVISION | REVISION | |||
| Last updated: 27 January 2024 | Last updated: 24 April 2024 | |||
| Copyright (c) 1997-2024 University of Cambridge. | Copyright (c) 1997-2024 University of Cambridge. | |||
| PCRE 10.43 27 January 2024 PCRE2TEST (1) | PCRE 10.44 24 April 2024 PCRE2TEST (1) | |||
| End of changes. 145 change blocks. | ||||
| 610 lines changed or deleted | 623 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||