pcre2jit.3 | pcre2jit.3 | |||
---|---|---|---|---|
skipping to change at line 37 | skipping to change at line 37 | |||
ARM 64-bit | ARM 64-bit | |||
IBM s390x 64 bit | IBM s390x 64 bit | |||
Intel x86 32-bit and 64-bit | Intel x86 32-bit and 64-bit | |||
LoongArch 64 bit | LoongArch 64 bit | |||
MIPS 32-bit and 64-bit | MIPS 32-bit and 64-bit | |||
Power PC 32-bit and 64-bit | Power PC 32-bit and 64-bit | |||
RISC-V 32-bit and 64-bit | RISC-V 32-bit and 64-bit | |||
If --enable-jit is set on an unsupported platform, compilation fails . | If --enable-jit is set on an unsupported platform, compilation fails . | |||
A client program can tell if JIT support is available by callin g pcre2_config() with the | A client program can tell if JIT support has been compiled by callin g pcre2_config() with the | |||
PCRE2_CONFIG_JIT option. The result is one if PCRE2 was built with JIT support, and zero other‐ | PCRE2_CONFIG_JIT option. The result is one if PCRE2 was built with JIT support, and zero other‐ | |||
wise. However, having the JIT code available does not guarantee that it will be used for any | wise. However, having the JIT code available does not guarantee that it will be used for any | |||
particular match. One reason for this is that there are a number of options and pattern items | particular match. One reason for this is that there are a number of options and pattern items | |||
that are not supported by JIT (see below). Another reason is that in some environments JIT is | that are not supported by JIT (see below). Another reason is that in some environments JIT is | |||
unable to get memory in which to build its compiled code. The only g | unable to get executable memory in which to build its compiled c | |||
uarantee from pcre2_config() | ode. The only guarantee from | |||
is that if it returns zero, JIT will definitely not be used. | pcre2_config() is that if it returns zero, JIT will definitely not b | |||
e used. | ||||
A simple program does not need to check availability in order to us | As of release 10.45 there is a more informative way to test for J | |||
e JIT when possible. The API | IT support. If pcre2_com‐ | |||
pile_jit() is called with the single option PCRE2_JIT_TEST_ALL | ||||
OC it returns zero if JIT is | ||||
available and has a working allocator. Otherwise it returns PCRE2 | ||||
_ERROR_NOMEMORY if JIT is | ||||
available but cannot allocate executable memory, or PCRE2_ERROR_JI | ||||
T_UNSUPPORTED if JIT support | ||||
is not compiled. The code argument is ignored, so it can be a NULL v | ||||
alue. | ||||
A simple program does not need to check availability in order to use | ||||
JIT when possible. The API | ||||
is implemented in a way that falls back to the interpretive code if JIT is not available or can‐ | is implemented in a way that falls back to the interpretive code if JIT is not available or can‐ | |||
not be used for a given match. For programs that need the best possi ble performance, there is a | not be used for a given match. For programs that need the best poss ible performance, there is a | |||
"fast path" API that is JIT-specific. | "fast path" API that is JIT-specific. | |||
SIMPLE USE OF JIT | SIMPLE USE OF JIT | |||
To make use of the JIT support in the simplest way, all you have to do is to call pcre2_jit_com‐ | To make use of the JIT support in the simplest way, all you have to do is to call pcre2_jit_com‐ | |||
pile() after successfully compiling a pattern with pcre2_compile(). | pile() after successfully compiling a pattern with pcre2_compile(). | |||
This function has two argu‐ | This function has two argu‐ | |||
ments: the first is the compiled pattern pointer that was returned b | ments: the first is the compiled pattern pointer that was returned | |||
y pcre2_compile(), and the | by pcre2_compile(), and the | |||
second is zero or more of the following option bits: PCRE2_JIT_COMPL ETE, PCRE2_JIT_PARTIAL_HARD, | second is zero or more of the following option bits: PCRE2_JIT_COMPL ETE, PCRE2_JIT_PARTIAL_HARD, | |||
or PCRE2_JIT_PARTIAL_SOFT. | or PCRE2_JIT_PARTIAL_SOFT. | |||
If JIT support is not available, a call to pcre2_jit_compil | If JIT support is not available, a call to pcre2_jit_compile() | |||
e() does nothing and returns | does nothing and returns | |||
PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled pattern is passed | PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled pattern is passe | |||
to the JIT compiler, which | d to the JIT compiler, which | |||
turns it into machine code that executes much faster than the n | turns it into machine code that executes much faster than the norm | |||
ormal interpretive code, but | al interpretive code, but | |||
yields exactly the same results. The returned value from pcre2_jit_c ompile() is zero on success, | yields exactly the same results. The returned value from pcre2_jit_c ompile() is zero on success, | |||
or a negative error code. | or a negative error code. | |||
There is a limit to the size of pattern that JIT supports, imposed b y the size of machine stack | There is a limit to the size of pattern that JIT supports, imposed by the size of machine stack | |||
that it uses. The exact rules are not documented because they may ch ange at any time, in partic‐ | that it uses. The exact rules are not documented because they may ch ange at any time, in partic‐ | |||
ular, when new optimizations are introduced. If a pattern is too b ig, a call to pcre2_jit_com‐ | ular, when new optimizations are introduced. If a pattern is too bi g, a call to pcre2_jit_com‐ | |||
pile() returns PCRE2_ERROR_NOMEMORY. | pile() returns PCRE2_ERROR_NOMEMORY. | |||
PCRE2_JIT_COMPLETE requests the JIT compiler to generate code for co | PCRE2_JIT_COMPLETE requests the JIT compiler to generate code for c | |||
mplete matches. If you want | omplete matches. If you want | |||
to run partial matches using the PCRE2_PARTIAL_HARD or PCR | to run partial matches using the PCRE2_PARTIAL_HARD or PCRE2 | |||
E2_PARTIAL_SOFT options of | _PARTIAL_SOFT options of | |||
pcre2_match(), you should set one or both of the other options as | pcre2_match(), you should set one or both of the other optio | |||
well as, or instead of | ns as well as, or instead of | |||
PCRE2_JIT_COMPLETE. The JIT compiler generates different optimize | PCRE2_JIT_COMPLETE. The JIT compiler generates different optimized c | |||
d code for each of the three | ode for each of the three | |||
modes (normal, soft partial, hard partial). When pcre2_match() is ca | modes (normal, soft partial, hard partial). When pcre2_match() is | |||
lled, the appropriate code | called, the appropriate code | |||
is run if it is available. Otherwise, the pattern is matched using i nterpretive code. | is run if it is available. Otherwise, the pattern is matched using i nterpretive code. | |||
You can call pcre2_jit_compile() multiple times for the same compi led pattern. It does nothing | You can call pcre2_jit_compile() multiple times for the same compile d pattern. It does nothing | |||
if it has previously compiled code for any of the option bits. For e xample, you can call it once | if it has previously compiled code for any of the option bits. For e xample, you can call it once | |||
with PCRE2_JIT_COMPLETE and (perhaps later, when you find you need p | with PCRE2_JIT_COMPLETE and (perhaps later, when you find you need | |||
artial matching) again with | partial matching) again with | |||
PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time it will i | PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time it will ign | |||
gnore PCRE2_JIT_COMPLETE and | ore PCRE2_JIT_COMPLETE and | |||
just compile code for partial matching. If pcre2_jit_compile() is ca | just compile code for partial matching. If pcre2_jit_compile() i | |||
lled with no option bits | s called with no option bits | |||
set, it immediately returns zero. This is an alternative way of t | set, it immediately returns zero. This is an alternative way of test | |||
esting whether JIT is avail‐ | ing whether JIT support has | |||
able. | been compiled. | |||
At present, it is not possible to free JIT compiled code except when the entire compiled pattern | At present, it is not possible to free JIT compiled code except when the entire compiled pattern | |||
is freed by calling pcre2_code_free(). | is freed by calling pcre2_code_free(). | |||
In some circumstances you may need to call additional functions. The se are described in the sec‐ | In some circumstances you may need to call additional functions. The se are described in the sec‐ | |||
tion entitled "Controlling the JIT stack" below. | tion entitled "Controlling the JIT stack" below. | |||
There are some pcre2_match() options that are not supported by JIT, and there are also some pat‐ | There are some pcre2_match() options that are not supported by JIT, and there are also some pat‐ | |||
tern items that JIT cannot handle. Details are given below. In both cases, matching automati‐ | tern items that JIT cannot handle. Details are given below. In bo th cases, matching automati‐ | |||
cally falls back to the interpretive code. If you want to know wheth er JIT was actually used for | cally falls back to the interpretive code. If you want to know wheth er JIT was actually used for | |||
a particular match, you should arrange for a JIT callback function to be set up as described in | a particular match, you should arrange for a JIT callback function t o be set up as described in | |||
the section entitled "Controlling the JIT stack" below, even if you do not need to supply a non- | the section entitled "Controlling the JIT stack" below, even if you do not need to supply a non- | |||
default JIT stack. Such a callback function is called whenever JIT c ode is about to be obeyed. | default JIT stack. Such a callback function is called whenever JIT code is about to be obeyed. | |||
If the match-time options are not right for JIT execution, the callb ack function is not obeyed. | If the match-time options are not right for JIT execution, the callb ack function is not obeyed. | |||
If the JIT compiler finds an unsupported item, no JIT data is genera ted. You can find out if JIT | If the JIT compiler finds an unsupported item, no JIT data is genera ted. You can find out if JIT | |||
compilation was successful for a compiled pattern by calling p cre2_pattern_info() with the | compilation was successful for a compiled pattern by calling pcr e2_pattern_info() with the | |||
PCRE2_INFO_JITSIZE option. A non-zero result means that JIT compilat ion was successful. A result | PCRE2_INFO_JITSIZE option. A non-zero result means that JIT compilat ion was successful. A result | |||
of 0 means that JIT support is not available, or the pattern was not processed by pcre2_jit_com‐ | of 0 means that JIT support is not available, or the pattern was not processed by pcre2_jit_com‐ | |||
pile(), or the JIT compiler was not able to handle the pattern. Succ | pile(), or the JIT compiler was not able to handle the pattern. Suc | |||
essful JIT compilation does | cessful JIT compilation does | |||
not, however, guarantee the use of JIT at match time because there | not, however, guarantee the use of JIT at match time because there a | |||
are some match time options | re some match time options | |||
that are not supported by JIT. | that are not supported by JIT. | |||
MATCHING SUBJECTS CONTAINING INVALID UTF | MATCHING SUBJECTS CONTAINING INVALID UTF | |||
When a pattern is compiled with the PCRE2_UTF option, subject string | When a pattern is compiled with the PCRE2_UTF option, subject stri | |||
s are normally expected to | ngs are normally expected to | |||
be a valid sequence of UTF code units. By default, this is checked | be a valid sequence of UTF code units. By default, this is checked a | |||
at the start of matching and | t the start of matching and | |||
an error is generated if invalid UTF is detected. The PCRE2_NO_UTF_C HECK option can be passed to | an error is generated if invalid UTF is detected. The PCRE2_NO_UTF_C HECK option can be passed to | |||
pcre2_match() to skip the check (for improved performance) if you ar e sure that a subject string | pcre2_match() to skip the check (for improved performance) if you ar e sure that a subject string | |||
is valid. If this option is used with an invalid string, the result is undefined. The calling | is valid. If this option is used with an invalid string, the resu lt is undefined. The calling | |||
program may crash or loop or otherwise misbehave. | program may crash or loop or otherwise misbehave. | |||
However, a way of running matches on strings that may contain inv | However, a way of running matches on strings that may contain invali | |||
alid UTF sequences is avail‐ | d UTF sequences is avail‐ | |||
able. Calling pcre2_compile() with the PCRE2_MATCH_INVALID_UTF optio | able. Calling pcre2_compile() with the PCRE2_MATCH_INVALID_UTF opti | |||
n has two effects: it tells | on has two effects: it tells | |||
the interpreter in pcre2_match() to support invalid UTF, and, if p | the interpreter in pcre2_match() to support invalid UTF, and, if pcr | |||
cre2_jit_compile() is subse‐ | e2_jit_compile() is subse‐ | |||
quently called, the compiled JIT code also supports invalid UTF. De | quently called, the compiled JIT code also supports invalid UTF. | |||
tails of how this support | Details of how this support | |||
works, in both the JIT and the interpretive cases, is given in the p cre2unicode documentation. | works, in both the JIT and the interpretive cases, is given in the p cre2unicode documentation. | |||
There is also an obsolete option for pcre2_jit_compile() called | There is also an obsolete option for pcre2_jit_compile() called P | |||
PCRE2_JIT_INVALID_UTF, which | CRE2_JIT_INVALID_UTF, which | |||
currently exists only for backward compatibility. It is superseded | currently exists only for backward compatibility. It is supersede | |||
by the pcre2_compile() op‐ | d by the pcre2_compile() op‐ | |||
tion PCRE2_MATCH_INVALID_UTF and should no longer be used. It may be removed in future. | tion PCRE2_MATCH_INVALID_UTF and should no longer be used. It may be removed in future. | |||
UNSUPPORTED OPTIONS AND PATTERN ITEMS | UNSUPPORTED OPTIONS AND PATTERN ITEMS | |||
The pcre2_match() options that are supported for JIT matching are | The pcre2_match() options that are supported for JIT matching are | |||
PCRE2_COPY_MATCHED_SUBJECT, | PCRE2_COPY_MATCHED_SUBJECT, | |||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATS | PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_AT | |||
TART, PCRE2_NO_UTF_CHECK, | START, PCRE2_NO_UTF_CHECK, | |||
PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and P CRE2_ENDANCHORED options are | PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and P CRE2_ENDANCHORED options are | |||
not supported at match time. | not supported at match time. | |||
If the PCRE2_NO_JIT option is passed to pcre2_match() it disables th e use of JIT, forcing match‐ | If the PCRE2_NO_JIT option is passed to pcre2_match() it disables th e use of JIT, forcing match‐ | |||
ing by the interpreter code. | ing by the interpreter code. | |||
The only unsupported pattern items are \C (match a single data unit) when running in a UTF mode, | The only unsupported pattern items are \C (match a single data unit) when running in a UTF mode, | |||
and a callout immediately before an assertion condition in a conditi onal group. | and a callout immediately before an assertion condition in a conditi onal group. | |||
RETURN VALUES FROM JIT MATCHING | RETURN VALUES FROM JIT MATCHING | |||
When a pattern is matched using JIT, the return values are the same as those given by the inter‐ | When a pattern is matched using JIT, the return values are the same as those given by the inter‐ | |||
pretive pcre2_match() code, with the addition of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. | pretive pcre2_match() code, with the addition of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. | |||
This means that the memory used for the JIT stack was insufficie nt. See "Controlling the JIT | This means that the memory used for the JIT stack was insufficient. See "Controlling the JIT | |||
stack" below for a discussion of JIT stack usage. | stack" below for a discussion of JIT stack usage. | |||
The error code PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if searching a very large pat‐ | The error code PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if searching a very large pat‐ | |||
tern tree goes on for too long, as it is in the same circumstance wh | tern tree goes on for too long, as it is in the same circumstance w | |||
en JIT is not used, but the | hen JIT is not used, but the | |||
details of exactly what is counted are not the same. The PCRE2_ER | details of exactly what is counted are not the same. The PCRE2_ERROR | |||
ROR_DEPTHLIMIT error code is | _DEPTHLIMIT error code is | |||
never returned when JIT matching is used. | never returned when JIT matching is used. | |||
CONTROLLING THE JIT STACK | CONTROLLING THE JIT STACK | |||
When the compiled JIT code runs, it needs a block of memory to use a | When the compiled JIT code runs, it needs a block of memory to use | |||
s a stack. By default, it | as a stack. By default, it | |||
uses 32KiB on the machine stack. However, some large or complic | uses 32KiB on the machine stack. However, some large or complicated | |||
ated patterns need more than | patterns need more than | |||
this. The error PCRE2_ERROR_JIT_STACKLIMIT is given when there is no | this. The error PCRE2_ERROR_JIT_STACKLIMIT is given when there is n | |||
t enough stack. Three func‐ | ot enough stack. Three func‐ | |||
tions are provided for managing blocks of memory for use as JIT stac ks. There is further discus‐ | tions are provided for managing blocks of memory for use as JIT stac ks. There is further discus‐ | |||
sion about the use of JIT stacks in the section entitled "JIT stack FAQ" below. | sion about the use of JIT stacks in the section entitled "JIT stack FAQ" below. | |||
The pcre2_jit_stack_create() function creates a JIT stack. Its argu | The pcre2_jit_stack_create() function creates a JIT stack. Its argum | |||
ments are a starting size, a | ents are a starting size, a | |||
maximum size, and a general context (for memory allocation functions | maximum size, and a general context (for memory allocation function | |||
, or NULL for standard mem‐ | s, or NULL for standard mem‐ | |||
ory allocation). It returns a pointer to an opaque structure of type pcre2_jit_stack, or NULL if | ory allocation). It returns a pointer to an opaque structure of type pcre2_jit_stack, or NULL if | |||
there is an error. The pcre2_jit_stack_free() function is used to fr ee a stack that is no longer | there is an error. The pcre2_jit_stack_free() function is used to fr ee a stack that is no longer | |||
needed. If its argument is NULL, this function returns immediately, without doing anything. (For | needed. If its argument is NULL, this function returns immediately, without doing anything. (For | |||
the technically minded: the address space is allocated by mmap or Vi rtualAlloc.) A maximum stack | the technically minded: the address space is allocated by mmap or Vi rtualAlloc.) A maximum stack | |||
size of 512KiB to 1MiB should be more than enough for any pattern. | size of 512KiB to 1MiB should be more than enough for any pattern. | |||
The pcre2_jit_stack_assign() function specifies which stack JIT co de should use. Its arguments | The pcre2_jit_stack_assign() function specifies which stack JIT code should use. Its arguments | |||
are as follows: | are as follows: | |||
pcre2_match_context *mcontext | pcre2_match_context *mcontext | |||
pcre2_jit_callback callback | pcre2_jit_callback callback | |||
void *data | void *data | |||
The first argument is a pointer to a match context. When this is sub sequently passed to a match‐ | The first argument is a pointer to a match context. When this is sub sequently passed to a match‐ | |||
ing function, its information determines which JIT stack is used. If | ing function, its information determines which JIT stack is used. I | |||
this argument is NULL, the | f this argument is NULL, the | |||
function returns immediately, without doing anything. There are t | function returns immediately, without doing anything. There are thre | |||
hree cases for the values of | e cases for the values of | |||
the other two options: | the other two options: | |||
(1) If callback is NULL and data is NULL, an internal 32KiB block | (1) If callback is NULL and data is NULL, an internal 32KiB block | |||
on the machine stack is used. This is the default when a match | on the machine stack is used. This is the default when a match | |||
context is created. | context is created. | |||
(2) If callback is NULL and data is not NULL, data must be | (2) If callback is NULL and data is not NULL, data must be | |||
a pointer to a valid JIT stack, the result of calling | a pointer to a valid JIT stack, the result of calling | |||
pcre2_jit_stack_create(). | pcre2_jit_stack_create(). | |||
(3) If callback is not NULL, it must point to a function that is | (3) If callback is not NULL, it must point to a function that is | |||
called with data as an argument at the start of matching, in | called with data as an argument at the start of matching, in | |||
order to set up a JIT stack. If the return from the callback | order to set up a JIT stack. If the return from the callback | |||
function is NULL, the internal 32KiB stack is used; otherwise the | function is NULL, the internal 32KiB stack is used; otherwise the | |||
return value must be a valid JIT stack, the result of calling | return value must be a valid JIT stack, the result of calling | |||
pcre2_jit_stack_create(). | pcre2_jit_stack_create(). | |||
A callback function is obeyed whenever JIT code is about to be run ; it is not obeyed when | A callback function is obeyed whenever JIT code is about to b e run; it is not obeyed when | |||
pcre2_match() is called with options that are incompatible for JIT m atching. A callback function | pcre2_match() is called with options that are incompatible for JIT m atching. A callback function | |||
can therefore be used to determine whether a match operation was e xecuted by JIT or by the in‐ | can therefore be used to determine whether a match operation was exe cuted by JIT or by the in‐ | |||
terpreter. | terpreter. | |||
You may safely use the same JIT stack for more than one pattern (eit her by assigning directly or | You may safely use the same JIT stack for more than one pattern (eit her by assigning directly or | |||
by callback), as long as the patterns are matched sequentially in th | by callback), as long as the patterns are matched sequentially in | |||
e same thread. Currently, | the same thread. Currently, | |||
the only way to set up non-sequential matches in one thread is t | the only way to set up non-sequential matches in one thread is to us | |||
o use callouts: if a callout | e callouts: if a callout | |||
function starts another match, that match must use a different JIT s | function starts another match, that match must use a different J | |||
tack to the one used for | IT stack to the one used for | |||
currently suspended match(es). | currently suspended match(es). | |||
In a multithread application, if you do not specify a JIT stack, o | In a multithread application, if you do not specify a JIT stack, or | |||
r if you assign or pass back | if you assign or pass back | |||
NULL from a callback, that is thread-safe, because each thread has i | NULL from a callback, that is thread-safe, because each thread has | |||
ts own machine stack. How‐ | its own machine stack. How‐ | |||
ever, if you assign or pass back a non-NULL JIT stack, this must b | ever, if you assign or pass back a non-NULL JIT stack, this must be | |||
e a different stack for each | a different stack for each | |||
thread so that the application is thread-safe. | thread so that the application is thread-safe. | |||
Strictly speaking, even more is allowed. You can assign the same non -NULL stack to a match con‐ | Strictly speaking, even more is allowed. You can assign the same no n-NULL stack to a match con‐ | |||
text that is used by any number of patterns, as long as they are not used for matching by multi‐ | text that is used by any number of patterns, as long as they are not used for matching by multi‐ | |||
ple threads at the same time. For example, you could use the sam | ple threads at the same time. For example, you could use the same st | |||
e stack in all compiled pat‐ | ack in all compiled pat‐ | |||
terns, with a global mutex in the callback to wait until the stack i | terns, with a global mutex in the callback to wait until the stac | |||
s available for use. How‐ | k is available for use. How‐ | |||
ever, this is an inefficient solution, and not recommended. | ever, this is an inefficient solution, and not recommended. | |||
This is a suggestion for how a multithreaded program that needs to s et up non-default JIT stacks | This is a suggestion for how a multithreaded program that needs to s et up non-default JIT stacks | |||
might operate: | might operate: | |||
During thread initialization | During thread initialization | |||
thread_local_var = pcre2_jit_stack_create(...) | thread_local_var = pcre2_jit_stack_create(...) | |||
During thread exit | During thread exit | |||
pcre2_jit_stack_free(thread_local_var) | pcre2_jit_stack_free(thread_local_var) | |||
Use a one-line callback function | Use a one-line callback function | |||
return thread_local_var | return thread_local_var | |||
All the functions described in this section do nothing if JIT is not available. | All the functions described in this section do nothing if JIT is not available. | |||
JIT STACK FAQ | JIT STACK FAQ | |||
(1) Why do we need JIT stacks? | (1) Why do we need JIT stacks? | |||
PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a s | PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a st | |||
tack where the local data of | ack where the local data of | |||
the current node is pushed before checking its child nodes. Allocat | the current node is pushed before checking its child nodes. Allo | |||
ing real machine stack on | cating real machine stack on | |||
some platforms is difficult. For example, the stack chain needs to | some platforms is difficult. For example, the stack chain needs to b | |||
be updated every time if we | e updated every time if we | |||
extend the stack on PowerPC. Although it is possible, its updating time overhead decreases per‐ | extend the stack on PowerPC. Although it is possible, its updating time overhead decreases per‐ | |||
formance. So we do the recursion in memory. | formance. So we do the recursion in memory. | |||
(2) Why don't we simply allocate blocks of memory with malloc()? | (2) Why don't we simply allocate blocks of memory with malloc()? | |||
Modern operating systems have a nice feature: they can reserve an ad dress space instead of allo‐ | Modern operating systems have a nice feature: they can reserve an ad dress space instead of allo‐ | |||
cating memory. We can safely allocate memory pages inside this addre ss space, so the stack could | cating memory. We can safely allocate memory pages inside this addre ss space, so the stack could | |||
grow without moving memory data (this is important because of pointe rs). Thus we can allocate | grow without moving memory data (this is important because of poi nters). Thus we can allocate | |||
1MiB address space, and use only a single memory page (usually 4KiB) if that is enough. However, | 1MiB address space, and use only a single memory page (usually 4KiB) if that is enough. However, | |||
we can still grow up to 1MiB anytime if needed. | we can still grow up to 1MiB anytime if needed. | |||
(3) Who "owns" a JIT stack? | (3) Who "owns" a JIT stack? | |||
The owner of the stack is the user program, not the JIT studied p | The owner of the stack is the user program, not the JIT studied patt | |||
attern or anything else. The | ern or anything else. The | |||
user program must ensure that if a stack is being used by pcre2_matc | user program must ensure that if a stack is being used by pcre2_ | |||
h(), (that is, it is as‐ | match(), (that is, it is as‐ | |||
signed to a match context that is passed to the pattern currently r | signed to a match context that is passed to the pattern currently ru | |||
unning), that stack must not | nning), that stack must not | |||
be used by any other threads (to avoid overwriting the same memory a | be used by any other threads (to avoid overwriting the same memory | |||
rea). The best practice for | area). The best practice for | |||
multithreaded programs is to allocate a stack for each thread, and r eturn this stack through the | multithreaded programs is to allocate a stack for each thread, and r eturn this stack through the | |||
JIT callback function. | JIT callback function. | |||
(4) When should a JIT stack be freed? | (4) When should a JIT stack be freed? | |||
You can free a JIT stack at any time, as long as it will not be | You can free a JIT stack at any time, as long as it will not be use | |||
used by pcre2_match() again. | d by pcre2_match() again. | |||
When you assign the stack to a match context, only a pointer is se | When you assign the stack to a match context, only a pointer i | |||
t. There is no reference | s set. There is no reference | |||
counting or any other magic. You can free compiled patterns, contex | counting or any other magic. You can free compiled patterns, context | |||
ts, and stacks in any order, | s, and stacks in any order, | |||
anytime. Just do not call pcre2_match() with a match context poin | anytime. Just do not call pcre2_match() with a match context | |||
ting to an already freed | pointing to an already freed | |||
stack, as that will cause SEGFAULT. (Also, do not free a stack cur | stack, as that will cause SEGFAULT. (Also, do not free a stack curre | |||
rently used by pcre2_match() | ntly used by pcre2_match() | |||
in another thread). You can also replace the stack in a context at a | in another thread). You can also replace the stack in a context a | |||
ny time when it is not in | t any time when it is not in | |||
use. You should free the previous stack before assigning a replaceme nt. | use. You should free the previous stack before assigning a replaceme nt. | |||
(5) Should I allocate/free a stack every time before/after calling p cre2_match()? | (5) Should I allocate/free a stack every time before/after calling p cre2_match()? | |||
No, because this is too costly in terms of resources. However, you | No, because this is too costly in terms of resources. However, you c | |||
could implement some clever | ould implement some clever | |||
idea which release the stack if it is not used in let's say two minu | idea which release the stack if it is not used in let's say two m | |||
tes. The JIT callback can | inutes. The JIT callback can | |||
help to achieve this without keeping a list of patterns. | help to achieve this without keeping a list of patterns. | |||
(6) OK, the stack is for long term memory allocation. But what happe ns if a pattern causes stack | (6) OK, the stack is for long term memory allocation. But what happe ns if a pattern causes stack | |||
overflow with a stack of 1MiB? Is that 1MiB kept until the stack is freed? | overflow with a stack of 1MiB? Is that 1MiB kept until the stack is freed? | |||
Especially on embedded systems, it might be a good idea to rele | Especially on embedded systems, it might be a good idea to release | |||
ase memory sometimes without | memory sometimes without | |||
freeing the stack. There is no API for this at the moment. Probably | freeing the stack. There is no API for this at the moment. Probab | |||
a function call which re‐ | ly a function call which re‐ | |||
turns with the currently allocated memory for any stack and another | turns with the currently allocated memory for any stack and another | |||
which allows releasing mem‐ | which allows releasing mem‐ | |||
ory (shrinking the stack) would be a good idea if someone needs this . | ory (shrinking the stack) would be a good idea if someone needs this . | |||
(7) This is too much of a headache. Isn't there any better solution for JIT stack handling? | (7) This is too much of a headache. Isn't there any better solution for JIT stack handling? | |||
No, thanks to Windows. If POSIX threads were used everywhere, we cou ld throw out this compli‐ | No, thanks to Windows. If POSIX threads were used everywhere, we could throw out this compli‐ | |||
cated API. | cated API. | |||
FREEING JIT SPECULATIVE MEMORY | FREEING JIT SPECULATIVE MEMORY | |||
void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); | void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); | |||
The JIT executable allocator does not free all memory when it is po | The JIT executable allocator does not free all memory when it is pos | |||
ssible. It expects new allo‐ | sible. It expects new allo‐ | |||
cations, and keeps some free memory around to improve allocation spe | cations, and keeps some free memory around to improve allocation s | |||
ed. However, in low memory | peed. However, in low memory | |||
conditions, it might be better to free all possible memory. You | conditions, it might be better to free all possible memory. You can | |||
can cause this to happen by | cause this to happen by | |||
calling pcre2_jit_free_unused_memory(). Its argument is a general c | calling pcre2_jit_free_unused_memory(). Its argument is a genera | |||
ontext, for custom memory | l context, for custom memory | |||
management, or NULL for standard memory management. | management, or NULL for standard memory management. | |||
EXAMPLE CODE | EXAMPLE CODE | |||
This is a single-threaded example that specifies a JIT stack with out using a callback. A real | This is a single-threaded example that specifies a JIT stack without using a callback. A real | |||
program should include error checking after all the function calls. | program should include error checking after all the function calls. | |||
int rc; | int rc; | |||
pcre2_code *re; | pcre2_code *re; | |||
pcre2_match_data *match_data; | pcre2_match_data *match_data; | |||
pcre2_match_context *mcontext; | pcre2_match_context *mcontext; | |||
pcre2_jit_stack *jit_stack; | pcre2_jit_stack *jit_stack; | |||
re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, 0, | re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, 0, | |||
&errornumber, &erroffset, NULL); | &errornumber, &erroffset, NULL); | |||
skipping to change at line 325 | skipping to change at line 331 | |||
pcre2_code_free(re); | pcre2_code_free(re); | |||
pcre2_match_data_free(match_data); | pcre2_match_data_free(match_data); | |||
pcre2_match_context_free(mcontext); | pcre2_match_context_free(mcontext); | |||
pcre2_jit_stack_free(jit_stack); | pcre2_jit_stack_free(jit_stack); | |||
JIT FAST PATH API | JIT FAST PATH API | |||
Because the API described above falls back to interpreted matching w hen JIT is not available, it | Because the API described above falls back to interpreted matching w hen JIT is not available, it | |||
is convenient for programs that are written for general use in many environments. However, call‐ | is convenient for programs that are written for general use in many environments. However, call‐ | |||
ing JIT via pcre2_match() does have a performance impact. Programs | ing JIT via pcre2_match() does have a performance impact. Progr | |||
that are written for use | ams that are written for use | |||
where JIT is known to be available, and which need the best possi | where JIT is known to be available, and which need the best possible | |||
ble performance, can instead | performance, can instead | |||
use a "fast path" API to call JIT matching directly instead of calli | use a "fast path" API to call JIT matching directly instead of call | |||
ng pcre2_match() (obviously | ing pcre2_match() (obviously | |||
only for patterns that have been successfully processed by pcre2_jit _compile()). | only for patterns that have been successfully processed by pcre2_jit _compile()). | |||
The fast path function is called pcre2_jit_match(), and it takes e | The fast path function is called pcre2_jit_match(), and it takes exa | |||
xactly the same arguments as | ctly the same arguments as | |||
pcre2_match(). However, the subject string must be specified with a | pcre2_match(). However, the subject string must be specified with | |||
length; PCRE2_ZERO_TERMI‐ | a length; PCRE2_ZERO_TERMI‐ | |||
NATED is not supported. Unsupported option bits (for example, PCR | NATED is not supported. Unsupported option bits (for example, PCRE2 | |||
E2_ANCHORED and PCRE2_ENDAN‐ | _ANCHORED and PCRE2_ENDAN‐ | |||
CHORED) are ignored, as is the PCRE2_NO_JIT option. The return value | CHORED) are ignored, as is the PCRE2_NO_JIT option. The return val | |||
s are also the same as for | ues are also the same as for | |||
pcre2_match(), plus PCRE2_ERROR_JIT_BADOPTION if a matching mode | pcre2_match(), plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (pa | |||
(partial or complete) is re‐ | rtial or complete) is re‐ | |||
quested that was not compiled. | quested that was not compiled. | |||
When you call pcre2_match(), as well as testing for invalid options, | When you call pcre2_match(), as well as testing for invalid optio | |||
a number of other sanity | ns, a number of other sanity | |||
checks are performed on the arguments. For example, if the sub | checks are performed on the arguments. For example, if the subject | |||
ject pointer is NULL but the | pointer is NULL but the | |||
length is non-zero, an immediate error is given. Also, unless PCRE2_ | length is non-zero, an immediate error is given. Also, unless PCRE | |||
NO_UTF_CHECK is set, a UTF | 2_NO_UTF_CHECK is set, a UTF | |||
subject string is tested for validity. In the interests of speed, t | subject string is tested for validity. In the interests of speed, th | |||
hese checks do not happen on | ese checks do not happen on | |||
the JIT fast path. If invalid UTF data is passed when PCRE2_MATCH_IN | the JIT fast path. If invalid UTF data is passed when PCRE2_MATCH | |||
VALID_UTF was not set for | _INVALID_UTF was not set for | |||
pcre2_compile(), the result is undefined. The program may crash or | pcre2_compile(), the result is undefined. The program may crash or l | |||
loop or give wrong results. | oop or give wrong results. | |||
In the absence of PCRE2_MATCH_INVALID_UTF you should call pcre2_jit_ | In the absence of PCRE2_MATCH_INVALID_UTF you should call pcre2_jit | |||
match() in UTF mode only if | _match() in UTF mode only if | |||
you are sure the subject is valid. | you are sure the subject is valid. | |||
Bypassing the sanity checks and the pcre2_match() wrapping can give speedups of more than 10%. | Bypassing the sanity checks and the pcre2_match() wrapping can give speedups of more than 10%. | |||
SEE ALSO | SEE ALSO | |||
pcre2api(3), pcre2unicode(3) | pcre2api(3), pcre2unicode(3) | |||
AUTHOR | AUTHOR | |||
Philip Hazel (FAQ by Zoltan Herczeg) | Philip Hazel (FAQ by Zoltan Herczeg) | |||
Retired from University Computing Service | Retired from University Computing Service | |||
Cambridge, England. | Cambridge, England. | |||
REVISION | REVISION | |||
Last updated: 21 February 2024 | Last updated: 22 August 2024 | |||
Copyright (c) 1997-2024 University of Cambridge. | Copyright (c) 1997-2024 University of Cambridge. | |||
PCRE2 10.43 21 February 2024 PCRE2JIT(3) | PCRE2 10.45-RC1 22 August 2024 PCRE2JIT(3) | |||
End of changes. 47 change blocks. | ||||
185 lines changed or deleted | 197 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |