vlakoff
Repos
82
Followers
63
Following
41

Events

issue comment
Fix handling of glob wildcards when inside a character class

If we encounter a single backslash at the end of a range, or at the end of the pattern, there is an ambiguity:

  • it could mean a literal backslash that the user overlooked to escape
  • but it could also mean an escape backslash, with nothing to escape after…

In both problematic locations (end of range, and end of pattern), the code was previously resulting in triggering PHP warnings.

Indeed, I think it's better to reject these erroneous patterns, rather than trying to handle them somehow.

In #10 I took care of preserving these PHP warnings, for BC. Ideally, we might want to throw custom exceptions, but let's keep this aside for now.

Created at 2 days ago
issue comment
Fix handling of glob wildcards when inside a character class

all special characters (?, *, but also the other ones) can be escaped with a backslash, and have to when inside a character class

That was the situation before the current changes. With the current changes, they still can, but no longer have to ;-)

Not exactly sure what you mean here, but when writing a glob pattern to match a group of characters a single backslash will be escaped properly.

I think there is some confusion (and that's really understandable) between the escape with \ in Splat patterns, and the escape with \ in PHP strings. There are two layers of escaping.

And a lot of confusion probably comes from the fact that PHP doesn't require escaping the \ in most, but not all circumstances. (refs this answer on Stack Overflow)

Created at 2 days ago
issue comment
Improved code that properly supports special characters in ranges

Alternatively, we might allow unescaped \ as end of range (as you already did), and at end of pattern.

But I really think we should not do this.

So that the rules are consistent (as they were previously): the \ is an escape character in Splat patterns (and the current changes make it not needed in ranges), and the tricky part: \ should itself be escaped in PHP strings.

Maybe this example will help to show it better:

  • 'ab\*' — matches literally a, b, and * which is escaped
  • 'ab\\*' — same as above, but properly inputting the \ in a PHP string
  • 'ab\\\\*' — matches literally a, b, \, then there is a wildcard *
Created at 2 days ago
issue comment
Improved code that properly supports special characters in ranges

PHPUnit just gave me a really hard time. Just to capture a PHP warning.

After a few hours of pain, I couldn't even solve the problem. So I'm just removing the failing tests (they were about unescaped \ as the last character of a range, from being recently accepted, to going back triggering a PHP warning).

Created at 2 days ago

Update tests

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Improved code that properly supports special characters in ranges

Update test

The backslash is an escape character, which itself can be escaped by doubling it.

Therefore, a range cannot (and should not) contain an unescaped backslash as its last character.

Created at 2 days ago

Improved code that properly supports special characters in ranges

Created at 2 days ago

Rename variable, and minor adjustments

Improved code that properly supports special characters in ranges

Created at 2 days ago
pull request opened
Improved code that properly supports special characters in ranges

Follow-up to #9.

With this new code, all special characters can be used in ranges without having to be escaped.

Basically, it's a new approach that uses two different code paths, whether we are in a character range or not. It's both simpler and robuster.

Notes:

  • An unescaped \ at the end of the pattern was previously triggering a PHP warning, and unexpectedly escaping the next character appended after the loop ($ end anchor or # pattern delimiter). In the revised code, I drop out this \, and trigger the same PHP warning for BC.
  • As in previous code, an unescaped \ at the end of a character range is not supported (and should not be), and PHP triggers a "preg_match(): compilation failed" warning.

Disclaimer: published "in emergency" after #9 got merged. Though, everything is working.

Created at 2 days ago
create branch
vlakoff create branch new_code
Created at 2 days ago
create tag
vlakoff create tag 5.0.0
Created at 2 days ago
issue comment
Fix handling of glob wildcards when inside a character class

This PR was incomplete actually. I had further worked on it locally, and noticed that, as I was expecting, many other characters were needing the same handling.

The rule with the previous code was simple: all special characters (?, *, but also the other ones) can be escaped with a backslash, and have to when inside a character class. This PR in its current state was introducing a discrepancy, by alleviating the escaping obligation for ?, * but not the other characters.

About an unescaped \ as the last character of a class: maybe some implementations allow it, but it's basically wrong and as far as I remember, almost all implementations forbid it. With previous code, Splat was triggering a PHP warning in this case, and for BC we should continue to forbid this case.

Thankfully, I had worked locally a new version with all the issues fixed, and containing many code comments explaining the various pitfalls I have discovered. I'm going to open "in emergency" a PR with this code so you can have a look at it.

Created at 2 days ago
Created at 1 month ago
started
Created at 1 month ago
issue comment
[9.x] Use secure randomness in Arr:random and Arr:shuffle

I would still suggest adding new methods secureRandom() and secureShuffle().

Case in point, currently the shuffle() method is secure without seed, but not secure with seed. I think for a given method, the level of security shouldn't be impacted by the presence (or not) of an optional parameter.

Created at 1 month ago
issue comment
[9.x] Use secure randomness in Arr:random and Arr:shuffle

(indeed, the 16 thumbs up surprised me as well)

can you share your benchmarking details, showing the 10x performance hit?

I just ran the following:

$nb = 1000;
$data = range(1, 5000);

$t1 = microtime(true);
for ($ii = $nb; $ii--; ) {
    shuffle_v1($data);
}
$t2 = microtime(true);
for ($ii = $nb; $ii--; ) {
    shuffle_v2($data);
}
$t3 = microtime(true);

echo $t2 - $t1;
echo "\n";
echo $t3 - $t2;
echo "\n\n";
echo $t2 - $t1 == 0 ? 'N/A' : ($t3 - $t2) * 100 / ($t2 - $t1);

function shuffle_v1($array) {
    shuffle($array);
    return $array;
}

function shuffle_v2($array) {
    $values = array_values($array);
    for ($i = count($values) - 1; $i >= 1; --$i) {
        $j = random_int(0, $i);
        $temp = $values[$i];
        $values[$i] = $values[$j];
        $values[$j] = $temp;
    }
    return $values;
}

Results:

0.11835885047913 1.1667959690094

985.81218412152

Created at 1 month ago