alexdowad
Repos
36
Followers
41
Following
1

Nostalgic collection of Alex Dowad's old Ruby code

31
3

Fast bitwise operations for Ruby

C
8
0

Ruby EventMachine TFTP implementation

1
0

Alex Dowad's GitHub page

A pure-JS JPEG decoder (for learning only, NOT production use)

Efficient, Immutable, Thread-Safe Collection classes for Ruby

106
6

Events

issue comment
Major overhaul of mbstring (part 26)

@kamil-tekiela Then mb_strtoupper will use your default internal character encoding... what does mb_internal_encoding return on your system?

Created at 1 week ago
issue comment
Major overhaul of mbstring (part 26)

@kamil-tekiela Thank you very much for testing! Could you share your test code?

Created at 1 week ago
issue comment
Major overhaul of mbstring (part 26)

In case they are interested... @Girgias @kocsismate

Created at 1 week ago
pull request opened
Major overhaul of mbstring (part 26)

Use the new (faster) encoding conversion code for case conversion functions like mb_convert_case, mb_strtoupper, and mb_strtolower. Speed increase is only about 50% for title casing, but 2-3x for other types of case conversion.

Fuzzed with libfuzzer. One bug in my first draft of the implementation was found, and a regression test added.

Note: the signature of one function with public symbol (php_unicode_convert_case) is changed. This could break C extensions which link directly to mbstring and call this function. However, none of the PECL extensions do so.

FYA @cmb69 @nikic @kamil-tekiela

Perhaps @mvorisek might be interested. Recently he raised some suggestions about how to make mb_strtoupper and mb_strtolower faster. This PR does not close the performance gap with strtoupper and strtolower, but at least makes it much smaller than it was.

Created at 1 week ago
create branch
alexdowad create branch cleanup-mbstring-26
Created at 1 week ago

Bump commonmarker from 0.23.5 to 0.23.6

Bumps commonmarker from 0.23.5 to 0.23.6.


updated-dependencies:

  • dependency-name: commonmarker dependency-type: indirect ...

Signed-off-by: dependabot[bot] support@github.com

Created at 1 week ago
pull request closed
Bump commonmarker from 0.23.5 to 0.23.6

Bumps commonmarker from 0.23.5 to 0.23.6.

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

Created at 1 week ago
issue comment
Fix GH-9535: The behavior of mb_strcut in mbstring has been changed in PHP8.1

Before I merge, do any others who are interested in mbstring want to say anything? @cmb69 @nikic @kamil-tekiela

Created at 1 week ago
issue comment
Fix GH-9535: The behavior of mb_strcut in mbstring has been changed in PHP8.1

I guess this fix will be merged into PHP-8.1 first, then from there to PHP-8.2, then from there to master.

If @NathanFreeman can update the description for the 2nd commit as requested, I could merge, or else any other core developer could do it...

Created at 1 week ago
issue comment
Fix GH-9535: The behavior of mb_strcut in mbstring has been changed in PHP8.1

@NathanFreeman Thank you very much for taking the time to add more tests. I strongly suspected that more testing would reveal something interesting. (It usually does!)

As mentioned, I am on a trip. Yesterday I took a bit of time to install everything needed on the laptop which I am carrying so that I can build PHP. I am just contemplating whether I should try to squeeze some time to step through this adjusted code in a debugger and understand the issue and fix in more detail. It would be easier once I get back home.

In any case, your code does fix a bug and you have added a number of additional tests, which gives a bit more confidence that the fix doesn't break anything. I am thinking that I may approve the fix for now...

I would just kindly like to ask that you amend the commit log message for the commit which adds the new tests. Right now we have two commits which are both called "fix php#9535".

Created at 1 week ago
issue comment
mb_convert_encoding "\" (backslash) and "~" (tilde) BC breaks to Shift_JIS-2004

@youkidearitai Thanks. Hopefully I might prepare a PR later today...

Thanks again.

Created at 1 week ago
issue comment
mb_convert_encoding "\" (backslash) and "~" (tilde) BC breaks to Shift_JIS-2004

@youkidearitai Are you aware of any other Shift-JIS-2004 mappings which have changed?

In this case, your concern is just about Unicode -> JIS mappings, not JIS -> Unicode, is that correct?

Created at 2 weeks ago
issue comment
mb_convert_encoding "\" (backslash) and "~" (tilde) BC breaks to Shift_JIS-2004

@cmb69 Agreed.

Created at 2 weeks ago
issue comment
Fix GH-9535: The behavior of mb_strcut in mbstring has been changed in PHP8.1

Hi, @NathanFreeman. Thanks very much for working on mbstring.

I am currently travelling and do not have a machine with me which is set up to build PHP, or I would pull your changes, build, and do some testing. In any case, the issue raised in #9535 is 100% a bug, and your changes do apparently fix it.

Our test suite for mb_strcut is obviously inadequate, or this bug would have been caught earlier. I would be more comfortable if, rather than just adding one test which covers this bug (but who knows how many other bugs are still there), the test suite was filled out to make it really complete. When I started working on mbstring, there were only a handful of tests for mb_strcut; I added a handful more, but they are not enough.

Best would be to add a test which generates several thousands (or 10,000s) of random strings, of different lengths, and in different encodings, runs them through mb_strcut using random indices, and checks to make sure that in all cases, the strings are cut at character boundaries (further, they should be the closest character boundaries on the left side of the requested cut points). I think you can use mb_str_split to find out where the character boundaries are; have a look at my existing test code in mb_strcut.phpt.

Set the RNG seed to a fixed value at the beginning of the test (with srand) so the results are consistent.

Does that sound like something which might be feasible? If not, please let me know. If you do it, it will be very interesting to see if any other bugs are found...

Created at 2 weeks ago
issue comment
Reintroduce legacy 'SJIS-win' text encoding in mbstring

Merged. Thanks, everyone.

Created at 1 month ago
pull request closed
Reintroduce legacy 'SJIS-win' text encoding in mbstring

Commit log message:

In e2459857af, I combined mbstring's "SJIS-win" text encoding
into CP932. This was done after doing some testing which appeared
to show that the mappings for "SJIS-win" were the same as those
for "CP932".

Later, it was found that there was actually a small difference
prior to e2459857af when converting Unicode to CP932. The
mappings for the following two codepoints were different:

        CP932  SJIS-win
U+203E  0x7E   0x81 0x50
U+00A5  0x5C   0x81 0x8F

As shown, mbstring's "CP932" mapped Unicode's 'OVERLINE' and
'YEN SIGN' to the ASCII bytes which have conflicting uses in
most legacy Japanese text encodings. "SJIS-win" mapped these
to equivalent JIS X 0208 fullwidth characters.

Since e2459867af was not intended to cause any user-visible
change in behavior, I am rolling back the merge of "CP932"
and "SJIS-win".

It seems doubtful whether these two text encodings should
be kept separate or merged in a future release. An extensive
discussion of the related historical background and
compatibility issues involved can be found in this
GitHub thread:

https://github.com/php/php-src/issues/8308

FYA @nikic @cmb69 @zonuexe @sj-i

Created at 1 month ago
delete branch
alexdowad delete branch sjiswin
Created at 1 month ago

Reintroduce legacy 'SJIS-win' text encoding in mbstring

In e2459857af, I combined mbstring's "SJIS-win" text encoding into CP932. This was done after doing some testing which appeared to show that the mappings for "SJIS-win" were the same as those for "CP932".

Later, it was found that there was actually a small difference prior to e2459857af when converting Unicode to CP932. The mappings for the following two codepoints were different:

    CP932  SJIS-win

U+203E 0x7E 0x81 0x50 U+00A5 0x5C 0x81 0x8F

As shown, mbstring's "CP932" mapped Unicode's 'OVERLINE' and 'YEN SIGN' to the ASCII bytes which have conflicting uses in most legacy Japanese text encodings. "SJIS-win" mapped these to equivalent JIS X 0208 fullwidth characters.

Since e2459867af was not intended to cause any user-visible change in behavior, I am rolling back the merge of "CP932" and "SJIS-win".

It seems doubtful whether these two text encodings should be kept separate or merged in a future release. An extensive discussion of the related historical background and compatibility issues involved can be found in this GitHub thread:

https://github.com/php/php-src/issues/8308

Merge branch 'PHP-8.1'

  • PHP-8.1: Reintroduce legacy 'SJIS-win' text encoding in mbstring
Created at 1 month ago

Reintroduce legacy 'SJIS-win' text encoding in mbstring

In e2459857af, I combined mbstring's "SJIS-win" text encoding into CP932. This was done after doing some testing which appeared to show that the mappings for "SJIS-win" were the same as those for "CP932".

Later, it was found that there was actually a small difference prior to e2459857af when converting Unicode to CP932. The mappings for the following two codepoints were different:

    CP932  SJIS-win

U+203E 0x7E 0x81 0x50 U+00A5 0x5C 0x81 0x8F

As shown, mbstring's "CP932" mapped Unicode's 'OVERLINE' and 'YEN SIGN' to the ASCII bytes which have conflicting uses in most legacy Japanese text encodings. "SJIS-win" mapped these to equivalent JIS X 0208 fullwidth characters.

Since e2459867af was not intended to cause any user-visible change in behavior, I am rolling back the merge of "CP932" and "SJIS-win".

It seems doubtful whether these two text encodings should be kept separate or merged in a future release. An extensive discussion of the related historical background and compatibility issues involved can be found in this GitHub thread:

https://github.com/php/php-src/issues/8308

Created at 1 month ago
issue comment
Major overhaul of mbstring (part 25)

Merged. Thanks all very much for the reviews. It was appreciated.

Created at 1 month ago