Topic: CentOS 6 + PHP 5.6 JSON Troubles

Thanks for the great repository. It's been a life saver for our CentOS 6 installation.

I am currently in the process of upgrading a legacy system from PHP 5.4 to PHP 5.6. Upon completing the upgrade, I noticed some behavior differed in json_decode. Specifically, how json_decode reacts to invalid "\u" unicode escape sequences. I finally traced the difference to the PHP json extension differences.

In PHP 5.4 Remi, it appears to use the built-in JSON support (or at least I don't see any RPMs providing the JSON support). This shows within PHP as version 1.2.1, which I assume is the legacy PECL extension: pecl/package/json
In PHP 5.6 Remi, it uses the jsonc module, which comes in via the php-pecl-jsonc package. This shows within PHP as version 1.3.10, so this corresponds to the newer jsonc extension: peclpackage/jsonc

I am wondering if it is possible within the PHP 5.6 Remi environment to "downgrade" to the PHP JSON 1.2.1 behavior that we were seeing in the 5.4 environment. I tried to phpize & compile the official 1.2.1 json pecl package, but it ran into a compilation error:
/root/json-1.2.1/json.c:44: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘json_functions’
/root/json-1.2.1/json.c:58: error: ‘json_functions’ undeclared here (not in a function)

Here are the exact versions I am using:

CentOS release 6.9 x86_64
PHP 5.4.45 w/ JSON 1.2.1 -- this is coming from /enterprise/remi-release-6.rpm && remi-php54
PHP 5.6.31 w/ JSONC 1.3.10 -- this is coming from /enterprise/remi-release-6.rpm && remi-php56

In case you are curious, the JSON discrepancy I am seeing is the following.

php > echo phpversion();
5.4.45
php > echo phpversion('json');
1.2.1
php > $json = '"\ud88c\u2c2ehello\ud88c"';
php > $decoded = json_decode($json);
php > var_dump($decoded);
php shell code:1:
string(14) "���Ⱞhello���"
php > echo bin2hex($decoded), "\n";
eda28ce2b0ae68656c6c6feda28c

php > echo phpversion();
5.6.31
php > echo phpversion('json');
1.3.10
php > $json = '"\ud88c\u2c2ehello\ud88c"';
php > $decoded = json_decode($json);
php > var_dump($decoded);
php shell code:1:
string(14) "�Ⱞhello�"
php > echo bin2hex($decoded), "\n";
efbfbde2b0ae68656c6c6fefbfbd

Re: CentOS 6 + PHP 5.6 JSON Troubles

Long story.

json extension in PHP 5 was not a free software, so starting with PHP 5.5 (when we discover this issue) we replace it with a free implementation with some behavior changes.
=> https://pecl.php.net/package/jsonc

This is not perfect, but this is free.

PHP 7+ now have a new free "json" extension (version 1.4+)
=> https://pecl.php.net/package/jsond

Trying your sample with recent PHP versions

$ php56 -r 'var_dump(json_decode("\"\ud88c\u2c2ehello\ud88c\""), json_last_error_msg(), phpversion("json"));'
string(14) "�Ⱞhello�"
string(8) "No error"
string(6) "1.3.10"

$ php70 -r 'var_dump(json_decode("\"\ud88c\u2c2ehello\ud88c\""), json_last_error_msg(), phpversion("json"));'
NULL
string(50) "Single unpaired UTF-16 surrogate in unicode escape"
string(5) "1.4.0"

$ php71 -r 'var_dump(json_decode("\"\ud88c\u2c2ehello\ud88c\""), json_last_error_msg(), phpversion("json"));'
NULL
string(50) "Single unpaired UTF-16 surrogate in unicode escape"
string(5) "1.5.0"

$ php72 -r 'var_dump(json_decode("\"\ud88c\u2c2ehello\ud88c\""), json_last_error_msg(), phpversion("json"));'
NULL
string(50) "Single unpaired UTF-16 surrogate in unicode escape"
string(5) "1.6.0"

So, your input doesn't seems valid, so answer is mostly "undefined behavior"

Laptop:  Fedora 38 + rpmfusion + remi (SCL only)
x86_64 builder: Fedora 39 + rpmfusion + remi-test
aarch64 builder: RHEL 9 with EPEL
Hosting Server: CentOS 8 Stream with EPEL, rpmfusion, remi

Re: CentOS 6 + PHP 5.6 JSON Troubles

Notice that you can take benefit of newer extension with old PHP.

$ php56  -r 'var_dump(jsond_decode("\"\ud88c\u2c2ehello\ud88c\""), jsond_last_error_msg(), phpversion("jsond"));'
NULL
string(50) "Single unpaired UTF-16 surrogate in unicode escape"
string(5) "1.4.0"

BTW, this use different function names.

P.S. I just update jsond extension to latest version 1.4.0 in the repository.

Laptop:  Fedora 38 + rpmfusion + remi (SCL only)
x86_64 builder: Fedora 39 + rpmfusion + remi-test
aarch64 builder: RHEL 9 with EPEL
Hosting Server: CentOS 8 Stream with EPEL, rpmfusion, remi

Re: CentOS 6 + PHP 5.6 JSON Troubles

> BTW, this use different function names.

$ php56 -r '
<?php

function json_decode($json, $assoc=false, $depth=512, $opt=0) {
  return jsond_decode($json, $assoc, $depth, $opt);
}
function json_encode($value, $options=0, $depth=512) {
  return jsond_encode($value, $options, $depth);
}
function json_last_error_msg() {
  return jsond_last_error_msg();
}

var_dump(json_encode("foo"), json_decode("\"\ud88c\u2c2ehello\ud88c\""), json_last_error_msg(), phpversion("jsond"));
'
string(5) ""foo""
NULL
string(50) "Single unpaired UTF-16 surrogate in unicode escape"
string(5) "1.4.0"

wink

Laptop:  Fedora 38 + rpmfusion + remi (SCL only)
x86_64 builder: Fedora 39 + rpmfusion + remi-test
aarch64 builder: RHEL 9 with EPEL
Hosting Server: CentOS 8 Stream with EPEL, rpmfusion, remi

Re: CentOS 6 + PHP 5.6 JSON Troubles

Fantastic! Thanks for the thorough reply. I'll give the latest jsond extension a try. I think the behavior of detecting the invalid character will suffice for our use case. And it matches PHP 7+ behavior, so whenever we upgrade to that we can expect similar results.