Topic: InvalidStringData: strings in documents must be valid UTF-8

Hello again,

I am trying to import the el8 modular repo using pulp 2.21 and it's giving me the following error.   The first character \x96 is not valid UTF-8. 

Unfortunately I cannot tell which file this is coming from in the repodata and am hoping you may have a hint and possibly a fix.

InvalidStringData: strings in documents must be valid UTF-8: '\x96\x00\x00\x00\x04remi-8.0\x00\x13\x00\x00\x00\x020\x00\x07\x00\x00\x00common\x00\x00\x04remi-7.4\x00\x13\x00\x00\x00\x020\x00\x07\x00\x00\x00common\x00\x00\x04remi-7.3\x00\x13\x00\x00\x00\x020\x00\x07\x00\x00\x00common\x00\x00\x04remi-8.1\x00\x13\x00\x00\x00\x020\x00\x07\x00\x00\x00common\x00\x00\x04remi-7.2\x00\x13\x00\x00\x00\x020\x00\x07\x00\x00\x00common\x00\x00\x00'

Thank you

Tom

Re: InvalidStringData: strings in documents must be valid UTF-8

Sorry, I don't know pulp, and don't even know if it supports modular repository...


P.S.1 but of course the modular metadata seems fine to me, as can be consumed by dnf and all users of the repository.
P.S.2 I also notice that pulp have been removed from Fedora repository (at F30 time, 3 years ago)

Desktop: Fedora 35 + rpmfusion + remi-test + remi-dev
Laptop:  Fedora 34 + rpmfusion + remi (SCL only)
Hosting Server: CentOS 8 Stream with EPEL, rpmfusion, remi

Re: InvalidStringData: strings in documents must be valid UTF-8

I don't see any significant difference in data from official repo

---
document: modulemd-defaults
version: 1
data:
  module: php
  stream: "7.2"
  profiles:
    7.2: [common]
    7.3: [common]
    7.4: [common]
    8.0: [common]
...

and from my repo

---
document: modulemd-defaults
version: 1
data:
  module: php
  stream: "7.2"
  profiles:
    remi-7.2: [common]
    remi-7.3: [common]
    remi-7.4: [common]
    remi-8.0: [common]
    remi-8.1: [common]
...

P.S. only diff was missing "" around default stream, which I just fix.

Desktop: Fedora 35 + rpmfusion + remi-test + remi-dev
Laptop:  Fedora 34 + rpmfusion + remi (SCL only)
Hosting Server: CentOS 8 Stream with EPEL, rpmfusion, remi

Re: InvalidStringData: strings in documents must be valid UTF-8

Thank you for investigating. 

I know pulp 2 is old.  The server is running CentOS7 and that is the version that ships for EL7. 

It doesn't support modular repos explicitly but doesn't choke either.  I have mirrored all of AlmaLinux8 and CentOS8 with no issues other than having to ensure that the retention of old packages versions is sufficient to not delete the rpms that belong to older modules. 

I will check to see if your fix changed anything on my end and if not will keep investigating. 

All of the files you are serving are supposed to be encoded UTF-8?  Some of the forums I have been reading suggest that the problem can occur if the file is in a different encoding (Latin1 for example) and is being read as UTF-8. 

I do really appreciate your time (and the work you do on your repos)

Thank you.

Re: InvalidStringData: strings in documents must be valid UTF-8

> All of the files you are serving are supposed to be encoded UTF-8?

Yes.

Desktop: Fedora 35 + rpmfusion + remi-test + remi-dev
Laptop:  Fedora 34 + rpmfusion + remi (SCL only)
Hosting Server: CentOS 8 Stream with EPEL, rpmfusion, remi