($INBOX_DIR/description missing)
 help / color / mirror / Atom feed
From: "Sebert, Holger.ext" <holger.sebert.ext@karlstorz.com>
To: "Reyna, David" <david.reyna@windriver.com>,
	"toaster@lists.yoctoproject.org" <toaster@lists.yoctoproject.org>
Subject: Re: Database erros due to UTF-8 filenames
Date: Thu, 26 Nov 2020 17:32:32 +0000	[thread overview]
Message-ID: <46f0a394220a4858bff684e092ba35b8@karlstorz.com> (raw)
In-Reply-To: <BY5PR11MB416707D79A8F958CA323BCA2EAE30@BY5PR11MB4167.namprd11.prod.outlook.com>

Hi David,

as far as I can tell, Toaster doesn't set charset and collation by itself, but uses
the defaults of the server.

The problem can be solved by passing adequate parameters when starting up
the MySQL server, like so:

    docker run -dit --network host --name running-toaster-db toaster-db --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci

If this is the right solution, maybe we can put this somewhere in the documentation?

Best,
Holger
________________________________________
Von: Reyna, David <david.reyna@windriver.com>
Gesendet: Montag, 16. November 2020 14:17:53
An: Sebert, Holger.ext; toaster@lists.yoctoproject.org
Betreff: RE: Database erros due to UTF-8 filenames

Hi Holger,

This is an interesting problem. I will investigate.

We should see if there are any other localization fields that might have to support UTF-8 strings. Certainly all local path names will need to be supported.

I am also curious on how the local time zone support is working for you.

David

-----Original Message-----
From: toaster@lists.yoctoproject.org <toaster@lists.yoctoproject.org> On Behalf Of Sebert, Holger.ext
Sent: Monday, November 16, 2020 4:57 AM
To: toaster@lists.yoctoproject.org
Subject: [Toaster] Database erros due to UTF-8 filenames

Hi,

I've setup Toaster and a MySQL docker container, all running on Ubuntu 16.04.
I am encountering the following database error, when building my Yocto project:

        ERROR: (1366, "Incorrect string value: '\\xC5\\x91tan\\xC3...' for column 'path' at row 1")
        Traceback (most recent call last):
          File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
                return self.cursor.execute(sql, params)
          File "/usr/local/lib/python3.7/dist-packages/django/db/backends/mysql/base.py", line 71, in execute
                return self.cursor.execute(query, args)
          File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 206, in execute
                res = self._query(query)
          File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 319, in _query
                db.query(q)
          File "/usr/local/lib/python3.7/dist-packages/MySQLdb/connections.py", line 260, in query
                _mysql.connection.query(self, query)
        MySQLdb._exceptions.OperationalError: (1366, "Incorrect string value: '\\xC5\\x91tan\\xC3...' for column 'path' at row 1")

The query that raised this error looks as follows:

        INSERT INTO `orm_target_file`
                (`target_id`, `path`, `size`, `inodetype`, `permission`,
                `owner`, `group`, `directory_id`, `sym_target_id`)
        VALUES (19,
                '/usr/share/ca-certificates/mozilla/NetLock_Arany_=Class_Gold=_F\xc5\x91tan\xc3\xbas\xc3\xadtv\xc3\xa1ny.crt',
                1476, 1, 'rw-r--r--', 'root', 'root', NULL, NULL)

The file causing this error has the following UTF-8 encoded filename:

        NetLock_Arany_=Class_Gold=_Főtanúsítvány.crt

When looking into the database I found out that the column `path` of table
`orm_target_file` has the following properties:

        CHARACTER_SET_NAME: latin1
        COLLATION_NAME: latin1_swedish_ci

Apperently, the column `path` is not ready for UTF-8 strings. I can fix that
manually by doing the following mysql command using the `mysql` tool:

        ALTER TABLE orm_target_file
        CONVERT TO CHARACTER SET utf8
        COLLATE utf8_general_ci;

This change makes the database error disappear.

I would like to fix that directly in Toasters's `orm/models.py`. I found the
following definition in class `Target_File`:

    path = models.FilePathField()

It seems like I need to pass some clever options to `FilePathField`, but which?
My own research in that direction has brought up nothing useful so far.

My questions are thus:

* How can I parametrize `FilePathField` to properly handle UTF-8 encoded
  filenames in the underlying database?

* How should a correspondig migration file look like in `orm/migrations`?

Thanks!

Best,
Holger

      reply	other threads:[~2020-11-26 17:32 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-16 12:56 Database erros due to UTF-8 filenames holger.sebert.ext
2020-11-16 13:17 ` Reyna, David
2020-11-26 17:32   ` Sebert, Holger.ext [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46f0a394220a4858bff684e092ba35b8@karlstorz.com \
    --to=holger.sebert.ext@karlstorz.com \
    --cc=david.reyna@windriver.com \
    --cc=toaster@lists.yoctoproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).