Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-ost Error 1366: Incorrect string value modify charset #1249

Closed
atanthowiz opened this issue Jan 19, 2023 · 12 comments
Closed

gh-ost Error 1366: Incorrect string value modify charset #1249

atanthowiz opened this issue Jan 19, 2023 · 12 comments

Comments

@atanthowiz
Copy link

atanthowiz commented Jan 19, 2023

Hi,

I tried to run gh-ost to modify charset in specific column of table. Eventually it failed with a message like this

**Error 1366: Incorrect string value: '\xA0887\xA04...' for column 'comments' at row 1**

for my table definition:

CREATE TABLE `tbtest` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL DEFAULT '',
  `comments` text NOT NULL,
   PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

and this one for gh-ost command :

gh-ost --user="root" \
--password="xxxxxxx" \
--host="xxxxxxx" \
--port="3306" \
--allow-on-master \
--gcp \
--assume-rbr \
--alter="MODIFY name varchar(255) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',MODIFY comments text CHARACTER SET utf8mb4"

I have tried the gh-ost with latest version (1.1.15), v1.1.2, and v1.0.48, but that doesn't solve the problem. I dont know this is related to this issue 267 or not, but that issue solved with v1.0.21. but I can't use v1.0.2 since I using gcp and in v1.0.2 the -gcp flag is not available.

Thank you!

@jondrummond-codeo
Copy link

We are encountering the same issue as above when changing collation to utf8mb4, is there a workaround for this?

@atanthowiz
Copy link
Author

atanthowiz commented Feb 2, 2023

If I'm not mistaken, this error appears because the data contains char windows_1252 \x80-\xFF. if I convert char to hex, there is a difference between latin1 and utf8.

mysql> CREATE TABLE `tbghost1` ( `id` int(11) NOT NULL AUTO_INCREMENT, `remark` text, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.07 sec)

mysql> CREATE TABLE `tbghost2` ( `id` int(11) NOT NULL AUTO_INCREMENT, `remark` text, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.02 sec)

mysql> insert into tbghost1(remark) values('”');
Query OK, 1 row affected (0.01 sec)

mysql> insert into tbghost2(remark) values('”');
Query OK, 1 row affected (0.00 sec)

mysql> select id,remark,hex(remark) from tbghost1;
+----+--------+-------------+
| id | remark | hex(remark) |
+----+--------+-------------+
|  1 | ”      | 94          |
+----+--------+-------------+
1 row in set (0.01 sec)

mysql> select id,remark,hex(remark) from tbghost2;
+----+--------+-------------+
| id | remark | hex(remark) |
+----+--------+-------------+
|  1 | ”      | E2809D      |
+----+--------+-------------+
1 row in set (0.00 sec)

mysql>

@jondrummond-codeo
Copy link

It looks like a fix may have been done for this (which was merged to master in September 2022) - #1158 - but there has not been a release including this fix since then.

@atanthowiz
Copy link
Author

ah that's right, I've tried it for a small table, which has a size of ~2GB, and no problem appears. but for a large table with a size of ~70GB, I got an error incorrect string value for char \x80-\xFF, I don't know why.

@jondrummond-codeo
Copy link

We got ours to work using a different SQL statement in the gh-ost query, maybe try the following and see if it also works for you.

change column name name varchar(255) not null default '', character set utf8mb4 collate utf8mb4_unicode_ci

@atanthowiz
Copy link
Author

still error, if the column contains one of \x80-\xFF

@jondrummond-codeo
Copy link

I can also confirm that we encounter the issue when the table size is large, it currently fails on a ~900GB table we are attempting to do a migration on. 😢 If I take the offending row that it fails on and add to a temp table, then gh-ost runs and completes correctly.

@timvaillancourt
Copy link
Collaborator

It looks like a fix may have been done for this (which was merged to master in September 2022) - #1158 - but there has not been a release including this fix since then.

@jondrummond-codeo / @atanthowiz if possible, could you test on HEAD of master to validate #1158 fixes this? 🙇

@atanthowiz
Copy link
Author

atanthowiz commented Feb 8, 2023

I'm currently using this branch. previously I ran gh-ost with chunk-size=100 on high traffic and tried it also on low traffic, but always got error incorrect string value for char \x80-\xFF. And yesterday i tried to increase chunk-size to 500, and ran it on low traffic, and it worked, gh-ost ran successfully. I don't know, do the "chunk-size" and "low traffic" affect the gh-ost process?

updated:
but the gh-ost process is fails right now, when there's an update query to a row that has a column contains char \x80-\xFF.

@wangzihuacool
Copy link
Contributor

It has nothing to do with traffic and chunk-size. It may occurs , when an illegal latin1 character (such as windows_1252 \x80-\xFF) was inserted in a transaction just during changing the table.
#1158 is intended to fix the Error 1366 , without doing any characterset conversion.

@jondrummond-codeo
Copy link

Thanks @wangzihuacool - I see that #1158 was merged to master last year September, is there a plan to get this into a release any time soon or should we clone master and do the build ourselves to get around this?

@atanthowiz
Copy link
Author

I think this case is already solved.
I using this branch as mentioned before #1158 (comment), and I've tried migration several times, and it always works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants