mysql varchar和char类型在不同字符集下的字符数和字节数计算

时间:2023-01-09 22:51:50

场景一:CHAR(30)的最大字符数和最大字节数计算

1.CHAR(30)在UTF-8字符集下计算最大字符数和最大字节数
(product)root@localhost [test]> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `c1` char(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

(product)root@localhost [test]> insert into t1 values(repeat('王',30));

Query OK, 1 row affected (0.00 sec)

(product)root@localhost [test]> select * from t1;
+--------------------------------------------------------------------------------------------+
| c1                                                                                         |
+--------------------------------------------------------------------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa                                                             |
| 王王王王王王王王王王王王王王王王王王王王王王王王王王王王王王                               |
+--------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

(product)root@localhost [test]> select char_length(c1) , length(c1) from t1;

+-----------------+------------+
| char_length(c1) | length(c1) |
+-----------------+------------+
|              30 |         30 |
|              30 |         90 |
+-----------------+------------+
2 rows in set (0.00 sec)

上面计算可看到CHAR(30)在UTF-8字符集下计算最大字符数和最大字节数分别是30和90(放入汉字时),而存入字母时分别为30和30。


2.CHAR(30)在GBK字符集下计算最大字符数和最大字节数

CREATE TABLE `t2` (
  `c1` char(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=gbk;

insert into t2 values(repeat('a',30));
insert into t2 values(repeat('王',30));

(product)root@localhost [test]> select * from t2;
+--------------------------------------------------------------------------------------------+
| c1                                                                                         |
+--------------------------------------------------------------------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa                                                             |
| 王王王王王王王王王王王王王王王王王王王王王王王王王王王王王王                               |
+--------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)


product)root@localhost [test]> select char_length(c1) , length(c1) from t2;
+-----------------+------------+
| char_length(c1) | length(c1) |
+-----------------+------------+
|              30 |         30 |
|              30 |         60 |
+-----------------+------------+

上面计算可看到CHAR(30)在UTF-8字符集下计算最大字符数和最大字节数分别是30和60(放入汉字时),而存入字母时分别为30和30。


3.CHAR(30)在latin1字符集下计算最大字符数和最大字节数
CREATE TABLE `t3` (
  `c1` char(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into t3 values(repeat('a',30));

(product)root@localhost [test]> select * from t3;
+--------------------------------+
| c1                             |
+--------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
+--------------------------------+
1 row in set (0.00 sec)

(product)root@localhost [test]> select char_length(c1) , length(c1) from t3;
+-----------------+------------+
| char_length(c1) | length(c1) |
+-----------------+------------+
|              30 |         30 |
+-----------------+------------+
1 row in set (0.00 sec)

上面计算可看到CHAR(30)在latin1字符集下计算最大字符数和最大字节数分别是30和30

4.CHAR(30)在latin1字符集下能存入UTF8编码的多少汉字,长度是多少

(product)root@localhost [test]> set names latin1;
Query OK, 0 rows affected (0.00 sec)

(product)root@localhost [test]>\s
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    latin1
Conn.  characterset:    latin1

(product)root@localhost [test]> insert into t3 values(repeat('王',10));
Query OK, 1 row affected (0.00 sec)

(product)root@localhost [test]>  select * from t3;;
+--------------------------------+
| c1                             |
+--------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
| 王王王王王王王王王王           |
+--------------------------------+
2 rows in set (0.00 sec)

product)root@localhost [test]> select char_length(c1) , length(c1) from t3;
+-----------------+------------+
| char_length(c1) | length(c1) |
+-----------------+------------+
|              30 |         30 |
|              30 |         30 |
+-----------------+------------+
2 rows in set (0.00 sec)

(product)root@localhost [test]> insert into t3 values(repeat('王',11));
ERROR 1406 (22001): Data too long for column 'c1' at row 1

上面计算可看到CHAR(30)在latin1字符集下能存入UTF8编码的10个汉字,长度是30

5.CHAR(30)在latin1字符集下能存入gbk编码的多少汉字,长度是多少

先在my.cnf文件中更改数据字符集:
character-set-server = gbk

(product)root@localhost [test]>\s
Server characterset:    gbk
Db     characterset:    gbk
Client characterset:    utf8
Conn.  characterset:    utf8


(product)root@localhost [test]> set names latin1;
Query OK, 0 rows affected (0.00 sec)

(product)root@localhost [test]>\s
Server characterset:    gbk
Db     characterset:    gbk
Client characterset:    latin1
Conn.  characterset:    latin1

CREATE TABLE `t4` (
  `c1` char(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into t4 values(repeat('a',30));
insert into t4 values(repeat('王',11));

(product)root@localhost [test]> insert into t4 values(repeat('王',15));
ERROR 1406 (22001): Data too long for column 'c1' at row 1

(product)root@localhost [test]> insert into t4 values(repeat('王',11));
ERROR 1406 (22001): Data too long for column 'c1' at row 1

(product)root@localhost [test]> insert into t4 values(repeat('王',10));
ERROR 1406 (22001): Data too long for column 'c1' at row 1

理认上插入15是可以的,但最多只能插入10,好怪。这里留个问号,以后用python来测试。
.

场景二:VARCHAR(30)的最大字符数和最大字节数计算

1.VARCHAR(30)在utf8字符集下计算最大字符数和最大字节数

CREATE TABLE `t5` (
  `c1` varchar(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into t5 values(repeat('a',30));
insert into t5 values(repeat('王',30));

(product)root@localhost [test]> select * from t5;
+--------------------------------------------------------------------------------------------+
| c1                                                                                         |
+--------------------------------------------------------------------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa                                                             |
| 王王王王王王王王王王王王王王王王王王王王王王王王王王王王王王                               |
+--------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

(product)root@localhost [test]> select char_length(c1) , length(c1) from t5;
+-----------------+------------+
| char_length(c1) | length(c1) |
+-----------------+------------+
|              30 |         30 |
|              30 |         90 |
+-----------------+------------+
2 rows in set (0.00 sec)
上面计算可看到VARCHAR(30)在UTF-8字符集下计算字节数与CHAR(30)有点不同,varchar实际所占长度计算规则为:长度在1到255间,实际所占字节数加1,长度在255到65535间,
实际所占字节数需要加2。则最大字符数和最大字节数分别是30和91(放入汉字时),而存入字母时分别为30和31。

2.VARCHAR(30)在gbk字符集下计算最大字符数和最大字节数

CREATE TABLE `t6` (
  `c1` varchar(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=gbk;

insert into t6 values(repeat('a',30));
insert into t6 values(repeat('王',30));

(product)root@localhost [test]> select * from t6;
+--------------------------------------------------------------------------------------------+
| c1                                                                                         |
+--------------------------------------------------------------------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa                                                             |
| 王王王王王王王王王王王王王王王王王王王王王王王王王王王王王王                               |
+--------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

(product)root@localhost [test]> select char_length(c1) , length(c1) from t6;
+-----------------+------------+
| char_length(c1) | length(c1) |
+-----------------+------------+
|              30 |         30 |
|              30 |         60 |
+-----------------+------------+
2 rows in set (0.00 sec)
上面计算可看到VARCHAR(30)在GBK字符集下计算字节数与CHAR(30)有点不同,varchar实际所占长度计算规则为:长度在1到255间,实际所占字节数加1,长度在255到65535间,实际所占字节数需要加2。则最大字符数和最大字节数分别是30和61(放入汉字时),而存入字母时分别为30和31。

3.VARCHAR(30)在latin1字符集下计算最大字符数和最大字节数

CREATE TABLE `t7` (
  `c1` varchar(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

set names latin1

insert into t7 values(repeat('a',30));
insert into t7 values(repeat('王',10));

(product)root@localhost [test]> insert into t7 values(repeat('王',10));
Query OK, 1 row affected (0.00 sec)

(product)root@localhost [test]> select * from t7;
+--------------------------------+
| c1                             |
+--------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
| 王王王王王王王王王王           |
+--------------------------------+
2 rows in set (0.00 sec)

(product)root@localhost [test]> select char_length(c1) , length(c1) from t7;
+-----------------+------------+
| char_length(c1) | length(c1) |
+-----------------+------------+
|              30 |         30 |
|              30 |         30 |
+-----------------+------------+
2 rows in set (0.00 sec)
上面计算可看到VARCHAR(30)在GBK字符集下计算字节数与CHAR(30)有点不同,varchar实际所占长度计算规则为:长度在1到255间,实际所占字节数加1,长度在255到65535间,实际所占字节数需要加2。则最大字符数和最大字节数分别是30和31(放入汉字时),而存入字母时分别为30和31。