Rechercher dans le manuel MySQL
10.4 Connection Character Sets and Collations
A “connection” is what a client program makes when it connects to the server, to begin a session within which it interacts with the server. The client sends SQL statements, such as queries, over the session connection. The server sends responses, such as result sets or error messages, over the connection back to the client.
Connection Character Set and Collation System Variables
Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections:
The
character_set_server
andcollation_server
system variables indicate the server character set and collation. See Section 10.3.2, “Server Character Set and Collation”.The
character_set_database
andcollation_database
system variables indicate the character set and collation of the default database. See Section 10.3.3, “Database Character Set and Collation”.
Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has session-specific connection-related character set and collation system variables. These session system variable values are initialized at connect time, but can be changed within the session.
Several questions about character set and collation handling for client connections can be answered in terms of system variables:
What character set are statements in when they leave the client?
The server takes the
character_set_client
system variable to be the character set in which statements are sent by the client.NoteSome character sets cannot be used as the client character set. See Impermissible Client Character Sets.
What character set should the server translate statements to after receiving them?
To determine this, the server uses the
character_set_connection
andcollation_connection
system variables:The server converts statements sent by the client from
character_set_client
tocharacter_set_connection
. Exception: For string literals that have an introducer such as_utf8mb4
or_latin2
, the introducer determines the character set. See Section 10.3.8, “Character Set Introducers”.collation_connection
is important for comparisons of literal strings. For comparisons of strings with column values,collation_connection
does not matter because columns have their own collation, which has a higher collation precedence (see Section 10.8.4, “Collation Coercibility in Expressions”).
What character set should the server translate query results to before shipping them back to the client?
The
character_set_results
system variable indicates the character set in which the server returns query results to the client. This includes result data such as column values, result metadata such as column names, and error messages.To tell the server to perform no conversion of result sets or error messages, set
character_set_results
toNULL
orbinary
:For more information about character sets and error messages, see Section 10.6, “Error Message Character Set”.
To see the values of the character set and collation system variables that apply to the current session, use this statement:
- 'character_set_client', 'character_set_connection',
- 'character_set_results', 'collation_connection'
The following simpler statements also display the connection variables, but include other related variables as well. They can be useful to see all character set and collation system variables:
Clients can fine-tune the settings for these variables, or depend on the defaults (in which case, you can skip the rest of this section). If you do not use the defaults, you must change the character settings for each connection to the server.
The character_set_client
system
variable cannot be set to certain character sets:
ucs2
utf16
utf16le
utf32
Attempting to use any of those character sets as the client character set produces an error:
- ERROR 1231 (42000): Variable 'character_set_client'
- can't be set to the value of 'ucs2'
The same error occurs if any of those character sets are used in
the following contexts, all of which result in an attempt to set
character_set_client
to the
named character set:
The
--default-character-set=
command option used by MySQL client programs such as mysql and mysqladmin.charset_name
The
SET NAMES '
statement.charset_name
'The
SET CHARACTER SET '
statement.charset_name
'
When a client connects to the server, it indicates which
character set it wants to use for communication with the server.
(Actually, the client indicates the default collation for that
character set, from which the server can determine the character
set.) The server uses this information to set the
character_set_client
,
character_set_results
,
character_set_connection
system
variables to the character set, and
collation_connection
to the
character set default collation. In effect, the server performs
the equivalent of a SET NAMES
operation.
If the server does not support the requested character set or collation, it falls back to using the server character set and collation to configure the connection. For additional detail about this fallback behavior, see Connection Character Set Error Handling.
The mysql, mysqladmin, mysqlcheck, mysqlimport, and mysqlshow client programs determine the default character set to use as follows:
In the absence of other information, each client uses the compiled-in default character set, usually
utf8mb4
.Each client can autodetect which character set to use based on the operating system setting, such as the value of the
LANG
orLC_ALL
locale environment variable on Unix systems or the code page setting on Windows systems. For systems on which the locale is available from the OS, the client uses it to set the default character set rather than using the compiled-in default. For example, settingLANG
toru_RU.KOI8-R
causes thekoi8r
character set to be used. Thus, users can configure the locale in their environment for use by MySQL clients.The OS character set is mapped to the closest MySQL character set if there is no exact match. If the client does not support the matching character set, it uses the compiled-in default. For example,
utf8
andutf-8
map toutf8mb4
, anducs2
is not supported as a connection character set, so it maps to the compiled-in default.C applications can use character set autodetection based on the OS setting by invoking
mysql_options()
as follows before connecting to the server:mysql_options(mysql, MYSQL_SET_CHARSET_NAME, MYSQL_AUTODETECT_CHARSET_NAME);
Each client supports a
--default-character-set
option, which enables users to specify the character set explicitly to override whatever default the client otherwise determines.NoteSome character sets cannot be used as the client character set. Attempting to use them with
--default-character-set
produces an error. See Impermissible Client Character Sets.
With the mysql client, to use a character set
different from the default, you could explicitly execute a
SET NAMES
statement every time
you connect to the server (see
Client Program Connection Character Set Configuration). To
accomplish the same result more easily, specify the character
set in your option file. For example, the following option file
setting changes the three connection-related character set
system variables set to koi8r
each time you
invoke mysql:
[mysql]
default-character-set=koi8r
If you are using the mysql client with
auto-reconnect enabled (which is not recommended), it is
preferable to use the charset
command rather
than SET NAMES
. For example:
The charset
command issues a
SET NAMES
statement, and also
changes the default character set that mysql
uses when it reconnects after the connection has dropped.
When configuration client programs, you must also consider the environment within which they execute. See Section 10.5, “Configuring Application Character Set and Collation”.
After a connection has been established, clients can change the
character set and collation system variables for the current
session. These variables can be changed individually using
SET
statements, but two more convenient statements affect the
connection-related character set sytem variables as a group:
SET NAMES '
charset_name
' [COLLATE 'collation_name
']SET NAMES
indicates what character set the client will use to send SQL statements to the server. Thus,SET NAMES 'cp1251'
tells the server, “future incoming messages from this client are in character setcp1251
.” It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use aSELECT
statement that produces a result set.)A
SET NAMES '
statement is equivalent to these three statements:charset_name
'Setting
character_set_connection
tocharset_name
also implicitly setscollation_connection
to the default collation forcharset_name
. It is unnecessary to set that collation explicitly. To specify a particular collation to use forcollation_connection
, add aCOLLATE
clause:SET CHARACTER SET '
'charset_name
SET CHARACTER SET
is similar toSET NAMES
but setscharacter_set_connection
andcollation_connection
tocharacter_set_database
andcollation_database
(which, as mentioned previously, indicate the character set and collation of the default database).A
SET CHARACTER SET
statement is equivalent to these three statements:charset_name
Setting
collation_connection
also implicitly setscharacter_set_connection
to the character set associated with the collation (equivalent to executingSET character_set_connection = @@character_set_database
). It is unnecessary to setcharacter_set_connection
explicitly.
Some character sets cannot be used as the client character
set. Attempting to use them with SET
NAMES
or SET CHARACTER
SET
produces an error. See
Impermissible Client Character Sets.
Example: Suppose that column1
is defined as
CHAR(5) CHARACTER SET latin2
. If you do not
say SET NAMES
or
SET CHARACTER SET
, then for
SELECT column1 FROM t
, the server sends back
all the values for column1
using the
character set that the client specified when it connected. On
the other hand, if you say SET NAMES 'latin1'
or SET CHARACTER SET 'latin1'
before issuing
the SELECT
statement, the server
converts the latin2
values to
latin1
just before sending results back.
Conversion may be lossy for characters that are not in both
character sets.
Attempts to use an inappropriate connection character set or collation can produce an error, or cause the server to fall back to its default character set and collation for a given connection. This section describes problems that can occur when configuring the connection character set. These problems can occur when establishing a connection or when changing the character set within an established connection.
Connect-Time Error Handling
Some character sets cannot be used as the client character set; see Impermissible Client Character Sets. If you specify a character set that is valid but not permitted as a client character set, the server returns an error:
shell> mysql --default-character-set=ucs2
ERROR 1231 (42000): Variable 'character_set_client' can't be set to
the value of 'ucs2'
If you specify a character set that the client does not recognize, it produces an error:
shell> mysql --default-character-set=bogus
mysql: Character set 'bogus' is not a compiled character set and is
not specified in the '/usr/local/mysql/share/charsets/Index.xml' file
ERROR 2019 (HY000): Can't initialize character set bogus
(path: /usr/local/mysql/share/charsets/)
If you specify a character set that the client recognizes but
the server does not, the server falls back to its default
character set and collation. Suppose that the server is
configured to use latin1
and
latin1_swedish_ci
as its defaults, and that
it does not recognize gb18030
as a valid
character set. A client that specifies
--default-character-set=gb18030
is
able to connect to the server, but the resulting character set
is not what the client wants:
- +--------------------------+--------+
- +--------------------------+--------+
- | character_set_client | latin1 |
- | character_set_connection | latin1 |
- ...
- | character_set_results | latin1 |
- ...
- +--------------------------+--------+
- +----------------------+-------------------+
- +----------------------+-------------------+
- | collation_connection | latin1_swedish_ci |
- +----------------------+-------------------+
You can see that the connection system variables have been set
to reflect a character set and collation of
latin1
and
latin1_swedish_ci
. This occurs because the
server cannot satisfy the client character set request and falls
back to its defaults.
In this case, the client cannot use the character set that it wants because the server does not support it. The client must either be willing to use a different character set, or connect to a different server that supports the desired character set.
The same problem occurs in a more subtle context: When the
client tells the server to use a character set that the server
recognizes, but the default collation for that character set on
the client side is not known on the server side. This occurs,
for example, when a MySQL 8.0 client wants to connect to a MySQL
5.7 server using utf8mb4
as the client
character set. A client that specifies
--default-character-set=utf8mb4
is
able to connect to the server. However, as in the previous
example, the server falls back to its default character set and
collation, not what the client requested:
- +--------------------------+--------+
- +--------------------------+--------+
- | character_set_client | latin1 |
- | character_set_connection | latin1 |
- ...
- | character_set_results | latin1 |
- ...
- +--------------------------+--------+
- +----------------------+-------------------+
- +----------------------+-------------------+
- | collation_connection | latin1_swedish_ci |
- +----------------------+-------------------+
Why does this occur? After all, utf8mb4
is
known to the 8.0 client and the 5.7 server, so both of them
recognize it. To understand this behavior, it is necessary to
understand that when the client tells the server which character
set it wants to use, it really tells the server the default
collation for that character set. Therefore, the aforementioned
behavior occurs due to a combination of factors:
The default collation for
utf8mb4
differs between MySQL 5.7 and 8.0 (utf8mb4_general_ci
for 5.7,utf8mb4_0900_ai_ci
for 8.0).When the 8.0 client requests a character set of
utf8mb4
, what it sends to the server is the default 8.0utf8mb4
collation; that is, theutf8mb4_0900_ai_ci
.utf8mb4_0900_ai_ci
is implemented only as of MySQL 8.0, so the 5.7 server does not recognize it.Because the 5.7 server does not recognize
utf8mb4_0900_ai_ci
, it cannot satisfy the client character set request, and falls back to its default character set and collation (latin1
andlatin1_swedish_ci
).
In this case, the client can still use
utf8mb4
by issuing a SET NAMES
'utf8mb4'
statement after connecting. The resulting
collation is the 5.7 default utf8mb4
collation; that is, utf8mb4_general_ci
. If
the client additionally wants a collation of
utf8mb4_0900_ai_ci
, it cannot achieve that
because the server does not recognize that collation. The client
must either be willing to use a different
utf8mb4
collation, or connect to a server
from MySQL 8.0 or higher.
Runtime Error Handling
Within an established connection, the client can request a
change of connection character set and collation with
SET NAMES
or
SET CHARACTER SET
.
Some character sets cannot be used as the client character set; see Impermissible Client Character Sets. If you specify a character set that is valid but not permitted as a client character set, the server returns an error:
- ERROR 1231 (42000): Variable 'character_set_client' can't be set to
- the value of 'ucs2'
If the server does not recognize the character set (or the collation), it produces an error:
A client that wants to verify whether its requested character set was honored by the server can execute the following statement after connecting and checking that the result is the expected character set:
- SELECT @@character_set_client;
Traduction non disponible
Le manuel MySQL n'est pas encore traduit en français sur l'infobrol. Seule la version anglaise est disponible pour l'instant.
Document créé le 26/06/2006, dernière modification le 26/10/2018
Source du document imprimé : https://www.gaudry.be/mysql-rf-charset-connection.html
L'infobrol est un site personnel dont le contenu n'engage que moi. Le texte est mis à disposition sous licence CreativeCommons(BY-NC-SA). Plus d'info sur les conditions d'utilisation et sur l'auteur.
Références
Ces références et liens indiquent des documents consultés lors de la rédaction de cette page, ou qui peuvent apporter un complément d'information, mais les auteurs de ces sources ne peuvent être tenus responsables du contenu de cette page.
L'auteur de ce site est seul responsable de la manière dont sont présentés ici les différents concepts, et des libertés qui sont prises avec les ouvrages de référence. N'oubliez pas que vous devez croiser les informations de sources multiples afin de diminuer les risques d'erreurs.