Rechercher dans le manuel MySQL
13.2.7 LOAD DATA Syntax
- INFILE 'file_name'
- ]
- ]
- [(col_name_or_user_var
- [, col_name_or_user_var] ...)]
The LOAD DATA
statement reads rows
from a text file into a table at a very high speed.
LOAD DATA
is the complement of
SELECT ... INTO
OUTFILE
. (See Section 13.2.10.1, “SELECT ... INTO Syntax”.) To write
data from a table to a file, use
SELECT ... INTO
OUTFILE
. To read the file back into a table, use
LOAD DATA
. The syntax of the
FIELDS
and LINES
clauses is
the same for both statements.
You can also load data files by using the
mysqlimport utility; see
Section 4.5.5, “mysqlimport — A Data Import Program”. mysqlimport
operates by sending a LOAD DATA
statement to the server.
For more information about the efficiency of
INSERT
versus
LOAD DATA
and speeding up
LOAD DATA
, see
Section 8.2.5.1, “Optimizing INSERT Statements”.
Partitioned Table Support
LOAD DATA
supports explicit
partition selection using the PARTITION
option with a list of one or more comma-separated names of
partitions, subpartitions, or both. When this option is used, if
any rows from the file cannot be inserted into any of the
partitions or subpartitions named in the list, the statement
fails with the error Found a row not matching the
given partition set. For more information and
examples, see Section 23.5, “Partition Selection”.
For partitioned tables using storage engines that employ table
locks, such as MyISAM
,
LOAD DATA
cannot prune any
partition locks. This does not apply to tables using storage
engines that employ row-level locking, such as
InnoDB
. For more information, see
Partitioning and Locking.
The file name must be given as a literal string. On Windows,
specify backslashes in path names as forward slashes or doubled
backslashes. The
character_set_filesystem
system
variable controls the interpretation of the file name character
set.
The server uses the character set indicated by the
character_set_database
system
variable to interpret the information in the file.
SET NAMES
and the setting of
character_set_client
do not
affect interpretation of input. If the contents of the input
file use a character set that differs from the default, it is
usually preferable to specify the character set of the file by
using the CHARACTER SET
clause. A character
set of binary
specifies “no
conversion.”
LOAD DATA
interprets all fields
in the file as having the same character set, regardless of the
data types of the columns into which field values are loaded.
For proper interpretation of file contents, you must ensure that
it was written with the correct character set. For example, if
you write a data file with mysqldump -T or by
issuing a SELECT
... INTO OUTFILE
statement in
mysql, be sure to use a
--default-character-set
option so that output
is written in the character set to be used when the file is
loaded with LOAD DATA
.
It is not possible to load data files that use the
ucs2
, utf16
,
utf16le
, or utf32
character set.
LOAD DATA
is considered unsafe
for statement-based replication. If you do use
LOAD DATA
when
binlog_format=STATEMENT
is set,
a temporary file containing the data is created on the
replication slave where the changes are applied. If binary log
encryption is active on the server, note that this temporary
file is not encrypted. When encryption is required, be sure to
use row-based or mixed binary logging format instead, which do
not create the temporary files. For more information on the
interaction between LOAD DATA
and
replication, see
Section 17.4.1.19, “Replication and LOAD DATA”.
If you use the LOW_PRIORITY
modifier,
execution of the LOAD DATA
statement is delayed until no other clients are reading from the
table. This affects only storage engines that use only
table-level locking (such as MyISAM
,
MEMORY
, and MERGE
).
If you specify the CONCURRENT
modifier with a
MyISAM
table that satisfies the condition for
concurrent inserts (that is, it contains no free blocks in the
middle), other threads can retrieve data from the table while
LOAD DATA
is executing. This
modifier affects the performance of LOAD
DATA
a bit, even if no other thread is using the table
at the same time.
The LOCAL
modifier affects expected location
of the file and error handling, as described later.
LOCAL
works only if your server and your
client both have been configured to permit it. For example, if
mysqld was started with the
local_infile
system variable
disabled, LOCAL
does not work. See
Section 6.1.6, “Security Issues with LOAD DATA LOCAL”.
The LOCAL
modifier affects where the file is
expected to be found:
If
LOCAL
is specified, the file is read by the client program on the client host and sent to the server. The file can be given as a full path name to specify its exact location. If given as a relative path name, the name is interpreted relative to the directory in which the client program was started.When using
LOCAL
withLOAD DATA
, a copy of the file is created in the directory where the MySQL server stores temporary files. See Section B.4.3.5, “Where MySQL Stores Temporary Files”. Lack of sufficient space for the copy in this directory can cause theLOAD DATA LOCAL
statement to fail.If
LOCAL
is not specified, the file must be located on the server host and is read directly by the server. The server uses the following rules to locate the file:If the file name is an absolute path name, the server uses it as given.
If the file name is a relative path name with one or more leading components, the server searches for the file relative to the server's data directory.
If a file name with no leading components is given, the server looks for the file in the database directory of the default database.
In the non-LOCAL
case, these rules mean that
a file named as ./myfile.txt
is read from
the server's data directory, whereas the file named as
myfile.txt
is read from the database
directory of the default database. For example, if
db1
is the default database, the following
LOAD DATA
statement reads the
file data.txt
from the database directory
for db1
, even though the statement explicitly
loads the file into a table in the db2
database:
The server also uses the non-LOCAL
rules to
locate .sdi
files for the
IMPORT TABLE
statement.
Non-LOCAL
load operations read text files
located on the server. For security reasons, such operations
require that you have the FILE
privilege. See Section 6.2.2, “Privileges Provided by MySQL”. Also,
non-LOCAL
load operations are subject to the
secure_file_priv
system
variable setting. If the variable value is a nonempty directory
name, the file to be loaded must be located in that directory.
If the variable value is empty (which is insecure), the file
need only be readable by the server.
Using LOCAL
is a bit slower than letting the
server access the files directly, because the file contents must
be sent over the connection by the client to the server. On the
other hand, you do not need the
FILE
privilege to load local
files.
LOCAL
also affects error handling:
With
LOAD DATA
, data-interpretation and duplicate-key errors terminate the operation.With
LOAD DATA LOCAL
, data-interpretation and duplicate-key errors become warnings and the operation continues because the server has no way to stop transmission of the file in the middle of the operation. For duplicate-key errors, this is the same as ifIGNORE
is specified.IGNORE
is explained further later in this section.
The REPLACE
and IGNORE
modifiers control handling of input rows that duplicate existing
rows on unique key values:
If you specify
REPLACE
, input rows replace existing rows. In other words, rows that have the same value for a primary key or unique index as an existing row. See Section 13.2.9, “REPLACE Syntax”.If you specify
IGNORE
, rows that duplicate an existing row on a unique key value are discarded. For more information, see Comparison of the IGNORE Keyword and Strict SQL Mode.If you do not specify either modifier, the behavior depends on whether the
LOCAL
modifier is specified. WithoutLOCAL
, an error occurs when a duplicate key value is found, and the rest of the text file is ignored. WithLOCAL
, the default behavior is the same as ifIGNORE
is specified; this is because the server has no way to stop transmission of the file in the middle of the operation.
To ignore foreign key constraints during the load operation,
execute a SET foreign_key_checks = 0
statement before executing LOAD
DATA
.
If you use LOAD DATA
on an empty
MyISAM
table, all nonunique indexes are
created in a separate batch (as for REPAIR
TABLE
). Normally, this makes LOAD
DATA
much faster when you have many indexes. In some
extreme cases, you can create the indexes even faster by turning
them off with ALTER
TABLE ... DISABLE KEYS
before loading the file into
the table and using
ALTER TABLE ...
ENABLE KEYS
to re-create the indexes after loading the
file. See Section 8.2.5.1, “Optimizing INSERT Statements”.
For both the LOAD DATA
and
SELECT ... INTO
OUTFILE
statements, the syntax of the
FIELDS
and LINES
clauses
is the same. Both clauses are optional, but
FIELDS
must precede LINES
if both are specified.
If you specify a FIELDS
clause, each of its
subclauses (TERMINATED BY
,
[OPTIONALLY] ENCLOSED BY
, and
ESCAPED BY
) is also optional, except that you
must specify at least one of them. Arguments to these clauses
are permitted to contain only ASCII characters.
If you specify no FIELDS
or
LINES
clause, the defaults are the same as if
you had written this:
Backslash is the MySQL escape character within strings in SQL
statements. Thus, to specify a literal backslash, you must
specify two backslashes for the value to be interpreted as a
single backslash. The escape sequences '\t'
and '\n'
specify tab and newline characters,
respectively.
In other words, the defaults cause LOAD
DATA
to act as follows when reading input:
Look for line boundaries at newlines.
Do not skip any line prefix.
Break lines into fields at tabs.
Do not expect fields to be enclosed within any quoting characters.
Interpret characters preceded by the escape character
\
as escape sequences. For example,\t
,\n
, and\\
signify tab, newline, and backslash, respectively. See the discussion ofFIELDS ESCAPED BY
later for the full list of escape sequences.
Conversely, the defaults cause
SELECT ... INTO
OUTFILE
to act as follows when writing output:
Write tabs between fields.
Do not enclose fields within any quoting characters.
Use
\
to escape instances of tab, newline, or\
that occur within field values.Write newlines at the ends of lines.
For a text file generated on a Windows system, proper file
reading might require LINES TERMINATED BY
'\r\n'
because Windows programs typically use two
characters as a line terminator. Some programs, such as
WordPad, might use \r
as
a line terminator when writing files. To read such files, use
LINES TERMINATED BY '\r'
.
If all the input lines have a common prefix that you want to
ignore, you can use LINES STARTING BY
'
to skip the
prefix and anything before it. If a line
does not include the prefix, the entire line is skipped. Suppose
that you issue the following statement:
prefix_string
'
If the data file looks like this:
xxx"abc",1
something xxx"def",2
"ghi",3
The resulting rows will be ("abc",1)
and
("def",2)
. The third row in the file is
skipped because it does not contain the prefix.
The IGNORE
option can be used to ignore lines at the start
of the file. For example, you can use number
LINESIGNORE 1
LINES
to skip an initial header line containing column
names:
When you use SELECT
... INTO OUTFILE
in tandem with
LOAD DATA
to write data from a
database into a file and then read the file back into the
database later, the field- and line-handling options for both
statements must match. Otherwise, LOAD
DATA
will not interpret the contents of the file
properly. Suppose that you use
SELECT ... INTO
OUTFILE
to write a file with fields delimited by
commas:
To read the comma-delimited file, the correct statement would be:
If instead you tried to read the file with the statement shown
following, it wouldn't work because it instructs
LOAD DATA
to look for tabs
between fields:
The likely result is that each input line would be interpreted as a single field.
LOAD DATA
can be used to read
files obtained from external sources. For example, many programs
can export data in comma-separated values (CSV) format, such
that lines have fields separated by commas and enclosed within
double quotation marks, with an initial line of column names. If
the lines in such a file are terminated by carriage
return/newline pairs, the statement shown here illustrates the
field- and line-handling options you would use to load the file:
If the input values are not necessarily enclosed within
quotation marks, use OPTIONALLY
before the
ENCLOSED BY
option.
Any of the field- or line-handling options can specify an empty
string (''
). If not empty, the
FIELDS [OPTIONALLY] ENCLOSED BY
and
FIELDS ESCAPED BY
values must be a single
character. The FIELDS TERMINATED BY
,
LINES STARTING BY
, and LINES
TERMINATED BY
values can be more than one character.
For example, to write lines that are terminated by carriage
return/linefeed pairs, or to read a file containing such lines,
specify a LINES TERMINATED BY '\r\n'
clause.
To read a file containing jokes that are separated by lines
consisting of %%
, you can do this
FIELDS [OPTIONALLY] ENCLOSED BY
controls
quoting of fields. For output
(SELECT ... INTO
OUTFILE
), if you omit the word
OPTIONALLY
, all fields are enclosed by the
ENCLOSED BY
character. An example of such
output (using a comma as the field delimiter) is shown here:
"1","a string","100.20"
"2","a string containing a , comma","102.20"
"3","a string containing a \" quote","102.20"
"4","a string containing a \", quote and comma","102.20"
If you specify OPTIONALLY
, the
ENCLOSED BY
character is used only to enclose
values from columns that have a string data type (such as
CHAR
,
BINARY
,
TEXT
, or
ENUM
):
1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a \" quote",102.20
4,"a string containing a \", quote and comma",102.20
Occurrences of the ENCLOSED BY
character
within a field value are escaped by prefixing them with the
ESCAPED BY
character. Also, if you specify an
empty ESCAPED BY
value, it is possible to
inadvertently generate output that cannot be read properly by
LOAD DATA
. For example, the
preceding output just shown would appear as follows if the
escape character is empty. Observe that the second field in the
fourth line contains a comma following the quote, which
(erroneously) appears to terminate the field:
1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a " quote",102.20
4,"a string containing a ", quote and comma",102.20
For input, the ENCLOSED BY
character, if
present, is stripped from the ends of field values. (This is
true regardless of whether OPTIONALLY
is
specified; OPTIONALLY
has no effect on input
interpretation.) Occurrences of the ENCLOSED
BY
character preceded by the ESCAPED
BY
character are interpreted as part of the current
field value.
If the field begins with the ENCLOSED BY
character, instances of that character are recognized as
terminating a field value only if followed by the field or line
TERMINATED BY
sequence. To avoid ambiguity,
occurrences of the ENCLOSED BY
character
within a field value can be doubled and are interpreted as a
single instance of the character. For example, if
ENCLOSED BY '"'
is specified, quotation marks
are handled as shown here:
"The ""BIG"" boss" -> The "BIG" boss
The "BIG" boss -> The "BIG" boss
The ""BIG"" boss -> The ""BIG"" boss
FIELDS ESCAPED BY
controls how to read or
write special characters:
For input, if the
FIELDS ESCAPED BY
character is not empty, occurrences of that character are stripped and the following character is taken literally as part of a field value. Some two-character sequences that are exceptions, where the first character is the escape character. These sequences are shown in the following table (using\
for the escape character). The rules forNULL
handling are described later in this section.For more information about
\
-escape syntax, see Section 9.1.1, “String Literals”.If the
FIELDS ESCAPED BY
character is empty, escape-sequence interpretation does not occur.For output, if the
FIELDS ESCAPED BY
character is not empty, it is used to prefix the following characters on output:The
FIELDS ESCAPED BY
character.The
FIELDS [OPTIONALLY] ENCLOSED BY
character.The first character of the
FIELDS TERMINATED BY
andLINES TERMINATED BY
values, if theENCLOSED BY
character is empty or unspecified.ASCII
0
(what is actually written following the escape character is ASCII0
, not a zero-valued byte).
If the
FIELDS ESCAPED BY
character is empty, no characters are escaped andNULL
is output asNULL
, not\N
. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.
In certain cases, field- and line-handling options interact:
If
LINES TERMINATED BY
is an empty string andFIELDS TERMINATED BY
is nonempty, lines are also terminated withFIELDS TERMINATED BY
.If the
FIELDS TERMINATED BY
andFIELDS ENCLOSED BY
values are both empty (''
), a fixed-row (nondelimited) format is used. With fixed-row format, no delimiters are used between fields (but you can still have a line terminator). Instead, column values are read and written using a field width wide enough to hold all values in the field. ForTINYINT
,SMALLINT
,MEDIUMINT
,INT
, andBIGINT
, the field widths are 4, 6, 8, 11, and 20, respectively, no matter what the declared display width is.LINES TERMINATED BY
is still used to separate lines. If a line does not contain all fields, the rest of the columns are set to their default values. If you do not have a line terminator, you should set this to''
. In this case, the text file must contain all fields for each row.Fixed-row format also affects handling of
NULL
values, as described later.NoteFixed-size format does not work if you are using a multibyte character set.
Handling of NULL
values varies according to
the FIELDS
and LINES
options in use:
For the default
FIELDS
andLINES
values,NULL
is written as a field value of\N
for output, and a field value of\N
is read asNULL
for input (assuming that theESCAPED BY
character is\
).If
FIELDS ENCLOSED BY
is not empty, a field containing the literal wordNULL
as its value is read as aNULL
value. This differs from the wordNULL
enclosed withinFIELDS ENCLOSED BY
characters, which is read as the string'NULL'
.If
FIELDS ESCAPED BY
is empty,NULL
is written as the wordNULL
.With fixed-row format (which is used when
FIELDS TERMINATED BY
andFIELDS ENCLOSED BY
are both empty),NULL
is written as an empty string. This causes bothNULL
values and empty strings in the table to be indistinguishable when written to the file because both are written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.
An attempt to load NULL
into a NOT
NULL
column causes assignment of the implicit default
value for the column's data type and a warning, or an error in
strict SQL mode. Implicit default values are discussed in
Section 11.7, “Data Type Default Values”.
Some cases are not supported by LOAD
DATA
:
Fixed-size rows (
FIELDS TERMINATED BY
andFIELDS ENCLOSED BY
both empty) andBLOB
orTEXT
columns.If you specify one separator that is the same as or a prefix of another,
LOAD DATA
cannot interpret the input properly. For example, the followingFIELDS
clause would cause problems:If
FIELDS ESCAPED BY
is empty, a field value that contains an occurrence ofFIELDS ENCLOSED BY
orLINES TERMINATED BY
followed by theFIELDS TERMINATED BY
value causesLOAD DATA
to stop reading a field or line too early. This happens becauseLOAD DATA
cannot properly determine where the field or line value ends.
The following example loads all columns of the
persondata
table:
By default, when no column list is provided at the end of the
LOAD DATA
statement, input lines
are expected to contain a field for each table column. If you
want to load only some of a table's columns, specify a column
list:
- (col_name_or_user_var [, col_name_or_user_var] ...);
You must also specify a column list if the order of the fields in the input file differs from the order of the columns in the table. Otherwise, MySQL cannot tell how to match input fields with table columns.
Each col_name_or_user_var
value is
either a column name or a user variable. With user variables,
the SET
clause enables you to perform
preprocessing transformations on their values before assigning
the result to columns.
User variables in the SET
clause can be used
in several ways. The following example uses the first input
column directly for the value of t1.column1
,
and assigns the second input column to a user variable that is
subjected to a division operation before being used for the
value of t1.column2
:
The SET
clause can be used to supply values
not derived from the input file. The following statement sets
column3
to the current date and time:
You can also discard an input value by assigning it to a user variable and not assigning the variable to a table column:
Use of the column/variable list and SET
clause is subject to the following restrictions:
Assignments in the
SET
clause should have only column names on the left hand side of assignment operators.You can use subqueries in the right hand side of
SET
assignments. A subquery that returns a value to be assigned to a column may be a scalar subquery only. Also, you cannot use a subquery to select from the table that is being loaded.Lines ignored by an
IGNORE
clause are not processed for the column/variable list orSET
clause.User variables cannot be used when loading data with fixed-row format because user variables do not have a display width.
When processing an input line, LOAD
DATA
splits it into fields and uses the values
according to the column/variable list and the
SET
clause, if they are present. Then the
resulting row is inserted into the table. If there are
BEFORE INSERT
or AFTER
INSERT
triggers for the table, they are activated
before or after inserting the row, respectively.
If an input line has too many fields, the extra fields are ignored and the number of warnings is incremented.
If an input line has too few fields, the table columns for which input fields are missing are set to their default values. Default value assignment is described in Section 11.7, “Data Type Default Values”.
An empty field value is interpreted different from a missing field:
For string types, the column is set to the empty string.
For numeric types, the column is set to
0
.For date and time types, the column is set to the appropriate “zero” value for the type. See Section 11.3, “Date and Time Types”.
These are the same values that result if you assign an empty
string explicitly to a string, numeric, or date or time type
explicitly in an INSERT
or
UPDATE
statement.
Treatment of empty or incorrect field values differs from that
just described if the SQL mode is set to a restrictive value.
For example, if sql_mode
is set
to TRADITIONAL
, conversion of
an empty value or a value such as 'x'
for a
numeric column results in an error, not conversion to 0. (With
LOCAL
or IGNORE
, warnings
occur rather than errors, even with a restrictive
sql_mode
value, and the row is
inserted using the same closest-value behavior used for
nonrestrictive SQL modes. This occurs because the server has no
way to stop transmission of the file in the middle of the
operation.)
TIMESTAMP
columns are set to the
current date and time only if there is a NULL
value for the column (that is, \N
) and the
column is not declared to permit NULL
values,
or if the TIMESTAMP
column's
default value is the current timestamp and it is omitted from
the field list when a field list is specified.
LOAD DATA
regards all input as
strings, so you cannot use numeric values for
ENUM
or
SET
columns the way you can with
INSERT
statements. All
ENUM
and
SET
values must be specified as
strings.
BIT
values cannot be loaded
directly using binary notation (for example,
b'011010'
). To work around this, use the
SET
clause to strip off the leading
b'
and trailing '
and
perform a base-2 to base-10 conversion so that MySQL loads the
values into the BIT
column
properly:
shell> cat /tmp/bit_test.txt
b'10'
b'1111111'
shell> mysql test
mysql> LOAD DATA INFILE '/tmp/bit_test.txt'
INTO TABLE bit_test (@var1)
SET b = CAST(CONV(MID(@var1, 3, LENGTH(@var1)-3), 2, 10) AS UNSIGNED);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Deleted: 0 Skipped: 0 Warnings: 0
mysql> SELECT BIN(b+0) FROM bit_test;
+----------+
| BIN(b+0) |
+----------+
| 10 |
| 1111111 |
+----------+
2 rows in set (0.00 sec)
For BIT
values in
0b
binary notation (for example,
0b011010
), use this SET
clause instead to strip off the leading 0b
:
When the LOAD DATA
statement
finishes, it returns an information string in the following
format:
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
Warnings occur under the same circumstances as when values are
inserted using the INSERT
statement (see Section 13.2.6, “INSERT Syntax”), except that
LOAD DATA
also generates warnings
when there are too few or too many fields in the input row.
You can use SHOW WARNINGS
to get
a list of the first
max_error_count
warnings as
information about what went wrong. See
Section 13.7.6.40, “SHOW WARNINGS Syntax”.
If you are using the C API, you can get information about the
statement by calling the
mysql_info()
function. See
Section 28.7.7.36, “mysql_info()”.
On Unix, if you need LOAD DATA
to
read from a pipe, you can use the following technique (the
example loads a listing of the /
directory
into the table db1.t1
):
mkfifo /mysql/data/db1/ls.dat
chmod 666 /mysql/data/db1/ls.dat
find / -ls > /mysql/data/db1/ls.dat &
mysql -e "LOAD DATA INFILE 'ls.dat' INTO TABLE t1" db1
Here you must run the command that generates the data to be loaded and the mysql commands either on separate terminals, or run the data generation process in the background (as shown in the preceding example). If you do not do this, the pipe will block until data is read by the mysql process.
Document created the 26/06/2006, last modified the 26/10/2018
Source of the printed document:https://www.gaudry.be/en/mysql-rf-load-data.html
The infobrol is a personal site whose content is my sole responsibility. The text is available under CreativeCommons license (BY-NC-SA). More info on the terms of use and the author.
References
These references and links indicate documents consulted during the writing of this page, or which may provide additional information, but the authors of these sources can not be held responsible for the content of this page.
The author This site is solely responsible for the way in which the various concepts, and the freedoms that are taken with the reference works, are presented here. Remember that you must cross multiple source information to reduce the risk of errors.