Rechercher dans le manuel MySQL
6.4.8.3 Using MySQL Enterprise Data Masking and De-Identification
Before using MySQL Enterprise Data Masking and De-Identification, install it according to the instructions provided at Section 6.4.8.2, “Installing or Uninstalling MySQL Enterprise Data Masking and De-Identification”.
To use MySQL Enterprise Data Masking and De-Identification in applications, invoke the functions that are appropriate for the operations you wish to perform. For detailed function descriptions, see Section 6.4.8.4, “MySQL Enterprise Data Masking and De-Identification User-Defined Function Reference”. This section demonstrates how to use the functions to carry out some representative tasks. It first presents an overview of the available functions, followed by some examples of how the functions might be used in real-world context:
Masking Data to Remove Identifying Characteristics
MySQL provides general-purpose masking functions that mask arbitrary strings, and special-purpose masking functions that mask specific types of values.
General-Purpose Masking Functions
mask_inner()
and
mask_outer()
are general-purpose
functions that mask parts of arbitrary strings based on
position within the string:
mask_inner()
masks the interior of its string argument, leaving the ends unmasked. Other arguments specify the sizes of the unmasked ends.- +--------------------------------------+
- | mask_inner('This is a string', 5, 1) |
- +--------------------------------------+
- | This XXXXXXXXXXg |
- +--------------------------------------+
- +--------------------------------------+
- | mask_inner('This is a string', 1, 5) |
- +--------------------------------------+
- | TXXXXXXXXXXtring |
- +--------------------------------------+
mask_outer()
does the reverse, masking the ends of its string argument, leaving the interior unmasked. Other arguments specify the sizes of the masked ends.- +--------------------------------------+
- | mask_outer('This is a string', 5, 1) |
- +--------------------------------------+
- | XXXXXis a strinX |
- +--------------------------------------+
- +--------------------------------------+
- | mask_outer('This is a string', 1, 5) |
- +--------------------------------------+
- +--------------------------------------+
By default, mask_inner()
and
mask_outer()
use
'X'
as the masking character, but permit an
optional masking-character argument:
- +-------------------------------------------+
- | mask_inner('This is a string', 5, 1, '*') |
- +-------------------------------------------+
- | This **********g |
- +-------------------------------------------+
- +-------------------------------------------+
- | mask_outer('This is a string', 5, 1, '#') |
- +-------------------------------------------+
- | #####is a strin# |
- +-------------------------------------------+
Special-Purpose Masking Functions
Other masking functions expect a string argument representing a specific type of value and mask it to remove identifying characteristics.
The examples here supply function arguments using the random value generation functions that return the appropriate type of value. For more information about generation functions, see Generating Random Data with Specific Characteristics.
Payment card Primary Account Number masking. Masking functions provide strict and relaxed masking of Primary Account Numbers.
mask_pan()
masks all but the last four digits of the number:- +-------------------------+
- | mask_pan(gen_rnd_pan()) |
- +-------------------------+
- | XXXXXXXXXXXX2461 |
- +-------------------------+
mask_pan_relaxed()
is similar but does not mask the first six digits that indicate the payment card issuer unmasked:- +---------------------------------+
- | mask_pan_relaxed(gen_rnd_pan()) |
- +---------------------------------+
- | 770630XXXXXX0807 |
- +---------------------------------+
U.S. Social Security number masking.
mask_ssn()
masks all but the
last four digits of the number:
- +-------------------------+
- | mask_ssn(gen_rnd_ssn()) |
- +-------------------------+
- | XXX-XX-1723 |
- +-------------------------+
Several functions generate random values. These values can be used for testing, simulation, and so forth.
gen_range()
returns a random
integer selected from a given range:
- +------------------+
- | gen_range(1, 10) |
- +------------------+
- | 6 |
- +------------------+
gen_rnd_email()
returns a random
email address in the example.com
domain:
- +---------------------------+
- | gen_rnd_email() |
- +---------------------------+
- | ayxnq.xmkpvvy@example.com |
- +---------------------------+
gen_rnd_pan()
returns a random
payment card Primary Account Number:
(The gen_rnd_pan()
function
result is not shown because its return values should be used
only for testing purposes, and not for publication. It cannot
be guaranteed the number is not assigned to a legitimate
payment account.)
gen_rnd_ssn()
returns a random
U.S. Social Security number with the first and second parts
each chosen from a range not used for legitimate numbers:
- +---------------+
- | gen_rnd_ssn() |
- +---------------+
- | 912-45-1615 |
- +---------------+
gen_rnd_us_phone()
returns a
random U.S. phone number in the 555 area code not used for
legitimate numbers:
- +--------------------+
- | gen_rnd_us_phone() |
- +--------------------+
- | 1-555-747-5627 |
- +--------------------+
MySQL Enterprise Data Masking and De-Identification enables dictionaries to be used as sources of random values. To use a dictionary, it must first be loaded from a file and given a name. Each loaded dictionary becomes part of the dictionary registry. Items then can be selected from registered dictionaries and used as random values or as replacements for other values.
A valid dictionary file has these characteristics:
The file contents are plain text, one term per line.
Empty lines are ignored.
The file must contain at least one term.
Suppose that a file named de_cities.txt
contains these city names in Germany:
Berlin
Munich
Bremen
Also suppose that a file named
us_cities.txt
contains these city names
in the United States:
Chicago
Houston
Phoenix
El Paso
Detroit
Assume that the
secure_file_priv
system
variable is set to
/usr/local/mysql/mysql-files
. In that
case, copy the dictionary files to that directory so that the
MySQL server can access them. Then use
gen_dictionary_load()
to load
the dictionaries into the dictionary registry and assign them
names:
- +--------------------------------------------------------------------------------+
- | gen_dictionary_load('/usr/local/mysql/mysql-files/de_cities.txt', 'DE_Cities') |
- +--------------------------------------------------------------------------------+
- +--------------------------------------------------------------------------------+
- +--------------------------------------------------------------------------------+
- | gen_dictionary_load('/usr/local/mysql/mysql-files/us_cities.txt', 'US_Cities') |
- +--------------------------------------------------------------------------------+
- +--------------------------------------------------------------------------------+
To select a random term from a dictionary, use
gen_dictionary()
:
- +-----------------------------+
- | gen_dictionary('DE_Cities') |
- +-----------------------------+
- | Berlin |
- +-----------------------------+
- +-----------------------------+
- | gen_dictionary('US_Cities') |
- +-----------------------------+
- | Phoenix |
- +-----------------------------+
To select a random term from multiple dictionaries, randomly select one of the dictionaries, then select a term from it:
- +---------------------------------------------------------------+
- +---------------------------------------------------------------+
- | Detroit |
- +---------------------------------------------------------------+
- +---------------------------------------------------------------+
- +---------------------------------------------------------------+
- | Bremen |
- +---------------------------------------------------------------+
The gen_blacklist()
function
enables a term from one dictionary to be replaced by a term
from another dictionary, which effects masking by
substitution. Its arguments are the term to replace, the
dictionary in which the term appears, and the dictionary from
which to choose a replacement. For example, to substitute a
U.S. city for a German city, or vice versa, use
gen_blacklist()
like this:
- +---------------------------------------------------+
- | gen_blacklist('Munich', 'DE_Cities', 'US_Cities') |
- +---------------------------------------------------+
- | Houston |
- +---------------------------------------------------+
- +----------------------------------------------------+
- | gen_blacklist('El Paso', 'US_Cities', 'DE_Cities') |
- +----------------------------------------------------+
- | Bremen |
- +----------------------------------------------------+
If the term to replace is not in the first dictionary,
gen_blacklist()
returns it
unchanged:
- +---------------------------------------------------+
- | gen_blacklist('Moscow', 'DE_Cities', 'US_Cities') |
- +---------------------------------------------------+
- | Moscow |
- +---------------------------------------------------+
At customer-service call centers, one common identity
verification technique is to ask customers to provide their
last four Social Security number (SSN) digits. For example, a
customer might say her name is Joanna Bond and that her last
four SSN digits are 0007
.
Suppose that a customer
table containing
customer records has these columns:
id
: Customer ID number.first_name
: Customer first name.last_name
: Customer last name.ssn
: Customer Social Security number.
The application used by customer-service representatives to check the customer SSN might execute a query like this:
- mysql> FROM customer
- +-----+-------------+
- | id | ssn |
- +-----+-------------+
- | 786 | 906-39-0007 |
- +-----+-------------+
However, that exposes the SSN to the customer-service representative, who has no need to see anything but the last four digits. Instead, the application can use this query to display only the masked SSN:
- mysql> FROM customer
- +-----+-------------------------------------+
- +-----+-------------------------------------+
- | 786 | XXX-XX-0007 |
- +-----+-------------------------------------+
Now the representative sees only what is necessary, and customer privacy is preserved.
Why was the CONVERT()
function
used for the argument to
mask_ssn()
? Because
mask_ssn()
requires an argument
of length 11, and because UDFs treat string arguments as
binary strings, with one byte per character. Thus, even though
ssn
is defined as
VARCHAR(11)
, if the ssn
column has a multibyte character set, it appears to be longer
than 11 bytes when passed to a UDF, and an error occurs.
Converting the value to a binary string ensures that the UDF
sees an argument of length 11.
A similar technique may be needed for other data masking functions when string arguments do not have a single-byte character set.
If masked data from a table is used for multiple queries, it may be convenient to define a view that produces masked data. That way, applications can select from the view without performing masking in individual queries.
For example, a masking view on the customer
table from the previous section can be defined like this:
- FROM customer;
Then the query to look up a customer becomes simpler but still returns masked data:
Traduction non disponible
Le manuel MySQL n'est pas encore traduit en français sur l'infobrol. Seule la version anglaise est disponible pour l'instant.
Document créé le 26/06/2006, dernière modification le 26/10/2018
Source du document imprimé : https://www.gaudry.be/mysql-rf-data-masking-usage.html
L'infobrol est un site personnel dont le contenu n'engage que moi. Le texte est mis à disposition sous licence CreativeCommons(BY-NC-SA). Plus d'info sur les conditions d'utilisation et sur l'auteur.
Références
Ces références et liens indiquent des documents consultés lors de la rédaction de cette page, ou qui peuvent apporter un complément d'information, mais les auteurs de ces sources ne peuvent être tenus responsables du contenu de cette page.
L'auteur de ce site est seul responsable de la manière dont sont présentés ici les différents concepts, et des libertés qui sont prises avec les ouvrages de référence. N'oubliez pas que vous devez croiser les informations de sources multiples afin de diminuer les risques d'erreurs.