InnoDB Import Speed

From Hashmysql
Jump to: navigation, search

Introduction

Importing large dump files can be quite time consuming, unless some precautions are taken.

Alternative export methods, such as percona-xtrabackup are faster if the entire server must be backed up and you are likely to do full restores. Similarly, consider mariabackup for MariaDB 10.2+ instances.

When attempting to restore a single database, or using mysqldump during a transition to innodb_file_per_table, the approach described below is acceptable.

Before exporting the data with mysqldump

Determine if your version of mysqldump supports --innodb-optimize-keys and whether this is appropriate for your use case (eg the destination server supports "fast index creation")

You should almost certainly use some light compression to reduce disk I/O and space usage, while doing a logical export of the data to a flat file with mysqldump, and also when feeding the sql file to the mysql client: mysqldump <options> | gzip -1 > dbname.sql.gz (if most of your data is in BLOB columns this may not be a good idea)

Another often overlooked option for mysqldump is --no-autocommit, which means that each table will be encapsulated in a transaction instead of each insert being a separate transaction.

Before starting the import

The following variables should be adjusted:

innodb_buffer_pool_size = <value as large as possible>

You should use as much as 70 to 80 percent of your available memory before the import process. This variable can also be adjusted down after the import process, to allow the server to run other services, if desired. See the official documentation.

innodb_log_file_size

Increasing the log file size will allow larger transactions to fit into one file, and reduce the number of checkpoints needed. See how to change the size safely on versions older than 5.6. It is acceptable to set innodb_log_file_size to 1/2 innodb_buffer_pool_size on modern versions of MySQL, since crash recovery is much faster than it used to be when we had to keep the innodb_log_file_size "artificially" small. Don't allow this to be smaller than say 100M or 256M if you can spare the disk space (by default innodb_log_files_in_group = 2 so this is half of the total redo log space consumed on disk)

Use a sane I/O scheduler (elevator)

Choosing your I/O scheduler wisely will go a long way to ensure efficient use of your hardware. Avoid the cfq scheduler. Deadline and noop are great.

echo deadline > /sys/block/xvda/queue/scheduler

Use your block device name, have this command execute on boot via rc.local or update grub or udev rules. You can change this if the import is already in progress.

Disable transparent_hugepage

Check to see if THP is enabled and whether it is using always or madvise/never:

ps axuw | grep khugepaged

cat /sys/kernel/mm/transparent_hugepage/enabled

Change this to either madvise or never:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Configure your system to keep this setting (grub, tuned, rc.local, etc)

Start the import

Write to the binary log

If you need to write the the binary log, because you want this import replicated, start the import, as follows.

With pv

The pv (pipe viewer) command will show the import progress.

pv dump.sql.gz | zcat - | mysql

Without pv

If you do not have pv installed or do not want a progress indicator, just use zcat:

zcat dump.sql | mysql

Without compression, using pv

pv dump.sql | mysql

Skip writing to the binary log

You can speed up the import significantly if you do not have a need to write to the binary log. This sample series of commands will disable binary logging for just your import session.

For a non-compressed dump
Without pv

{ echo "set sql_log_bin=0;"; cat dump.sql ; } | mysql

With pv

pv dump.sql | { echo "set sql_log_bin=0;"; cat; } | mysql

For a compressed dump
Without pv

{ echo "set sql_log_bin=0;"; zcat dump.sql.gz ; } | mysql

With pv

pv dump.sql.gz | { echo "set sql_log_bin=0;"; zcat; } | mysql

When the import is in progress

innodb_flush_log_at_trx_commit = 2

This will allow flushing to the log file less often, reducing the load on the I/O subsystem. See the official documentation. Don't change this setting in my.cnf, at least not on any master.

If there are no production use databases on this server right now, innodb_flush_log_at_trx_commit=0 is even faster than innodb_flush_log_at_trx_commit=2.

Make this change dynamically using the mysql cli:

SET GLOBAL innodb_flush_log_at_trx_commit=2

After the import, you should restart the service or manually revert this to the default.

sync_binlog = 0

This will not fsync binary log writes (see libeatmydata for the truly dedicated folks), so use with caution. See the official documentation. If you've already started the import but did not use sql_log_bin=0, then consider temporarily reducing the durability of writes to the binary log (for all sessions):

SET GLOBAL sync_binlog=0

After the import, you should restart the service or manually revert this to the default.

After the import

Don't forget to revert any dynamically modified server variables, maybe just restart the service to be sure. Make sure you don't have sync_binlog=0 or innodb_flush_log_at_trx_commit=2 in your my.cnf, at least not on a master.

Other considerations

Ordering your data

Importing randomly ordered data can have disastrous consequences on the performance. Consider a scenario where your buffer pool is significantly smaller than your data. As a result, every record requires a disk seek, so an average seek time of 10ms can extend the import time to weeks.


When inserting in sorted order, once pages are filled they won't be revisited, so the working set is only the data that has been recently inserted. The older (not recently touched) pages will age out of the buffer pool (by LRU) allowing just the recent data to remain.


If your data fits in your buffer pool, you will still gain speed by ordering it, since poorly filled pages will result in more in more writes for the same amount of data.


Sample fill rates should be in the range of ~87.5% (~14% overhead) for sorted data, and ~50% (~100% overhead) for non-sorted data.


Ordering strategies should favour the primary key first, and then other indexes. When multiples indexes are present, the non-PK indexes should be rebuilt after the fact to maintain good performance.

Consider using mysqldump with --order-by-primary if the export speed is not a concern and this is how you are generating the file to import.

If you are importing to a storage engine other than InnoDB, consider whether mysqldump --order-by-primary-desc is supported by your version of mysqldump and whether this is desirable for your target storage engine.

Force mysql

Piping to mysql --force is useful for large datasets to continue on errors, which may be a minor problem like a broken view, etc - maybe redirect stderr output to a log file and review later:

zcat foo.sql.gz | mysql --force foo 2> import_foo_stderr.txt