MySQL 8.0 Reference Manual（读书笔记76节-Optimizer Statistics for InnoDB (2)）

1. Configuring Non-Persistent Optimizer Statistics Parameters

开始讲解非固化的统计数据

This section describes how to configure non-persistent optimizer statistics. Optimizer statistics are not persisted to disk when innodb_stats_persistent=OFF or when individual tables are created or altered with STATS_PERSISTENT=0. Instead, statistics are stored in memory, and are lost when the server is shut down. Statistics are also updated periodically by certain operations and under certain conditions.

Optimizer Statistics Updates

这类静态数据更新的时机

Non-persistent optimizer statistics are updated when:

• Running ANALYZE TABLE.

• Running SHOW TABLE STATUS, SHOW INDEX, or querying the Information Schema TABLES or STATISTICS tables with the innodb_stats_on_metadata option enabled.

The default setting for innodb_stats_on_metadata is OFF. Enabling innodb_stats_on_metadata may reduce access speed for schemas that have a large number of tables or indexes, and reduce stability of execution plans for queries that involve InnoDB tables. innodb_stats_on_metadata is configured globally using a SET statement.

SET GLOBAL innodb_stats_on_metadata=ON

留意: innodb_stats_on_metadata only applies when optimizer statistics are configured to be non-persistent (when innodb_stats_persistent is disabled).

• Starting a mysql client with the –auto-rehash option enabled, which is the default. The autorehash option causes all InnoDB tables to be opened, and the open table operations cause statistics to be recalculated.

To improve the start up time of the mysql client and to updating statistics, you can turn off autorehash using the –disable-auto-rehash option. The auto-rehash feature enables automatic name completion of database, table, and column names for interactive users.

• A table is first opened.

• InnoDB detects that 1 / 16 of table has been modified since the last time statistics were updated.

Configuring the Number of Sampled Pages

The MySQL query optimizer uses estimated statistics about key distributions to choose the indexes for an execution plan, based on the relative selectivity of the index. When InnoDB updates optimizer statistics, it samples random pages from each index on a table to estimate the cardinality of the index. (This technique is known as random dives.)

To give you control over the quality of the statistics estimate (and thus better information for the query optimizer), you can change the number of sampled pages using the parameter innodb_stats_transient_sample_pages. The default number of sampled pages is 8, which could be insufficient to produce an accurate estimate, leading to poor index choices by the query optimizer. This technique is especially important for large tables and tables used in joins. Unnecessary full table scans for such tables can be a substantial performance issue.innodb_stats_transient_sample_pages is a global parameter that can be set at runtime.

The value of innodb_stats_transient_sample_pages affects the index sampling for all InnoDB tables and indexes when innodb_stats_persistent=0. Be aware of the following potentially significant impacts when you change the index sample size:

• Small values like 1 or 2 can result in inaccurate estimates of cardinality.

• Increasing the innodb_stats_transient_sample_pages value might require more disk reads. Values much larger than 8 (say, 100), can cause a significant slowdown in the time it takes to open a table or execute SHOW TABLE STATUS.

• The optimizer might choose very different query plans based on different estimates of index selectivity.

Whatever value of innodb_stats_transient_sample_pages works best for a system, set the option and leave it at that value. Choose a value that results in reasonably accurate estimates for all tables in your database without requiring excessive I/O. Because the statistics are automatically recalculated at various times other than on execution of ANALYZE TABLE, it does not make sense to increase the index sample size, run ANALYZE TABLE, then decrease sample size again. Smaller tables generally require fewer index samples than larger tables. If your database has many large tables, consider using a higher value for innodb_stats_transient_sample_pages than if you have mostly smaller tables.

2. Estimating ANALYZE TABLE Complexity for InnoDB Tables

ANALYZE TABLE complexity【kəmˈpleksəti 复杂性;难题;难懂;难以理解的局势】 for InnoDB tables is dependent on:

• The number of pages sampled, as defined by innodb_stats_persistent_sample_pages.

• The number of indexed columns in a table

• The number of partitions【pɑːrˈtɪʃnz 分割;隔断;分治;瓜分;隔扇;隔板墙】. If a table has no partitions, the number of partitions is considered to be 1.

Using these parameters, an approximate formula for estimating ANALYZE TABLE complexity would be:

The value of innodb_stats_persistent_sample_pages * number of indexed columns in a table * the number of partitions

Typically, the greater the resulting value, the greater the execution time for ANALYZE TABLE.

说明:

innodb_stats_persistent_sample_pages defines the number of pages sampled at a global level. To set the number of pages sampled for an individual table, use the STATS_SAMPLE_PAGES option with CREATE TABLE or ALTER TABLE.

If innodb_stats_persistent=OFF, the number of pages sampled is defined by innodb_stats_transient_sample_pages.

In Big O notation, ANALYZE TABLE complexity is described as: —公式描述

O(n_sample
 * (n_cols_in_uniq_i
 + n_cols_in_non_uniq_i
 + n_cols_in_pk * (1 + n_non_uniq_i))
 * n_part)

where:

• n_sample is the number of pages sampled (defined by innodb_stats_persistent_sample_pages)

• n_cols_in_uniq_i is total number of all columns in all unique indexes (not counting the primary key columns)

• n_cols_in_non_uniq_i is the total number of all columns in all nonunique indexes

• n_cols_in_pk is the number of columns in the primary key (if a primary key is not defined, InnoDB creates a single column primary key internally)

• n_non_uniq_i is the number of nonunique indexes in the table

• n_part is the number of partitions. If no partitions are defined, the table is considered to be a single partition.

什么是Big O notation?

大O符号（Big O notation）是用于描述函数渐近行为的数学符号。更确切地说，它是用另一个（通常更简单的）函数来描述一个函数数量级的渐近上界。在数学中，它一般用来刻画被截断的无穷级数尤其是渐近级数的剩余项；在计算机科学中，它在分析算法复杂性的方面非常有用。

—https://dev.mysql.com/doc/refman/8.0/en/innodb-statistics-estimation.html

—https://dev.mysql.com/doc/refman/8.0/en/innodb-analyze-table-complexity.html

玄机博客

1.本站内容仅供参考，不作为任何法律依据。用户在使用本站内容时，应自行判断其真实性、准确性和完整性，并承担相应风险。

2.本站部分内容来源于互联网，仅用于交流学习研究知识，若侵犯了您的合法权益，请及时邮件或站内私信与本站联系，我们将尽快予以处理。

3.本文采用知识共享署名4.0国际许可协议 [BY-NC-SA] 进行授权

4.根据《计算机软件保护条例》第十七条规定“为了学习和研究软件内含的设计思想和原理，通过安装、显示、传输或者存储软件等方式使用软件的，可以不经软件著作权人许可，不向其支付报酬。”您需知晓本站所有内容资源均来源于网络，仅供用户交流学习与研究使用，版权归属原版权方所有，版权争议与本站无关，用户本人下载后不能用作商业或非法用途，需在24个小时之内从您的电脑中彻底删除上述内容，否则后果均由用户承担责任；如果您访问和下载此文件，表示您同意只将此文件用于参考、学习而非其他用途，否则一切后果请您自行承担，如果您喜欢该程序，请支持正版软件，购买注册，得到更好的正版服务。