最近碰到一个慢SQL问题,解决过程有点小曲折,和大家分享下。 SQL本身不复杂,表结构、索引也比较简单,不过个别字段存在于多个索引中。
CREATE TABLE `pre_forum_post` ( `pid` int(10) unsigned NOT NULL, `fid` mediumint(8) unsigned NOT NULL DEFAULT '0', `tid` mediumint(8) unsigned NOT NULL DEFAULT '0', `first` tinyint(1) NOT NULL DEFAULT '0', `author` varchar(40) NOT NULL DEFAULT '', `authorid` int(10) unsigned NOT NULL DEFAULT '0', `subject` varchar(80) NOT NULL DEFAULT '', `dateline` int(10) unsigned NOT NULL DEFAULT '0', `message` mediumtext NOT NULL, `useip` varchar(15) NOT NULL DEFAULT '', `invisible` tinyint(1) NOT NULL DEFAULT '0', `anonymous` tinyint(1) NOT NULL DEFAULT '0', `usesig` tinyint(1) NOT NULL DEFAULT '0', `htmlon` tinyint(1) NOT NULL DEFAULT '0', `bbcodeoff` tinyint(1) NOT NULL DEFAULT '0', `smileyoff` tinyint(1) NOT NULL DEFAULT '0', `parseurloff` tinyint(1) NOT NULL DEFAULT '0', `attachment` tinyint(1) NOT NULL DEFAULT '0', `rate` smallint(6) NOT NULL DEFAULT '0', `ratetimes` tinyint(3) unsigned NOT NULL DEFAULT '0', `status` int(10) NOT NULL DEFAULT '0', `tags` varchar(255) NOT NULL DEFAULT '0', `comment` tinyint(1) NOT NULL DEFAULT '0', `replycredit` int(10) NOT NULL DEFAULT '0', `position` int(8) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`tid`,`position`), UNIQUE KEY `pid` (`pid`), KEY `fid` (`fid`), KEY `displayorder` (`tid`,`invisible`,`dateline`), KEY `first` (`tid`,`first`), KEY `new_auth` (`authorid`,`invisible`,`tid`), KEY `idx_dt` (`dateline`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;"root@localhost Fri Aug 1 11:59:56 2014 11:59:56 [test]>show table status like 'pre_forum_post'\G *************************** 1. row *************************** Name: pre_forum_post Engine: MyISAM Version: 10 Row_format: Dynamic Rows: 23483977 Avg_row_length: 203 Data_length: 4782024708 Max_data_length: 281474976710655 Index_length: 2466093056 Data_free: 0 Auto_increment: 1 Create_time: 2014-08-01 11:00:56 Update_time: 2014-08-01 11:08:49 Check_time: 2014-08-01 11:12:23 Collation: utf8_general_ci Checksum: NULL Create_options: Comment: mysql> show index from pre_forum_post; +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | pre_forum_post | 0 | PRIMARY | 1 | tid | A | 838713 | NULL | NULL | | BTREE | | | | pre_forum_post | 0 | PRIMARY | 2 | position | A | 23483977 | NULL | NULL | | BTREE | | | | pre_forum_post | 0 | pid | 1 | pid | A | 23483977 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | fid | 1 | fid | A | 1470 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 1 | tid | A | 838713 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 2 | invisible | A | 869776 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 3 | dateline | A | 23483977 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | first | 1 | tid | A | 838713 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | first | 2 | first | A | 1174198 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 1 | authorid | A | 1806459 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 2 | invisible | A | 1956998 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 3 | tid | A | 11741988 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | idx_dt | 1 | dateline | A | 23483977 | NULL | NULL | | BTREE | | | +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
我们来看下这个SQL的执行计划:
mysql> explain select * from pre_forum_post where tid=7932612 and `invisible` in('0','-2') order by dateline limit 15\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: pre_forum_post type: index possible_keys: PRIMARY,displayorder,first key: idx_dt key_len: 4 ref: NULL rows: 14042 Extra: Using where
可以看到执行计划比较奇怪,从几个可选的索引中,最终选择了 idx_dt,结果悲剧了,这个SQL执行耗时很长:
mysql> select * from pre_forum_post where tid=7932612 and `invisible` in('0','-2') order by dateline limit 15; 15 rows in set (26.78 sec)
看下MySQL的会话状态值:Handler_read_next
| Handler_read_next | 17274153 |
从1700多万数据中选取15条记录,结果可想而知,非常慢。 我们强制指定比较靠谱的索引再看下:
mysql> explain select * from pre_forum_post force index(displayorder) where tid=7932612 and `invisible` in('0','-2') order by dateline limit 15\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: pre_forum_post type: range possible_keys: displayorder key: displayorder key_len: 4 ref: NULL rows: 46131 Extra: Using index condition; Using filesort
看下实际执行的耗时:
mysql> select * from pre_forum_post force index(displayorder) where tid=7932612 and `invisible` in('0','-2') order by dateline limit 15; 15 rows in set (0.08 sec)
尼玛,怎么可以这么快,查询优化器未免太坑爹了吧。 再看下MySQL的会话状态值:Handler_read_next
| Handler_read_next | 31188 |
和不强制索引的情况相比,差了553倍! 所幸,5.6以上除了EXPLAIN外,还支持OPTIMIZER_TRACE,我们来观察下两种执行计划的区别,发现不强制指定索引时的执行计划有诈,会在最后判断到 ORDER BY 子句时,修改执行计划:
{\"reconsidering_access_paths_for_index_ordering": {\"clause": "ORDER BY",\"index_order_summary": {\"table": "`pre_forum_post`",\"index_provides_order": true,\"order_direction": "asc",\"index": "idx_dt",\"plan_changed": true,\"access_type": "index_scan"\ } /* index_order_summary */\ } /* reconsidering_access_paths_for_index_ordering */\
而在前面analyzing_range_alternatives和considered_execution_plans阶段,都认为其他几个索引也是可选择的,直到这里才给强X了,你Y的… 看起来像是MySQL 5.6查询优化器的bug了,GOOGLE了一下,还真发有人已经反馈过类似的问题: MySQL bug 70245: incorrect costing for range scan causes optimizer to choose incorrect index
看完才发现,其实不是神马BUG,而是原来从5.6开始,增加了一个选项叫 eq_range_index_dive_limit 的高级货,这货大概的用途是: 在较多等值查询(例如多值的IN查询)情景中,预估可能会扫描的记录数,从而选择相对更合适的索引,避免所谓的index dive问题。
当面临下面两种选择时:
1、索引代价较高,但结果较为精确; 2、索引代价较低,但结果可能不够精确;
简单说,选项 eq_range_index_dive_limit 的值设定了 IN列表中的条件个数上线,超过设定值时,会将执行计划分支从 1 变成 2。
该值默认为 10,但社区众多人反馈较低了,因此在5.7版本后,将默认值调整为 200了。
不过,今天我们这里的案例却是想反的,因为优化器选择了看似代价低但精确的索引,实际却选择了更低效的索引。 因此,我们需要将其阈值调低,尝试设置 eq_range_index_dive_limit = 2 后(上面的例子中,IN条件里有2个值),再看下新的查询计划:
mysql> set eq_range_index_dive_limit = 2; mysql> explain select * from pre_forum_post where tid=7932612 and `invisible` in('0','-2') order by dateline limit 15\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: pre_forum_post type: range possible_keys: PRIMARY,displayorder,first key: displayorder key_len: 4 ref: NULL rows: 54 Extra: Using index condition; Using filesort
卧槽,预估扫描记录数又降了557倍,相比最开始降了接近32万倍! 在这个案例中,虽然通过修改选项 eq_range_index_dive_limit 的阈值可以达到优化效果,但事实上更靠谱的做法是:直接删除 idx_dt索引。 是的,没错,删除这个垃圾 重复索引,因为实际上这个索引的用处不大,够坑爹吧~~
参考资料:
http://blog.163.com/li_hx/blog/static/18399141320147521735442/
http://mysqlserverteam.com/you-asked-for-it-new-default-for-eq_range_index_dive_limit/https://www.facebook.com/note.php?note_id=10151533648715933http://bugs.mysql.com/bug.php?id=70586http://bugs.mysql.com/bug.php?id=67980http://bugs.mysql.com/bug.php?id=70331