COUNT(1)、COUNT(*)和COUNT(字段)

这个问题在网上有好多说法,甚至一些已经工作许久的开发者也没能弄清楚,网上博客千千万都不如官方文档来得靠谱,MySQL8.0官方文档地址,可以找到关于COUNT关键字的解释

MySQL官方解释

  1. Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. The result is a BIGINT value.

    这句话说的是COUNT(expr)是个啥,COUNT函数用于统计在符合搜索条件的记录中,指定的表达式expr
    不为NULL的行数有多少。这里需要特别注意的是,expr不仅仅可以是列名,其他任意表达式都是可以的。

    	SELECT COUNT(1) FROM employees;
    

    1永远不为NULL,即统计employees表中有多少记录,同理COUNT(2),COUNT('abv')也是如此。

    	SELECT COUNT(`birth_date`) FROM employees;
    

    服务器不断向存储引擎要每一条记录birth_date的值,如果为NULL,不计数,否则计数+1。

  2. COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.

    COUNT(*)计数表行,无论是否存在NULL值

  3. For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.
    InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

    InnoDB存储引擎由于并发事务,无法事先存储表行数,因为对于不同的并发事务来说,有多少行是不确定的。

  4. InnoDB processes SELECT COUNT() statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, InnoDB processes SELECT COUNT() statements by scanning the clustered index.

    InnoDB处理COUNT(*)走最小的辅助索引,除非优化器明确指示其他索引。如果表没有辅助索引,走聚簇索引。

  5. Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.

    如果全部索引记录不在缓冲池中,COUNT(*)计数会比较耗时。可以考虑建立一个Counter表用于维护各个表的记录数,随着各个表的操作更新。这种方法在大量的并发事务下可能不合适,如果不是需要精确的记录数,考虑使用SHOW TABLE STATUS获取近似值。

  6. InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

    对于InnoDB存储引擎来说,COUNT(*)和COUNT(1)没有什么区别。

所以答案很清楚了,关键还是在于MYSQL对于COUNT函数的优化,走索引肯定比全表扫描快,同样走索引,由于索引也存储了列,那么走存储小的索引肯定更快,也就是说其实走聚簇索引是最慢的,也就是网上一些文章说的COUNT(主键) < COUNT(*),其实不然,COUNT(主键)同样会走二级索引。

COUNT函数执行过程

以下展示了COUNT执行的流程。

SELECT COUNT(*) FROM employees;

在执行上述查询时,server层会维护一个名叫count的变量,然后:

  • server层向InnoDB要第一条记录(由于是COUNT(*),没有指定字段,返回主键即可,所有二级索引都包含主键,选择最小的二级索引)。

  • InnoDB找到第一条二级索引记录,并返回给server层(注意:由于此时只是统计记录数量,所以并不需要回表)。

  • 由于COUNT函数的参数是*,MySQL会将*当作常数0处理。由于0并不是NULL,server层给count变量加1。

  • server层向InnoDB要下一条记录。

  • InnoDB通过二级索引记录的next_record属性找到下一条二级索引记录,并返回给server层。

  • server层继续给count变量加1。

  • ... 重复上述过程,直到InnoDB向server层返回没记录可查的消息。

  • server层将最终的count变量的值发送到客户端。

综上,最终的结论应该是COUNT(*)=COUNT(常量)=COUNT(主键)>COUNT(非索引字段),优先使用最小的二级索引。

试验分析

还是employees数据库,如下所示

 CREATE TABLE `employees` (
  `emp_no` int NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`),
  KEY `index_all` (`birth_date`,`last_name`,`first_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

表中包含主键索引和一组联合索引
QQ图片20220817232638.png

无条件COUNT(*)

mysql> EXPLAIN SELECT COUNT(*) FROM employees;
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key       | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | index_all | 97      | NULL | 299423 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+--------+----------+-------------+

二级索引只有一组联合索引,key_len为97

无条件COUNT(主键)

mysql> EXPLAIN SELECT COUNT(`emp_no`) FROM employees;
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key       | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | index_all | 97      | NULL | 299423 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+--------+----------+-------------+

有条件COUNT(*),条件在联合索引中且满足最左匹配原则

mysql> EXPLAIN SELECT COUNT(`birth_date`) FROM employees;
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key       | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | index | NULL          | index_all | 97      | NULL | 299423 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+--------+----------+-------------+

有条件COUNT(*),条件在联合索引中,不满足最左匹配原则

mysql> EXPLAIN SELECT COUNT(*) FROM employees WHERE `last_name` like 'john%';
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+-------+----------+----------------------------------------+
| id | select_type | table     | partitions | type  | possible_keys | key       | key_len | ref  | rows  | filtered | Extra                                  |
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+-------+----------+----------------------------------------+
|  1 | SIMPLE      | employees | NULL       | range | index_all     | index_all | 53      | NULL | 33265 |   100.00 | Using where; Using index for skip scan |
+----+-------------+-----------+------------+-------+---------------+-----------+---------+------+-------+----------+----------------------------------------+

很容易想到,条件改成first_name。key_len为97

有条件COUNT(*),条件不在索引中

mysql> EXPLAIN SELECT COUNT(*) FROM employees WHERE `hire_date` < '1999-01-01';
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 299423 |    33.33 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+

全表扫描

关于COUNT的执行流程及COUNT(0)、COUNT(*)、COUNT(字段)之间的区别,这篇文章已经讲得非常清楚了,强烈推荐阅读!
MySQL的COUNT是怎么执行的

Q.E.D.


一切很好,不缺烦恼。