博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
MariaDB · 新特性 · 窗口函数
阅读量:6893 次
发布时间:2019-06-27

本文共 9489 字,大约阅读时间需要 31 分钟。

简介

窗口函数(Window Function)是 SQL:2013 标准中提出的,在后续标准版本的更新中也多次扩展,最新的版本是 SQL:2011 中的标准。

从某些方面来说,窗口函数与聚合函数(Aggregate Function)有些类似,他们都是对一系列的行进行聚合计算,然而不同于聚合函数的是,窗口函数的输出并不是被聚集到单独的一行。像 SUM, AVG, COUNT 这些聚合函数最终输出都是一行记录。

窗口函数可以通过计算每行周围窗口上的集合值来分析数据,例如:计算一定记录范围内、一定值域内、或者一段时间内的累计和以及移动平均值等等。之所以使用窗口这个术语,是因为对结果的处理使用了一个滑动的查询结果集范围。

大部分数据库,尤其是商业数据库都支持SQL标准中定义的部分窗口函数,但是MySQL一直没有支持这个特性。窗口函数在MySQL社区一直呼声很高,却一直没有被实现,直到 MairaDB 10.2 响应了客户的需求,实现了部分窗口函数,并且在持续完善中,我们就来简单介绍一下 MairaDB 的窗口函数。

语法

MariaDB的窗口函数查询指定使用 OVER 关键字,默认情况下,用于计算的行集合(Window,窗口)是整个数据集,并且可以用 ORDER BY 子句排序。PARTITION BY 子句可以用于将窗口缩小到特定的集合内。

使用语法:

function (expression) OVER (  [ PARTITION BY expression_list ]  [ ORDER BY order_list [ frame_clause ] ] ) function:  A valid window functionexpression_list:  expression | column_name [, expr_list ]order_list:  expression | column_name [ ASC | DESC ]   [, ... ]frame_clause:

例如,给出下面的原始数据:

CREATE TABLE student (name CHAR(10), test CHAR(10), score TINYINT); INSERT INTO student VALUES   ('Chun', 'SQL', 75), ('Chun', 'Tuning', 73),   ('Esben', 'SQL', 43), ('Esben', 'Tuning', 31),   ('Kaolin', 'SQL', 56), ('Kaolin', 'Tuning', 88),   ('Tatiana', 'SQL', 87), ('Tatiana', 'Tuning', 83);

下面两个查询可以分别返回按 test 和 name 分区处理的平均数:

SELECT name, test, score, AVG(score) OVER (PARTITION BY test)   AS average_by_test FROM student;+---------+--------+-------+-----------------+| name    | test   | score | average_by_test |+---------+--------+-------+-----------------+| Chun    | SQL    |    75 |         65.2500 || Chun    | Tuning |    73 |         68.7500 || Esben   | SQL    |    43 |         65.2500 || Esben   | Tuning |    31 |         68.7500 || Kaolin  | SQL    |    56 |         65.2500 || Kaolin  | Tuning |    88 |         68.7500 || Tatiana | SQL    |    87 |         65.2500 || Tatiana | Tuning |    83 |         68.7500 |+---------+--------+-------+-----------------+SELECT name, test, score, AVG(score) OVER (PARTITION BY name)   AS average_by_name FROM student;+---------+--------+-------+-----------------+| name    | test   | score | average_by_name |+---------+--------+-------+-----------------+| Chun    | SQL    |    75 |         74.0000 || Chun    | Tuning |    73 |         74.0000 || Esben   | SQL    |    43 |         37.0000 || Esben   | Tuning |    31 |         37.0000 || Kaolin  | SQL    |    56 |         72.0000 || Kaolin  | Tuning |    88 |         72.0000 || Tatiana | SQL    |    87 |         85.0000 || Tatiana | Tuning |    83 |         85.0000 |+---------+--------+-------+-----------------+

用例

RANK

描述:

RANK() 可以显示给定行的序号,从1开始,顺序以 ORDER BY 字段排序后的序列为准。

语法:

RANK() OVER (  [ PARTITION BY partition_expression ]  [ ORDER BY order_list ])

例子

CREATE TABLE student(course VARCHAR(10), mark int, name varchar(10));INSERT INTO student VALUES      ('Maths', 60, 'Thulile'),     ('Maths', 60, 'Pritha'),     ('Maths', 70, 'Voitto'),     ('Biology', 60, 'Bilal'),     ('Biology', 70, 'Roger');SELECT RANK() OVER (PARTITION BY course ORDER BY mark) AS rank,    DENSE_RANK() OVER (PARTITION BY course ORDER BY mark) AS dense_rank,    ROW_NUMBER() OVER (PARTITION BY course ORDER BY mark) AS row_num,    course, mark, name from student;+------+------------+---------+---------+------+---------+| rank | dense_rank | row_num | course  | mark | name    |+------+------------+---------+---------+------+---------+|    1 |          1 |       1 | Maths   |   60 | Thulile ||    1 |          1 |       2 | Maths   |   60 | Pritha  ||    3 |          2 |       3 | Maths   |   70 | Voitto  ||    1 |          1 |       1 | Biology |   60 | Bilal   ||    2 |          2 |       2 | Biology |   70 | Roger   |+------+------------+---------+---------+------+---------+

CUME_DIST

描述:

CUME_DIST() 可以返回一行数据的累积分布(cumulative distribution)。计算公式如下

(number of rows <= current row) / (total rows)

语法:

CUME_DIST() OVER (   [ PARTITION BY partition_expression ]   [ ORDER BY order_list ])

例子:

create table t1 (  pk int primary key,  a int,  b int);insert into t1 values( 1 , 0, 10),( 2 , 0, 10),( 3 , 1, 10),( 4 , 1, 10),( 8 , 2, 10),( 5 , 2, 20),( 6 , 2, 20),( 7 , 2, 20),( 9 , 4, 20),(10 , 4, 20);select pk, a, b,    rank() over (order by a),    percent_rank() over (order by a),    cume_dist() over (order by a)from t1;+----+------+------+--------------------------+----------------------------------+-------------------------------+| pk | a    | b    | rank() over (order by a) | percent_rank() over (order by a) | cume_dist() over (order by a) |+----+------+------+--------------------------+----------------------------------+-------------------------------+|  1 |    0 |   10 |                        1 |                     0.0000000000 |                  0.2000000000 ||  2 |    0 |   10 |                        1 |                     0.0000000000 |                  0.2000000000 ||  3 |    1 |   10 |                        3 |                     0.2222222222 |                  0.4000000000 ||  4 |    1 |   10 |                        3 |                     0.2222222222 |                  0.4000000000 ||  5 |    2 |   20 |                        5 |                     0.4444444444 |                  0.8000000000 ||  6 |    2 |   20 |                        5 |                     0.4444444444 |                  0.8000000000 ||  7 |    2 |   20 |                        5 |                     0.4444444444 |                  0.8000000000 ||  8 |    2 |   10 |                        5 |                     0.4444444444 |                  0.8000000000 ||  9 |    4 |   20 |                        9 |                     0.8888888889 |                  1.0000000000 || 10 |    4 |   20 |                        9 |                     0.8888888889 |                  1.0000000000 |+----+------+------+--------------------------+----------------------------------+-------------------------------+select pk, a, b,       percent_rank() over (order by pk),       cume_dist() over (order by pk)from t1 order by pk;+----+------+------+-----------------------------------+--------------------------------+| pk | a    | b    | percent_rank() over (order by pk) | cume_dist() over (order by pk) |+----+------+------+-----------------------------------+--------------------------------+|  1 |    0 |   10 |                      0.0000000000 |                   0.1000000000 ||  2 |    0 |   10 |                      0.1111111111 |                   0.2000000000 ||  3 |    1 |   10 |                      0.2222222222 |                   0.3000000000 ||  4 |    1 |   10 |                      0.3333333333 |                   0.4000000000 ||  5 |    2 |   20 |                      0.4444444444 |                   0.5000000000 ||  6 |    2 |   20 |                      0.5555555556 |                   0.6000000000 ||  7 |    2 |   20 |                      0.6666666667 |                   0.7000000000 ||  8 |    2 |   10 |                      0.7777777778 |                   0.8000000000 ||  9 |    4 |   20 |                      0.8888888889 |                   0.9000000000 || 10 |    4 |   20 |                      1.0000000000 |                   1.0000000000 |+----+------+------+-----------------------------------+--------------------------------+select pk, a, b,        percent_rank() over (partition by a order by a),        cume_dist() over (partition by a order by a)from t1;+----+------+------+-------------------------------------------------+----------------------------------------------+| pk | a    | b    | percent_rank() over (partition by a order by a) | cume_dist() over (partition by a order by a) |+----+------+------+-------------------------------------------------+----------------------------------------------+|  1 |    0 |   10 |                                    0.0000000000 |                                 1.0000000000 ||  2 |    0 |   10 |                                    0.0000000000 |                                 1.0000000000 ||  3 |    1 |   10 |                                    0.0000000000 |                                 1.0000000000 ||  4 |    1 |   10 |                                    0.0000000000 |                                 1.0000000000 ||  5 |    2 |   20 |                                    0.0000000000 |                                 1.0000000000 ||  6 |    2 |   20 |                                    0.0000000000 |                                 1.0000000000 ||  7 |    2 |   20 |                                    0.0000000000 |                                 1.0000000000 ||  8 |    2 |   10 |                                    0.0000000000 |                                 1.0000000000 ||  9 |    4 |   20 |                                    0.0000000000 |                                 1.0000000000 || 10 |    4 |   20 |                                    0.0000000000 |                                 1.0000000000 |+----+------+------+-------------------------------------------------+----------------------------------------------+

总结

  • 支持 ROWS 和 RANGE类型的Frame
    • 各种类型的Frame界定都支持,包括 RANGE PRECEDING | FOLLOWING n个Frame范围(不同于PostgreSQL 和 SQL Server)
    • RANGE类型的Frame 还不支持 DATE[TIME] 数据类型和四则运算,但MDEV-9727正在进行开发
  • 还不支持 GROUPS类型的Frame(但好像还没有流行的数据库有支持这玩意的)
  • 不支持禁用Frame(好像也没其他数据库支持)
  • 不支持显式的 NULLS FIRST 和 NULLS LAST
  • 不支持窗口函数嵌套处理(就是VALUE_OF(expr AT row_marker [, default_value) 这种语法)
  • 下面这些窗口函数都支持:
    • “Streamable(流式)” 窗口函数:ROW_NUMBER, RANK, DENSE_RANK
    • 一旦分区中的行数知道后,就可以流式处理的窗口函数:PERCENT_RANK, CUME_DIST, NTILE
  • 目前支持窗口函数的聚合函数有: COUNT, SUM, AVG, BIT_OR, BIT_AND, BIT_XOR
  • 包含 DISTINCT 子句的聚合函数不支持作为窗口函数

大家可以在查看具体支持哪些函数以及未来新增了哪些函数

转载地址:http://ucudl.baihongyu.com/

你可能感兴趣的文章
Ubuntu 更改 Gun Make 版本
查看>>
Service学习笔记
查看>>
idea配置git、GitHub
查看>>
Cocopods安装和升级备忘录
查看>>
如何用Python写一个贪吃蛇AI
查看>>
nginx全局变量
查看>>
今日一练习
查看>>
Kylin 在 58 集团的实践和应用
查看>>
javascript性能优化
查看>>
41. First Missing Positive
查看>>
sql的行转列(PIVOT)与列转行(UNPIVOT)
查看>>
sbt配置——数据源问题解决
查看>>
框架模式与设计模式之区别
查看>>
AngularJS+Satellizer+Node.js+MongoDB->Instagram-13
查看>>
CSS 实现打字效果
查看>>
jquery 根据已知值来修改单选框的被选中项
查看>>
html5 aside
查看>>
并发线程学习001(volatile 使用)
查看>>
JSONArray().fromObject(); 出现org.apache.catalina.core.StandardWrapperValve invoke错误的解决办法...
查看>>
取余和取模的小结
查看>>