ROW_NUMBER 開窗函數優化方案（Oracle && PostgreSQL 效能比對）

幫朋友優化一條很簡單的視窗函數 ROW_NUMBER() OVER() , Oracle 遷移 PostgreSQL專案。

原始SQL和執行計劃

STUDENT_BAK 表我模擬的資料，3千萬行資料。

SELECT STU_ID, STU_NAME, STU_SEX, STU_AGE, STU_DATE, STU_CLASSID, STU_SAL
FROM (SELECT SB.*, 
			 ROW_NUMBER() OVER (PARTITION BY SB.STU_CLASSID,SB.STU_AGE,SB.STU_SEX ORDER BY SB.STU_DATE DESC) RN 
	FROM STUDENT_BAK SB) X
WHERE RN = 1;

Oracle 執行計劃

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |             |      1 |        |   5400 |00:00:28.92 |     160K|    166K|   5702 |
|*  1 |  VIEW                    |             |      1 |     30M|   5400 |00:00:28.92 |     160K|    166K|   5702 |
|*  2 |   WINDOW SORT PUSHED RANK|             |      1 |     30M|  12919 |00:00:28.92 |     160K|    166K|   5702 |
|   3 |    TABLE ACCESS FULL     | STUDENT_BAK |      1 |     30M|     30M|00:00:01.44 |     160K|    160K|      0 |
--------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
"   1 - filter(""RN""=1)"
"   2 - filter(ROW_NUMBER() OVER ( PARTITION BY ""SB"".""STU_CLASSID"",""SB"".""STU_AGE"",""SB"".""STU_SEX"" ORDER BY INTERNAL_FUNCTION(""SB"".""STU_DATE"") DESC "
              )<=1)

PostgreSQL 執行計劃

                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on x  (cost=5085273.94..6210240.94 rows=149996 width=33) (actual time=64787.749..80656.358 rows=5400 loops=1)
   Filter: (x.rn = 1)
   Rows Removed by Filter: 29994600
   ->  WindowAgg  (cost=5085273.94..5835251.94 rows=29999120 width=41) (actual time=64787.747..79956.535 rows=30000000 loops=1)
         ->  Sort  (cost=5085273.94..5160271.74 rows=29999120 width=33) (actual time=64787.728..73281.302 rows=30000000 loops=1)
               Sort Key: sb.stu_classid, sb.stu_age, sb.stu_sex, sb.stu_date DESC
               Sort Method: external merge  Disk: 1369904kB
               ->  Seq Scan on student_bak sb  (cost=0.00..539329.20 rows=29999120 width=33) (actual time=0.008..20064.258 rows=30000000 loops=1)
 Planning Time: 0.123 ms
 Execution Time: 80764.258 ms
(10 行記錄)

Oracle：28秒出結果，PG：80秒出結果，兩者執行時間差距很大。

優化方案一、使用分割區表

將 STUDENT_BAK 表的 STU_CLASSID 列作為 LIST 分割區

Oracle 使用分割區表優化後執行計劃：

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |      1 |        |   5400 |00:00:33.62 |     167K|    167K|
|   1 |  PARTITION LIST ALL       |                 |      1 |     30M|   5400 |00:00:33.62 |     167K|    167K|
|*  2 |   VIEW                    |                 |    300 |     30M|   5400 |00:00:21.59 |     167K|    167K|
|*  3 |    WINDOW SORT PUSHED RANK|                 |    300 |     30M|  12919 |00:00:18.20 |     167K|    167K|
|   4 |     TABLE ACCESS FULL     | STUDENT_PAR_BAK |    300 |     30M|     30M|00:00:01.15 |     167K|    167K|
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
"   2 - filter(""RN""=1)"
"   3 - filter(ROW_NUMBER() OVER ( PARTITION BY ""SB"".""STU_CLASSID"",""SB"".""STU_AGE"",""SB"".""STU_SEX"" ORDER BY "
"              INTERNAL_FUNCTION(""SB"".""STU_DATE"") DESC )<=1)"

PostgreSQL 使用分割區表優化後執行計劃：

                                                                         QUERY PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on x  (cost=10416416.10..11541288.60 rows=149983 width=450) (actual time=54656.336..63401.133 rows=5400 loops=1)
   Filter: (x.rn = 1)
   Rows Removed by Filter: 29994600
   ->  WindowAgg  (cost=10416416.10..11166331.10 rows=29996600 width=458) (actual time=54656.333..62742.542 rows=30000000 loops=1)
         ->  Sort  (cost=10416416.10..10491407.60 rows=29996600 width=450) (actual time=54656.303..56466.219 rows=30000000 loops=1)
               Sort Key: sb.stu_classid, sb.stu_age, sb.stu_sex, sb.stu_date DESC
               Sort Method: external merge  Disk: 1369904kB
               ->  Append  (cost=0.00..539445.00 rows=29996600 width=450) (actual time=0.010..20511.558 rows=30000000 loops=1)
                     ->  Seq Scan on student_par_bak_p0 sb  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.010..67.413 rows=100240 loops=1)
                     ->  Seq Scan on student_par_bak_p1 sb_1  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.010..67.507 rows=100169 loops=1)
                     ->  Seq Scan on student_par_bak_p2 sb_2  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.013..65.356 rows=100260 loops=1)
                     ->  Seq Scan on student_par_bak_p3 sb_3  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.009..65.545 rows=100052 loops=1)
                     ->  Seq Scan on student_par_bak_p4 sb_4  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.010..65.411 rows=100078 loops=1)
                     ->  Seq Scan on student_par_bak_p5 sb_5  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.010..65.672 rows=99757 loops=1)
                     ->  Seq Scan on student_par_bak_p6 sb_6  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.007..66.148 rows=100510 loops=1)
                     ->  Seq Scan on student_par_bak_p7 sb_7  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.008..65.720 rows=99636 loops=1)
                     ->  Seq Scan on student_par_bak_p8 sb_8  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.013..63.716 rows=100625 loops=1)
                     ->  Seq Scan on student_par_bak_p9 sb_9  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.007..64.223 rows=99965 loops=1)
                     ->  Seq Scan on student_par_bak_p10 sb_10  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.009..65.790 rows=100186 loops=1)
                     ->  Seq Scan on student_par_bak_p11 sb_11  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.011..65.542 rows=100171 loops=1)
                     ->  Seq Scan on student_par_bak_p12 sb_12  (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.015..64.736 rows=99841 loops=1)
                    /* 此處省略 200 行分割區表計劃 */
 Planning Time: 0.688 ms
 Execution Time: 63466.714 ms
(310 行記錄)

Oracle：33秒出結果，PG：63秒出結果。

Oracle使用分割區表後反而比原來的單錶慢了5秒，但是PG使用分割區表以後比原來快了17秒（效果不錯）。

優化方案二、分割區表的基礎上使用並行

Oracle 語句新增 HINT parallel(8) 執行計劃

------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |      1 |        |   5400 |00:00:02.80 |     910 |
|   1 |  PX COORDINATOR                |                 |      1 |        |   5400 |00:00:02.80 |     910 |
|   2 |   PX SEND QC (RANDOM)          | :TQ10001        |      0 |     30M|      0 |00:00:00.01 |       0 |
|*  3 |    VIEW                        |                 |      0 |     30M|      0 |00:00:00.01 |       0 |
|*  4 |     WINDOW SORT PUSHED RANK    |                 |      0 |     30M|      0 |00:00:00.01 |       0 |
|   5 |      PX RECEIVE                |                 |      0 |     30M|      0 |00:00:00.01 |       0 |
|   6 |       PX SEND HASH             | :TQ10000        |      0 |     30M|      0 |00:00:00.01 |       0 |
|*  7 |        WINDOW CHILD PUSHED RANK|                 |      0 |     30M|      0 |00:00:00.01 |       0 |
|   8 |         PX BLOCK ITERATOR      |                 |      0 |     30M|      0 |00:00:00.01 |       0 |
|*  9 |          TABLE ACCESS FULL     | STUDENT_PAR_BAK |      0 |     30M|      0 |00:00:00.01 |       0 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
"   3 - filter(""RN""=1)"
"   4 - filter(ROW_NUMBER() OVER ( PARTITION BY ""SB"".""STU_CLASSID"",""SB"".""STU_AGE"",""SB"".""STU_SEX"" ORDER BY "
"              INTERNAL_FUNCTION(""SB"".""STU_DATE"") DESC )<=1)"
"   7 - filter(ROW_NUMBER() OVER ( PARTITION BY ""SB"".""STU_CLASSID"",""SB"".""STU_AGE"",""SB"".""STU_SEX"" ORDER BY "
"              INTERNAL_FUNCTION(""SB"".""STU_DATE"") DESC )<=1)"
   9 - access(:Z>=:Z AND :Z<=:Z)
 
Note
-----
   - Degree of Parallelism is 8 because of hint

PostgreSQL 語句新增 HINT parallel(8 sb hard) 執行計劃

                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on x  (cost=9876971.10..11001843.60 rows=149983 width=450) (actual time=52118.006..61222.471 rows=5400 loops=1)
   Filter: (x.rn = 1)
   Rows Removed by Filter: 29994600
   ->  WindowAgg  (cost=9876971.10..10626886.10 rows=29996600 width=458) (actual time=52118.003..60558.094 rows=30000000 loops=1)
         ->  Sort  (cost=9876971.10..9951962.60 rows=29996600 width=450) (actual time=52117.986..54214.307 rows=30000000 loops=1)
               Sort Key: sb.stu_classid, sb.stu_age, sb.stu_sex, sb.stu_date DESC
               Sort Method: external merge  Disk: 1369904kB
               ->  Gather  (cost=0.00..0.00 rows=29996600 width=450) (actual time=2.655..9516.724 rows=30000000 loops=1)
                     Workers Planned: 8
                     Workers Launched: 8
                     ->  Parallel Append  (cost=0.00..0.00 rows=3749575 width=450) (actual time=0.020..10158.207 rows=3333333 loops=9)
                           ->  Parallel Seq Scan on student_par_bak_p0 sb  (cost=0.00..0.00 rows=12499 width=450) (actual time=0.018..400.671 rows=100240 loops=1)
                           ->  Parallel Seq Scan on student_par_bak_p1 sb_1  (cost=0.00..0.00 rows=12499 width=450) (actual time=0.018..410.512 rows=100169 loops=1)
                           ->  Parallel Seq Scan on student_par_bak_p2 sb_2  (cost=0.00..0.00 rows=12499 width=450) (actual time=0.024..344.441 rows=100260 loops=1)
                           ->  Parallel Seq Scan on student_par_bak_p3 sb_3  (cost=0.00..0.00 rows=12499 width=450) (actual time=0.021..337.809 rows=100052 loops=1)
                           ->  Parallel Seq Scan on student_par_bak_p4 sb_4  (cost=0.00..0.00 rows=12499 width=450) (actual time=0.019..406.484 rows=100078 loops=1)
                           ->  Parallel Seq Scan on student_par_bak_p5 sb_5  (cost=0.00..0.00 rows=12499 width=450) (actual time=0.020..385.837 rows=99757 loops=1)
                           ->  Parallel Seq Scan on student_par_bak_p6 sb_6  (cost=0.00..0.00 rows=12499 width=450) (actual time=0.023..259.886 rows=100510 loops=1)
                           ->  Parallel Seq Scan on student_par_bak_p7 sb_7  (cost=0.00..0.00 rows=12499 width=450) (actual time=0.019..293.689 rows=99636 loops=1)
                          /* 此處省略 200 行分割區表計劃 */
 Planning Time: 0.564 ms
 Execution Time: 61279.150 ms
(313 行記錄)

Oracle：2.8秒出結果，PG：61秒出結果。
Oracle使用並行以後提升巨大、PG 感覺並行感覺沒有對row_number進行很好的優化，和原來分割區表執行時間差別不大：61s。

優化方案三、ROW_NUMBER() OVER() 等價改寫

在原來分割區表、8個並行的基礎上等價改寫 ROW_NUMBER() OVER() 邏輯

等價改寫SQL不放出來，看到這裡有想要繼續深入學習的讀者可以聯絡我，畢竟我也想了一會才想到實現的方法，純邏輯轉換題。

Oracle 等價改寫ROW_NUMBER() OVER() 計劃

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |      1 |        |   5400 |00:00:00.69 |     910 |       |
|   1 |  PX COORDINATOR               |                 |      1 |        |   5400 |00:00:00.69 |     910 |       |
|   2 |   PX SEND QC (RANDOM)         | :TQ10001        |      0 |     37 |      0 |00:00:00.01 |       0 |       |
|   3 |    NESTED LOOPS               |                 |      0 |     37 |      0 |00:00:00.01 |       0 |       |
|   4 |     VIEW                      |                 |      0 |   2700 |      0 |00:00:00.01 |       0 |       |
|   5 |      SORT GROUP BY            |                 |      0 |   2700 |      0 |00:00:00.01 |       0 | 68608 |
|   6 |       PX RECEIVE              |                 |      0 |   2700 |      0 |00:00:00.01 |       0 |       |
|   7 |        PX SEND HASH           | :TQ10000        |      0 |   2700 |      0 |00:00:00.01 |       0 |       |
|   8 |         SORT GROUP BY         |                 |      0 |   2700 |      0 |00:00:00.01 |       0 |   585K|
|   9 |          PX BLOCK ITERATOR    |                 |      0 |     30M|      0 |00:00:00.01 |       0 |       |
|* 10 |           TABLE ACCESS FULL   | STUDENT_PAR_BAK |      0 |     30M|      0 |00:00:00.01 |       0 |       |
|* 11 |     TABLE ACCESS BY USER ROWID| STUDENT_PAR_BAK |      0 |      1 |      0 |00:00:00.01 |       0 |       |
-------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
  10 - access(:Z>=:Z AND :Z<=:Z)
"  11 - filter((""B"".""STU_CLASSID""=""X2"".""STU_CLASSID"" AND ""B"".""STU_AGE""=""X2"".""STU_AGE"" AND ""B"".""STU_SEX""=""X2"".""STU_SEX"" AND "
"              ""B"".""STU_DATE""=""X2"".""MAX_STU_DATE""))"
 
Note
-----
   - Degree of Parallelism is 8 because of hint

PostgreSQL 等價改寫ROW_NUMBER() OVER() 計劃

						
																QUERY PLAN    

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=5627880.30..6621517.67 rows=281 width=450) (actual time=26827.678..32286.183 rows=5400 loops=1)
   Hash Cond: ((x1.stu_classid = sb.stu_classid) AND (x1.stu_age = sb.stu_age) AND ((x1.stu_sex)::text = (sb.stu_sex)::text) AND ((x1.stu_date)::timestamp without time zone = (((max(
(sb.stu_date)::timestamp without time zone))::date))::timestamp without time zone))
   CTE x
     ->  Gather  (cost=0.00..0.00 rows=29996600 width=456) (actual time=3.752..6509.459 rows=30000000 loops=1)
           Workers Planned: 8
           Workers Launched: 8
           ->  Parallel Append  (cost=0.00..0.00 rows=3749575 width=456) (actual time=0.018..10115.770 rows=3333333 loops=9)
                 ->  Parallel Seq Scan on student_par_bak_p0 b  (cost=0.00..0.00 rows=12499 width=456) (actual time=0.020..308.911 rows=100240 loops=1)
                 ->  Parallel Seq Scan on student_par_bak_p1 b_1  (cost=0.00..0.00 rows=12499 width=456) (actual time=0.018..308.579 rows=100169 loops=1)
                 ->  Parallel Seq Scan on student_par_bak_p2 b_2  (cost=0.00..0.00 rows=12499 width=456) (actual time=0.017..248.827 rows=100260 loops=1)
                 ->  Parallel Seq Scan on student_par_bak_p3 b_3  (cost=0.00..0.00 rows=12499 width=456) (actual time=0.016..278.987 rows=100052 loops=1)
                 ->  Parallel Seq Scan on student_par_bak_p4 b_4  (cost=0.00..0.00 rows=12499 width=456) (actual time=0.021..316.482 rows=100078 loops=1)
                 ->  Parallel Seq Scan on student_par_bak_p5 b_5  (cost=0.00..0.00 rows=12499 width=456) (actual time=0.016..306.535 rows=99757 loops=1)
                 ->  Parallel Seq Scan on student_par_bak_p6 b_6  (cost=0.00..0.00 rows=12499 width=456) (actual time=0.016..312.291 rows=100510 loops=1)
                 ->  Parallel Seq Scan on student_par_bak_p7 b_7  (cost=0.00..0.00 rows=12499 width=456) (actual time=0.015..187.120 rows=99636 loops=1)
                          /* 此處省略 200 行分割區表計劃 */
   ->  CTE Scan on x x1  (cost=0.00..599932.00 rows=29996600 width=456) (actual time=3.754..1339.590 rows=30000000 loops=1)
   ->  Hash  (cost=5560387.95..5560387.95 rows=2999660 width=30) (actual time=26808.484..26808.486 rows=5400 loops=1)
         Buckets: 4194304  Batches: 1  Memory Usage: 33095kB
         ->  GroupAggregate  (cost=5042946.60..5530391.35 rows=2999660 width=30) (actual time=20922.491..26805.407 rows=5400 loops=1)
               Group Key: sb.stu_classid, sb.stu_age, sb.stu_sex
               ->  Sort  (cost=5042946.60..5117938.10 rows=29996600 width=30) (actual time=20921.360..22292.427 rows=30000000 loops=1)
                     Sort Key: sb.stu_classid, sb.stu_age, sb.stu_sex
                     Sort Method: external merge  Disk: 1174192kB
                     ->  CTE Scan on x sb  (cost=0.00..599932.00 rows=29996600 width=30) (actual time=0.004..10345.811 rows=30000000 loops=1)
 Planning Time: 1.540 ms
 Execution Time: 32535.572 ms
(318 行記錄)

時間：32541.503 ms (00:32.542)

Oracle：0.69秒出結果，PG：32秒出結果。
Oracle執行速度感覺打了雞血一樣，太猛了，PG改寫以後也能從61秒降低到32秒跑出結果

目前只能將PG的 ROW_NUMBER() OVER() 執行速度優化到32秒，後續再想想有沒有更快的優化方案。

結束語

上面 ROW_NUMBER() OVER() 的三種優化手段，如果在專案上有遇到類似的案例，先要搞清楚環境需求，再選擇採用哪種優化手段。

如果讀者有更好的優化思路，歡迎一起交流，進步。