Oracle

Extended Statistics

Extended statistics were introduced in the oracle cost based optimized beginning with 11g. They are used to gather information on the correlated columns.

For example there is a correlation between the “county” and “city” column of a table. Number of distinct values over these columns can not be calculated by multiplying the NDV (number of distinct values) of these columns. In the following test table for example.
NDV of vmod1 is 5.
NDV of vmod2 is 5.
But NDV of (vmod1 & vmod2) is still 5.

Possible values you can get from the test table :

vmod1 vmod1
0 1
1 2
2 3
3 4
4 5

Prepare the Table

SQL> drop table eser_sil purge;
SQL> create table eser_sil as 
  2  SELECT LEVEL vrow, mod(level,5) vmod1, mod(level,5)+1 vmod2, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' vval
  3  FROM dual
  4  CONNECT BY LEVEL <= 1e6;
SQL> create index eser_sil_x1 on eser_sil (vmod1);

WITHOUT EXTENDED STATISTICS

Gather Statistic on the indexed columns.

SQL>  set lines 200
SQL>  set pages 0
SQL>  explain plan for
  2   select count(*)  from eser_sil
  3          where vmod1 = :a
  4          and vmod2 = :b ;
Explained.
SQL>  begin
  2   dbms_stats.gather_table_stats(ownname => 'VTYUSER',
  3                                 tabname => 'ESER_SIL',
  4                                 method_opt => 'for all  indexed columns size auto' ,
  5                                 estimate_percent => 100 ,
  6                                 no_invalidate => false );
  7   end;
  8   /
PL/SQL procedure successfully completed.

Histograms are also created on the indexed columns. But there are no extended statistics created as expected.

SQL> select table_name, column_name, histogram from dba_tab_col_statistics where table_name = 'ESER_SIL';
ESER_SIL                       VMOD1                          FREQUENCY
ESER_SIL                       VMOD2                          FREQUENCY
SQL>  select table_name, extension_name, extension from dba_stat_extensions where table_name = 'ESER_SIL';
no rows selected

When we examine the expected rows from the explained plan of a query which uses both columns as predicates. We can see that the expected rows is 40000. How did we get this number ? Lets explain. For both columns the number of distinct values is 5. And since there is AND operator the we have to multiply the selectivity of the two columns. See more details on “Cost Based Optimizer (CBO) Overview (Doc ID 10626.1)”

( number of rows ) * ( selectivity of VMOD1 ) 	             * ( selectivity of VMOD2 )                = 
( number of rows ) * ( 1 / No. Of distinct Values of VMOD1 ) * ( 1 / No. Of distinct Values of VMOD2 ) = 
1000000 * (1/5) * (1/5)	=  40000
SQL>  set lines 200
SQL>  set pages 0
SQL>  explain plan for
  2   select count(*)  from eser_sil
  3          where vmod1 = :a
  4          and vmod2 = :b ;

Explained.

SQL> select * from table(dbms_xplan.display);
Plan hash value: 2186261789

---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |     1 |     6 |    96   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE   |             |     1 |     6 |            |          |
|*  2 |   INDEX RANGE SCAN| ESER_SIL_X1 | 40000 |   234K|    96   (0)| 00:00:02 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VMOD1"=TO_NUMBER(:A) AND "VMOD2"=TO_NUMBER(:B))

14 rows selected.

WITH EXTENDED STATISTICS

Calculate statistics with extended statistics on the columns.

SQL> begin
  2  dbms_stats.gather_table_stats(ownname => 'VTYUSER',
  3                                tabname => 'ESER_SIL',
  4                                method_opt => 'for all indexed columns size auto for columns size 10 (vmod1, vmod2)' ,
  5                                estimate_percent => 100 ,
  6                                no_invalidate => false );
  7  end;
  8  /

PL/SQL procedure successfully completed.

Check if the extended statistics are calculated properly.

SQL> select table_name, column_name, histogram from dba_tab_col_statistics where table_name = 'ESER_SIL';
ESER_SIL                       VMOD1                          FREQUENCY
ESER_SIL                       VMOD2                          FREQUENCY
ESER_SIL                       SYS_STUUKQC85N0T0_U4SUOFY4M29V FREQUENCY

SQL>  select table_name, extension_name, extension from dba_stat_extensions where table_name = 'ESER_SIL';
ESER_SIL                       SYS_STUUKQC85N0T0_U4SUOFY4M29V ("VMOD1","VMOD2")

This time when we check the explain plan of query. The correlation between the columns were taken into consideration by the optimizer. This time the optimizer correctly calculated number of distinct values from the combination of these two columns. There would be only 5 distinct values because of the high correlation between the columns.

( number of rows ) * ( selectivity of VMOD1 & VMOD2 ) 		  = 
( number of rows ) * ( 1 / No. Of distinct Values of VMOD1 & VMOD2 )  = 
1000000 * (1/5)  =  200000
SQL>  set lines 200
SQL>  set pages 0
SQL>  explain plan for
  2   select count(*)  from eser_sil
  3          where vmod1 = :a
  4          and vmod2 = :b ;

Explained.

SQL> select * from table(dbms_xplan.display);
Plan hash value: 2186261789

---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |     1 |     6 |   472   (1)| 00:00:06 |
|   1 |  SORT AGGREGATE   |             |     1 |     6 |            |          |
|*  2 |   INDEX RANGE SCAN| ESER_SIL_X1 |   200K|  1171K|   472   (1)| 00:00:06 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VMOD1"=TO_NUMBER(:A) AND "VMOD2"=TO_NUMBER(:B))
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s