Oracle 78일차 - PLTRwillsoar

Table of Contents

[통계정보]

통계 정보 수집 시 각 컬럼의 히스토그램 수집 가능 -> 각 컬럼의 분포에 따른 실행 계획 세울 수 있음.
만약 시간이 지날수록 분포가 달라진다면 최적의 실행 계획 생성 불가. 따라서 주기적으로 통계정보 업데이트 필요!
– 옵티마이저가 실행계획을 세울 때 판단 근거가 되는 기준
– 테이블, 인덱스, 컬럼 통계 정보
– 주기적 수집 및 업데이트 필요

** 수집 통계정보 종류
1) 테이블 통계정보

NUM_ROWS: 테이블의 총 행(row) 수.
BLOCKS: 테이블이 차지하는 데이터 블록 수.
EMPTY_BLOCKS: 비어 있는 데이터 블록 수.
AVG_ROW_LEN: 평균 행 길이.
CHAIN_CNT: 체인된 행(row chaining) 또는 마이그레이션된 행(row migration)의 수.
AVG_SPACE: 비어 있는 공간의 평균 크기.
AVG_SPACE_FREELIST_BLOCKS: 비어 있는 공간을 포함한 블록의 평균 수.
NUM_FREELIST_BLOCKS: FREELIST 블록의 수.
LAST_ANALYZED: 마지막으로 분석된 날짜와 시간.
SAMPLE_SIZE: 통계를 수집할 때 사용된 샘플의 크기.
DEGREE: 병렬 처리 수준.
INSTANCES: 병렬 인스턴스 수.
CACHE: 테이블이 캐시에 저장될지 여부.
PARTITIONED: 테이블이 파티셔닝되었는지 여부.
IOT_TYPE: 인덱스 조직 테이블의 유형.

2) 인덱스 통계정보

BLOCKS: 인덱스 차지하는 데이터 블록 수.
EMPTY_BLOCKS: 인덱스 있는 데이터 블록 수.

3) 컬럼 통계정보

NUM_DISTINCT: 해당 컬럼의 고유 값 수.
DENSITY: 값의 밀도(고유 값 수의 역수).
NUM_NULLS: 해당 컬럼에 NULL이 들어 있는 행의 수.
LOW_VALUE: 해당 컬럼의 최소값.
HIGH_VALUE: 해당 컬럼의 최대값.
AVG_COL_LEN: 해당 컬럼의 평균 길이.
HISTOGRAM: 값의 분포를 나타내는 히스토그램*
SAMPLE_SIZE: 통계를 수집할 때 사용된 샘플의 크기.
CHARACTER_SET: 문자 집합(CHAR/VARCHAR2 컬럼의 경우).
GLOBAL_STATS: 통계가 글로벌 통계인지 로컬 통계인지 여부.
USER_STATS: 사용자 정의 통계가 있는지 여부.

통계정보 수집 방법

1. analyze
– 통계정보를 간단하고 빠르게 수집하는 방식.
– oracle에서는 권고하지 않음.
– 히스토그램에 대해 정확하게 수집하지 않는다.

2. dbms_stats 패키지
– analyze 보다 많은 기능을 가지고 통계정보를 수집.
– sample_size, degree 등의 옵션 전달 가능.
– 컬럼 히스토그램을 자세하게 수집할 수 있음.

[analyze]

1. 테이블 단위
analyze table 테이블명 compute statistics;

2. 인덱스 단위
analyze index 인덱스명 compute statistics;

3. 컬럼 단위
analyze table 테이블명 compute statistics for columns 컬럼1, 컬럼2, …;

실습

1. 통계정보 수집 확인

-- 테이블 통계정보 확인)
select t.owner, t.table_name, t.last_analyzed, s.bytes
from dba_tables t, dba_segments s
where 1=1
and t.owner = s.owner
and t.table_name = s.segment_name
and t.owner = 'SCOTT'
order by 4 desc;

-- 컬럼 통계정보 확인)
select *
from dba_tab_columns
where table_name = 'LARGE_ACCOUNT';

-- 인덱스 통계정보 확인)
select *
from dba_indexes
where owner = 'SCOTT';

analyze table scott.dept compute statistics;
analyze table scott.large_account compute statistics;

[dbms_stats]

exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'scott', TABNAME=>'emp');

** 파라미터 
1) OWNNAME : 테이블 소유자의 스키마 이름
2) TABNAME : 테이블의 이름
3) PARTNAME : 통계를 수집할 파티션의 이름
4) ESTIMATE_PERCENT : 샘플링 비율을 백분율로 지정 ******
5) BLOCK_SAMPLE : 블록 샘플링을 사용할지 여부를 지정
6) METHOD_OPT : 히스토그램 수집 옵션을 지정*********
기본값: 'FOR ALL COLUMNS SIZE AUTO'
* SIZE : 히스토그램 수집 시 사용하는 버킷 수(버킷수가 많을수록 세부 분포를 정확하게 나타내지만, 생성과 관리에 더 많은 자원 소모)
7) DEGREE : 병렬 처리의 정도
8) GRANULARITY : 통계를 수집할 세부 수준을 지정
기본값: 'DEFAULT'
옵션: 'DEFAULT', 'ALL', 'GLOBAL AND PARTITION', 'APPROX_GLOBAL AND PARTITION', 'PARTITION', 'SUBPARTITION'
9) CASCADE : 테이블과 연결된 인덱스 통계도 함께 수집할지 여부를 지정******** TRUE
10) STALE_PERCENT : 테이블의 변경 임계값을 지정
11) NO_INVALIDATE : 통계 수집 후 커서 캐시 무효화 동작을 지정
12) STAT_TAB : 통계를 저장할 테이블 이름을 지정
13) STATID : 통계 저장 시 사용할 식별자를 지정
14) STATOWN : 통계가 저장될 테이블의 소유자 스키마 이름을 지정합
15) FORCE : 통계를 강제로 수집할지 여부를 지정

실습

1. 통게정보 수집 확인(테이블)

exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'scott', TABNAME=>'emp');

** 쌤이 썼던 통계정보 수집 스크립트 생성 쿼리

SELECT chr(39) FROM DUAL;  -- single quote 아스키 코드

select 'exec dbms_stats.gather_table_stats(ownname=>'||chr(39)||OWNER||chr(39)||',tabname=>'||chr(39)||table_name||chr(39)
    || ',estimate_percent=>'
    || case when MBYTES < 10    then '100'
            when MBYTES < 100   then '20'
            when MBYTES < 1000  then '5'
            when MBYTES < 5000  then '5'
            when MBYTES < 10000 then '1'
            else '1'
       end
    ||',CASCADE=>TRUE,granularity=>'||chr(39)||'ALL'||chr(39)||' ,method_opt=>'||chr(39)||'FOR ALL COLUMNS SIZE AUTO'||chr(39)
    ||' ,degree=>'||
       case when MBYTES <  1000  then '8'
            when MBYTES <  5000  then '8'
            when MBYTES <  10000 then '16'
            when MBYTES >= 10000 then '32'
       end
    || ',no_invalidate=>FALSE);' as STMT
  from (select OWNER, TABLE_NAME, sum(MBYTES) MBYTES
          from (select OWNER, SEGMENT_NAME TABLE_NAME, PARTITION_NAME, round(sum(BYTES)/1024/1024,0) MBYTES
                  from DBA_SEGMENTS seg
                 where 1=1
                   and OWNER       = 'SCOTT'
                   and SEGMENT_TYPE = 'TABLE'
--                   and SEGMENT_NAME like '%'
--                   and length(SEGMENT_NAME) = 14
                 group by OWNER, SEGMENT_NAME, PARTITION_NAME)
         group by OWNER, TABLE_NAME)
 order by MBYTES desc
;

[테이블 압축]

9i 부터 지원.
읽기 전용, 로그성 테이블 같이 대용량, 수정이 잦지 않은 테이블을 주로 압축.
컬럼의 중복값을 실제 물리적으로 저장하지 않고 block에 심볼테이블을 생성하여 해당 값을 기록하면서 실제 저장 공간을 효율적으로 사용.
selectivity가 낮을수록, cardinality가 높은 컬럼일수록 유리하다. = distinct값이 적다.
-> 수치형 컬럼을 많이 갖지 않고 범주형 컬럼(카테고리)을 많이 가질수록 압축 효율성 증대.

** 장점)
– 저장공간 save
– disk I/O 감소

** 단점)
– 압축(compress) / 해제(uncompress)에 시간이 오래 걸림
– update 성능 저하

실습

1. 압축 확인

select owner, table_name, null as partition_name, compression
from dba_tables
where compression = 'ENABLED'
union all
select table_owner, table_name, partition_name, compression
from dba_tab_partitions
where compression = 'ENABLED';

2. 압축 테이블 생성

create table scott.s_suju2
nologging
parallel 4
compress
as
select * from scott.s_suju;

alter table scott.s_suju2 logging;
-- 안바꾸면 redo log 안 생기는 테이블이 되서 복구 불가능. logging으로 바꾼 후에 백업도 다시 받아야 됨.

alter table scott.s_suju2 noparallel;
-- 병렬인 채로 냅두면 조회 시마다 코어를 여러 개 갖다 써서 cpu가 풀참.

3. 사이즈 확인

select segment_name, bytes
from dba_segments
where segment_name like 'S_SUJU%';

select round((268435456 - 154796032) / 268435456 * 100, 2)  -- 압축률 : 42.33%
from dual;

압축 효과 비교

중복값을 많이 갖지 않는 경우 테이블 압축률이 떨어짐

1. 테이블 생성

create table scott.compress_test1(col1 number, col2 varchar2(30), col3 varchar2(30));

begin
  for i in 1..1000000 loop
    insert into scott.compress_test1
    select i, dbms_random.string('U',29), dbms_random.string('A',29) from dual;
  end loop;
  commit;
end;
/

2. 압축

create table scott.compress_test1_comp
nologging
parallel 4
compress
as
select /*+ parallel(a 4) */ *
from scott.compress_test1 a;

alter table scott.compress_test1_comp logging;
alter table scott.compress_test1_comp noparallel;

3. 압축률 확인

select segment_name, bytes
from dba_segments
where segment_name like 'COMPRESS_TEST1%';

select round((83886080 - 72876032) / 83886080 * 100, 2)  -- 압축률 : 13.13%
from dual;

[통계정보 자동 수집 기능]

통계정보는 내부 job 형태로 자동으로 수집됨. (default)
통계정보 자동 수집 시 부하가 크기 때문에 DB 설치 후 자동 수집 disable 시킴을 권고.

1. 파라미터 확인

select client_name, status
from dba_autotask_client
where client_name = 'auto optimizer stats collection';

2. disable 변경

exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection', operation=>null, window_name=>null);

[조인 내부 수행 종류]

1. NL Join (Nested Loop join)
– 데이터를 한 건씩 찾아 연결하는 조인 방식.
– OLTP 환경일 때 유리한 조인 방식.
– 후행 테이블의 조인키에 인덱스가 설계가 된 경우 주로 발생.

2. Sort Merge Join
– 많은 양의 데이터를 조인 후 출력하는 경우 양쪽 테이블을 조인키에 대해 모두 정렬한 뒤 순서대로 데이터를 연결하는 조인 방식.
– NL join으로 풀리지 않는 경우 수행(후행 테이블 조인키에 인덱스가 없는 경우)
– sort 연산은 1차적으로 PGA에서 수행, 2차적으로 temp tablespace에서 수행.
-> PGA 사이즈가 작게 설정되어 있는 경우 sort merge join cost가 높아질 수 있음
-> 양쪽 테이블 조인키에 인덱스를 생성해두면 이미 정렬되어있어서 정렬 연산을 추가적으로 수행하지 않아도 됨. = sort merge join cost를 낮출 수 있음.

3. Hash Join
– 데이터를 hash function에 의해 리턴되는 hash value값에 의해 찾아가는 조인 과정.
– hash function을 생성하는 cost 발생. -> cost를 줄이기 위해 작은 테이블을 선행테이블로 지정. 조인키 컬럼이 selectivity가 높을수록, cardinality가 낮을수록 유리.
– NL join, sort merge join이 유리하지 않은 경우 유리. (index가 없는, 대용량의 데이터 조인 시)

실습

1. 인덱스 조회 및 기존 인덱스 삭제

select i1.owner, i1.index_name, i1.table_name, i2.column_name,
       'drop index '||i1.owner||'.'||i1.index_name||';'
from dba_indexes i1, dba_ind_columns i2
where 1=1
and i1.owner = i2.index_owner
and i1.index_name = i2.index_name
and i1.owner = 'SCOTT'
and i1.table_name in ('LARGE_EMP', 'LARGE_DEPT');

drop index SCOTT.IDX_LARGE_EMP_DEPTNO_SAL;

2. 조인 및 실행계획 확인

alter session set statistics_level = all;

select e.ename, d.dname
from scott.large_emp e, scott.large_dept d
where e.deptno = d.deptno;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));
-- hash join

3. 인덱스 생성 후 조인 실행계획 확인

create index scott.idx_large_dept_deptno on scott.large_dept(deptno);

select e.ename, d.dname
from scott.large_emp e, scott.large_dept d
where e.deptno = d.deptno;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));
-- 여전히 hash join 하는 중

-- hint 줘서 NL join 하게 만듦)
select /*+ use_nl(e d) */ e.ename, d.dname
from scott.large_emp e, scott.large_dept d
where e.deptno = d.deptno;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));
-- NL join이 사이즈 큰 경우에는 유리하지 않다.
-- 쿼리는 OLTP성이 아닌 전체 데이터를 출력하는 형태이므로, 후행테이블의 조인키에 인덱스가 생성되어 있어도 NL join이 유리하지 않음.

-- hint 줘서 sort merge join 하게 만듦)
create index scott.idx_large_emp_deptno on scott.large_emp(deptno);
create index scott.idx_large_dept_deptno on scott.large_emp(deptno);

select /*+ use_merge(e d) */ e.ename, d.dname
from scott.large_emp e, scott.large_dept d
where e.deptno = d.deptno;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));
-- cost가 NL join보다 낮아졌다. 하지만 hash가 훨씬 낮다.
-- 대용량 테이블의 조회 시 조인 테이블의 양쪽 조인키에 인덱스가 생성된 경우 NL join보다 sort merge join이 유리함.

— 예제) 단건 처리하는 쿼리의 조인 성능 비교
– 단건 처리 쿼리의 경우 후행 테이블 조인키에 인덱스가 생성된 경우 NL join 유리!

-- data를 좀 고쳐놔서 매칭이 안되서 update로 일단 하나 매칭 시켜줌.
update scott.large_emp
set deptno = 11
where empno = 29939;
commit;

-- update한 empno를 조인 시킴.
select e.ename, d.dname
from scott.large_emp e, scott.large_dept d
where e.deptno = d.deptno
and empno = 29939;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));
-- NL join으로 됨.

** 실행 순서 : 3-4-2-5-1-0
위처럼 3번에서 full access로 cost 다 먹으니 거기다가 unique index를 추가하면 좀 나아질 거라고 추측해볼 수 있다.

create index scott.idx_large_emp_empno on scott.large_emp(empno);

select e.ename, d.dname
from scott.large_emp e, scott.large_dept d
where e.deptno = d.deptno
and empno = 29939;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));

** 읽는 순서 : 4-3-5-2-6-1-0

[조인 힌트]

1. 조인 방법

NL join : /*+ use_nl(선행테이블명 후행테이블명) */
sort merge join : /*+ use_merge(선행테이블명 후행테이블명) */
hash join : /*+ use_hash(선행테이블명 후행테이블명) */

2. 테이블 스캔 순서(선행 테이블 순서)

from절 나열 순서대로 : /*+ ordered */
선행테이블 지정가능 : /*+ leading(선행테이블명) */

[조인 내부 수행 비교 실습]

1. orcl 에 접속 (11g, windows) & 테이블 조회

select * from oe.customers;
select * from oe.orders;

2. 기존 인덱스 삭제

-- 제약조건 삭제) pk의 경우 fk가 참조하고 있다면 cascade 옵션으로 둘 다 drop해야 삭제 가능.
select 'alter table '||owner||'.'||table_name||' drop constraint '||constraint_name||' cascade;'
from dba_constraints
where table_name in ('CUSTOMERS','ORDERS')
and owner = 'OE'
and constraint_type = 'P';
---------------------------------------------------------------------------
alter table OE.ORDERS drop constraint ORDER_PK cascade;
alter table OE.CUSTOMERS drop constraint CUSTOMERS_PK cascade;

-- 기타 인덱스 삭제)
select 'drop index '||owner||'.'||index_name||';'
from dba_indexes
where table_name in ('CUSTOMERS','ORDERS')
and owner = 'OE'
and index_type != 'LOB';
------------------------------------------------------------
drop index OE.CUST_UPPER_NAME_IX;
drop index OE.CUSTOMERS_PK;
drop index OE.CUST_ACCOUNT_MANAGER_IX;
drop index OE.CUST_LNAME_IX;
drop index OE.CUST_EMAIL_IX;
drop index OE.ORD_SALES_REP_IX;
drop index OE.ORD_CUSTOMER_IX;
drop index OE.ORD_ORDER_DATE_IX;
drop index OE.ORDER_PK;

3. 조인 수행 및 실행계획 확인

select *
from oe.customers c, oe.orders o
where c.customer_id = o.customer_id
and c.customer_id = 147;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));
-- 양쪽 테이블 조인키에 인덱스가 없기 때문에 hash join됨.

-- 양쪽에 index 생성해서 sort merge join을 수행
create index oe.idx_customers_id on oe.customers(customer_id);
create index oe.idx_orders_id on oe.orders(customer_id);

select *
from oe.customers c, oe.orders o
where c.customer_id = o.customer_id
and c.customer_id = 147;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));

-- hint 사용해서 NL join으로 유도 - 선행테이블은 c 에 대해서 where절에 조건이 들어갔으므로 c가 되는게 낫다.

select /*+ use_nl(c o) */ *
from oe.customers c, oe.orders o
where c.customer_id = o.customer_id
and c.customer_id = 147;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));
-- OLTP성 때문에 NL join도 빠르다.

** 읽는 순서 : 3-2-5-4-1-0

[DW성 쿼리의 조인 방식 비교 실습]

1. NL join

select /*+ use_nl(c o) */ *
from oe.customers c, oe.orders o
where c.customer_id = o.customer_id;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));

2. sort merge join

select /*+ use_merge(c o) */ *
from oe.customers c, oe.orders o
where c.customer_id = o.customer_id;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));

3. hash join

select /*+ use_hash(c o) */ *
from oe.customers c, oe.orders o
where c.customer_id = o.customer_id;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced'));