DATA 전문가로 가는 길

[SQL] Oracle LISTAGG, XMLAGG(XMLELEMENT) 해결 되지 않는 문자열 큰 데이터 병합 하는 방법(STRAGG->CLOB, XMLAGG().GETCLOBVAL) 본문

Data Architecture/SQL Query

[SQL] Oracle LISTAGG, XMLAGG(XMLELEMENT) 해결 되지 않는 문자열 큰 데이터 병합 하는 방법(STRAGG->CLOB, XMLAGG().GETCLOBVAL)

EstenPark 2016. 4. 22. 17:15

문자열을 그룹별로 합쳐야 할 때 LISTAGG, XMLAGG를 주로 많이 사용 하거나 SYS_CONNECT_BY_PATH를 활용하기도 합니다. 하지만 4000 Byte 초과할 경우 "[Error] Execution (8: 1): ORA-01489: 문자열 연결의 결과가 너무 깁니다" 에러 메시지를 종종 보게 됩니다.


따라서 이번에는 4000 Byte 초과하는 문자열을 합쳐볼  수 있는지 확인 해보겠습니다.


오라클에서 제공하는 함수, 프로시저, 패키지로는 불가능하고, Object Type과 Table을 활용해서 CLOB 형태의 데이터를 합쳐보겠습니다.


1. 테스트 데이터 가공
DROP TABLE EST_DATA;

CREATE TABLE EST_DATA (ROW_GRP_NO NUMBER(9),ROW_NO NUMBER(9), ROW_DATA VARCHAR2(4000));

INSERT INTO EST_DATA VALUES(1, 1, 'Fun with CLOBS! If you are using Oracle, if you have to deal with text that is over 4000 bytes, you will probably find yourself dealing with CLOBs, which can go up to 4GB. They are pretty tricky, and it took me a long time to figure out these lessons learned. I hope they will help some down-trodden developer out there somehow Here is my original code, which worked great on my Oracle Express Edition: (for all examples, the first one writes a new CLOB, the next one Updates an existing CLOB and the final one reads a CLOB back)');

INSERT INTO EST_DATA VALUES(1, 2, 'When getting the substring of a CLOB column and using a query tool that has size/buffer restrictions sometimes you would need to set the BUFFER to a larger size. For example while using SQL Plus use the SET BUFFER 10000 to set it to 10000 as the default is 4000.');

INSERT INTO EST_DATA VALUES(1, 3, 'A CLOB (character large object) value can be up to 2,147,483,647 characters long. A CLOB is used to store unicode character-based data, such as large documents in any character set.');

INSERT INTO EST_DATA VALUES(1, 4, 'The length is given in number characters for both CLOB, unless one of the suffixes K, M, or G is given, relating to the multiples of 1024, 1024*1024, 1024*1024*1024 respectively.');

INSERT INTO EST_DATA VALUES(1, 5, 'The following article presents a simple methods for importing a file into a CLOB datatype. First a directory object is created to point to the relevant filesystem directory.');

INSERT INTO EST_DATA VALUES(1, 6, 'LOB stands for Large OBject, a stream of data stored in a database. Maximum capacity of a LOB is (4 gigabytes-1) bytes. In Oracle three kinds of LOB data type exist: BLOB datatype stores unstructured binary large objects. BLOB objects can be thought of as bitstreams with no character set semantics. The CLOB datatype stores single-byte and multibyte character data. Both fixed-width and variable-width character sets are supported, and both use the database character set. The NCLOB datatype stores Unicode data. dotConnect for Oracle supports all three datatypes. You can retrieve values of LOB fields using OracleDataReader as well as other types like LONG and LONG RAW. The difference with usage of LOB data type becomes evident when you need to access these fields in DML and PL/SQL statements. For BLOB and CLOB data types only LOB locators (pointers to data) are stored in table columns; actual BLOB and CLOB data is stored in separate tablespace. This is the difference to the way that data of LONG or LONG RAW types is stored in database - tables hold their immediate values. Another issue you should be aware of is temporary LOBs. This kind of object is not referenced by any table yet. It exists in current transaction only. You have to use temporary LOBs when inserting new data to a table. In dotConnect for Oracle, you can create temporary LOB using OracleLob constructors. Once you create an OracleLob instance corresponding temporary LOB appears on the server, and any data you insert into the object is sent immediately to server. An example of using temporary LOBs you will see later in the article. inary data is generally represented as stream of bytes, or buffers. dotConnect for Oracle allows manipulating BLOB data in most convenient ways. You can work with OracleLob.Value property or use OracleLob.Read() and OracleLob.Write() methods to transfer data to and from server. Both ways are presented in the samples below. Note that when OracleLob.LobType property is OracleDbType.Blob the OracleLob.Value is treated as array of bytes (byte[]), whereas OracleDbType.Clob and OracleDbType.NClob represent OracleLob.Value as string. The next sample routines show how to upload a file from hard disk to server and download it back. To execute these routines you have to create a table that is described as follows: Working with CLOB data generally is same as working with BLOB data. The difference is in representation of Value property. For CLOB and NCLOB data types when you read from OracleLob.Value property, you get strings. The data is transparently decoded so you do not have to take care about its character set. If you need raw data however you can use streaming capabilities of OracleLob, that is OracleLob.Read and OracleLob.Write methods. Note that since OracleLob.Value is a read-only property, you can write data to LOB only with OracleLob.Write method, and no encoding is performed. For example consider this table definition.');

COMMIT;


2. 일반적인 방법으로 사용할 경우(LISTAGG)

SELECT    LISTAGG (a.ROW_DATA, ',') WITHIN GROUP (ORDER BY a.ROW_NO) AS c2
FROM      EST_DATA A
GROUP BY  A.ROW_GRP_NO
;

/* 결과 : 
[Error] Execution (33: 1): ORA-01489: 문자열 연결의 결과가 너무 깁니다
*/


3. CLOBAGG 생성

CREATE OR REPLACE TYPE CLOBAGG_TYPE AS OBJECT
(
  TEXT CLOB
 ,STATIC FUNCTION ODCIAGGREGATEINITIALIZE(SCTX IN OUT CLOBAGG_TYPE)
    RETURN NUMBER
 ,MEMBER FUNCTION ODCIAGGREGATEITERATE(SELF IN OUT CLOBAGG_TYPE, VALUE IN CLOB)
    RETURN NUMBER
 ,MEMBER FUNCTION ODCIAGGREGATETERMINATE(SELF IN CLOBAGG_TYPE, RETURNVALUE OUT CLOB, FLAGS IN NUMBER)
    RETURN NUMBER
 ,MEMBER FUNCTION ODCIAGGREGATEMERGE(SELF IN OUT CLOBAGG_TYPE, CTX2 IN CLOBAGG_TYPE)
    RETURN NUMBER
);
/

CREATE OR REPLACE TYPE BODY CLOBAGG_TYPE
IS
  STATIC FUNCTION ODCIAGGREGATEINITIALIZE(SCTX IN OUT CLOBAGG_TYPE)
    RETURN NUMBER
  IS
  BEGIN
    SCTX   := CLOBAGG_TYPE(NULL);
    RETURN ODCICONST.SUCCESS;
  END;

  MEMBER FUNCTION ODCIAGGREGATEITERATE(SELF IN OUT CLOBAGG_TYPE, VALUE IN CLOB)
    RETURN NUMBER
  IS
  BEGIN
    SELF.TEXT   := SELF.TEXT || VALUE;
    RETURN ODCICONST.SUCCESS;
  END;

  MEMBER FUNCTION ODCIAGGREGATETERMINATE(SELF IN CLOBAGG_TYPE, RETURNVALUE OUT CLOB, FLAGS IN NUMBER)
    RETURN NUMBER
  IS
  BEGIN
    RETURNVALUE   := SELF.TEXT;
    RETURN ODCICONST.SUCCESS;
  END;

  MEMBER FUNCTION ODCIAGGREGATEMERGE(SELF IN OUT CLOBAGG_TYPE, CTX2 IN CLOBAGG_TYPE)
    RETURN NUMBER
  IS
  BEGIN
    SELF.TEXT   := SELF.TEXT || CTX2.TEXT;
    RETURN ODCICONST.SUCCESS;
  END;
END;
/

CREATE OR REPLACE FUNCTION CLOBAGG(INPUT CLOB)
  RETURN CLOB
  DETERMINISTIC
  PARALLEL_ENABLE
  AGGREGATE USING CLOBAGG_TYPE;
/


4. CLOBAGG 함수를 이용한 대용량 문자열 연결하는 쿼리

SELECT    CLOBAGG (a.ROW_DATA || '##_division_##') AS AGG_ROW_DATA
FROM      EST_DATA A
GROUP BY  A.ROW_GRP_NO
;

/* 결과 : 
Fun with CLOBS! If you are using Oracle, if you have to deal with text that is over 4000 bytes, you will probably find yourself dealing with CLOBs, which can go up to 4GB. They are pretty tricky, and it took me a long time to figure out these lessons learned. I hope they will help some down-trodden developer out there somehow Here is my original code, which worked great on my Oracle Express Edition: (for all examples, the first one writes a new CLOB, the next one Updates an existing CLOB and the final one reads a CLOB back)##_division_##LOB stands for Large OBject, a stream of data stored in a database. Maximum capacity of a LOB is (4 gigabytes-1) bytes. In Oracle three kinds of LOB data type exist: BLOB datatype stores unstructured binary large objects. BLOB objects can be thought of as bitstreams with no character set semantics. The CLOB datatype stores single-byte and multibyte character data. Both fixed-width and variable-width character sets are supported, and both use the database character set. The NCLOB datatype stores Unicode data. dotConnect for Oracle supports all three datatypes. You can retrieve values of LOB fields using OracleDataReader as well as other types like LONG and LONG RAW. The difference with usage of LOB data type becomes evident when you need to access these fields in DML and PL/SQL statements. For BLOB and CLOB data types only LOB locators (pointers to data) are stored in table columns; actual BLOB and CLOB data is stored in separate tablespace. This is the difference to the way that data of LONG or LONG RAW types is stored in database - tables hold their immediate values. Another issue you should be aware of is temporary LOBs. This kind of object is not referenced by any table yet. It exists in current transaction only. You have to use temporary LOBs when inserting new data to a table. In dotConnect for Oracle, you can create temporary LOB using OracleLob constructors. Once you create an OracleLob instance corresponding temporary LOB appears on the server, and any data you insert into the object is sent immediately to server. An example of using temporary LOBs you will see later in the article. inary data is generally represented as stream of bytes, or buffers. dotConnect for Oracle allows manipulating BLOB data in most convenient ways. You can work with OracleLob.Value property or use OracleLob.Read() and OracleLob.Write() methods to transfer data to and from server. Both ways are presented in the samples below. Note that when OracleLob.LobType property is OracleDbType.Blob the OracleLob.Value is treated as array of bytes (byte[]), whereas OracleDbType.Clob and OracleDbType.NClob represent OracleLob.Value as string. The next sample routines show how to upload a file from hard disk to server and download it back. To execute these routines you have to create a table that is described as follows: Working with CLOB data generally is same as working with BLOB data. The difference is in representation of Value property. For CLOB and NCLOB data types when you read from OracleLob.Value property, you get strings. The data is transparently decoded so you do not have to take care about its character set. If you need raw data however you can use streaming capabilities of OracleLob, that is OracleLob.Read and OracleLob.Write methods. Note that since OracleLob.Value is a read-only property, you can write data to LOB only with OracleLob.Write method, and no encoding is performed. For example consider this table definition.##_division_##The following article presents a simple methods for importing a file into a CLOB datatype. First a directory object is created to point to the relevant filesystem directory.##_division_##The length is given in number characters for both CLOB, unless one of the suffixes K, M, or G is given, relating to the multiples of 1024, 1024*1024, 1024*1024*1024 respectively.##_division_##A CLOB (character large object) value can be up to 2,147,483,647 characters long. A CLOB is used to store unicode character-based data, such as large documents in any character set.##_division_##When getting the substring of a CLOB column and using a query tool that has size/buffer restrictions sometimes you would need to set the BUFFER to a larger size. For example while using SQL Plus use the SET BUFFER 10000 to set it to 10000 as the default is 4000.##_division_##
*/


5. CLOBAGG 함수를 이용한 분석 함수 활용 쿼리(방안 1)

SELECT    CLOBAGG(A.ROW_DATA || '##_division_##') OVER (PARTITION BY ROW_GRP_NO ORDER BY A.ROW_NO 
                                                        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
            AS AGG_ROW_DATA
FROM      EST_DATA A
;


6. XMLAGG(XMLELEMENT())..GETCLOBVAL() 함수를 활용한 쿼리(방안 2)

SELECT    LENGTH(A.STR) AS LEN, A.STR
FROM      (SELECT    XMLAGG(XMLELEMENT(A, A.ROW_DATA || '|#delimiter#|') ORDER BY A.ROW_NO).EXTRACT('//text()').GETCLOBVAL() AS STR
           FROM      EST_DATA A) A
;


최대 100만 Length까지 병합해도 속도는 1초 안쪽으로 나옵니다.


오라클은 Object Type을 활용해서 이러한 문제를 해소하고 있습니다. 성능 테스트는 별도로 하지 않았지만, 아무래도 더 큰 데이터를 가공할 때는 비지니스 로직으로 풀어보는 것을 권장 합니다.


자바, C, C++ 다양한 프로그래밍 언어에서 위와 같은 작업을 파싱해서 데이터를 적재하는 것이 가장 좋은 방법입니다.

Comments