When you need to load millions of rows of data into a table, the most efficient way is usually to use an INSERT, UPDATE, or MERGE statement to process your data in bulk. Similarly, if you want to delete thousands of rows, using a DELETE statement is usually faster than using procedural code. But what if the data you intend to load contains values that might cause an integrity or check constraint to be violated, or what if some values are too big for the column they are to be loaded into?
You may well have loaded 999,999 rows into your table, but that last row, which violates a check constraint, causes the whole statement to fail and roll back. In situations such as this, you have to use an alternative approach to loading your data.
For example, if your data is held in a file, you can use SQL*Loader to automatically handle data that raises an error, but then you have to put together a control file, run SQL*Loader from the command line, and check the output file and the bad datafile to detect any errors.
If, however, your data is held in a table or another object, you can write a procedure or an anonymous block to process your data row by row, loading the valid rows and using exception handling to process those rows that raise an error. You might even use BULK COLLECT and FORALL to handle data in your PL/SQL routine more efficiently, but even with these improvements, handling your data in this manner is still much slower than performing a bulk load by using a direct-path INSERT DML statement.
Until now, you could take advantage of the set-based performance of INSERT, UPDATE, MERGE, and DELETE statements only if you knew that your data was free from errors; in all other circumstances, you needed to resort to slower alternatives. All of this changes with the release of Oracle Database 10g Release 2, which introduces a new SQL feature called DML error logging.
Efficient Error Handling
DML error logging enables you to write INSERT, UPDATE, MERGE, or DELETE statements that automatically deal with certain constraint violations. With this new feature, you use the new LOG ERRORS clause in your DML statement and Oracle Database automatically handles exceptions, writing erroneous data and details of the error message to an error logging table you've created.
Before you can use the LOG ERRORS clause, you need to create an error logging table, either manually with DDL or automatically with the CREATE_ERROR_LOG procedure in the DBMS_ERRLOG package, whose specification is shown in Listing 1.
Code Listing 1: DBMS_ERRLOG.CREATE_ERROR_LOG parameters
DBMS_ERRLOG.CREATE_ERROR_LOG (
dml_table_name IN VARCHAR2,
err_log_table_name IN VARCHAR2 := NULL,
err_log_table_owner IN VARCHAR2 := NULL,
err_log_table_space IN VARCHAR2 := NULL,
skip_unsupported IN BOOLEAN := FALSE);
All the parameters except DML_TABLE_NAME are optional, and if the optional details are omitted, the name of the error logging table will be ERR$_ together with the first 25 characters of the DML_TABLE_NAME. The SKIP_UNSUPPORTED parameter, if set to TRUE, instructs the error logging clause to skip over LONG, LOB, and object type columns that are not supported and omit them from the error logging table.
Syntax:
With the error logging table created, you can add the error logging clause to most DML statements, using the following syntax:
LOG ERRORS [INTO [schema.]table][ (simple_expression) ] [ REJECT LIMIT {integer|UNLIMITED} ]
The INTO clause is optional; if you omit it, the error logging clause will put errors into a table with the same name format used by the CREATE_ERROR_LOG procedure. SIMPLE_EXPRESSION is any expression that would evaluate to a character string and is used for tagging rows in the error table to indicate the process that caused the error, the time of the data load, and so on. REJECT LIMIT can be set to any integer or UNLIMITED and specifies the number of errors that can occur before the statement fails. This value is optional, but if it is omitted, the default value is 0, which effectively disables the error logging feature.
The following types of errors are handled by the error logging clause:
- Column values that are too large
- Constraint violations (NOT NULL, unique, referential, and check constraints), except in certain circumstances detailed below
- Errors raised during trigger execution
- Errors resulting from type conversion between a column in a subquery and the corresponding column of the table
- Partition mapping errors
The following conditions cause the statement to fail and roll back without invoking the error logging capability:
- Violated deferred constraints
- Out-of-space errors
- Any direct-path INSERT operation (INSERT or MERGE) that raises a unique constraint or index violation
- Any UPDATE operation (UPDATE or MERGE) that raises a unique constraint or index violation
To show how the error logging clause works in practice, consider the following scenario, in which data needs to be loaded in batch from one table to another:
You have heard of the new error logging feature in Oracle Database 10g Release 2 and want to compare this new approach with your previous method of writing a PL/SQL package. To do this, you will use data held in the SH sample schema to try out each approach.
Using DML Error Logging
In this example, you will use the data in the SALES table in the SH sample schema, together with values from a sequence, to create a source table for the error logging test. This example assumes that the test schema is called ERRLOG_TEST and that it has the SELECT object privilege for the SH.SALES table. Create the source data and a target table called SALES_TARGET, based on the definition of the SALES_SRC table, and add a check constraint to the AMOUNT_SOLD column to allow only values greater than 0. Listing 2 shows the DDL for creating the source and target tables.
Code Listing 2: Creating the SALES_SRC and SALES_TARGET tables
SQL> CREATE SEQUENCE sales_id_seq;
Sequence created.
SQL> CREATE TABLE sales_src
2 AS
3 SELECT sales_id_seq.nextval AS "SALES_ID"
4 , cust_id
5 , prod_id
6 , channel_id
7 , time_id
8 , promo_id
9 , amount_sold
10 , quantity_sold
11 FROM sh.sales
12 ;
Table created.
SQL> SELECT count(*)
2 , min(sales_id)
3 , max(sales_id)
4 FROM sales_src
5 ;
COUNT(*) MIN(SALES_ID) MAX(SALES_ID)
------ -------- --------
918843 1 918843
SQL> CREATE TABLE sales_target
2 AS
3 SELECT *
4 FROM sales_src
5 WHERE 1=0
6 ;
Table created.
SQL> ALTER TABLE sales_target
2 ADD CONSTRAINT amount_sold_chk
3 CHECK (amount_sold > 0)
4 ENABLE
5 VALIDATE
6 ;
Table altered.
Note from the descriptions of the tables in Listing 2 that the SALES_TARGET and SALES_SRC tables have automatically inherited the NOT NULL constraints that were present on the SH.SALES table because you created these tables by using a CREATE TABLE ... AS SELECT statement that copies across these column properties when you are creating a table.
You now introduce some errors into your source data, so that you can subsequently test the error logging feature. Note that because one of the errors you want to test for is a NOT NULL constraint violation on the PROMO_ID column, you need to remove this constraint from the SALES_SRC table before adding null values. The following shows the SQL used to create the data errors.
SQL> ALTER TABLE sales_src
2 MODIFY promo_id NULL
3 ;
Table altered.
SQL> UPDATE sales_src
2 SET promo_id = null
3 WHERE sales_id BETWEEN 5000 and 5005
4 ;
6 rows updated.
SQL> UPDATE sales_src
2 SET amount_sold = 0
3 WHERE sales_id IN (1000,2000,3000)
4 ;
3 rows updated.
SQL> COMMIT;
Commit complete.
Now that your source and target tables are prepared, you can use the DBMS_ERRLOG.CREATE_ERROR_LOG procedure to create the error logging table. Supply the name of the table on which the error logging table is based; the procedure will use default values for the rest of the parameters. Listing 3 shows the creation and description of the error logging table.
Code Listing 3: Creating the err$_sales_target error logging table
SQL> BEGIN
2 DBMS_ERRLOG.CREATE_ERROR_LOG('SALES_TARGET');
3 END;
4 /
PL/SQL procedure successfully completed.
SQL> DESCRIBE err$_sales_target;
Name Null? Type
------------------- ---- -------------
ORA_ERR_NUMBER$ NUMBER
ORA_ERR_MESG$ VARCHAR2(2000)
ORA_ERR_ROWID$ ROWID
ORA_ERR_OPTYP$ VARCHAR2(2)
ORA_ERR_TAG$ VARCHAR2(2000)
SALES_ID VARCHAR2(4000)
CUST_ID VARCHAR2(4000)
PROD_ID VARCHAR2(4000)
CHANNEL_ID VARCHAR2(4000)
TIME_ID VARCHAR2(4000)
PROMO_ID VARCHAR2(4000)
AMOUNT_SOLD VARCHAR2(4000)
QUANTITY_SOLD VARCHAR2(4000)
Note that the CREATE_ERROR_LOG procedure creates five ORA_ERR_% columns, to hold the error number, error message, ROWID, operation type, and tag you will supply when using the error logging clause. Datatypes have been automatically chosen for the table columns that will allow you to store numbers and characters.
The first approach is to load data into the SALES_TARGET table by using a direct-path INSERT statement. This is normally the most efficient way to load data into a table while still making the DML recoverable, but in the past, this INSERT would have failed, because the check constraints on the SALES_TARGET table would have been violated. Listing 4 shows this INSERT and the check constraint violation.
Code Listing 4: Violating the check constraint with direct-path INSERT
SQL> SET SERVEROUTPUT ON
SQL> SET LINESIZE 150
SQL> SET TIMING ON
SQL> ALTER SESSION SET SQL_TRACE = TRUE;
Session altered.
Elapsed: 00:00:00.04
SQL> INSERT /*+ APPEND */
2 INTO sales_target
3 SELECT *
4 FROM sales_src
5 ;
INSERT /*+ APPEND */
*
ERROR at line 1:
ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_SOLD_CHK) violated
Elapsed: 00:00:00.15
If you add the new LOG ERRORS clause to the INSERT statement, however, the statement will complete successfully and save any rows that violate the table constraints to the error logging table, as shown in Listing 5.
Code Listing 5: Violating the constraints and logging the errors with LOG ERRORS
SQL> INSERT /*+ APPEND */
2 INTO sales_target
3 SELECT *
4 FROM sales_src
5 LOG ERRORS
6 REJECT LIMIT UNLIMITED
7 ;
918834 rows created.
Elapsed: 00:00:05.75
SQL> SELECT count(*)
2 FROM err$_sales_target
3 ;
COUNT(*)
---------
9
Elapsed: 00:00:00.06
SQL> COLUMN ora_err_mesg$ FORMAT A50
SQL> SELECT ora_err_number$
2 , ora_err_mesg$
3 FROM err$_sales_target
4 ;
ORA_ERR_NUMBER$ ORA_ERR_MESG$
--------------- ------------------------------
2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_
SOLD_CHK) violated
2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_
SOLD_CHK) violated
2290 ORA-02290: check constraint (ERRLOG_TEST.AMOUNT_
SOLD_CHK) violated
1400 ORA-01400: cannot insert NULL into ("ERRLOG_TEST".
"SALES_TARGET"."PROMO_ID")
1400 ORA-01400: cannot insert NULL into ("ERRLOG_TEST".
"SALES_TARGET"."PROMO_ID")
1400 ORA-01400: cannot insert NULL into ("ERRLOG_TEST".
"SALES_TARGET"."PROMO_ID")
1400 ORA-01400: cannot insert NULL into ("ERRLOG_TEST".
"SALES_TARGET"."PROMO_ID")
1400 ORA-01400: cannot insert NULL into ("ERRLOG_TEST".
"SALES_TARGET"."PROMO_ID")
1400 ORA-01400: cannot insert NULL into ("ERRLOG_TEST".
"SALES_TARGET"."PROMO_ID")
9 rows selected.
Elapsed: 00:00:00.28
Listing 5 shows that when this INSERT statement uses direct path to insert rows above the table high-water mark, the process takes 5.75 seconds and adds nine rows to the error logging table. Try the same statement again, this time with a conventional-path INSERT, as shown in Listing 6.
Code Listing 6: Violating the check and NOT NULL constraints with conven
SQL> TRUNCATE TABLE sales_target;
Table truncated.
Elapsed: 00:00:06.07
SQL> TRUNCATE TABLE err$_sales_target;
Table truncated.
Elapsed: 00:00:00.25
SQL> INSERT INTO sales_target
2 SELECT *
3 FROM sales_src
4 LOG ERRORS
5 REJECT LIMIT UNLIMITED
6 ;
918834 rows created.
Elapsed: 00:00:30:65
As you might expect, the results in Listing 6 show that the direct-path load is much faster than the conventional-path load, because the former writes directly to disk whereas the latter writes to the buffer cache. The LOG ERRORS clause also causes kernel device table (KDT) buffering to be disabled when you're performing a conventional-path INSERT. One reason you might want to nevertheless use a conventional-path INSERT with error logging is that direct-path loads will fail when a unique constraint or index violation occurs, whereas a conventional-path load will log these errors to the error logging table and then continue. Oracle Database will also ignore the /*+ APPEND */ hint when the table you are inserting into contains foreign key constraints, because you cannot have these enabled when working in direct-path mode.
Now compare these direct- and conventional-path loading timings with the timing for using a PL/SQL anonymous block. You know that the traditional way of declaring a cursor against the source table—reading it row by row, inserting the contents into the target table, and dealing with exceptions as they occur—will be slow, but the column by Tom Kyte in the September/October 2003 issue of Oracle Magazine ("On HTML DB, Bulking Up, and Speeding") shows how BULK COLLECT, FORALL, and SAVE EXCEPTIONS could be used to process dirty data in a more efficient manner. How does Kyte's 2003 approach compare with using DML error logging? A version of Kyte's approach that, like the LOG ERRORS clause, writes error messages to an error logging table is shown in Listing 7.
Code Listing 7: PL/SQL anonymous block doing row-by-row INSERT
SQL> CREATE TABLE sales_target_errors
2 (sql_err_mesg varchar2(4000))
3 /
Table created.
Elapsed: 00:00:00.28
SQL> DECLARE
2 TYPE array IS TABLE OF sales_target%ROWTYPE
3 INDEX BY BINARY_INTEGER;
4 sales_src_arr ARRAY;
5 errors NUMBER;
6 error_mesg VARCHAR2(255);
7 bulk_error EXCEPTION;
8 l_cnt NUMBER := 0;
9 PRAGMA exception_init
10 (bulk_error, -24381);
11 CURSOR c IS
12 SELECT *
13 FROM sales_src;
14 BEGIN
15 OPEN c;
16 LOOP
17 FETCH c
18 BULK COLLECT
19 INTO sales_src_arr
20 LIMIT 100;
21 BEGIN
22 FORALL i IN 1 .. sales_src_arr.count
23 SAVE EXCEPTIONS
24 INSERT INTO sales_target VALUES sales_src_arr(i);
25 EXCEPTION
26 WHEN bulk_error THEN
27 errors :=
28 SQL%BULK_EXCEPTIONS.COUNT;
29 l_cnt := l_cnt + errors;
30 FOR i IN 1..errors LOOP
31 error_mesg := SQLERRM(-SQL%BULK_EXCEPTIONS(i).ERROR_CODE);
32 INSERT INTO sales_target_errors
33 VALUES (error_mesg);
34 END LOOP;
35 END;
36 EXIT WHEN c%NOTFOUND;
37
38 END LOOP;
39 CLOSE c;
40 DBMS_OUTPUT.PUT_LINE
41 ( l_cnt || ' total errors' );
42 END;
43 /
9 total errors
PL/SQL procedure successfully completed.
Elapsed: 00:00:10.46
SQL> alter session set sql_trace = false;
Session altered.
Elapsed: 00:00:00.03
SQL> select * from sales_target_errors;
SQL_ERR_MESG
---------------------------------
ORA-02290: check constraint (.) violated
ORA-02290: check constraint (.) violated
ORA-02290: check constraint (.) violated
ORA-01400: cannot insert NULL into ()
ORA-01400: cannot insert NULL into ()
ORA-01400: cannot insert NULL into ()
ORA-01400: cannot insert NULL into ()
ORA-01400: cannot insert NULL into ()
ORA-01400: cannot insert NULL into ()
9 rows selected.
Elapsed: 00:00:00.21
Processing your data with this method takes 10.46 seconds, longer than the 5.75 seconds when using DML error logging and a direct-path INSERT but quicker than using a conventional-path INSERT. The results are conclusive: If you use DML error logging and you can insert your data with direct path, your batches can load an order of magnitude faster than if you processed your data row by row, using PL/SQL, even if you take advantage of features such as BULK COLLECT, FORALL, and SAVE EXCEPTIONS.
Finally, use TKPROF to format the SQL trace file you generated during your testing and check the explain plan and statistics for the direct-path insertion, shown in Listing 8. Note that the insertions into the error logging table are carried out after the INSERT has taken place and that these rows will stay in the error logging table even if the main statement fails and rolls back.
Code Listing 8: Using TKPROF to look at direct-path INSERT statistics
INSERT /*+ APPEND */
INTO sales_target
SELECT *
FROM sales_src
LOG ERRORS
REJECT LIMIT UNLIMITED
call count cpu elapsed disk query current rows
--- --- ---- ---- ---- ---- ---- ----
Parse 1 0.01 0.10 0 0 0 0
Execute 1 2.84 5.52 3460 5226 6659 918834
Fetch 0 0.00 0.00 0 0 0 0
--- --- ---- ---- ---- ---- ---- ----
total 2 2.85 5.62 3460 5226 6659 918834
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 99
Rows Row Source Operation
------- ---------------------------------------------------
1 LOAD AS SELECT (cr=5907 pr=3462 pw=5066 time=5539104 us)
918843 ERROR LOGGING (cr=5094 pr=3460 pw=0 time=92811603 us)
918843 TABLE ACCESS FULL SALES_SRC (cr=5075 pr=3458 pw=0 time=16547710 us)
***************************************************************************
INSERT INTO ERR$_SALES_TARGET (ORA_ERR_NUMBER$, ORA_ERR_MESG$,
ORA_ERR_ROWID$, ORA_ERR_OPTYP$, ORA_ERR_TAG$, SALES_ID, PROD_ID,
CUST_ID, CHANNEL_ID, TIME_ID, PROMO_ID, AMOUNT_SOLD, QUANTITY_SOLD)
VALUES
(:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13)
call count cpu elapsed disk query current rows
--- --- ---- ---- ---- ---- ---- ----
Parse 1 0.00 0.00 0 0 0 0
Execute 9 0.00 0.01 2 4 39 9
Fetch 0 0.00 0.00 0 0 0 0
--- --- ---- ---- ---- ---- ---- ----
total 10 0.00 0.01 2 4 39 9
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 99 (recursive depth: 1)
Next, locate the part of the formatted trace file that represents the PL/SQL approach and note how the execution of the anonymous block is split into four parts: (1) the anonymous block is parsed, (2) the source data is bulk-collected into an array, (3) the array is unloaded into the target table, and (4) the exceptions are written to the error logging table. Listing 9 shows that, together, these steps take more than twice as long to execute as a direct-path INSERT statement with DML error logging yet involve more coding and store less information about the rows that returned errors.
Code Listing 9: PROF to look at PL/SQL INSERT statistics
DECLARE
TYPE array IS TABLE OF sales_target%ROWTYPE
INDEX BY BINARY_INTEGER;
sales_src_arr ARRAY;
errors NUMBER;
error_mesg VARCHAR2(255);
bulk_error EXCEPTION;
l_cnt NUMBER := 0;
PRAGMA exception_init
(bulk_error, -24381);
CURSOR c IS
SELECT *
FROM sales_src;
BEGIN
OPEN c;
LOOP
FETCH c
BULK COLLECT
INTO sales_src_arr
LIMIT 100;
BEGIN
FORALL i IN 1 .. sales_src_arr.count
SAVE EXCEPTIONS
INSERT INTO sales_target VALUES sales_src_arr(i);
EXCEPTION
WHEN bulk_error THEN
errors :=
SQL%BULK_EXCEPTIONS.COUNT;
l_cnt := l_cnt + errors;
FOR i IN 1..errors LOOP
error_mesg := SQLERRM(-SQL%BULK_EXCEPTIONS(i).ERROR_CODE);
INSERT INTO sales_target_errors
VALUES (error_mesg);
END LOOP;
END;
EXIT WHEN c%NOTFOUND;
END LOOP;
CLOSE c;
DBMS_OUTPUT.PUT_LINE
( l_cnt || ' total errors' );
END;
call count cpu elapsed disk query current rows
--- --- ---- ---- ---- ---- ---- ----
Parse 1 0.03 0.02 0 0 0 0
Execute 1 1.14 2.71 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
--- --- ---- ---- ---- ---- ---- ----
total 2 1.17 2.73 0 0 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 99
********************************************************************
SELECT *
FROM
SALES_SRC
call count cpu elapsed disk query current rows
--- --- ---- ---- ---- ---- ---- ----
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 9189 3.60 3.23 0 14219 0 918843
--- --- ---- ---- ---- ---- ---- ----
total 9191 3.60 3.23 0 14219 0 918843
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 99 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
918843 TABLE ACCESS FULL SALES_SRC (cr=14219 pr=0 pw=0 time=33083496 us)
**************************************************************************
INSERT INTO SALES_TARGET
VALUES
(:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 )
call count cpu elapsed disk query current rows
--- --- ---- ---- ---- ---- ---- ----
Parse 1 0.00 0.00 0 0 0 0
Execute 9189 4.39 4.30 2 6886 54411 918834
Fetch 0 0.00 0.00 0 0 0 0
--- --- ---- ---- ---- ---- ---- ----
total 9190 4.39 4.30 2 6886 54411 918834
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 99 (recursive depth: 1)
************************************************************************
INSERT INTO SALES_TARGET_ERRORS
VALUES (:B1 )
call count cpu elapsed disk query current rows
--- --- ---- ---- ---- ---- ---- ----
Parse 1 0.00 0.00 0 0 0 0
Execute 9 0.00 0.01 2 4 30 9
Fetch 0 0.00 0.00 0 0 0 0
--- --- ---- ---- ---- ---- ---- ----
total 10 0.00 0.01 2 4 30 9
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 99 (recursive depth: 1)
Leftover from Listing 2***************
SQL> DESC sales_src
Name Null? Type
------------------- ---- -------------
SALES_ID NUMBER
CUST_ID NOT NULL NUMBER
PROD_ID NOT NULL NUMBER
CHANNEL_ID NOT NULL NUMBER
TIME_ID NOT NULL DATE
PROMO_ID NOT NULL NUMBER
AMOUNT_SOLD NOT NULL NUMBER(10,2)
QUANTITY_SOLD NOT NULL NUMBER(10,2)
SQL> DESC sales_target
Name Null? Type
------------------- ---- -------------
SALES_ID NUMBER
CUST_ID NOT NULL NUMBER
PROD_ID NOT NULL NUMBER
CHANNEL_ID NOT NULL NUMBER
TIME_ID NOT NULL DATE
PROMO_ID NOT NULL NUMBER
AMOUNT_SOLD NOT NULL NUMBER(10,2)
QUANTITY_SOLD NOT NULL NUMBER(10,2)
Conclusion
In the past, if you wanted to load data into a table and gracefully deal with constraint violations or other DML errors, you either had to use a utility such as SQL*Loader or write a PL/SQL procedure that processed each row on a row-by-row basis. The new DML error logging feature in Oracle Database 10g Release 2 enables you to add a new LOG ERRORS clause to most DML statements that allows the operation to continue, writing errors to an error logging table. By using the new DML error logging feature, you can load your batches faster, have errors handled automatically, and do away with the need for custom-written error handling routines in your data loading process.