This chapter describes SQL*Loader's conventional and direct path load methods. The following topics are covered:
- Data Loading Methods
- Conventional Path Load
- Direct Path Load
- Using Direct Path Load
- Optimizing Performance of Direct Path Loads
- Optimizing Direct Path Loads on Multiple-CPU Systems
- Avoiding Index Maintenance
- Direct Loads, Integrity Constraints, and Triggers
- Parallel Data Loading Models
- General Performance Improvement Hints
SQL*Loader provides two methods for loading data:
A conventional path load executes SQL
INSERT
statements to populate tables in an Oracle database. A direct path load eliminates much of the Oracle database overhead by formatting Oracle data blocks and writing the data blocks directly to the database files. A direct load does not compete with other users for database resources, so it can usually load data at near disk speed. Considerations inherent to direct path loads, such as restrictions, security, and backup implications, are discussed in this chapter.
The tables to be loaded must already exist in the database. SQL*Loader never creates tables. It loads existing tables that either already contain data or are empty.
- You must have
INSERT
privileges on the table to be loaded. - You must have
DELETE
privileges on the table to be loaded, when using theREPLACE
orTRUNCATE
option to empty old data from the table before loading the new data in its place.
Figure 11-1 shows how conventional and direct path loads perform database writes.
Description of "Figure 11-1 Database Writes on SQL*Loader Direct Path and Conventional Path"
Conventional Path Load
Conventional path load (the default) uses the SQL
INSERT
statement and a bind array buffer to load data into database tables. This method is used by all Oracle tools and applications.
When SQL*Loader performs a conventional path load, it competes equally with all other processes for buffer resources. This can slow the load significantly. Extra overhead is added as SQL statements are generated, passed to Oracle, and executed.
The Oracle database looks for partially filled blocks and attempts to fill them on each insert. Although appropriate during normal use, this can slow bulk loads dramatically.
Conventional Path Load of a Single Partition
By definition, a conventional path load uses SQL
INSERT
statements. During a conventional path load of a single partition, SQL*Loader uses the partition-extended syntax of the INSERT
statement, which has the following form:INSERT INTO TABLE T PARTITION (P) VALUES ...
The SQL layer of the Oracle kernel determines if the row being inserted maps to the specified partition. If the row does not map to the partition, the row is rejected, and the SQL*Loader log file records an appropriate error message.
When to Use a Conventional Path Load
If load speed is most important to you, you should use direct path load because it is faster than conventional path load. However, certain restrictions on direct path loads may require you to use a conventional path load. You should use a conventional path load in the following situations:
- When accessing an indexed table concurrently with the load, or when applying inserts or updates to a non-indexed table concurrently with the loadTo use a direct path load (with the exception of parallel loads), SQL*Loader must have exclusive write access to the table and exclusive read/write access to any indexes.
- When loading data into a clustered tableA direct path load does not support loading of clustered tables.
- When loading a relatively small number of rows into a large indexed tableDuring a direct path load, the existing index is copied when it is merged with the new index keys. If the existing index is very large and the number of new keys is very small, then the index copy time can offset the time saved by a direct path load.
- When loading a relatively small number of rows into a large table with referential and column-check integrity constraints.Because these constraints cannot be applied to rows loaded on the direct path, they are disabled for the duration of the load. Then they are applied to the whole table when the load completes. The costs could outweigh the savings for a very large table and a small number of new rows.
- When loading records and you want to ensure that a record is rejected under any of the following circumstances:
- If the record, upon insertion, causes an Oracle error
- If the record is formatted incorrectly, so that SQL*Loader cannot find field boundaries
- If the record violates a constraint or tries to make a unique index non-unique.
Direct Path Load
Instead of filling a bind array buffer and passing it to the Oracle database with a SQL
INSERT
statement, a direct path load uses the direct path API to pass the data to be loaded to the load engine in the server. The load engine builds a column array structure from the data passed to it.
The direct path load engine uses the column array structure to format Oracle data blocks and build index keys. The newly formatted database blocks are written directly to the database (multiple blocks per I/O request using asynchronous writes if the host platform supports asynchronous I/O).
Internally, multiple buffers are used for the formatted blocks. While one buffer is being filled, one or more buffers are being written if asynchronous I/O is available on the host platform. Overlapping computation with I/O increases load performance
During a direct path load, data conversion occurs on the client side rather than on the server side. This means that NLS parameters in the initialization parameter file (server-side language handle) will not be used. To override this behavior, you can specify a format mask in the SQL*Loader control file that is equivalent to the setting of the NLS parameter in the initialization parameter file, or set the appropriate environment variable. For example, to specify a date format for a field, you can either set the date format in the SQL*Loader control file as shown in Example 11-1 or set an
NLS_DATE_FORMAT
environment variable as shown in Example 11-2.LOAD DATA INFILE 'data.dat' INSERT INTO TABLE emp FIELDS TERMINATED BY "|" ( EMPNO NUMBER(4) NOT NULL, ENAME CHAR(10), JOB CHAR(9), MGR NUMBER(4), HIREDATE DATE 'YYYYMMDD', SAL NUMBER(7,2), COMM NUMBER(7,2), DEPTNO NUMBER(2) )
On UNIX Bourne or Korn shell:
% NLS_DATE_FORMAT='YYYYMMDD' % export NLS_DATE_FORMAT
On UNIX csh:
%setenv NLS_DATE_FORMAT='YYYYMMDD'
Advantages of a Direct Path Load
- Partial blocks are not used, so no reads are needed to find them, and fewer writes are performed.
- SQL*Loader need not execute any SQL
INSERT
statements; therefore, the processing load on the Oracle database is reduced. - A direct path load calls on Oracle to lock tables and indexes at the start of the load and releases them when the load is finished. A conventional path load calls Oracle once for each array of rows to process a SQL
INSERT
statement. - A direct path load uses multiblock asynchronous I/O for writes to the database files.
- During a direct path load, processes perform their own write I/O, instead of using Oracle's buffer cache. This minimizes contention with other Oracle users.
- The sorted indexes option available during direct path loads enables you to presort data using high-performance sort routines that are native to your system or installation.
- When a table to be loaded is empty, the presorting option eliminates the sort and merge phases of index-building. The index is filled in as data arrives.
- Protection against instance failure does not require redo log file entries during direct path loads. Therefore, no time is required to log the load when:
- The Oracle database has the SQL
NOARCHIVELOG
parameter enabled - The SQL*Loader
UNRECOVERABLE
clause is enabled - The object being loaded has the SQL
NOLOGGING
parameter set
Restrictions on Using Direct Path Loads
- Tables are not clustered.
- Tables to be loaded do not have any active transactions pending.To check for this condition, use the Oracle Enterprise Manager command
MONITOR
TABLE
to find the object ID for the tables you want to load. Then use the commandMONITOR
LOCK
to see if there are any locks on the tables. - For versions of the database prior to Oracle9i, you can perform a SQL*Loader direct path load only when the client and server are the same version. This also means that you cannot perform a direct path load of Oracle9i data into a database of an earlier version. For example, you cannot use direct path load to load data from a release 9.0.1 database into a release 8.1.7 database.Beginning with Oracle9i, you can perform a SQL*Loader direct path load when the client and server are different versions. However, both versions must be at least release 9.0.1 and the client version must be the same as or lower than the server version. For example, you can perform a direct path load from a release 9.0.1 database into a release 9.2 database. However, you cannot use direct path load to load data from a release 10.0.0 database into a release 9.2 database.
- Tables to be loaded in direct path mode do not have VPD policies active on INSERT.
The following features are not available with direct path load:
- Loading a parent table together with a child table
- Loading
BFILE
columns - Use of
CREATE SEQUENCE
during the load. This is because in direct path loads there is no SQL being generated to fetch the next value since direct path does not generateINSERT
statements.
When to Use a Direct Path Load
- You have a large amount of data to load quickly. A direct path load can quickly load and index large amounts of data. It can also load data into either an empty or nonempty table.
- You want to load data in parallel for maximum performance.
Integrity Constraints
All integrity constraints are enforced during direct path loads, although not necessarily at the same time.
NOT
NULL
constraints are enforced during the load. Records that fail these constraints are rejected.UNIQUE
constraints are enforced both during and after the load. A record that violates a UNIQUE
constraint is not rejected (the record is not available in memory when the constraint violation is detected).
Integrity constraints that depend on other rows or tables, such as referential constraints, are disabled before the direct path load and must be reenabled afterwards. If
REENABLE
is specified, SQL*Loader can reenable them automatically at the end of the load. When the constraints are reenabled, the entire table is checked. Any rows that fail this check are reported in the specified error log.
Field Defaults on the Direct Path
Default column specifications defined in the database are not available when you use direct path loading. Fields for which default values are desired must be specified with the
DEFAULTIF
clause. If a DEFAULTIF
clause is not specified and the field is NULL
, then a null value is inserted into the database.
SQL*Loader leaves indexes in an Index Unusable state when the data segment being loaded becomes more up-to-date than the index segments that index it.
Any SQL statement that tries to use an index that is in an Index Unusable state returns an error. The following conditions cause a direct path load to leave an index or a partition of a partitioned index in an Index Unusable state:
- SQL*Loader runs out of space for the index and cannot update the index.
- The data is not in the order specified by the
SORTED
INDEXES
clause. - There is an instance failure, or the Oracle shadow process fails while building the index.
- There are duplicate keys in a unique index.
- Data savepoints are being used, and the load fails or is terminated by a keyboard interrupt after a data savepoint occurred.
SELECT INDEX_NAME, STATUS FROM USER_INDEXES WHERE TABLE_NAME = 'tablename';
If you are not the owner of the table, then search
ALL_INDEXES
or DBA_INDEXES
instead of USER_INDEXES
.
To determine if an index partition is in an unusable state, you can execute the following query:
SELECT INDEX_NAME, PARTITION_NAME, STATUS FROM USER_IND_PARTITIONS WHERE STATUS != 'VALID';
If you are not the owner of the table, then search
ALL_IND_PARTITIONS
and DBA_IND_PARTITIONS
instead of USER_IND_PARTITIONS
.
0 comments:
Post a Comment