SQL*Loader Features
SQL*Loader loads data from external files into tables of an Oracle database. It has a powerful data parsing engine that puts little limitation on the format of the data in the datafile. You can use SQL*Loader to do the following:
-
Load data across a network. This means that you can run the SQL*Loader client on a different system from the one that is running the SQL*Loader server.
-
Load data from multiple datafiles during the same load session.
-
Load data into multiple tables during the same load session.
-
Specify the character set of the data.
-
Selectively load data (you can load records based on the records values).
-
Manipulate the data before loading it, using SQL functions.
-
Generate unique sequential key values in specified columns.
-
Use the operating system's file system to access the datafiles.
-
Load data from disk, tape, or named pipe.
-
Generate sophisticated error reports, which greatly aid troubleshooting.
-
Load arbitrarily complex object-relational data.
-
Use secondary datafiles for loading LOBs and collections.
-
Use either conventional or direct path loading. While conventional path loading is very flexible, direct path loading provides superior loading performance
The figure shows SQL*Loader receiving input datafiles and a SQL*Loader control file as input. SQL*Loader then outputs a log file, bad files, and discard files. Also, the figure shows that the database into which SQL*Loader loaded the input data now contains tables and indexes.
SQL*Loader Parameters
SQL*Loader loads data from external files into tables of an Oracle database. It has a powerful data parsing engine that puts little limitation on the format of the data in the datafile. You can use SQL*Loader to do the following:
Load data across a network. This means that you can run the SQL*Loader client on a different system from the one that is running the SQL*Loader server.
Load data from multiple datafiles during the same load session.
Load data into multiple tables during the same load session.
Specify the character set of the data.
Selectively load data (you can load records based on the records values).
Manipulate the data before loading it, using SQL functions.
Generate unique sequential key values in specified columns.
Use the operating system's file system to access the datafiles.
Load data from disk, tape, or named pipe.
Generate sophisticated error reports, which greatly aid troubleshooting.
Load arbitrarily complex object-relational data.
Use secondary datafiles for loading LOBs and collections.
Use either conventional or direct path loading. While conventional path loading is very flexible, direct path loading provides superior loading performance
The figure shows SQL*Loader receiving input datafiles and a SQL*Loader control file as input. SQL*Loader then outputs a log file, bad files, and discard files. Also, the figure shows that the database into which SQL*Loader loaded the input data now contains tables and indexes.
SQL*Loader is invoked when you specify the sqlldr
command and, optionally, parameters that establish session characteristics.
In situations where you always use the same parameters for which the values seldom change, it can be more efficient to specify parameters using the following methods, rather than on the command line:
-
Parameters can be grouped together in a parameter file. You could then specify the name of the parameter file on the command line using the
PARFILE
parameter.
-
Certain parameters can also be specified within the SQL*Loader control file by using the
OPTIONS
clause.
Parameters specified on the command line override any parameter values specified in a parameter file or OPTIONS
clause.
SQL*Loader Control File
sqlldr
command and, optionally, parameters that establish session characteristics.
In situations where you always use the same parameters for which the values seldom change, it can be more efficient to specify parameters using the following methods, rather than on the command line:
Parameters can be grouped together in a parameter file. You could then specify the name of the parameter file on the command line using the
PARFILE
parameter.
Certain parameters can also be specified within the SQL*Loader control file by using the
OPTIONS
clause.
Parameters specified on the command line override any parameter values specified in a parameter file or
OPTIONS
clause.
The control file is a text file written in a language that SQL*Loader understands. The control file tells SQL*Loader where to find the data, how to parse and interpret the data, where to insert the data, and more.
Although not precisely defined, a control file can be said to have three sections.
The first section contains session-wide information, for example:
-
Global options such as bindsize, rows, records to skip, and so on
-
INFILE
clauses to specify where the input data is located
-
Data to be loaded
The second section consists of one or more INTO TABLE
blocks. Each of these blocks contains information about the table into which the data is to be loaded, such as the table name and the columns of the table.
The third section is optional and, if present, contains input data.
Input Data and Datafiles
SQL*Loader reads data from one or more files (or operating system equivalents of files) specified in the control file. From SQL*Loader's perspective, the data in the datafile is organized as records. A particular datafile can be in fixed record format, variable record format, or stream record format. The record format can be specified in the control file with the INFILE
parameter. If no record format is specified, the default is stream record format.
Note: If data is specified inside the control file (that is, INFILE *
was specified in the control file), then the data is interpreted in the stream record format with the default record terminator.
Fixed Record Format
A file is in fixed record format when all records in a datafile are the same byte length. Although this format is the least flexible, it results in better performance than variable or stream format. Fixed format is also simple to specify. For example:
INFILE datafile_name "fix n"
The control file is a text file written in a language that SQL*Loader understands. The control file tells SQL*Loader where to find the data, how to parse and interpret the data, where to insert the data, and more.
Although not precisely defined, a control file can be said to have three sections.
The first section contains session-wide information, for example:
- Global options such as bindsize, rows, records to skip, and so on
INFILE
clauses to specify where the input data is located- Data to be loaded
The second section consists of one or more
INTO TABLE
blocks. Each of these blocks contains information about the table into which the data is to be loaded, such as the table name and the columns of the table.
The third section is optional and, if present, contains input data.
Input Data and Datafiles
Input Data and Datafiles
SQL*Loader reads data from one or more files (or operating system equivalents of files) specified in the control file. From SQL*Loader's perspective, the data in the datafile is organized as records. A particular datafile can be in fixed record format, variable record format, or stream record format. The record format can be specified in the control file with the
INFILE
parameter. If no record format is specified, the default is stream record format.
Note: If data is specified inside the control file (that is,
INFILE *
was specified in the control file), then the data is interpreted in the stream record format with the default record terminator.
Fixed Record Format
A file is in fixed record format when all records in a datafile are the same byte length. Although this format is the least flexible, it results in better performance than variable or stream format. Fixed format is also simple to specify. For example:
INFILE datafile_name "fix n"
Example 1 shows a control file that specifies a datafile that should be interpreted in the fixed record format. The datafile in the example contains five physical records. Assuming that a period (.) indicates a space, the first physical record is [001,...cd,.] which is exactly eleven bytes (assuming a single-byte character set). The second record is [0002,fghi,\n] followed by the newline character (which is the eleventh byte), and so on. Note that newline characters are not required with the fixed record format.
Example 1 Loading Data in Fixed Record Format
load data
infile 'example.dat' "fix 11"
into table example
fields terminated by ',' optionally enclosed by '"'
(col1, col2)
example.dat:
001, cd, 0002,fghi,
00003,lmn,
1, "pqrs",
0005,uvwx,
Variable Record Format
A file is in variable record format when the length of each record in a character field is included at the beginning of each record in the datafile. This format provides some added flexibility over the fixed record format and a performance advantage over the stream record format. For example, you can specify a datafile that is to be interpreted as being in variable record format as follows:
Example 1 shows a control file that specifies a datafile that should be interpreted in the fixed record format. The datafile in the example contains five physical records. Assuming that a period (.) indicates a space, the first physical record is [001,...cd,.] which is exactly eleven bytes (assuming a single-byte character set). The second record is [0002,fghi,\n] followed by the newline character (which is the eleventh byte), and so on. Note that newline characters are not required with the fixed record format.
Example 1 Loading Data in Fixed Record Format
load data
infile 'example.dat' "fix 11"
into table example
fields terminated by ',' optionally enclosed by '"'
(col1, col2)
example.dat:
001, cd, 0002,fghi,
00003,lmn,
1, "pqrs",
0005,uvwx,
Variable Record Format
A file is in variable record format when the length of each record in a character field is included at the beginning of each record in the datafile. This format provides some added flexibility over the fixed record format and a performance advantage over the stream record format. For example, you can specify a datafile that is to be interpreted as being in variable record format as follows:
INFILE "datafile_name" "var n"
In this example, n
specifies the number of bytes in the record length field. If n
is not specified, SQL*Loader assumes a length of 5 bytes. Specifying n
larger than 40 will result in an error.
Example 2 shows a control file specification that tells SQL*Loader to look for data in the datafile example
.dat
and to expect variable record format where the record length fields are 3 bytes long. The example.dat
datafile consists of three physical records. The first is specified to be 009 (that is, 9) bytes long, the second is 010 bytes long (that is, 10, including a 1-byte newline), and the third is 012 bytes long (also including a 1-byte newline). Note that newline characters are not required with the variable record format. This example also assumes a single-byte character set for the datafile.
The lengths are always interpreted in bytes, even if character-length semantics are in effect for the file. This is necessary because the file could contain a mix of fields, some processed with character-length semantics and others processed with byte-length semantics.
INFILE "datafile_name" "var n"
In this example,
n
specifies the number of bytes in the record length field. If n
is not specified, SQL*Loader assumes a length of 5 bytes. Specifying n
larger than 40 will result in an error.
Example 2 shows a control file specification that tells SQL*Loader to look for data in the datafile
example
.dat
and to expect variable record format where the record length fields are 3 bytes long. The example.dat
datafile consists of three physical records. The first is specified to be 009 (that is, 9) bytes long, the second is 010 bytes long (that is, 10, including a 1-byte newline), and the third is 012 bytes long (also including a 1-byte newline). Note that newline characters are not required with the variable record format. This example also assumes a single-byte character set for the datafile.
The lengths are always interpreted in bytes, even if character-length semantics are in effect for the file. This is necessary because the file could contain a mix of fields, some processed with character-length semantics and others processed with byte-length semantics.
Stream Record Format
A file is in stream record format when the records are not specified by size; instead SQL*Loader forms records by scanning for the record terminator. Stream record format is the most flexible format, but there can be a negative effect on performance. The specification of a datafile to be interpreted as being in stream record format looks similar to the following:
INFILE datafile_name ["str terminator_string"]
The terminator_string
is specified as either '
char_string
'
or X'hex_string
'
where:
-
'
char_string
'
is a string of characters enclosed in single or double quotation marks
-
X'hex_string
'
is a byte string in hexadecimal format
When the terminator_string
contains special (nonprintable) characters, it should be specified as a X'hex_string
'
. However, some nonprintable characters can be specified as ('
char_string
')
by using a backslash. For example:
-
\n
indicates a line feed
-
\t
indicates a horizontal tab
-
\f
indicates a form feed
-
\v
indicates a vertical tab
-
\r
indicates a carriage return
On UNIX-based platforms, if no terminator_string
is specified, SQL*Loader defaults to the line feed character, \n
. On Windows NT, if no terminator_string
is specified, then SQL*Loader uses either \n
or \r\n
as the record terminator, depending on which one it finds first in the datafile.
Example 3 illustrates loading data in stream record format where the terminator string is specified using a character string, '|\n'
. The use of the backslash character allows the character string to specify the nonprintable line feed character.
Example 3 Loading Data in Stream Record Format
load data
infile 'example.dat' "str '|\n'"
into table example
fields terminated by ',' optionally enclosed by '"'
(col1 char(5),
col2 char(7))
example.dat:
hello,world,|
james,bond,|
Discarded and Rejected Record
Records read from the input file might not be inserted into the database. Such records are placed in either a bad file or a discard file.
terminator_string
is specified as either '
char_string
'
or X'hex_string
'
where:'
char_string
'
is a string of characters enclosed in single or double quotation marksX'hex_string
'
is a byte string in hexadecimal formatterminator_string
contains special (nonprintable) characters, it should be specified as a X'hex_string
'
. However, some nonprintable characters can be specified as ('
char_string
')
by using a backslash. For example:\n
indicates a line feed\t
indicates a horizontal tab\f
indicates a form feed\v
indicates a vertical tab\r
indicates a carriage returnterminator_string
is specified, SQL*Loader defaults to the line feed character, \n
. On Windows NT, if no terminator_string
is specified, then SQL*Loader uses either \n
or \r\n
as the record terminator, depending on which one it finds first in the datafile.'|\n'
. The use of the backslash character allows the character string to specify the nonprintable line feed character.The Bad File
The bad file contains records that were rejected, either by SQL*Loader or by the Oracle database. If you do not specify a bad file and there are rejected records, then SQL*Loader automatically creates one. It will have the same name as the data file, with a.bad extension. Some of the possible reasons for rejection are discussed in the next sections.
The bad file contains records that were rejected, either by SQL*Loader or by the Oracle database. If you do not specify a bad file and there are rejected records, then SQL*Loader automatically creates one. It will have the same name as the data file, with a.bad extension. Some of the possible reasons for rejection are discussed in the next sections.
SQL*Loader Rejects
Oracle Database Rejects
After a datafile record is accepted for processing by SQL*Loader, it is sent to the Oracle database for insertion into a table as a row. If the Oracle database determines that the row is valid, then the row is inserted into the table. If the row is determined to be invalid, then the record is rejected and SQL*Loader puts it in the bad file. The row may be invalid, for example, because a key is not unique, because a required field is null, or because the field contains invalid data for the Oracle datatype.
The Discard File
As SQL*Loader executes, it may create a file called the discard file. This file is created only when it is needed, and only if you have specified that a discard file should be enabled. The discard file contains records that were filtered out of the load because they did not match any record-selection criteria specified in the control file.
After a datafile record is accepted for processing by SQL*Loader, it is sent to the Oracle database for insertion into a table as a row. If the Oracle database determines that the row is valid, then the row is inserted into the table. If the row is determined to be invalid, then the record is rejected and SQL*Loader puts it in the bad file. The row may be invalid, for example, because a key is not unique, because a required field is null, or because the field contains invalid data for the Oracle datatype.
The Discard File
As SQL*Loader executes, it may create a file called the discard file. This file is created only when it is needed, and only if you have specified that a discard file should be enabled. The discard file contains records that were filtered out of the load because they did not match any record-selection criteria specified in the control file.
The discard file therefore contains records that were not inserted into any table in the database. You can specify the maximum number of such records that the discard file can accept. Data written to any database table is not written to the discard file.
Log File and Logging Information
When SQL*Loader begins execution, it creates a log file. If it cannot create a log file, execution terminates. The log file contains a detailed summary of the load, including a description of any errors that occurred during the load
The discard file therefore contains records that were not inserted into any table in the database. You can specify the maximum number of such records that the discard file can accept. Data written to any database table is not written to the discard file.
Log File and Logging Information
When SQL*Loader begins execution, it creates a log file. If it cannot create a log file, execution terminates. The log file contains a detailed summary of the load, including a description of any errors that occurred during the load
Question: What is the difference between the bad file and the discard file in SQL*Loader.
Answer: The bad file and discard files both contain rejected rows, but they are rejected for different reasons:
- Bad file: The bad file contains rows that were rejected because of errors. These errors might include bad datatypes or referential integrity constraints.
- Discard file: The discard file contains rows that were discarded because they were filtered out because of a statement in the SQL*Loader control file.
Conventional Path Loads, Direct Path Loads, and External Table Loads
SQL*Loader provides the following methods to load data:
- Conventional Path Loads
- Direct Path Loads
- External Table Loads
Conventional Path Loads
SQL*Loader provides the following methods to load data:
- Conventional Path Loads
- Direct Path Loads
- External Table Loads
During conventional path loads, the input records are parsed according to the field specifications, and each data field is copied to its corresponding bind array. When the bind array is full (or no more data is left to read), an array insert is executed.
Direct Path Loads
A direct path load parses the input records according to the field specifications, converts the input field data to the column datatype, and builds a column array. The column array is passed to a block formatter, which creates data blocks in Oracle database block format. The newly formatted database blocks are written directly to the database, bypassing much of the data processing that normally takes place. Direct path load is much faster than conventional path load, but entails several restrictions.
External Table Loads
An external table load creates an external table for data that is contained in a datafile. The load executes INSERT
statements to insert the data from the datafile into the target table.
The advantages of using external table loads over conventional path and direct path loads are as follows:
-
An external table load attempts to load datafiles in parallel. If a datafile is big enough, it will attempt to load that file in parallel.
-
An external table load allows modification of the data being loaded by using SQL functions and PL/SQL functions as part of the
INSERT
statement that is used to create the external table.
INSERT
statements to insert the data from the datafile into the target table.
An external table load attempts to load datafiles in parallel. If a datafile is big enough, it will attempt to load that file in parallel.
An external table load allows modification of the data being loaded by using SQL functions and PL/SQL functions as part of the
INSERT
statement that is used to create the external table.Choosing External Tables Versus SQL*Loader
The record parsing of external tables and SQL*Loader is very similar, so normally there is not a major performance difference for the same record format. However, due to the different architecture of external tables and SQL*Loader, there are situations in which one method is more appropriate than the other.
In the following situations, use external tables for the best load performance:
-
You want to transform the data as it is being loaded into the database.
-
You want to use transparent parallel processing without having to split the external data first.
However, in the following situations, use SQL*Loader for the best load performance:
-
You want to load data remotely.
-
Transformations are not required on the data, and the data does not need to be loaded in parallel.
The record parsing of external tables and SQL*Loader is very similar, so normally there is not a major performance difference for the same record format. However, due to the different architecture of external tables and SQL*Loader, there are situations in which one method is more appropriate than the other.
In the following situations, use external tables for the best load performance:
- You want to transform the data as it is being loaded into the database.
- You want to use transparent parallel processing without having to split the external data first.
However, in the following situations, use SQL*Loader for the best load performance:
- You want to load data remotely.
- Transformations are not required on the data, and the data does not need to be loaded in parallel.
0 comments:
Post a Comment