It's All About ORACLE

Oracle - The number one Database Management System. Hope this Blog will teach a lot about oracle.

Advantages of EXTERNAL TABLE Over SQL*Loader

We have seen some examples of external table syntax but have not yet explored why we might use them over SQL*Loader. It is the case that SQL*Loader can parse and load almost any flat-file we wish to throw at it. External tables, on the other hand, cater for the more common processing requirements. Despite this, their advantages over SQL*Loader are numerous, some of which are as follows:
  • Ease of use: External tables can be selected, sorted, filtered, joined, intersected, minused, unioned and so on using SQL, the language most familiar database to Oracle developers. Anything we can do in a SELECT statement with a "normal" table can be done with an external table. This makes working with external flat-files very simple.
  • Performance (1): reads/loads involving external tables can be paralleled  When combined with direct path (in the case of INSERTs), this dramatically outperforms SQL*Loader which has to load in serial mode. Parallelism can be set at the external table level or more selectively in hints.
  • Performance (2): as seen above, loading a table from an external table is faster than SQL*Loader due to parallel query and DML. In addition to this, ETL processes that read directly from external tables rather than pre-loaded staging tables are faster because they do not incur the SQL*Loader step (i.e. data is only read once). For example, if an ETL process takes 10 minutes in SQL*Loader and 10 minutes in SQL or PL/SQL loading, using external tables directly in the latter process can eradicate up to 50% of the ETL time.
  • Disk Space: External tables do not require any database space; only the space consumed by the flat-files on the filesystem. SQL*Loader requires two copies of each file (one inside the database in a staging table), so external tables are "cheaper" on disk.
  • Error-trapping: SQL*Loader returns different codes based on its outcome. For batch systems, this can be tricky because the error code is sometimes ambiguous. For example, exit code 2 means "a bad file has been created". Of course, this may or may not be a cause for concern. If we allow 50 bad records (errors=50), then the fact that a bad file contains 1-49 bad records should not signal a batch failure. We therefore need to write some clever code to interrogate the badfile or logfile to determine whether this is a "bad code 2" or "acceptable code 2". With external tables, it is much more simple. Until we reach the REJECT LIMIT, the SQL statement continues or completes successfully. If we reach this limit, the statement fails.
  • Debugging and support: External tables are equally useful as debugging and support aids. For example, we can create further external tables over logfiles and badfiles to investigate errors easily with SQL. DBAs can also create external tables over critical files such as the alert log.
We can see that the advantages of external tables are compelling. In our batch systems, we can use them in one of two ways and still achieve their benefits. First, and for maximum benefit, we can completely remove the need for staging tables altogether and use the tables directly in our ETL code. Second, and for medium benefit, we can use them as a straight replacement for SQL*Loader and therefore load our staging tables in parallel direct-path.

There are, however, a few issues to be aware of that might limit the scope of how we use these tables. For example, external tables can only have one location set at a time (i.e. different sessions cannot share a table but set different locations). This means that their use for loading is serialised for the length of time a specific location is required and in use (and probably should be protected as such). If multiple sessions need to share the same table but load different files at the same time, then either multiple tables must be created or some form of locking will be required. For the latter scenario, the length of time a table is locked should be reduced to a minimum to maximise concurrency. For this reason, multi-user external tables will probably not figure in long-running batch processes.

In addition to this, external tables cannot be indexed. For most staging tables, this won't be an issue as we would prefer hash joins for ETL batch queries, but in some cases indexes might be required. For these, it would probably be sensible to use external tables to load the physical staging tables rather than use them directly in the ETL queries.

0 comments:

You Might Also Like

Related Posts with Thumbnails

Pages