The IN Operator
The IN operator allows you to specify multiple values in a WHERE clause.
SQL IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1,value2,...);
The EXISTS Operator
The EXISTS checks the existence of a result of a subquery. The EXISTS subquery tests whether a subquery fetches at least one row. When no data is returned then this operator returns 'FALSE'.
A valid EXISTS subquery must contain an outer reference and it must be a correlated subquery.
The select list in the EXISTS subquery is not actually used in evaluating the EXISTS so it can contain any valid select list.
SQL JOIN
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
Tables in a database are often related to each other with keys.
A primary key is a column (or a combination of columns) with a unique value for each row. Each primary key value must be unique within the table. The purpose is to bind data together, across tables, without repeating all of the data in every table.
Explaining IN, EXISTS and JOIN:
1. Getting details of all employee having salary more than avg salary of their department.
Wrong:
SELECT LAST_NAME, SALARY, DEPARTMENT_ID
FROM EMPLOYEES A
WHERE SALARY > ( SELECT AVG(SALARY)
FROM EMPLOYEES B
WHERE B.EMPLOYEE_ID = A.EMPLOYEE_ID
GROUP BY DEPARTMENT_ID);
Right:
SELECT LAST_NAME, SALARY, A.DEPARTMENT_ID, I.AVGSAL
FROM EMPLOYEES A, (SELECT AVG(SALARY) AVGSAL, DEPARTMENT_ID
FROM EMPLOYEES GROUP BY DEPARTMENT_ID ) I
WHERE A.DEPARTMENT_ID = I.DEPARTMENT_ID
AND SALARY > ( SELECT AVG(SALARY)
FROM EMPLOYEES
WHERE DEPARTMENT_ID = A.DEPARTMENT_ID);
2. Displaying records of employee who have switched job at least twice.
Wrong:
SELECT EMPLOYEE_ID, LAST_NAME, JOB_ID
FROM EMPLOYEES A
WHERE EXISTS ( SELECT 'X' FROM JOB_HISTORY
WHERE EMPLOYEE_ID = A.EMPLOYEE_ID
HAVING COUNT(*) > 2);
Right:
SELECT EMPLOYEE_ID, LAST_NAME, JOB_ID
FROM EMPLOYEES A
WHERE 2 <= ( SELECT count(*) FROM JOB_HISTORY
WHERE EMPLOYEE_ID = A.EMPLOYEE_ID);
3. EXISTS ensures search in the inner query does not continue when at least one match is found. EXISTS operator tests for existence of rows in the results set of the subquery.
SELECT ENAME, DEPTNO FROM EMP
WHERE EXISTS ( SELECT 'X' FROM DEPT
WHERE DEPTNO = &DEPT_NUM);
4. Playing Employee Manager Self Join:
SQL> select * from emp;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-87 1100 20
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7900 JAMES CLERK 7698 03-DEC-81 950 30
7902 FORD ANALYST 7566 03-DEC-81 3000 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10
14 rows selected.
A). Want to find out manager of each employee
SELECT E.EMPNO, E.ENAME EMPLOYER, E.JOB EmpJob, E.MGR, M.ENAME Managers, M.JOB ManagerJob
FROM EMP E, EMP M
WHERE E.MGR = M.EMPNO
ORDER BY E.EMPNO;
EMPNO EMPLOYER EMPJOB MGR MANAGERS MANAGERJO
---------- ---------- --------- ---------- ---------- ---------
7369 SMITH CLERK 7902 FORD ANALYST
7499 ALLEN SALESMAN 7698 BLAKE MANAGER
7521 WARD SALESMAN 7698 BLAKE MANAGER
7566 JONES MANAGER 7839 KING PRESIDENT
7654 MARTIN SALESMAN 7698 BLAKE MANAGER
7698 BLAKE MANAGER 7839 KING PRESIDENT
7782 CLARK MANAGER 7839 KING PRESIDENT
7788 SCOTT ANALYST 7566 JONES MANAGER
7844 TURNER SALESMAN 7698 BLAKE MANAGER
7876 ADAMS CLERK 7788 SCOTT ANALYST
7900 JAMES CLERK 7698 BLAKE MANAGER
EMPNO EMPLOYER EMPJOB MGR MANAGERS MANAGERJO
---------- ---------- --------- ---------- ---------- ---------
7902 FORD ANALYST 7566 JONES MANAGER
7934 MILLER CLERK 7782 CLARK MANAGER
13 rows selected.
B). Now we need to find the managers and employees who report to them.
SELECT M.EMPNO, M.ENAME Manager, M.JOB ManagerJ, E.EMPNO, E.ENAME Employes, E.JOB EmployerJ
FROM EMP E, EMP M
WHERE E.MGR = M.EMPNO;
EMPNO MANAGER MANAGERJ EMPNO EMPLOYES EMPLOYERJ
---------- ---------- --------- ---------- ---------- ---------
7566 JONES MANAGER 7902 FORD ANALYST
7566 JONES MANAGER 7788 SCOTT ANALYST
7698 BLAKE MANAGER 7844 TURNER SALESMAN
7698 BLAKE MANAGER 7499 ALLEN SALESMAN
7698 BLAKE MANAGER 7521 WARD SALESMAN
7698 BLAKE MANAGER 7900 JAMES CLERK
7698 BLAKE MANAGER 7654 MARTIN SALESMAN
7782 CLARK MANAGER 7934 MILLER CLERK
7788 SCOTT ANALYST 7876 ADAMS CLERK
7839 KING PRESIDENT 7698 BLAKE MANAGER
7839 KING PRESIDENT 7566 JONES MANAGER
EMPNO MANAGER MANAGERJ EMPNO EMPLOYES EMPLOYERJ
---------- ---------- --------- ---------- ---------- ---------
7839 KING PRESIDENT 7782 CLARK MANAGER
7902 FORD ANALYST 7369 SMITH CLERK
13 rows selected.
C). Find out employee who have at least one person reporting to them.
SQL> SELECT DISTINCT M.MGR, E.ENAME, E.JOB
2 FROM EMP M, EMP E
3 WHERE M.MGR = E.EMPNO
4 AND M.MGR IS NOT NULL;
MGR ENAME JOB
---------- ---------- ---------
7839 KING PRESIDENT
7782 CLARK MANAGER
7788 SCOTT ANALYST
7566 JONES MANAGER
7698 BLAKE MANAGER
7902 FORD ANALYST
6 rows selected.
II. Perfoming point C using EXISTS clause
==========================================
SQL> SELECT EMPNO, ENAME, JOB
2 FROM EMP E
3 WHERE EXISTS ( SELECT 'X' FROM EMP WHERE MGR=E.EMPNO);
EMPNO ENAME JOB
---------- ---------- ---------
7566 JONES MANAGER
7698 BLAKE MANAGER
7782 CLARK MANAGER
7788 SCOTT ANALYST
7839 KING PRESIDENT
7902 FORD ANALYST
6 rows selected.
Perfoming this operation using IN:
----------------------------------
SELECT EMPNO, ENAME, JOB
FROM EMP
WHERE EMPNO IN ( SELECT MGR FROM EMP WHERE MGR IS NOT NULL);
Checking Execution plan for the above three methods:
====================================================
1. Using Self Join:
===================
SQL> SELECT DISTINCT M.MGR, E.ENAME, E.JOB
2 FROM EMP M, EMP E
3 WHERE M.MGR = E.EMPNO
4 AND M.MGR IS NOT NULL;
MGR ENAME JOB
---------- ---------- ---------
7839 KING PRESIDENT
7782 CLARK MANAGER
7788 SCOTT ANALYST
7566 JONES MANAGER
7698 BLAKE MANAGER
7902 FORD ANALYST
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1982370969
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 13 | 286 | 7 (29)| 00:00:01 |
| 1 | HASH UNIQUE | | 13 | 286 | 7 (29)| 00:00:01 |
| 2 | MERGE JOIN | | 13 | 286 | 6 (17)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| EMP | 14 | 252 | 2 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | PK_EMP | 14 | | 1 (0)| 00:00:01 |
|* 5 | SORT JOIN | | 13 | 52 | 4 (25)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | EMP | 13 | 52 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("M"."MGR"="E"."EMPNO")
filter("M"."MGR"="E"."EMPNO")
6 - filter("M"."MGR" IS NOT NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
9 consistent gets
0 physical reads
0 redo size
685 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
Using EXISTS
=============
SQL> SELECT EMPNO, ENAME, JOB
2 FROM EMP E
3 WHERE EXISTS ( SELECT 'X' FROM EMP WHERE MGR=E.EMPNO);
EMPNO ENAME JOB
---------- ---------- ---------
7566 JONES MANAGER
7698 BLAKE MANAGER
7782 CLARK MANAGER
7788 SCOTT ANALYST
7839 KING PRESIDENT
7902 FORD ANALYST
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 71815250
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 132 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN SEMI | | 6 | 132 | 6 (17)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 14 | 252 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | PK_EMP | 14 | | 1 (0)| 00:00:01 |
|* 4 | SORT UNIQUE | | 13 | 52 | 4 (25)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | EMP | 13 | 52 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("MGR"="E"."EMPNO")
filter("MGR"="E"."EMPNO")
5 - filter("MGR" IS NOT NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
11 consistent gets
0 physical reads
0 redo size
687 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
Using IN
=========
SQL> SELECT EMPNO, ENAME, JOB
2 FROM EMP
3 WHERE EMPNO IN ( SELECT MGR FROM EMP WHERE MGR IS NOT NULL);
EMPNO ENAME JOB
---------- ---------- ---------
7566 JONES MANAGER
7698 BLAKE MANAGER
7782 CLARK MANAGER
7788 SCOTT ANALYST
7839 KING PRESIDENT
7902 FORD ANALYST
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 71815250
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 132 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN SEMI | | 6 | 132 | 6 (17)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 14 | 252 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | PK_EMP | 14 | | 1 (0)| 00:00:01 |
|* 4 | SORT UNIQUE | | 13 | 52 | 4 (25)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | EMP | 13 | 52 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("EMPNO"="MGR")
filter("EMPNO"="MGR")
5 - filter("MGR" IS NOT NULL)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11 consistent gets
0 physical reads
0 redo size
687 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6 rows processed
Joining two tables using EXISTS and Joins
===========================================
SELECT EMPNO, ENAME, DEPTNO
FROM EMP E
WHERE EXISTS ( SELECT 'X'
FROM DEPT
WHERE DEPTNO = E.DEPTNO);
SELECT EMPNO, ENAME, E.DEPTNO
FROM EMP E, DEPT D
WHERE E.DEPTNO = D.DEPTNO;
EMPNO ENAME DEPTNO
---------- ---------- ----------
7369 SMITH 20
7499 ALLEN 30
7521 WARD 30
7566 JONES 20
7654 MARTIN 30
7698 BLAKE 30
7782 CLARK 10
7788 SCOTT 20
7839 KING 10
7844 TURNER 30
7876 ADAMS 20
EMPNO ENAME DEPTNO
---------- ---------- ----------
7900 JAMES 30
7902 FORD 20
7934 MILLER 10
14 rows selected.
When to use EXISTS and When to use IN:
Lets say the result of the subquery
( select y from T2 )
is "huge" and takes a long time. But the table T1 is relatively small and executing ( select null from t2 where y = x.x ) is very very fast (nice index on t2(y)). Then the exists will be faster as the time to full scan T1 and do the index probe into T2 could be less then the time to simply full scan T2 to build the subquery we need to distinct on.
Lets say the result of the subquery is small -- then IN is typicaly more appropriate.
If both the subquery and the outer table are huge -- either might work as well as the other -- depends on the indexes and other factors.