Running an SQL query with the IN predicate performs very poorly when there are a large number of values in the comparison list.

Document ID : KB000033237
Last Modified Date : 14/02/2018
Show Technical Document Details

Symptoms:  

When running an SQL query with an "IN" predicate in the WHERE clause, the query runs fine as long as we don't exceed a certain number of entries in the list. 

Is there a limit as to the number of values that can be specified in an “IN” list? 

What if we need to specify more values than that?  

 

Environment: 

This issue can arise on all operating systems, and with queries retrieving data from either SQL or network databases, or both.

 

Cause:

If a query experiences a change in performance when it is altered, that is almost certainly due to a different path being taken through the database on the different queries. 

 

Resolution/Workaround:

There are indeed limits for various aspects of SQL, including syntax limits. These are documented in appendix B of the SQL Reference manual. However, it can be confusing as to how the limit applies to a particular SQL statement, because the limit is applied after the compiler processes the statement and translates the clauses into how they will be processed. An 'IN' for example, is actually an "OR-list" so it's a series of OR clauses, each clause representing a statement that compares the table column to a specific value in the list. 

If a limit is exceeded, an error message will be produced indicating that has occurred. In some cases, the query will not receive an error; it will just encounter differences in performance time once it exceeds a certain number of parameter values. 

This change in performance is almost certainly due to a different path being taken through the database on the different queries. That can be verified using the EXPLAIN command, which is documented in the CA IDMS SQL Reference. 

There are a couple of options available if you need to process more comparisons than you are able to efficiently with an IN predicate. 

1- Probably the easiest & most efficient to execute is to create a table to hold the values you want to compare. If the values are mostly static, you can store them in the table and update them as necessary. If they're not know until run-time, you can store them in the table at that point. Then issue a join between the column currently in the IN clause, and the column of the table where the values are stored. 

2- Split the current statement into multiple statements, each one using a subset of the desired list of IN values. Store the results of each query in a temporary table. When all the queries are processed, retrieve the results from the temporary table. 

3- There are some variations on the second option above, such as storing the IN values in an array and processing various number of rows from the array in each command, using a variable to represent the subscript for the place in the array. 

4- Examine the results of the EXPLAIN command and determine if the performance could be improved by adding an additional access method (such as a new index) or by tuning an existing access method.

 

Additional Information:

The limits for various aspects of SQL, including syntax limits, are documented in appendix B of the CA IDMS SQL Reference manual. All manuals can be downloaded from the CA Support Online site.

 

More details about using the EXPLAIN command to determine the access path that will be taken to satisfy a specific query can be found in Knowledge Documents TEC477873 & FAQ260934