Doug,
First, I would avoid using the OVER(PARTION in this query. The only
time I've had a need for this is when I'm needing row numbering,
counting or ranking. I do not believe it is meant to be used in a
subquery as your example.
What I would recommend is adjusting your subquery. For performance
reasons, do not select * for the subquery. By doing so, the systems is
having to read/return data that you do not need/use. All you care about
is if C is in the subquery table.
Select t1.a, t1.b, t1.c, t1.d from tablea t1 where
t1.a = 100 and
t1.b = 2 and
t1.c in (select t2.c from tableb t2 where t2.ZZ = t1.c order by t2.ZZ DESC FETCH FIRST 1 ROWS ONLY);
By using the above, you limit the return values to just C column; and with the FETCH FIRST 1 ROWS ONLY, you limit your dataset. Because there are multiple criteria in the primary select statement, the subquery appears to only care if a record exists; but not necessarily how many times it exists. If you do care about the number of times it exists, just remove the FETCH FIRST 1 ROWS ONLY.
If you limit your returned results in your subquery, I would expect you will see a huge performance improvement.
If you are using the System i Navigator or Navigator for i, you may want to consider running the SQL Performance Monitor; specifically with Visual Explain. This will help troubleshoot your query and better understand its performance. More details on the SQL Performance Monitor (starting on slide 62):
http://www.quser.org/sites/default/files/2013-10_V7R1%20DBA%20TOOLS%20with%20IBM%20i%20Navigator.pdf I typically invoke it when I ran an SQL Script in the run query window and "explain" it. There are other ways to invoke it, too.
While I do not have a table with 166 million rows, I do have several that are almost 20 million. Even on a 20-30 million row dataset, it's amazing how much difference a "select *" vs "select x" in a subquery will make.
Also, with that many rows, I really hope you have at least one index. If not, you really should consider adding an index, not only on the master column, but possible on column c (in the subquery table).
Hope that helps.
-JA-
Jason Aleski / IT Specialist
On 7/5/2016 6:38 PM, Englander, Douglas wrote:
Does anyone know if it is possible to perform a SQL OVER(PARTITION BY...) as a subselect? I want to only perform the OVER(PARTITION BY for the row that is being selected in a SELECT statement. I would rather not execute a OVER(PARTITION BY over the whole second table, since it causes 166,000,000+ rows to be partitioned that don't need to be.
I am selecting rows from a main table, and using a column in that table to narrow down the PARTITION BY to only select rows in the second table that match what is selected from the main table. I need the partition by as a further criteria on the main table select so I only include rows from the main where a specific selection exists in the second table, and that condition needs to be set up as a partition by.
For example:
Select a, b, c, d from tablea where
A = 100 and
B = 2 and
C in (select * from (OVER(PARTION BY SYNTAX built only over rows in tableb that have a value in a column ZZ that equals the value in C)))
We are on 7.1
Thank you,
Doug
As an Amazon Associate we earn from qualifying purchases.