Yan,

I wouldn't want to try to break this up myself...

Do you have 5722-SS1 option 26 - DB2® Symmetric Multiprocessing
installed so that the system can automatically multithread the query?

Do you have 5722-SS1 option 26 - DB2 Multisystem installed so that you
can partition the table?

I would consider the above options "must have" in a DW environment on the i.

Have you run the query though Visual Explain to see what it says?

Lastly, if it's that important the the query return quickly, your only
option may be a materialized query table or some other pre-calculated
summary table that you can use to get the answer more quickly.

HTH,
Charles Wilt


On Wed, Mar 31, 2010 at 9:46 PM, Yan Zhang <yzhtech@xxxxxxxxx> wrote:
I have a question about how to improve the performance of a SQL query by
using multiple threads/connections.

Here is my query:

select A1.aID,D1.dID ,
decimal(count(distinct case when(tdate between date('2009-10-01') and
date('2010-01-31') ) THEN
f1.dist_acct else null end)) count1,
decimal(count(distinct case when(tdate between date('2008-10-01') and
date('2009-01-31') ) THEN
f1.dist_acct else null end)) as count2
from fctsales f1,dimtime t1,dimaccts a1,dimdist D1,dimitem I1
where t1.date = f1.date and a1.dist_acct = f1.dist_acct and D1.DIST =
f1.DIST and I1.ITEM = f1.ITEM
and ((t1.tdate between date('2009-10-01') and date('2010-01-31')) or
(t1.tdate between date('2008-10-01') and date('2009-01-31')))
group by A1.aID,D1.dID;


Table information:
fctsales (100 million records) -- 4 fields: dist, dist_acct, item, date, and
the primary key is <dist_acct, item, date>;
dimaccts (1 million records) -- 10 fields: dist_acct, aID, .... primary key
is <dist_acct>;
dimdist (500 records) -- 10 fields: dist, dID,.... primary key is <dist>;
dimitem (200 records) -- 10 fields: item, .... primary key is <item>;
dimtime (1500 records, around 2 years dates) -- 10 fields: date, ... primary
key is <date>.

Requirement: get the results from the above query within 10-15 seconds for
various date ranges.

------------------------------------------------
Because the fctsales table is huge, we have to use other resources to trade
with the speed. Here is my thoughs: Whether we can split the fctsales table
to some subsets (say, 5 subsets), run these 5 subset parallel, and combine
the results. For example, for above query (assuming there are 5 possible dID
in Dimdist table), we can split it into:

select A1.aID,D1.dID,
decimal(count(distinct case when(tdate between date('2009-10-01') and
date('2010-01-31') ) THEN f1.dist_acct else null end)) count1,
decimal(count(distinct case when(tdate between date('2008-10-01') and
date('2009-01-31') ) THEN f1.dist_acct else null end)) as count2
from fctsales f1,dimtime t1,dimaccts a1,dimdist D1,dimitem I1
where t1.date = f1.date and a1.dist_acct = f1.dist_acct and D1.DIST =
f1.DIST and I1.ITEM = f1.ITEM
and D1.dID in ('0001')
and ((t1.tdate between date('2009-10-01') and date('2010-01-31')) or
(t1.tdate between date('2008-10-01') and date('2009-01-31')))
group by A1.aID,D1.dID;

select A1.aID,D1.dID,
decimal(count(distinct case when(tdate between date('2009-10-01') and
date('2010-01-31') ) THEN f1.dist_acct else null end)) count1,
decimal(count(distinct case when(tdate between date('2008-10-01') and
date('2009-01-31') ) THEN f1.dist_acct else null end)) as count2
from fctsales f1,dimtime t1,dimaccts a1,dimdist D1,dimitem I1
where t1.date = f1.date and a1.dist_acct = f1.dist_acct and D1.DIST =
f1.DIST and I1.ITEM = f1.ITEM
and D1.dID in ('0002')
and ((t1.tdate between date('2009-10-01') and date('2010-01-31')) or
(t1.tdate between date('2008-10-01') and date('2009-01-31')))
group by A1.aID,D1.dID;

.
.
.

select A1.aID,D1.dID,
decimal(count(distinct case when(tdate between date('2009-10-01') and
date('2010-01-31') ) THEN f1.dist_acct else null end)) count1,
decimal(count(distinct case when(tdate between date('2008-10-01') and
date('2009-01-31') ) THEN f1.dist_acct else null end)) as count2
from fctsales f1,dimtime t1,dimaccts a1,dimdist D1,dimitem I1
where t1.date = f1.date and a1.dist_acct = f1.dist_acct and D1.DIST =
f1.DIST and I1.ITEM = f1.ITEM
and D1.dID in ('0005')
and ((t1.tdate between date('2009-10-01') and date('2010-01-31')) or
(t1.tdate between date('2008-10-01') and date('2009-01-31')))
group by A1.aID,D1.dID;


We can perform "union" to union the results. Is this idea reasonable?

If I want to use Java stored procedure to implement it, I want to use
muliple threads (5 threads in this example) to get the database connection,
so that these 5 queries can run at the same time. Then I will get 5
resultSet. But how to union these resultSet in my Java stored procedure
becomes another problem. It is even more complex if the combined resultSet
has to be joined with resultSet generated from other queries.

Any comments are welcome!


thanks,
yan
--
This is the Java Programming on and around the iSeries / AS400 (JAVA400-L) mailing list
To post a message email: JAVA400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/java400-l
or email: JAVA400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/java400-l.



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.