Re: SQL question -- MIDRANGE-L

On 10/18/13 1:59 PM, William Howie wrote:

I'd like to bounce a question off the SQL gurus on the list. I have
a file that has 3 records like this:

FILE_A:
Store 1

PIN 1234567

Register 1

PIN 1234567

Receipt 1

PIN 1234567

So, 3 records, one each for store, register, and receipt, all with
the same PIN value.

So... Using actual SQL DDL and DML versus the above [IMO unclear] attempt used to describe the data, we apparently have the following?:

create table FILE_A
( store dec
, register dec
, receipt dec
, pin int
)
;
insert into FILE_A values
( 1, null, null, 1234567 )
,( null, 1, null, 1234567 )
,( null, null, 1, 1234567 )
;

<<SNIP>> an SQL statement to read the "FILE_A" file shown in the
table, trying to retrieve the PIN number (the 1234567) that is common
to all 3 records. Here is that SQL:

select PIN from FILE A where
STORE = 1
or REGISTER = 1
or RECEIPT = 1
group by PIN having count(*) = 3;

Does the count(*) have to equal three? Perhaps anything other than three is an indication of a problem with the data; i.e. perhaps there should always be exactly three for any one value, which in this case, is one? Yet perhaps an assumption can be made that the data is not in error, thus avoiding the requirement to count? In association with those inquiries... What are the constraints for the data? Are there assumptions that can be made safely about the value of the other two columns when each of those equal predicates is true? To be clear, in a record with STORE=1, must both REGISTER and RECEIPT be the NULL value [or a known default? Effectively, is there or should there be a UNIQUE INDEX across (STORE, REGISTER, RECEIPT, PIN)? Such an unique access path, which although preventing more than three, does not ensure any case of fewer than three...

This works. The problem is that it's a DOG from a performance
standpoint, due to the "verticalness" of the data setup. Can anyone
out there recommend a better SQL script, or maybe even just a
straight set of code without SQL, that would return the same thing
and be a lot faster? I'd appreciate any suggestions. Thanks!

Yet that unique access path would allow /index-only/ access to very quickly and efficiently implement that grouping query... given just a simple revision, to explicitly enforce\match that unique constraint in the predicates.

Without the unique Access Path, either of the following may be able to perform quicker than the GROUP BY with HAVING clause, but effecting the same as if the HAVING predicate had changed to COUNT(*)>=3 which may not be appropriate.? However, only given some other access paths exist; e.g. there should be one for PIN, best also on each of STORE, REGISTER, and RECEIPT because they are all in equals predicates, but each or all of the latter three probably best also include PIN as a second key field:

select distinct t1.PIN /* distinct hides count(*)>3 effect */
from FILE_A t1
where t1.store = 1 /* or instead: t1.store = :hv */
and exists
( select '1' from FILE_A t2
where t2.register = t1.store
and t2.pin = t1.pin )
and exists
( select '1' from FILE_A t3
where t3.receipt = t1.store
and t3.pin = t1.pin )

select distinct t1.PIN /* distinct hides count(*)>3 effect */
from FILE_A t1
inner join FILE_A t2
on t1.store = t2.register
and t1.pin = t2.pin
inner join FILE_A t3
on t1.store = t3.receipt
and t1.pin = t3.pin
where t1.store = 1 /* or instead: t1.store = :hv */