On 10/18/13 1:59 PM, William Howie wrote:
I'd like to bounce a question off the SQL gurus on the list. I have
a file that has 3 records like this:
FILE_A:
Store 1
PIN 1234567
Register 1
PIN 1234567
Receipt 1
PIN 1234567
So, 3 records, one each for store, register, and receipt, all with
the same PIN value.
So... Using actual SQL DDL and DML versus the above [IMO unclear]
attempt used to describe the data, we apparently have the following?:
create table FILE_A
( store dec
, register dec
, receipt dec
, pin int
)
;
insert into FILE_A values
( 1, null, null, 1234567 )
,( null, 1, null, 1234567 )
,( null, null, 1, 1234567 )
;
<<SNIP>> an SQL statement to read the "FILE_A" file shown in the
table, trying to retrieve the PIN number (the 1234567) that is common
to all 3 records. Here is that SQL:
select PIN from FILE A where
STORE = 1
or REGISTER = 1
or RECEIPT = 1
group by PIN having count(*) = 3;
Does the count(*) have to equal three? Perhaps anything other than
three is an indication of a problem with the data; i.e. perhaps there
should always be exactly three for any one value, which in this case, is
one? Yet perhaps an assumption can be made that the data is not in
error, thus avoiding the requirement to count? In association with
those inquiries... What are the constraints for the data? Are there
assumptions that can be made safely about the value of the other two
columns when each of those equal predicates is true? To be clear, in a
record with STORE=1, must both REGISTER and RECEIPT be the NULL value
[or a known default? Effectively, is there or should there be a UNIQUE
INDEX across (STORE, REGISTER, RECEIPT, PIN)? Such an unique access
path, which although preventing more than three, does not ensure any
case of fewer than three...
This works. The problem is that it's a DOG from a performance
standpoint, due to the "verticalness" of the data setup. Can anyone
out there recommend a better SQL script, or maybe even just a
straight set of code without SQL, that would return the same thing
and be a lot faster? I'd appreciate any suggestions. Thanks!
Yet that unique access path would allow /index-only/ access to very
quickly and efficiently implement that grouping query... given just a
simple revision, to explicitly enforce\match that unique constraint in
the predicates.
Without the unique Access Path, either of the following may be able
to perform quicker than the GROUP BY with HAVING clause, but effecting
the same as if the HAVING predicate had changed to COUNT(*)>=3 which may
not be appropriate.? However, only given some other access paths exist;
e.g. there should be one for PIN, best also on each of STORE, REGISTER,
and RECEIPT because they are all in equals predicates, but each or all
of the latter three probably best also include PIN as a second key field:
select distinct t1.PIN /* distinct hides count(*)>3 effect */
from FILE_A t1
where t1.store = 1 /* or instead: t1.store = :hv */
and exists
( select '1' from FILE_A t2
where t2.register = t1.store
and t2.pin = t1.pin )
and exists
( select '1' from FILE_A t3
where t3.receipt = t1.store
and t3.pin = t1.pin )
select distinct t1.PIN /* distinct hides count(*)>3 effect */
from FILE_A t1
inner join FILE_A t2
on t1.store = t2.register
and t1.pin = t2.pin
inner join FILE_A t3
on t1.store = t3.receipt
and t1.pin = t3.pin
where t1.store = 1 /* or instead: t1.store = :hv */
As an Amazon Associate we earn from qualifying purchases.