On 4/8/11 2:05 PM, CRPence wrote:
On 4/8/11 12:15 PM, James Lampert wrote:
I have a problem with phone number searching in a database.
The problem is that the phone numbers are not stored consistently.
Most U.S. phone numbers are stored with the punctuation squeezed out.
But there are a few that are stored including the punctuation.
Most non-U.S. phone numbers are stored "as-entered." But again,
there may be exceptions.
And we don't know which of these possibilities we're looking for
until after we find it.
Is there an SQL construct, that works back to V4R4, in which, if
we're given a raw value of, say, (714) 555-1212, we will find both
"(714) 555-1212" and "7145551212" in the file?
<<SNIP>>
If the assumption that the data is always three groups of digits AAA,
PPP, and NNNN holds, and if those digit-groups are stored in the format
AAAPPPNNNN with the only non-digit data inserted between or around each
group of digits stored in the column PHONENBR, then the following search
would locate both of the noted sample [or other similarly formatted]
phone number values. However such a generic selection almost ensures a
full-table-scan implementation:
select ...
WHERE PhoneNbr LIKE '%AAA%PPP%NNNN%'
That might be written instead, given variables A, P, and N that are the
character strings for each of the digit-groups, as:
select ...
WHERE PhoneNbr LIKE '%' concat :A
             concat '%' concat :P
             concat '%' concat :N concat '%'
<<SNIP>>
  FWiW, something of the following might be of interest or useful.
  An expression defining the value for the LIKE predicate could 
probably be generated from a similarly "inconsistent" input value, or 
even better, from a well-formed\consistent input value.  Consider the 
following scripted actions and results [where the sample input is the 
literal '(714):555[1212]-' which is followed by the same example 
specifying the variable :P].  Replacing all digits to a common character 
like with some typical formatting characters, then grouping the results 
with an optional COUNT(*) might be of interest to review for the most 
common formats stored in the data; maybe to choose into what format an 
update trigger might best effect to minimize changes.
<code>
  > drop   table qtemp/p
   Drop of P in QTEMP complete.
  > create table qtemp/p (pn char(18))
   Table P created in QTEMP.
  > insert into qtemp/p values
      ('(714) 555-1212') ,('(714)555-1212') ,('(714) 5551212')
     ,('714 555 1212'  ) ,('714-555-1212' ) ,('7145551212'   )
     ,('[714](555)1212') ,('a714\\p555n1212x34'),('714: 555.1212 x:44')
   9 rows inserted in P in QTEMP.
  > select pn,replace( translate( pn,' ','[]()-.:_<>'), ' ', '%')
    from qtemp/p where pn like
              replace( translate( '(714):555[1212]-',' ','[]()-.:_<>')
                          , ' ', '%')
  -- or in a program
  > select pn,replace( translate( pn,' ','[]()-.:_<>'), ' ', '%')
    from qtemp/p where pn like
              replace( translate( :P,' ','[]()-.:_<>'), ' ', '%')
   ....+....1....+....2....+....3....+...
  PN                  REPLACE
   (714) 555-1212      %714%%555%1212%%%%
   (714)555-1212       %714%555%1212%%%%%
   (714) 5551212       %714%%5551212%%%%%
   714 555 1212        714%555%1212%%%%%%
   714-555-1212        714%555%1212%%%%%%
   7145551212          7145551212%%%%%%%%
   [714](555)1212      %714%%555%1212%%%%
   a714\\p555n1212x34  a714\\p555n1212x34
   714: 555.1212 x:44  714%%555%1212%x%44
   ********  End of data  ********
  > select pn,
    from qtemp/p where pn like
              replace( translate( '(714):555[1212]-',' ','[]()-.:_<>')
                          , ' ', '%')
  -- or in a program
  > select pn,replace( translate( pn,' ','[]()-.:_<>'), ' ', '' )
    from qtemp/p where pn like
              replace( translate( :P,' ','[]()-.:_<>'), ' ', '%')
  ....+....1....+....2....+....3....+...
  PN                  REPLACE
   (714) 555-1212      7145551212
   (714)555-1212       7145551212
   (714) 5551212       7145551212
   714 555 1212        7145551212
   714-555-1212        7145551212
   7145551212          7145551212
   [714](555)1212      7145551212
   a714\\p555n1212x34  a714\\p555n1212x34
   714: 555.1212 x:44  7145551212x44
   ********  End of data  ********
</code>
  Or because the expression defining the value for the predicate should 
be evaluated just once for the query, a UDF [or even several] to effect 
some of the /better/ searches described in [or as can be inferred from] 
prior examples is possible.  For example a function which strips out the 
non-digit characters can be used to recombine each of the digit groups 
with the zero-to-many percent sign search character.
<code>
  -- Instead of:
    select ...
    WHERE PhoneNbr LIKE 'AAA%PPP%NNNN%'
      OR  PhoneNbr LIKE '(AAA%PPP%NNNN%'
  -- Perhaps:
    select ...
    WHERE PhoneNbr LIKE BldLikeAPN(:P) /* Gen: 'AAA%PPP%NNNN%' */
      OR  PhoneNbr LIKE '(' concat BldLikeAPN(:P)
</code>
  Or like the last example in the quoted message... Outside of SQL or 
using SQL, set the variables AAA, PPP, and NNNN, to the digit-groups 
[strings] of the digits AAAPPPNNNN and then perhaps the following:
<code>
     create table qtemp/pn (pn char(18))
     Exec SQL
      insert into qtemp/pn
      select pn
            ,replace( translate( pn,' ', '[]()-.:_<>'), ' ', '' ) px
      from p
      where
           pn = :P /* Exact Match to  the value typed by user */
      --    :P     20A   inz('(714)-555-1212')
       OR  pn like replace( translate( :P,' ', '[]()-.:_<>'), ' ', '')
                       concat '%'
       OR  pn like :aaa concat '%' concat :ppp
                        concat '%' concat :nnnn concat '%'
       OR  pn like '(' concat :aaa concat '%' concat :ppp
                       concat '%' concat :nnnn concat '%'
      ;
</code>
  Or perhaps [aiming for "prettier" SQL with more and more complex 
tests] move all of the common possible formatting options, for both 
EQuivalence and LiKe, into UDFs; being sure to avoid a replacement 
character [percent or underscore] as the first byte in hopes of enabling 
the implementation by an index:
      where
           pn = :EM    /* Exact Match value typed by user */
       OR  pn = PhoneStyleUSAEQ(:EM) /* (123) 456-7890 */
       OR  pn = PhoneStyleUS1EQ(:EM) /* (123)456-7890  */
       OR  pn = PhoneStyleUS2EQ(:EM) /* 123-456-7890   */
       OR  pn like PhStyleDIGLK(:EM) /* 1234567890%    */
       OR  pn like PhStyleUS1LK(:EM) /* (123%456%7890% */
       OR  pn like PhStyleUS2LK(:EM) /* 123%456%7890%  */
      where
           pn IN ( :EM /* Exact Match value typed by user */
                 , PhoneStyleUSAEQ(:EM) /* (123) 456-7890 */
                 , PhoneStyleUS1EQ(:EM) /* (123)456-7890  */
                 , PhoneStyleUS2EQ(:EM) /* 123-456-7890   */
                 )
       OR  pn like PhoneStyleDIGLK(:EM) /* 1234567890%    */
       OR  pn like PhoneStyleUS1LK(:EM) /* (123%456%7890% */
       OR  pn like PhoneStyleUS2LK(:EM) /* 123%456%7890%  */
</code>
As an Amazon Associate we earn from qualifying purchases.