[ACE] String matching 1

Definition of string matching problem

informal: given a text(sequence of characters) T of length n, and a pattern P(sequence of characters) of length m, determine if P occurs in T

Naive string matching algorithm

NaiveStringMatch(T, P)
    for(s = 0; s < T.length - P.length; s++)
        j = 1;
        while(T[s+j] == P[j] and j <= P.length)
            j++;
        if(j == P.length+1)
            return s;
    return -1;
  • note that the array started at index 1 instead of 0
  • note that s is a shift(not an array index) and so counts from 0 not 1
  • note also that if T.length < P.length, the for loop is skipped, and we return -1 immediately

Naive algorithm

  • NaiveStringMatch moves the pattern along the text string, until it finds a shift(offset) for which the characters in the pattern are equal to the characters in the next
  • The for loop considers each possible shift explicitly, start with a 0 shift
  • The while loop and conditional determines whether the current shift is valid

enter image description here

Complexity

worst case

NaiveStringMatch requires time O( (n-m+1) m) and this bound is tight in the worst case

  • n here is the length of the whole text(P), m here is the length of pattern
  • each shift requires at most m comparison to validate the match, and in the worst case we may have to check n-m+1 shifts to find a match or determine that there isn’t one
  • note that this is O(n^2) if m = n/2(which may be uncommon in practice)
    enter image description here

best case

the best case is when m>n, i.e. the pattern is longer than the text and is O(1)
if we assume that the algorithm is only called when there could be a shift, i.e., when m ≤ n, the best case running time depends on the variant of the algorithm

  • to return(only) the first match requires time O(m) in the best case
  • to return (all) matches requires time O(n) in the best case

Correctness

enter image description here

i.e., if we return s > -1, then there really is a match at s (each element of P matches the corresponding shifted element of T) and there is no shift s’ < s which is a match

so how can we prove the correctness?
We need to find two loop invariants(one for each loop) which allow us to prove the postcondition given the precondition
As we prove before, we should first prove inner loop then outer loop

Inner while

We should find the loop invariant of inner while loop:

  • At the start of each iteration of the inner while loop, the characters in the subarray P[1…j] match the corresponding characters in the subarray T[s…s+j]
  • the inner while loop invariant can be expressed formally as:
  • enter image description here

establishing invariants of while loop

invariant must be true:

  • at the start of every iteration of the loop
  • and when the loop terminates

Establishing initialization:

  • to establish initialization, we must show that the loop invariant holds before the first iteration of the loop, when j = 1;
  • when j = 1, the subarray P[j] containing a single character matches the subarray T[s + j] by the loop condition
  • so the loop invariant trivially holds prior to the first iteration of the loop

Establishing maintenance:

  • consider an iteration for some j
  • by the loop invariant, the characters in the subarray P[1…j-1] match the corresponding characters in the subarray T[s…s+j-1]
  • by the loop test, the subarray P[j] containing a single character matches the subarray T[s + j]
  • so at the top of the loop, the characters in the subarray P[1…j] match the corresponding characters in the subarray T[s…s+j]

outer for loop invariant

at the start of each iteration of the outer for loop, for all shifts 0<= s’ <= s, there is no match
the outer for loop invariant can be expressed formally as:
enter image description here

establishing invariants of for loops

  • in the case of for loops, when establishing initialization we check the loop invariant
  • after the initial assignment to the loop counter, and
  • before the first test of the loop condition
  • when establishing maintenance we check the loop invariant
  • after the increment of the loop counter, and
  • before the next test of the loop condition

Establishing initialization:

  • to establishing initialization, we must show that the loop invariant holds before the first iteration of the loop, when s = 0;
  • when s = 0, there are no shifts s’ such that 0 <= s’ <= s, so the loop invariant vacuously holds.

Establishing maintenance

  • we need to show that the code in the body of the outer for loop maintains the invariant of the outer loop as well as the invariant for the inner for loop
  • consider an iteration for some s
  • by the outer loop invariant, for all shifts 0 <= s’ < s there is no match
  • by the inner loop invariant, the characters in the subarray P[1…j] match the corresponding characters in the subarray T[s…s+j] and this invariant holds when the inner while loop terminates
  • when the inner loop terminates, we evaluate the conditional
  • if the conditional evaluates to true, the algorithm returns, so we need to consider termination as well
  • when the inner loop terminates, j ≤ P.length + 1
  • in the case in which j < P.length + 1 (B is false), P does not match T at shift s (since not all characters in P were matched) so the outer for loop invariant is maintained
  • in the case in which j = P.length + 1 (B is true), P does match T at shift s, and the algorithm terminates (returning s), so again the the outer for loop invariant is maintained

Termination

the inner while loop terminates after at most P.length iterations

  • it may terminate after fewer iterations if P[j] != T[s + j], but
    termination is guaranteed

    the outer for loop terminates after at most T.length - P.length iterations

  • it may terminate after fewer iterations due to the return statement, but gain termination is guaranteed

therefore the algorithm terminates, and the outer loop invariant ensures that the postcondition holds when it does

评论

此博客中的热门博文

[MLE] Linear Classification

[AIM] MetaHeuristics

[CS231] Neural Networks