[ACE] String matching 1
Definition of string matching problem
informal: given a text(sequence of characters) T of length n, and a pattern P(sequence of characters) of length m, determine if P occurs in T
Naive string matching algorithm
NaiveStringMatch(T, P)
for(s = 0; s < T.length - P.length; s++)
j = 1;
while(T[s+j] == P[j] and j <= P.length)
j++;
if(j == P.length+1)
return s;
return -1;
- note that the array started at index 1 instead of 0
- note that s is a shift(not an array index) and so counts from 0 not 1
- note also that if T.length < P.length, the for loop is skipped, and we return -1 immediately
Naive algorithm
- NaiveStringMatch moves the pattern along the text string, until it finds a shift(offset) for which the characters in the pattern are equal to the characters in the next
- The for loop considers each possible shift explicitly, start with a 0 shift
- The while loop and conditional determines whether the current shift is valid
Complexity
worst case
NaiveStringMatch requires time O( (n-m+1) m) and this bound is tight in the worst case
- n here is the length of the whole text(P), m here is the length of pattern
- each shift requires at most m comparison to validate the match, and in the worst case we may have to check n-m+1 shifts to find a match or determine that there isn’t one
- note that this is O(n^2) if m = n/2(which may be uncommon in practice)
best case
the best case is when m>n, i.e. the pattern is longer than the text and is O(1)
if we assume that the algorithm is only called when there could be a shift, i.e., when m ≤ n, the best case running time depends on the variant of the algorithm
- to return(only) the first match requires time O(m) in the best case
- to return (all) matches requires time O(n) in the best case
Correctness
i.e., if we return s > -1, then there really is a match at s (each element of P matches the corresponding shifted element of T) and there is no shift s’ < s which is a match
so how can we prove the correctness?
We need to find two loop invariants(one for each loop) which allow us to prove the postcondition given the precondition
As we prove before, we should first prove inner loop then outer loop
Inner while
We should find the loop invariant of inner while loop:
- At the start of each iteration of the inner while loop, the characters in the subarray P[1…j] match the corresponding characters in the subarray T[s…s+j]
- the inner while loop invariant can be expressed formally as:
establishing invariants of while loop
invariant must be true:
- at the start of every iteration of the loop
- and when the loop terminates
Establishing initialization:
- to establish initialization, we must show that the loop invariant holds before the first iteration of the loop, when j = 1;
- when j = 1, the subarray P[j] containing a single character matches the subarray T[s + j] by the loop condition
- so the loop invariant trivially holds prior to the first iteration of the loop
Establishing maintenance:
- consider an iteration for some j
- by the loop invariant, the characters in the subarray P[1…j-1] match the corresponding characters in the subarray T[s…s+j-1]
- by the loop test, the subarray P[j] containing a single character matches the subarray T[s + j]
- so at the top of the loop, the characters in the subarray P[1…j] match the corresponding characters in the subarray T[s…s+j]
outer for loop invariant
at the start of each iteration of the outer for loop, for all shifts 0<= s’ <= s, there is no match
the outer for loop invariant can be expressed formally as:
establishing invariants of for loops
- in the case of for loops, when establishing initialization we check the loop invariant
- after the initial assignment to the loop counter, and
- before the first test of the loop condition
- when establishing maintenance we check the loop invariant
- after the increment of the loop counter, and
- before the next test of the loop condition
Establishing initialization:
- to establishing initialization, we must show that the loop invariant holds before the first iteration of the loop, when s = 0;
- when s = 0, there are no shifts s’ such that 0 <= s’ <= s, so the loop invariant vacuously holds.
Establishing maintenance
- we need to show that the code in the body of the outer for loop maintains the invariant of the outer loop as well as the invariant for the inner for loop
- consider an iteration for some s
- by the outer loop invariant, for all shifts 0 <= s’ < s there is no match
- by the inner loop invariant, the characters in the subarray P[1…j] match the corresponding characters in the subarray T[s…s+j] and this invariant holds when the inner while loop terminates
- when the inner loop terminates, we evaluate the conditional
- if the conditional evaluates to true, the algorithm returns, so we need to consider termination as well
- when the inner loop terminates, j ≤ P.length + 1
- in the case in which j < P.length + 1 (B is false), P does not match T at shift s (since not all characters in P were matched) so the outer for loop invariant is maintained
- in the case in which j = P.length + 1 (B is true), P does match T at shift s, and the algorithm terminates (returning s), so again the the outer for loop invariant is maintained
Termination
the inner while loop terminates after at most P.length iterations
it may terminate after fewer iterations if P[j] != T[s + j], but
termination is guaranteedthe outer for loop terminates after at most T.length - P.length iterations
it may terminate after fewer iterations due to the return statement, but gain termination is guaranteed
therefore the algorithm terminates, and the outer loop invariant ensures that the postcondition holds when it does
评论
发表评论