Rabin Karp Substring Matching

Hello World

2 min readOct 1, 2020

Just some background before getting into Rabin Karp

Substring Matching Problem:

Given string s of length N, pattern p of length M, find the index of p in s if possible, otherwise return -1

Naive Approach

For each position i in s, check if s[i:i+M] == p

Time complexity:

Best Case: O(N+M)

Worst Case: O(NM)

Rabin Karp

Time complexity:

Best Case: O(N+M)

Worst Case: O(NM)

Average Case: O(N+M)， therefore RabinKarp wins

Intuition:

Just like the naive approach, for each position i in s, check if s[i:i+M] == p.

But instead of doing the check with letter-by-letter comparison, we do the check by comparing hash(s[i:i+M]) and hash(p), if they match, then we do a letter-by-letter comparison to avoid false positive due to hash collision.

If letter-by-letter comparison works, then we find the pattern, otherwise do the same trick at next index i+1, until end of s.

The key why RabinKarp is faster: if we know hash(s[i:i+M]), then it takes no time to calculate hash(s[i+1:i+1+M]), such hash function is called rolling hash.

Figure from book Algorithms by Robert Sedgewick

So the above formula describes how we could calculate hash of M characters starting at position i. Here, R is the radix(e.g it’s 256 in our case). Q is any big prime number.

If we calculate hash using this formula, it would be slow since there are some exponential calculations. There is something called the Honor’s method that that does this in O(M):

How do we calculate hash(s[i+1:i+1+M]) given hash(s[i:i+M])?

Note that in the above picture, R^{M-1} is a constant that can be pre-computed in advance.

Here’s a RabinKarp implementation for copy-paster :)

Rabin Karp Substring Matching

Naive Approach

Rabin Karp

Intuition:

Written by Hello World

No responses yet