Rabin Karp Substring Matching

Hello World
2 min readOct 1, 2020

--

Just some background before getting into Rabin Karp

Substring Matching Problem:

Given string s of length N, pattern p of length M, find the index of p in s if possible, otherwise return -1

Naive Approach

For each position i in s, check if s[i:i+M] == p

Time complexity:

Best Case: O(N+M)

Worst Case: O(NM)

Rabin Karp

Time complexity:

Best Case: O(N+M)

Worst Case: O(NM)

Average Case: O(N+M), therefore RabinKarp wins

Intuition:

Just like the naive approach, for each position i in s, check if s[i:i+M] == p.

But instead of doing the check with letter-by-letter comparison, we do the check by comparing hash(s[i:i+M]) and hash(p), if they match, then we do a letter-by-letter comparison to avoid false positive due to hash collision.

If letter-by-letter comparison works, then we find the pattern, otherwise do the same trick at next index i+1, until end of s.

The key why RabinKarp is faster: if we know hash(s[i:i+M]), then it takes no time to calculate hash(s[i+1:i+1+M]), such hash function is called rolling hash.

Figure from book Algorithms by Robert Sedgewick

So the above formula describes how we could calculate hash of M characters starting at position i. Here, R is the radix(e.g it’s 256 in our case). Q is any big prime number.

If we calculate hash using this formula, it would be slow since there are some exponential calculations. There is something called the Honor’s method that that does this in O(M):

Figure from book Algorithms by Robert Sedgewick

How do we calculate hash(s[i+1:i+1+M]) given hash(s[i:i+M])?

Note that in the above picture, R^{M-1} is a constant that can be pre-computed in advance.

Here’s a RabinKarp implementation for copy-paster :)

--

--

Hello World
Hello World

Written by Hello World

Software engineer, interested in learning how things work. What things? algorithm, system design, tech, business, politics, human.

No responses yet