/ Published in: Python
This is a generalization of the "string contains substring" problem to (more) arbitrary types.
Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts:
Example usage (Sequence in Sequence):
>>> seq_in_seq([5,6], [4,'a',3,5,6])
3
>>> seq_in_seq([5,7], [4,'a',3,5,6])
-1 # or None, or whatever
I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/
It finds all the correct subsequences in a given sequence, and should be used as an iterator:
For larger ones, look at the Aho-Corasick algorithm.
Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts:
Example usage (Sequence in Sequence):
>>> seq_in_seq([5,6], [4,'a',3,5,6])
3
>>> seq_in_seq([5,7], [4,'a',3,5,6])
-1 # or None, or whatever
I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/
It finds all the correct subsequences in a given sequence, and should be used as an iterator:
For larger ones, look at the Aho-Corasick algorithm.
Expand |
Embed | Plain Text
Copy this code and paste it in your HTML
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s 3 >>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s (nothing) Here is another KMP implementation: def seq_in_seq(seq1,seq2): ''' Return the index where seq1 appears in seq2, or -1 if seq1 is not in seq2, using the Knuth-Morris-Pratt algorithm based heavily on code by Neale Pickett <[email protected]> found at: woozle.org/~neale/src/python/kmp.py >>> seq_in_seq(range(3),range(5)) 0 >>> seq_in_seq(range(3)[-1:],range(5)) 2 >>>seq_in_seq(range(6),range(5)) -1 ''' def compute_prefix_function(p): m = len(p) pi = [0] * m k = 0 for q in xrange(1, m): while k > 0 and p[k] != p[q]: k = pi[k - 1] if p[k] == p[q]: k = k + 1 pi[q] = k return pi t,p = list(tee(seq2)[0]), list(tee(seq1)[0]) m,n = len(p),len(t) pi = compute_prefix_function(p) q = 0 for i in range(n): while q > 0 and p[q] != t[i]: q = pi[q - 1] if p[q] == t[i]: q = q + 1 if q == m: return i - m + 1 return -1
URL: http://stackoverflow.com/questions/425604?sort=oldest#sort-top