Question: What do a philologist and a lollipop have in common?
Answer: LOL (if you don’t get it, you will LOL when you see it below)
The generalized problem statement
Given a few strings:
ewf3hardyharharoiew p90weuhardyharhar hardyharharoie78wjf ahardyharhar787834
Determine the longest substring that they all have in common:
p90weuhardyharhar
|||||||||||
ahardyharhar787834
|||||||||||
ewf3hardyharharoiew
|||||||||||
d1hardyharharoie78wjf
Which in this example above, is ‘hardyharhar’. The question is, how do you code this?
The solution
Today on Hug-An-Algorithm day, we’re giving the hat-tip to an algorithm called longest common substring (LCS).
The solution can be implemented using a generalized suffix tree, or by dynamic programming
(Wikipedia).
Specifically, I’ll walk you through a Python implementation—but focusing on the approach behind it, so that you can implement this in any language you want. I found a few LCS implementations on Wikibooks.org, which is where I got this simple Python implementation from (and adapted it slightly for this demo).
Technical nitty-gritty details warning! (If you’re interested in how this may be useful to you, skip to real-world applications section after it):
Select full screen + high quality plz:
Code from the video above for plugging into your python interpreter:
def LCS(S, T):
m = len(S); n = len(T)
L = [[0] * (n+1) for i in xrange(m+1)]
LCS_set = set()
longest = 0
for i in xrange(m):
for j in xrange(n):
if S[i] == T[j]:
v = L[i][j] + 1
L[i+1][j+1] = v
if v > longest:
longest = v
LCS_set = set()
if v == longest:
LCS_set.add(S[i-v+1:i+1])
print LCS_set
Real world application
As researchers, we often find ourselves swimming in a sea of seemingly random data, and we’re always looking for ways to make sense of this wealth of data, how we can obtain insights and the act accordingly.
Here’s how I use the LCS algorithm to find patterns in malicious links. Obviously, it can be applied to any kind of data outside the boundaries of security research. Hope you’ll find this tool handy in your problem-solving toolkit!
Security Researcher: Jay Liew
p.s. This blog post is dedicated to the folks in #python on Freenode.
This is a cross-post from my company’s blog.














