San Diego’s 1st SuperHappyDevHouse (SDSHDH1)

July 4th, 2010

Update 7/11/2010: Coverage of the event from the Del Mar Times by Steve Perez!

San Diego’s 1st ever SuperHappyDevHouse was a blast and success! Special thanks to Erica and Richard for hosting the hackathon :) We had about 17-18 software + hardware folks (and one reporter! we have no idea how that happened). The venue was perfect, people brought snacks, drinks, lawn chairs, and we ordered pizza. Most of the attendees are not surprisingly, from the San Diego Hacker News meetup.

While the attendance was really great good for the SDSHDH1, I suspect that it would have been as much as 30% higher if the semester was in session as many of those who voiced interest are college students from the nearby UCSD. Below are some pictures and videos from the event. I’m already looking forward to the next one! :) Thanks to all who stopped by—”network effects” is key to having a fun SHDH ;)

SDSHDH1

SDSHDH1

SDSHDH1

A Simple N-gram Calculator: pyngram

May 20th, 2010
Updated v1.0.1 5/21/2010 – Improved the exception handling, and changed xrange(len(inputstring)) to xrange(len(inputstring)-nlen+1)). Thanks to colleague Arik Baratz!

Recently, as I was trying to solve a cryptogram, I wrote tool to parse the bigrams and trigrams from the ciphertext, tally the frequency, and then display the results sorted from most to least frequently occuring bigram and trigram.

First, a quick history of why I did this and how this was handy.

One of the ways to solve a substitution cipher is to do a frequency analysis. Here’s a typical distribution of letters in the English language. Just as it is obvious that the alphabet ‘e’ is by far the most popular in the English language, you can also calculate the most frequently occurring bigram (2 consecutive characters) and trigram (3 consecutive characters). In English, the top most frequently occurring bigrams are ‘th’ (1.52%), ‘he’ (1.28%), ‘in’ (0.94%) (full list from Wikipedia here). For trigrams, the most popular are ‘ th’ (note the leading whitespace), ‘he ‘ (trailing whitespace), followed by ‘the’ (full list here). The biggest assumption here is that the plaintext is in English. If it’s in say, German, then you’ll have to find the corresponding statistical distribution (Wikipedia has the 1-gram frequency distribution for other languages here).

Whatever the plaintext’s (human) language is, you’d have to find the top n-grams occurring the ciphertext first—and that’s what this calculator will do for you. You can import the python module and call the function calc_ngram, or just write it from your *nix command line.

Example usage from python shell:

>>> from pyngram import calc_ngram
>>> results = calc_ngram('bubble bobble, bubble bobble, bubble bobble', 3) # (inputstring, n-gram size)
>>> for l in results: print l[0] + ' occured ' + str(l[1]) + ' times'
...
bbl occured 6 times
ble occured 6 times
le  occured 3 times
obb occured 3 times
 bo occured 3 times
e b occured 3 times
ubb occured 3 times
bub occured 3 times
bob occured 3 times
, b occured 2 times
 bu occured 2 times
le, occured 2 times
e,  occured 2 times

You can install pyngram from cheeseshop, Python’s package index with

sudo pip install pyngram

For some strange reason, Perl’s CPAN had a few such utilities (just search for ngram, bigram, digram), but there wasn’t any for Python that I could find. Although CPAN’s offering on average looked more feature-rich, pyngram by comparison is more light-weight. It does one thing and one thing only, and it does it efficiently well.

Writing the calculator was actually the easiest part. Putting it together in a nice package for the pypi repository and making sure it works with pip was the most time consuming part! But it’s worth it, because now that I’ve been through the process once (whole topic on its own), I can easily do it again. Contributing a small module to open source gives me a small jolt of happiness :)

Here’s the source code. Enjoy!

#!/usr/bin/env python
"""
 A simple Python n-gram calculator.

 Given an arbitrary string, and the value of n as the size of the n-gram (int), this module
 will show you the results, sorted from most to least frequently occuring n-gram.

 The 'sort by value' operation for the dict follows the PEP 265 recommendation.

 Quick start:

 >>> from pyngram import calc_ngram

 method expects inputstring as 1st arg, size of n-gram as 2nd arg

 >>> calc_ngram('bubble bobble, bubble bobble, bubble bobble', 3)

 Or just run it from the command line prompt:
 user@host:~$ ./pyngram.py

 Enjoy!

 Jay Liew
 @jaysern

"""

__version__ = '1.0'
__author__ = 'Jay Liew' # @jaysern from @websenselabs
__license__ = 'MIT'

from operator import itemgetter

def calc_ngram(inputstring, nlen):
    if nlen < 1:
        raise ValueError, "Uh, n-grams have to be of size 1 or greater. Makes no sense to have a 0 length n-gram."

    if len(inputstring) < 1:
        raise ValueError, "umm yeah, ... the inputstring has to be longer than 1 char"

    # now, fish out the n-grams from the input string
    ngram_list = [inputstring[x:x+nlen] for x in xrange(len(inputstring)-nlen+1)]

    ngram_freq = {} # dict for storing results

    for n in ngram_list: # collect the distinct n-grams and count
        if n in ngram_freq:
            ngram_freq[n] += 1
        else:
            ngram_freq[n] = 1 # human counting numbers start at 1

    # set reverse = False to change order of sort (ascending/descending)
    return sorted(ngram_freq.iteritems(), key=itemgetter(1), reverse=True)

if __name__ == '__main__':
    inputstring = raw_input('Enter input string: ')
    nlen_str = raw_input('Enter size of n-gram (int): ')
    nlen = int(nlen_str) # cast string to int

    for t in calc_ngram(inputstring, nlen):
        print t[0] + ' occured ' + str(t[1]) + ' times'

This is a cross-posting from my company’s blog post here: http://community.websense.com/blogs/securitylabs/archive/2010/05/20/a-simple-n-gram-calculator-pyngram.aspx.

Idea –> Drawing –> Prototype –> Is this what I want to spend my life doing?

May 16th, 2010

One of Jack Dorsey’s key points, paraphrased:

Draw out your ideas, share it immediately, and get instant feedback on what works and what don’t. If it’s not working, then shelve it. Some elements of it might pop up later. How do you quickly move from idea –> drawing –> prototype –> to a position where you can say, “this is what I want to spend my life doing”. Or “something I want to put away for now so that I can draw out the next idea.”

Just 16 mins!

Entrepreneurial Thought Leaders: Marc Andreessen

May 16th, 2010

This is an awesome 1 hour video that I watched over and over just to make sure I absorbed all the points from the awesome serial entrepreneur himself—Marc Andreessen. If I had more time, I’d transcribe it.

My dev box reloaded! Lucid Lynx 10.04 LTS

May 12th, 2010

Notice the window’s close, maximize and minimize buttons. OS X inspired, eh?

First things first: apps! apps! apps!

* Google Chrome (+jQuery shell extension)
* TweetDeck

Ok, done with Desktop apps. Moving on to other housekeeping items.

* aptitude install emacs
* gnu screen (came out of the box)
* aptitude install irssi
* aptitude install synergy
* aptitude install openssh-server
* aptitude install htop
* aptitude install python-setuptools (this gives you python’s easy_install)
* easy_install pip (if you use aptitude to install ‘pip’, that’s perl’s pip)
* pip install virtualenvwrapper && virtualenv
* pip install fabric (for partial continuous deployment)
* aptitude install gitk
* aptitude install python-dev

I’m too tired to do the other stuff, but tomorrow, it will be:

* Apache
* Django
* MySQL, Redis, Mongo
* pip install python-mysql (and whatever else drivers)
* Set up SSH keys, .bashrc, .screenrc, .irssi,
..

Update 1:
* aptitude install flashplugin-nonfree (Adobe Flash plugin for Mozilla)
* OpenJDK Java 6 runtime (requirement for Eclipse)
* Eclipse IDE
* Firebug

Update 2:
* aptitude install build-essential
* aptitude install ipython
(click for bigger, must be signed in to Flickr)

Screenshot