Archive for the ‘coding’ Category

A LAMP guy’s n00b quick start to Amazon Web Services

Sunday, January 16th, 2011

If you haven’t dabbled with AWS before and are impatient but want to kick off a free instance to play around with it, here’s a perspective and lessons learned from a LAMP + Django guy who had no prior experience. Caution, this is just enough for you to wrap your head around the major concepts and get your first hello world instance off the ground. Beginning is always the hardest, I hope this will help lift you off the ground as I document the things I found out the time-consuming way.

EC2 instance is like a VM – but without a local hard disk. Instead, the VM uses a highly reliable external drive (EBS). Think of EBS as a really big USB drive, and just like a USB stick, it’s a raw device that needs to be formatted first. The obvious upside of this setup is you can use a low-powered VM (to save money), and when your site gets popular and you need to scale up – just quickly turn off that low-powered VM (the EC2 instance), fire up a beefier EC2 instance and attach to it the same EBS. And just like that, you’ve got a beefier machine running to handle the load.

Note: The EC2 instance just described above is what is known as an “EBS-backed” instance (the root device is an EBS volume, which is where the OS boots from), as opposed to the other option “instance store” (a.k.a. S3-backed). The latter option is for advanced use – not the focus of this primer ;) So when you’re firing up your hello-world instance, choose EBS-backed, not instance store. Similarly, ignore the AMI creation process, that’s advanced stuff you can visit later. Use one of the already existing AMI’s – like the ones provided by Ubuntu. My example uses Lucid Lynx.

Statistically, EBS is also actually more reliable than a local physical hard drive. However, you’d probably want to take snapshots of the volume (how confident are you that your data is safe on a USB stick?) Snapshots are stored in S3, which is even safer because it’s stored around the world (in case one continent gets wiped off the planet). Subsequent snapshots are also incremental deltas from previous snapshots, thus you save space. Although you’re fine if you just want to keep just daily 1 snapshot (unless you have a reason to want many daily snapshots).

Ok, at this point in the post, you should go ahead and follow AWS’s guide to firing up your 1st Ubuntu instance (remember, you can ignore the AMI creation stuff for now, and choose the 8GB EBS-backed instance – especially if you want to use the 1-year free micro instance). I’ll wait.

I’m still waiting.

Ok, at this point you should have it fired up and running, and be able to ssh into it. I ssh into mine with: ssh -i username.pem ubuntu@ec2-111-222-333-444.us-west-1.compute.amazonaws.com

Now on to some basic first time house-keeping items, best practice stuff.

The Delete On Termination flag on your EBS volume

On your new EBS-backed instance, that EBS volume has a flag called the “delete on termination” flag, which default to true. You’d probably want to set it to false. If it’s set to true, that means EBS root volume is deleted when you terminate the instance. Instances can be started, stopped, and terminated. “Stop” means you can start it back later. “Terminate” means delete, and you can’t un-delete. The following example uses Amazon’s EC2 API tools. Be sure to download your X509 certificate and private key (these are both files you download from your Amazon account). I’m using OS X 10.5, so change accordingly. Be sure to set your Java home too.

OSX-LEOPARD:bin jliew$ export EC2_HOME=/Users/jliew/Desktop/ec2-api-tools-1.3-62308/
OSX-LEOPARD:bin jliew$ export EC2_PRIVATE_KEY=/Users/jliew/pk-q3ef98zHSDg872hGTQpoX.pem
OSX-LEOPARD:bin jliew$ export EC2_CERT=/Users/jliew/cert-SDkhIzWU3HDqS83sXdsefh.pem
OSX-LEOPARD:bin jliew$ export JAVA_HOME=/Library/Java/Home

Once that’s set up, you’re ready to go. Don’t forget to pass it the region (mine is us-west-1), your instance’s id, and volume id (and volume mount point):

OSX-LEOPARD:bin jliew$ ./ec2-modify-instance-attribute -b /dev/sda1=vol-wuh5da87:false i-ppo983x1 --region=us-west-1
Unexpected error:
java.lang.ClassCastException: com.amazon.aes.webservices.client.InstanceBlockDeviceMappingDescription
	at com.amazon.aes.webservices.client.cmd.Outputter.outputInstanceAttribute(Outputter.java:664)
	at com.amazon.aes.webservices.client.cmd.ModifyInstanceAttribute.invokeOnline(ModifyInstanceAttribute.java:149)
	at com.amazon.aes.webservices.client.cmd.BaseCmd.invoke(BaseCmd.java:795)
	at com.amazon.aes.webservices.client.cmd.ModifyInstanceAttribute.main(ModifyInstanceAttribute.java:269)
OSX-LEOPARD:bin jliew$ ./ec2-modify-instance-attribute --region=us-west-1 -b /dev/sda1=vol-wuh5da87:false i-ppo983x1

Noticed how it actually crashed with an error? Turns out, other people faced this problem too, but the command actually succeeded. To verify it succeeded, run:

OSX-LEOPARD:bin jliew$ ./ec2-describe-instance-attribute --region=us-west-1 --block-device-mapping -v i-ppo983x1

and you should notice this that somewhere within the output is a line that looks like this: <deleteOnTermination>false</deleteOnTermination> (more…)

redis unknown command

Thursday, October 21st, 2010

Short story: If you installed redis (or more specifically redis-server) on Ubuntu, and it’s largely working except for a few commands where it errors out with an “unknown command”, it’s probably you are using an old version of redis; and the fact that Ubuntu’s package naming convention using epochs is misleading does not help.

Long story: I’m using redis-py and I’m doing all sorts of operations which work, except for a few sorted set operations like zrank.

>>> r.zrank('a','b')
Traceback (most recent call last):
File "", line 1, in
*snip*
File "/usr/local/lib/python2.6/dist-packages/redis/client.py", line 349, in _parse_response
raise ResponseError(response)
redis.exceptions.ResponseError: unknown command 'ZRANK'
>>> r.zrank
<bound method Redis.zrank of <redis.client.Redis object at 0xb7819c0c>>
>>>

(more…)

IE6 effect in HTML5 – How It Works [updated]

Thursday, October 7th, 2010
ie6 html effect

Quick update: Thanks for the endorsement Mr. doob! I’m honored :)

Here is a web site with a cool Javascript effect by Mr. doob, recently posted on HN & Reddit. I’m trying to get better at my Javascript so here’s my dissection of this interesting effect, if you’re interested in learning how this is done.

First, create a HTML 5 canvas element (highly recommended short read about canvas here). Make its width and height the same size as window.innerWidth and window.innerHeight so that it fills up the content area of the browser window. Append the canvas element to the document body.

var canvas = document.createElement( 'canvas' );
canvas.width = window.innerWidth;
canvas.height = window.innerHeight;
canvas.style.display = 'block';
document.body.appendChild( canvas );

As of now, there’s only a 2D context to pick from. In future, there might be a 3D context based on OpenGL ES (quote). So just get the context with getContext, and then create the image element.

var context = canvas.getContext( '2d' );
var image = document.createElement( 'img' );

Now let’s add an event handler to the image for when the image loads. ‘this’ refers to the image element itself. bitmapWidthHalf and bitmapHeightHalf is exactly what it means: half the length of the image’s width and height respectively. Math.floor is used to round the result of the division down to the nearest integer.
(more…)

A Simple N-gram Calculator: pyngram

Thursday, May 20th, 2010
Updated v1.0.1 5/21/2010 – Improved the exception handling, and changed xrange(len(inputstring)) to xrange(len(inputstring)-nlen+1)). Thanks to colleague Arik Baratz!

Recently, as I was trying to solve a cryptogram, I wrote tool to parse the bigrams and trigrams from the ciphertext, tally the frequency, and then display the results sorted from most to least frequently occuring bigram and trigram.

First, a quick history of why I did this and how this was handy.

One of the ways to solve a substitution cipher is to do a frequency analysis. Here’s a typical distribution of letters in the English language. Just as it is obvious that the alphabet ‘e’ is by far the most popular in the English language, you can also calculate the most frequently occurring bigram (2 consecutive characters) and trigram (3 consecutive characters). In English, the top most frequently occurring bigrams are ‘th’ (1.52%), ‘he’ (1.28%), ‘in’ (0.94%) (full list from Wikipedia here). For trigrams, the most popular are ‘ th’ (note the leading whitespace), ‘he ‘ (trailing whitespace), followed by ‘the’ (full list here). The biggest assumption here is that the plaintext is in English. If it’s in say, German, then you’ll have to find the corresponding statistical distribution (Wikipedia has the 1-gram frequency distribution for other languages here).

Whatever the plaintext’s (human) language is, you’d have to find the top n-grams occurring the ciphertext first—and that’s what this calculator will do for you. You can import the python module and call the function calc_ngram, or just write it from your *nix command line.

Example usage from python shell:

>>> from pyngram import calc_ngram
>>> results = calc_ngram('bubble bobble, bubble bobble, bubble bobble', 3) # (inputstring, n-gram size)
>>> for l in results: print l[0] + ' occured ' + str(l[1]) + ' times'
...
bbl occured 6 times
ble occured 6 times
le  occured 3 times
obb occured 3 times
 bo occured 3 times
e b occured 3 times
ubb occured 3 times
bub occured 3 times
bob occured 3 times
, b occured 2 times
 bu occured 2 times
le, occured 2 times
e,  occured 2 times

You can install pyngram from cheeseshop, Python’s package index with

sudo pip install pyngram

For some strange reason, Perl’s CPAN had a few such utilities (just search for ngram, bigram, digram), but there wasn’t any for Python that I could find. Although CPAN’s offering on average looked more feature-rich, pyngram by comparison is more light-weight. It does one thing and one thing only, and it does it efficiently well.

Writing the calculator was actually the easiest part. Putting it together in a nice package for the pypi repository and making sure it works with pip was the most time consuming part! But it’s worth it, because now that I’ve been through the process once (whole topic on its own), I can easily do it again. Contributing a small module to open source gives me a small jolt of happiness :)

Here’s the source code. Enjoy!

#!/usr/bin/env python
"""
 A simple Python n-gram calculator.

 Given an arbitrary string, and the value of n as the size of the n-gram (int), this module
 will show you the results, sorted from most to least frequently occuring n-gram.

 The 'sort by value' operation for the dict follows the PEP 265 recommendation.

 Quick start:

 >>> from pyngram import calc_ngram

 method expects inputstring as 1st arg, size of n-gram as 2nd arg

 >>> calc_ngram('bubble bobble, bubble bobble, bubble bobble', 3)

 Or just run it from the command line prompt:
 user@host:~$ ./pyngram.py

 Enjoy!

 Jay Liew
 @jaysern

"""

__version__ = '1.0'
__author__ = 'Jay Liew' # @jaysern from @websenselabs
__license__ = 'MIT'

from operator import itemgetter

def calc_ngram(inputstring, nlen):
    if nlen < 1:
        raise ValueError, "Uh, n-grams have to be of size 1 or greater. Makes no sense to have a 0 length n-gram."

    if len(inputstring) < 1:
        raise ValueError, "umm yeah, ... the inputstring has to be longer than 1 char"

    # now, fish out the n-grams from the input string
    ngram_list = [inputstring[x:x+nlen] for x in xrange(len(inputstring)-nlen+1)]

    ngram_freq = {} # dict for storing results

    for n in ngram_list: # collect the distinct n-grams and count
        if n in ngram_freq:
            ngram_freq[n] += 1
        else:
            ngram_freq[n] = 1 # human counting numbers start at 1

    # set reverse = False to change order of sort (ascending/descending)
    return sorted(ngram_freq.iteritems(), key=itemgetter(1), reverse=True)

if __name__ == '__main__':
    inputstring = raw_input('Enter input string: ')
    nlen_str = raw_input('Enter size of n-gram (int): ')
    nlen = int(nlen_str) # cast string to int

    for t in calc_ngram(inputstring, nlen):
        print t[0] + ' occured ' + str(t[1]) + ' times'

This is a cross-posting from my company’s blog post here: http://community.websense.com/blogs/securitylabs/archive/2010/05/20/a-simple-n-gram-calculator-pyngram.aspx.

locale.Error: unsupported locale setting?

Thursday, April 22nd, 2010

Is your Python code crashing over some kind of locale setting?

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/tweepy/binder.py", line 178, in _call
    return method.execute()
  File "/usr/local/lib/python2.6/dist-packages/tweepy/binder.py", line 164, in execute
    result = self.api.parser.parse(self, resp.read())
  File "/usr/local/lib/python2.6/dist-packages/tweepy/parsers.py", line 72, in parse
    result = model.parse_list(method.api, json)
  File "/usr/local/lib/python2.6/dist-packages/tweepy/models.py", line 35, in parse_list
    results.append(cls.parse(api, obj))
  File "/usr/local/lib/python2.6/dist-packages/tweepy/models.py", line 50, in parse
    setattr(status, k, parse_datetime(v))
  File "/usr/local/lib/python2.6/dist-packages/tweepy/utils.py", line 20, in parse_datetime
    locale.setlocale(locale.LC_TIME, '')
  File "/usr/lib/python2.6/locale.py", line 513, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

Or when using aptitude or apt-get or dpkg-reconfigure, you also get some LANGUAGE or LC_ALL unset error?

jaysern@jaysern:/home/jaysern# dpkg-reconfigure localeconf
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
      LANGUAGE = (unset),
      LC_ALL = (unset),
      LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
Package `localeconf' is not installed and no info is available.
Use dpkg --info (= dpkg-deb --info) to examine archive files,
and dpkg --contents (= dpkg-deb --contents) to list their contents.

If you tried this in your python code,

locale.setlocale(locale.LC_ALL, '')

or

locale.setlocale(locale.LC_TIME, '')

does it fail too?

If so, try

aptitude install language-pack-en

and see if that does the trick for you (credit). I spent a few hours trying to hunt down a bug in python – turns out the solution was just a one-liner. I hope this saves someone else the headache!