A LAMP guy’s n00b quick start to Amazon Web Services

If you haven’t dabbled with AWS before and are impatient but want to kick off a free instance to play around with it, here’s a perspective and lessons learned from a LAMP + Django guy who had no prior experience. Caution, this is just enough for you to wrap your head around the major concepts and get your first hello world instance off the ground. Beginning is always the hardest, I hope this will help lift you off the ground as I document the things I found out the time-consuming way.

EC2 instance is like a VM – but without a local hard disk. Instead, the VM uses a highly reliable external drive (EBS). Think of EBS as a really big USB drive, and just like a USB stick, it’s a raw device that needs to be formatted first. The obvious upside of this setup is you can use a low-powered VM (to save money), and when your site gets popular and you need to scale up – just quickly turn off that low-powered VM (the EC2 instance), fire up a beefier EC2 instance and attach to it the same EBS. And just like that, you’ve got a beefier machine running to handle the load.

Note: The EC2 instance just described above is what is known as an “EBS-backed” instance (the root device is an EBS volume, which is where the OS boots from), as opposed to the other option “instance store” (a.k.a. S3-backed). The latter option is for advanced use – not the focus of this primer ;) So when you’re firing up your hello-world instance, choose EBS-backed, not instance store. Similarly, ignore the AMI creation process, that’s advanced stuff you can visit later. Use one of the already existing AMI’s – like the ones provided by Ubuntu. My example uses Lucid Lynx.

Statistically, EBS is also actually more reliable than a local physical hard drive. However, you’d probably want to take snapshots of the volume (how confident are you that your data is safe on a USB stick?) Snapshots are stored in S3, which is even safer because it’s stored around the world (in case one continent gets wiped off the planet). Subsequent snapshots are also incremental deltas from previous snapshots, thus you save space. Although you’re fine if you just want to keep just daily 1 snapshot (unless you have a reason to want many daily snapshots).

Ok, at this point in the post, you should go ahead and follow AWS’s guide to firing up your 1st Ubuntu instance (remember, you can ignore the AMI creation stuff for now, and choose the 8GB EBS-backed instance – especially if you want to use the 1-year free micro instance). I’ll wait.

I’m still waiting.

Ok, at this point you should have it fired up and running, and be able to ssh into it. I ssh into mine with: ssh -i username.pem ubuntu@ec2-111-222-333-444.us-west-1.compute.amazonaws.com

Now on to some basic first time house-keeping items, best practice stuff.

The Delete On Termination flag on your EBS volume

On your new EBS-backed instance, that EBS volume has a flag called the “delete on termination” flag, which default to true. You’d probably want to set it to false. If it’s set to true, that means EBS root volume is deleted when you terminate the instance. Instances can be started, stopped, and terminated. “Stop” means you can start it back later. “Terminate” means delete, and you can’t un-delete. The following example uses Amazon’s EC2 API tools. Be sure to download your X509 certificate and private key (these are both files you download from your Amazon account). I’m using OS X 10.5, so change accordingly. Be sure to set your Java home too.

OSX-LEOPARD:bin jliew$ export EC2_HOME=/Users/jliew/Desktop/ec2-api-tools-1.3-62308/
OSX-LEOPARD:bin jliew$ export EC2_PRIVATE_KEY=/Users/jliew/pk-q3ef98zHSDg872hGTQpoX.pem
OSX-LEOPARD:bin jliew$ export EC2_CERT=/Users/jliew/cert-SDkhIzWU3HDqS83sXdsefh.pem
OSX-LEOPARD:bin jliew$ export JAVA_HOME=/Library/Java/Home

Once that’s set up, you’re ready to go. Don’t forget to pass it the region (mine is us-west-1), your instance’s id, and volume id (and volume mount point):

OSX-LEOPARD:bin jliew$ ./ec2-modify-instance-attribute -b /dev/sda1=vol-wuh5da87:false i-ppo983x1 --region=us-west-1
Unexpected error:
java.lang.ClassCastException: com.amazon.aes.webservices.client.InstanceBlockDeviceMappingDescription
	at com.amazon.aes.webservices.client.cmd.Outputter.outputInstanceAttribute(Outputter.java:664)
	at com.amazon.aes.webservices.client.cmd.ModifyInstanceAttribute.invokeOnline(ModifyInstanceAttribute.java:149)
	at com.amazon.aes.webservices.client.cmd.BaseCmd.invoke(BaseCmd.java:795)
	at com.amazon.aes.webservices.client.cmd.ModifyInstanceAttribute.main(ModifyInstanceAttribute.java:269)
OSX-LEOPARD:bin jliew$ ./ec2-modify-instance-attribute --region=us-west-1 -b /dev/sda1=vol-wuh5da87:false i-ppo983x1

Noticed how it actually crashed with an error? Turns out, other people faced this problem too, but the command actually succeeded. To verify it succeeded, run:

OSX-LEOPARD:bin jliew$ ./ec2-describe-instance-attribute --region=us-west-1 --block-device-mapping -v i-ppo983x1

and you should notice this that somewhere within the output is a line that looks like this: <deleteOnTermination>false</deleteOnTermination>

Disable termination of instance via API

Self explanatory, another good safety measure. You could do this using the tools from Amazon like the example right above, but the following example uses a Python wrapper to EC2′s API called boto – for the sake of learning! :) There’s also a good post about using python + boto from AWS here.

First, get your AWS key and secret, and put them in your ~user home dir in a file named .boto that looks like this:

[Credentials]
aws_access_key_id = YOur-KEY-heRE
aws_secret_access_key = yoUr-sEcREt-hERe

Then proceed (previously a bug but I’ve reported and it was fixed quickly!):

>>> import boto.ec2
>>> regions = boto.ec2.regions()
>>> regions
[RegionInfo:eu-west-1, RegionInfo:us-east-1, RegionInfo:us-west-1, RegionInfo:ap-southeast-1]
>>> usw = regions[2]
>>> conn = usw.connect()
>>> reservations = conn.get_all_instances()
>>> reservations
[Reservation:r-cfad1b8b]
>>> r1 = reservations[0]
>>> for i in r1.instances: print i
...
Instance:i-ppo983x1
>>> instance = r1.instances[0]
>>> instance.get_attribute('disableApiTermination')
{u'instanceId': None, u'disableApiTermination': u'false', u'requestId': None, u'DescribeInstanceAttributeResponse': u'true'}
>>> instance.modify_attribute('disableApiTermination', True)
True
>>> instance.get_attribute('disableApiTermination')
{u'instanceId': None, u'disableApiTermination': u'true', u'requestId': None, u'DescribeInstanceAttributeResponse': u'true'}

Now, on to some questions you might have.

How many snapshots do I need to keep?

I hear ya! More snapshots means more charges. It’s interesting to note that if you took the first snapshot (which would be a complete binary copy of the volume), then take a second snapshot (which would be only the bits of data that changed from the first snapshot), you could actually then delete the first snapshot and still be able to completely reload the entire volume from your second snapshot! Apparently behind the scenes, transparent to you, AWS does a forward merge before you delete the 1st snapshot. A tip from the smart guys at Skydera! So you can just keep 1 snapshot of each volume if you want.

Why are people adding multiple XFS volumes? Seems like more hassle, no?

One thing I learned, is that one of the shifts in thinking when you’re using AWS is this: it’s generally a best practice to have a separate volume for each application. For instance, it is highly recommended that MySQL’s data actually live on a separate XFS partition (Ubuntu’s filesystem defaults to ext3). Meaning, your server will have 2 attached EBS volumes: the first EBS vol (ext3) is where it boots Ubuntu from, and where the MySQL server lives and runs from. The second EBS vol (XFS) is where the MySQL data such as databases you create actually live (not in the first vol!)

There’s a good howto on how to tell MySQL to save it’s data on this XFS vol, and how to create it here by Eric Hammond. I used it too. Definitely read it, it also discusses how MySQL operations should be flushed before taking a snapshot. It also discuss how to reload a volume from a snapshot.

Update Nov 2011: I revisited these steps with an EBS backed instance running Ubuntu 11.10 (Oneiric Ocelot), and ran into an issue on the step as explained in the AWS article to mount the XFS partition. Turns out, some kernels use /dev/xvd* instead of /dev/sd* (credit), so where the article said to sudo “mkfs.xfs /dev/sdh” try “sudo mkfs.xfs /dev/xvdh”

Since you’re being charged by size and not by number of volumes created, there’s no reason not to segment your data. So for instance, if I also had Postgres running, I’d add yet another new XFS volume and have it’s data live there. If one volume corrupts/crash, the damage is isolated to only that one application. Also, the reason why we use XFS is because you can suspend pending I/O before taking a snapshot, for the sake of consistency when taking a snapshot (so that it doesn’t get corrupt by accident). This is just a good safety measure, I’ve heard of people just taking a snapshot of their instance’s ext3 with everything installed on the same volume and they didn’t have problems. You could do that too if you really wanted to, it’s a trade-off.

Static IP address?

By default, the instance does not get a static IP address. You can use what they call an Elastic IP, which is basically them giving you an IP address, which you can then assign to any instance you like. Please also do note that in the free tier, they charge you when you own an Elastic IP but do not use it by assigning it to an instance! A gotcha I found out by paying for it. So just assign the Elastic IP if you get one, even if you’re not using the instance. You can then buy a domain and map the domain to the IP, so that way you can just ssh to yourdomain.com, rather than type ec2-blah-blah-region-compute-aws-blah.com

Think of an Elastic IP as being “decoupled” from the instance. You could have 1 instance running, and if it crashes, you could fire up a second instance and re-map your Elastic IP to the new second instance, in hopes that the outside world don’t notice an interruption in service while you do a post-mortem on the 1st instance that crashed.

Final thoughts

There are other things like Elastic Load Balancing where you can re-direct traffic from “sick” underperforming instances to remaining “healthy” instances if you have a cluster of the running and stuff, but that’s all advanced stuff.

The biggest thing I’ve taken away from trying to bum some free web hosting from the free micro instance plan is this: If you’re looking for just a standalone server with a LAMP stack for some basic web apps you’re trying out for fun – you’re better off using prgmr.com or Linode. Only use AWS if you’re have some non-trivial process (more involving than just a plain old vanilla LAMP stack), or else you’re not really capitalizing on the benefits of using this “cloud computing” by AWS anyway.

What I mean by that is perfectly illustrated in this chart below. You’re only leveraging AWS’s power if you have a process where you queue work request, then have worker nodes that constantly grab the work requests, process them, then get the next request. Architected this way, you can actually scale up and down according to demand.

You can imagine scaling web apps using some kind of process as architected below – but when you’re first starting out, you got no traffic anyway and a standard LAMP stack is one thing less to debug. You’re probably facing a market risk, not a technology risk. So work on actually getting traction first, then when you’ve hit it big and are facing growing pains – which is a great problem to have – consider using AWS.

Or, you can go ahead and do what I did. Free 1 year hosting + you get to learn how to use AWS :) Recommended reading: ‘Host Your Web Site In The Cloud: Amazon Web Services Made Easy’ by Jeff Barr

Tags: , , , ,