Monday, October 13, 2014

Connet python to AWS -- boto

Boto is the python module to interface with Amazon Web Service. The docs list all AWS features work with Python 2.6 and 2.7. I start with S3 with python 2.7.

1. install boto (the easiest way is to use pip)

pip install boto

2. go to AWS and get access key and secret access key (set up both user and group)

3. run the example for testing

Before the run, change in the source code:
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

4. try basic s3 functions

from boto.s3.connection import S3Connection
from boto.s3.key import Key

#setup connection with AWS credentials

 #create bucket (you could think it as a cloud folder on S3)
b= s3.create_bucket('botobucket_1230')
  Note: bucket name has to be unique for all buckets on AWS (similar with url). That means you have to come up with a name that has not been taken by others.

 #create Key object that used to keep track of data stored in S3 (you could also think it as a filename)
k = Key(bucket)
k.key = 'testKey'

#write content from a string to a key (a file in a bucket)
k.set_contents_from_string('this is a test string')

#validate if a key exists in a bucket, if so, return the key object, otherwise return None
print b.get_key('testKey')
print b.get_key('testKey2')

#get all keys in the bucket
for k in b.list():
    print k

#copy content from / to a local file
source = 'path to localfile.txt'
source_size = os.stat(source).st_size # file size
print source, source_size
Note: for large data set, FileChunkIO module can help to chuck the original file into smaller segmentations. boto docs