Clone wiki

boto / Home

Getting Started

The purpose of this repository is to collaborate on the design of the boto 2.0 library. The boto library provides a Python interface to the full suite of Amazon Web Services. It has been around for about three years and the current released version of the library is 1.9b. The current code base can be found at http://boto.googlecode.com/.

The boto 2.0 project is a refactoring and rethinking of boto. Here are some of the goals of this project, from my perspective:

  • Add support for cloud providers other than AWS. I don't anticipate trying to cover the breadth of services offered by the libcloud project but I would like to refactor the code in a way to make it possible, and hopefully easy, to add other services to boto.
  • Make it possible to more easily use boto in an asynchronous programming model. There are a lot of great async tools available in Python, such as Twisted, Tornado, Diesel, etc. I don't think that the base install of boto 2.0 will be built on any of those but I would like to refactor the code in a way that would allow boto to be integrated with async toolkits more easily.
  • A better unit test and system test framework and a more complete set of tests. This would probably involve unit tests against mocked endpoints as well as end-to-end tests against live service endpoints similar to the current boto test scripts.

There are lots of other things that I hope to see happen in boto 2.0 but those are tops on my list. The purpose of this wiki page is to describe the initial snapshot of code that I have checked in and to give the boto community a chance to participate in this design process.

Let's Try it Out

Okay, enough talk. Let's look at some code.

First of all, you should know that this initial snapshot of code is just a proof of concept. It implements a few methods for a single service but tries to show how it works from beginning to end. The refactoring so far has produced a code base that is dramatically different than the existing boto code. To me, it's an improvement but I really want your feedback on that.

Even though the 2.0 release will not be backward compatible with the 1.x releases, there is still a goal to try to not change everything and to be able to present an interface that will make it relatively easy to migrate from 1.x to 2.0. Some sort of automated tool would be ideal, actually.

So let's start with a few lines of Python code that should look pretty familiar to people using the current version of boto. Wiki pages are normal files, with the .wiki extension. You can edit them locally, as well as creating new ones.

>>> import boto
>>> c = boto.connect_ec2()
>>> rs = c.get_all_key_pairs()

At first glance, that looks identical to what you would do in the 1.x release. However, a little inspection shows that all is not as it seems.

>>> c
 <boto.services.aws.ec2.EC2 instance at 0x169db98>
>>>

So, rather than getting a EC2Connection object back from the call to boto.connect_ec2 I now get a EC2 object which is a type of service. Also, if we inspect the returned value from the call to get_all_instances we notice that it is not a ResultSet object as in the 1.x release but instead is a native Python data structure

>>> rs
{u'describe_key_pairs_response': {u'key_set': [{u'key_fingerprint': u'32:34:b4:dd:ce:19:bb:60:f7:5f:af:4d:84:20:37:48:dd:b0:53:28',
                                                u'key_name': u'key1'},
                                               {u'key_fingerprint': u'7c:cb:7d:23:0f:f2:56:8a:c8:6e:5b:98:77:7a:ad:5b:2a:8d:c4:59',
                                                u'key_name': u'key2'},
                                               {u'key_fingerprint': u'7a:f6:18:a7:d2:cb:77:fe:05:bc:39:b0:db:88:e5:d7:12:50:db:47',
                                                u'key_name': u'key3'}],
                                  u'request_id': u'6d885004-f107-489e-b4a0-0e0d8f0c2d4e'}}

>>>

This is a very deliberate change. One of the things that got ugly in boto was the use of specialized Python classes wrapping just about every kind of data structure that was returned by the service. That approach is nice in a way because it allows you to provide some convenience methods on the objects to make certain operations easier. The downside, though, is a proliferation of objects. As the API's for the different services evolve and grow, more and more objects are needed and it gets a bit overwhelming eventually. So, in this refactoring I have so far stayed on the other end of the extreme by writing a little bit of code that parses the XML responses sent by existing Amazon services and transforming them into native Python data structures. I suspect that some middle ground might be required but for now I'm going to keep things as simple as possible.

If we look at the get_all_key_pairs method of the EC2 service object, you can see quite a few changes here, as well.

    def get_all_key_pairs(self, keynames=None):
        """
        Get all key pairs associated with your account.

        :type keynames: list
        :param keynames: A list of the names of keypairs to retrieve.
                         If not provided, all key pairs will be returned.

        :rtype: list
        :return: A list of :class:`boto.ec2.keypair.KeyPair`
        """
        # build the Request object
        req = AWSQueryRequest('GET', 'DescribeKeyPairs', self)
        if keynames:
            self.build_list_params(params, keynames, 'KeyName')
        # now execute the request and get response
        resp = req.send()
        # now parse the XML results and return the data structure
        return resp.parse()

One of the things I've been trying to do with this refactoring is to cleanly separate the functionality. It is my hope that this separation will result in code that is cleaner to read but also more flexible in use. The high level abstractions are:

  • Services represent endpoints with API's that perform some function. So, EC2 would be a Service, as would S3, etc. The Service object knows about the name of the service, a description of the service, the endpoints for the service, what type of requests and responses it supports, etc.
  • Connections represent the network connection to the service. This is HTTPS for all services we know about now but could possible be some other transport. The Connection also knows how to deal with proxy servers and proxy authentication, encryption of the connection, basic authentication mechanisms that are inherent to the underlying transport (e.g. HTTP Basic Authentication, etc.)
  • Requests represent a request for some action to be performed by a service. At the lowest level it is an HTTP response but it's also responsible for certain types of authentication (such as request signing).
  • Responses are used to handle the information we receive from a Service after making a request. They are responsible for gathering the information (i.e. handling paging, etc.) and also for parsing the information received into a form that can be used in the application.

Updated