Developed with solid distributed systems fundamentals from the start, Riak scales out to meet your needs. It also scales down easily, making development and prototyping easy.
Whatever size you need it to be
"Scalability" is an often-misused word on the Web. Scaling is not the same as being fast, and it is not the same as being large. To say meaningfully that something scales, you must do so in terms of some numerically measurable feature (throughput, latency, volume, anything you can describe with a single number) and some other numerically measurable resource that presumably costs you money (such as the number of computers being used). To say that something scales is to say that there is a relationship between that resource and that feature which remains constant regardless of how much you increase the numbers. This is practically useful because it gives you predictable cost ceilings when your needs grow. If there is no such useful resource/benefit relation, your system does not scale. (If turning the dial way up on your scaling factor causes some other aspect of your system to perform much worse, your system's scalability is questionable at best.)
One important way in which Riak scales is in the relationship between total storage capacity and the number of disk-containing servers in the cluster. If your cluster has a given total capacity of some number "C", and is comprised of "N" servers (which roughly equivalent amounts of disk in each) then you can double your capacity by doubling your number of servers -- no matter what the original values of C and N.
This is a huge deal, as it means that you don't have to keep packing more and more resources into a single master database, incurring greater and greater marginal expense the farther you go. Instead, when you need more capacity you just plug in another ordinary server and walk away.
Most storage systems (including both traditional RDBMS and some more recent document databases) do not scale in any meaningful way. While you can often add replicas to such systems for redundancy or load-balancing, such systems still require all of the data to fit on a single master. As soon as you run out of room to add disks to that master, you establish a maximum practical capacity.
Some systems achieve scaling through sharding, or dividing the data up into segments, each of which is owned by a separate master. This achieves some scaling, but typically at the cost of multiplying the points of failure that can bring everything crashing down. Between namenodes responsible for knowing which shard is which and the fact that each of these shards must now be treated with all of the care of the previous master, operational concerns can climb rapidly as such systems grow. The operational design of Riak does not have any of these drawbacks. Riak's scaling is done not by making more "master" locations, but by having no master at all.
Another essential and often ignored aspect of scaling is "scaling down". Systems that are designed to work in the large are often very inconvenient to set up, manage, and use in a very small environment such as a developer's laptop. If a system cannot run in a scaled-down and simple setup, it will be much harder for developers to explore the system and work with it to its fullest potential. Riak works well even when running on just a single node; developers can easily work against a local instance and have the same code work when deployed on a large cluster elsewhere.