Reaching the pinnacle: truly open web services and clouds

Part 5 of the series, "What are the chances for a free software cloud?"

Previous section:

Why web services should be released as free software

Free software in the cloud isn’t just a nice-sounding ideal or
even an efficient way to push innovation forward. Opening the cloud
also opens the path to a bountiful environment of computing for
all. Here are the steps to a better computing future.

Provide choice

The first layer of benefits when companies release their source code
is incremental: incorporating bug fixes, promoting value-added
resellers, finding new staff among volunteer programmers. But a free
software cloud should go far beyond this.

Remember that web services can be run virtually now. When you log in
to a site to handle mail, CRM, or some other service, you may be
firing up a virtual service within a hardware cloud.

So web and cloud providers can set up a gallery of alternative
services, trading off various features or offering alternative
look-and-feel interfaces. Instead of just logging into a site such as
Salesforce.com and accepting whatever the administrators have put up
that day, users could choose from a menu, and perhaps even upload
their own preferred version of the service. The SaaS site would then
launch the chosen application in the cloud. Published APIs would allow
users on different software versions to work together.

If a developer outside the company creates a new version with
substantial enhancements, the company can offer it as an option. If
new features slow down performance, the company can allow clients to
decide whether the delays are worth it. To keep things simple for
casual clients, there will probably always be a default service, but
those who want alternatives can have them.

Vendors can provide “alpha” or test sites where people can try out new
versions created by the vendor or by outsiders. Like stand-alone
software, cloud software can move through different stages of testing
and verification.

And providing such sandboxes can also be helpful to developers in
general. A developer would no longer have to take the trouble to
download, install, and configure software on a local computer to do
development and testing. Just log into the sandbox and play.
Google offers
The Go Playground
to encourage students of their Go language. CiviCRM,
which is a free software server (not a cloud or web service) offers a
sandbox for testing new
features. A web service company in electronic health records,
Practice Fusion,
which issued an API challenge in September, is now creating a sandbox
for third-party developers to test the API functionality on its
platform. I would encourage web and cloud services to go even
farther: open their own source code and provide sandboxes for people
to rewrite and try out new versions.

Let’s take a moment for another possible benefit of running a
service as a virtual instance. Infected computer systems present a
serious danger to users (who can suffer from identity theft if their
personal data is scooped up) and other systems, which can be
victimized by denial-of-service attacks or infections of their own.
An awkward tower of authorizations reaching right down into the
firmware or hardware. In trusted computing, the computer itself checks
to make sure that a recognized and uncompromised operating system is
running at boot time. The operating system then validates each
application before launching it.

Trusted computing is Byzantine and overly controlling. The hardware
manufacturer gets to decide which operating system you use, and
through that which applications you use. Wouldn’t users prefer
to run cloud instances that are born anew each time they log in? That
would wipe out any infection and ensure a trusted environment at the
start of each session without cumbersome gatekeeping.

Loosen the bonds on data

As we’ve seen, one of the biggest fears keeping potential clients away
from web services and cloud computing is the risk entailed in leaving
their data in the hands of another company. Here it can get lost,
stolen, or misused for nefarious purposes.

But data doesn’t have to be stored on the computer where the
processing is done, or even at the same vendor. A user could fire up a
web or cloud service, submit a data source and data store, and keep
results in the data store. IaaS-style cloud computing involves
encrypted instances of operating systems, and if web services did the
same, users would automatically be protected from malicious
prying. There is still a potential privacy issue whenever a user runs
software on someone else’s server, because it could skim off private
data and give to a marketing firm or law enforcement.

Alert web service vendors such as Google know they have to assuage
user fears of locked-in data. In Google’s case, they created a
protocol called the Data Liberation Front (see an article by two
Google employees,

The Case Against Data Lock-in
). This will allow users to extract
their data in a format that makes it feasible to reconstitute it in
its original format on another system, but it doesn’t actually sever
the data from the service as I’m suggesting.

A careful client would store data in several places (to guard against
loss in case one has a disk failure or other catastrophe). The client
would then submit one location to the web service for processing, and
store the data back in all locations or store it in the original
source and then copy it later, after making sure it has not been
corrupted.

A liability issue remains when calculation and data are separated. If
the client experiences loss or corruption, was the web service or the
data storage service responsible? A ping-pong scenario could easily
develop, with the web services provider saying the data storage
service corrupted a disk sector, the data storage service saying the
web service produced incorrect output, and the confused client left
furious with no recourse.

This could perhaps be solved by a hash or digest, a very stable and
widely-used practice used to ensure that any change to the data, even
the flip of a single bit, produces a different output value. A digest
is a small number that represents a larger batch of data. Algorithms
that create digests are fast but generate output that’s reasonably
unguessable. Each time the same input is submitted to the algorithm,
it is guaranteed to generate the same digest, but any change to the
input (through purposeful fiddling or an inadvertent error) will
produce a different digest.

The web service could log each completed activity along with the
digest of the data it produces. The data service writes the data,
reads it back, and computes a new digest. Any discrepancy signals a
problem on the data service side, which it can fix by repeating the
write. In the future, if data is corrupted but has the original
digest, the client can blame the web service, because the web service
must have written corrupt data in the first place.

Sascha Meinrath, a wireless networking expert, would like to see
programs run both on local devices and in the cloud. Each
program could exploit the speed and security of the local device but
reach seamlessly back to remote resources when necessary, rather like
a microprocessor uses the local caches as much as possible and faults
back to main memory when needed. Such a dual arrangement would offer
flexibility, making it possible to continue work offline, keep
particularly sensitive data off the network, and let the user trade
off compute power for network usage on a case-by-case basis. (Wireless
use on a mobile device can also run down the battery real fast.)

Before concluding, I should touch on another trend that some
developers hope will free users from proprietary cloud services:
peer-to-peer systems. The concept behind peer-to-peer is appealing and
has been

gaining more attention recently
:
individuals run servers on their systems at home or work and serve up
the data they want. But there are hard to implement, for reasons I
laid out in two articles,

From P2P to Web Services: Addressing and Coordination
and

From P2P to Web Services: Trust
. Running your own
software is somewhat moot anyway, because you’re well advised to store
your data somewhere else in addition to your own system. So long as
you’re employing a back-up service to keep your data safe in case of
catastrophe, you might as well take advantage of other cloud services
as well.

I also don’t believe that individual site maintained by
individuals will remain the sources for important data, as the
peer-to-peer model postulates. Someone is going to mine that data and
aggregate it–just look at the proliferation of Twitter search
services. So even if users try to live the ideal of keeping control
over their data, and use distributed technologies like the
Diaspora project,
they will end up surrendering at least some control and data to a
service.

A sunny future for clouds and free software together

The architecture I’m suggesting for computing makes free software even
more accessible than the current practice of putting software on the
Internet where individuals have to download and install it. The cloud
can make free software as convenient as Gmail. In fact, for free
software that consumes a lot of resources, the cloud can open it up to
people who can’t afford powerful computers to run the software.

Web service offerings would migrate to my vision of a free software
cloud by splitting into several parts, any or all of them free
software. A host would simply provide the hardware and
scheduling for the rest of the parts. A guest or
appliance would contain the creative software implementing
the service. A sandbox with tools for compilation, debugging,
and source control would make it easy for developers to create new
versions of the guest. And data would represent the results
of the service’s calculations in a clearly documented
format. Customers would run the default guest, or select another guest
on the vendor’s site or from another developer. The guest would output
data in the standardized format, to be stored in a location of the
customer’s choice and resubmitted for the next run.

With cloud computing, the platform you’re on no longer becomes
important. The application is everything and the computer is (almost)
nothing. The application itself may also devolve into a variety of
mashed-up components created by different development teams and
communicating over well-defined APIs, a trend I suggested almost a
decade ago in an article titled

Applications, User Interfaces, and Servers in the Soup
.

The merger of free software with cloud and web services is a win-win.
The convenience of IaaS and PaaS opens up opportunities for
developers, whereas SaaS simplifies the use of software and extends its
reach. Opening the source code, in turn, makes the cloud more
appealing and more powerful. The transition will take a buy-in from
cloud and SaaS providers, a change in the software development
process, a stronger link between computational and data clouds, and
new conventions to be learned by clients of the services. Let’s get
the word out.

(I’d like to thank Don Marti for suggesting additional ideas for this
article, including the fear of creating a two-tier user society, the
chance to shatter the tyranny of IT departments, the poor quality of
source code created for web services, and the value of logging
information on user interaction. I would also like to thank Sascha
Meinrath for the idea of seamless computing for local devices and the
cloud, Anne Gentle for her idea about running test and production
systems in the same cloud, and Karl Fogel for several suggestions,
especially the value of usage statistics for programmers of web
services.)

tags: , , , , , , , ,