July 2008

Forms on the Web and the Missing Stubs

I have changed my mind a lot over the years about web development. I think I have reached another one of those points of inflection, thanks to incredibly bright folks like Simon Stewart, Dan Worthington-Bodart, Jim Webber and George Malamidis. Unfortunately, it took me a lot longer than they did to figure this out, but at least I’m writing about it :)

About a month ago, a recent trip to the Brazilian Consulate General to renew my passport made a few things click. We talk a lot about forms on the web, but it’s really rare I get to fill in a form in real life. It’s a very different experience, and while at the same time it’s somewhat painful in some respects, there are lessons to be learned.

The process goes like this: you queue up to the first booth, and an attendant asks you about what service you require and gives you a coloured piece of paper with a number and a form to fill in. They call the number on that stub when it’s your turn to be seen. When called, you present the stub, form and any necessary supporting documentation to another attendant, who gives it a good check and tells you to go over there to pay a fee. Again, you get called by the number, pay the fee, come back and the attendant checks the receipt. She then decides that your application should be processed and staples the stub to another receipt and tell you to come back in a few days. When you get back, you present the stub, and they hand you the passports.

That tiny little piece of paper is the essential thing we’ve missed on the web. As an example, I’ll use what Rails and Merb generate in the RESTful scaffolding. In this case, you get the magic 8 CRUD actions:

  • index
  • new
  • create
  • show
  • edit
  • update
  • delete
  • destroy

I’m really interested in new, here. Digging a little deeper, you’ll see:

Looks reasonable. Let’s try it out:

This would be the equivalent of being handed out a form to fill in in real life… but all I got was the form—where’s the stub? How is the application on the other side going to know I’m talking about the same interaction?

You could argue that that’s the exact reason why the cookie is there, but the cookie doesn’t represent this particular interaction. It represents my browser’s (or other HTTP agent’s) interaction with the whole app. In real life, I couldn’t use the same stub to also fill in my tax returns, I’d have to get another one, probably of a different colour, even. I need something that the server can use to track this particular form being filled in, for reasons I’ll discuss later.

One quick and easy solution to this is to add an UUID to that form. UUIDs are guaranteed to be unique, and are pretty cheap to generate. So cheap in fact, there’s no reason not to slap one on the form itself:

This allows us to track the entire process of filling in a web version of my little passport application workflow. In HTTP-speak, that workflow would be something like:

  • GET /passport_applications/new.xml (200 OK)
  • POST /passport_applications (201 Created)
  • GET /fee_payments/new.xml?for=09711c30-40d5-012b-3f7b-001ec212da96 (200 OK)
  • POST /fee_payments (201 Created)
  • PUT /passport_applications/09711c30-40d5-012b-3f7b-001ec212da96 (202 Accepted)
  • GET /passport_applications/09711c30-40d5-012b-3f7b-001ec212da96 (200 OK)

A benefit to using an UUID to identify resources is already evident here: because they are unguessable, there’s no problem in using them on URLs for privacy-sensitive documents, as it is extremely unlikely that potential attackers would be able to hit arbitrary UUIDs and get to something other than a 404 Not Found.

Another benefit is that UUIDs also work really well as artificial primary keys in relational databases. SQLServer, Oracle, MySQL, PostgreSQL and most other RDBMSs support some UUID type, or have a UUID function. This means we don’t need sequential IDs on our tables, and while they need a little extra storage, the upside is that they don’t have to perform expensive synchronization on the sequences. If you are not using an RDBMS and need that extra little bit of cheap scalability, document databases such as Amazon SimpleDB, CouchDB, HBase and Google BigTable also love UUIDs.

So what kinds of cool stuff can you do if you buy some more storage and collect data about every step of an user interaction, even when that interaction wasn’t successful? Imagine that every time the number on my stub got called and I talked to the attendant, she also took a photocopy of my form and documents before handing them back. What could be done with that data, given some spare cycles?

I’m sure others will have many more interesting ideas, but the one that jumps to mind mind immediately is being able to see how long your forms are taking to complete and exactly at which step people trip on common validation mistakes. That data can answer questions like “is it worth adding some JavaScript that checks the email format of this field on the spot?” and “after signing up and logging in, what is the first thing my users do?”, and it can answer that with a lot more detail and accuracy than you would get by trawling the HTTP server logs or adding something like Google Analytics to your pages.

Suppose that you discovered that quite a few of your users are having trouble paying for the fee—they haven’t been told how much it was, and they had no cash at hand! You could then work out a solution, from the simplest (putting up a list of fees near the entrance) to the most complete (accepting credit and debit cards and putting a cash machine next to the booth). You could even let the process happen asynchronously: users can choose to pay when they come back to get their new passport if it’s more convenient, for example. And, best of all, it’s perfectly possible to do these things while being really nice to HTTP servers, proxies, caches and other bits of the infrastructure of the web. It’s really what REST is about, building and playing nice with the web’s infrastructure… isn’t it?


General

Comments (2)

Permalink

A Look at Rails’ Complexity

I’ve always been trying to understand a little bit more about project life-cycles: how and when do you consider a codebase mature? When does it become considered “legacy” code that people would rather rewrite than fix?

So equipped with git-iterate, I ran flog over the Ruby on Rails code, dating back from its first commit: in four years, it went from the loudly announced opinionated web framework that was promising to take over the world to something that actually accomplished it: tons of start-ups are using it all over the world, and there’s a thriving market for jobs and books and brought Ruby adoption into the near-mainstream while at it.

Meanwhile, the complexity of its code steadily increased: from roughly 16k flog points to ~95k, which is quite a big jump. I was tempted to cry “bloatware!” when, investigating further, I noticed a plateau around 4/5ths of the chart: the tests were increasing, but the code was actually stable or becoming simpler; new features added and bugs fixed without bloating it up!

That’s what I wanted to see in a mature and stable codebase: people begin finding analogies and abstractions that fit the solution a bit better, and small improvements pile up until you see most code size or complexity metrics decrease. It’s still no guarantee that the right problems are being solved, but at least it states that the problem it is solving is being solved well.

You might be asking yourself what happens towards the end of the chart, where there’s a big spike. Near the end of March, Geoff Buesing added an abbreviated version of the TZInfo gem, which is one of the steps taken toward getting timezone support going, one of the biggest new features in Rails 2.1. While it could be considered bloat, not including a dependency on another gem keeps things a lot easier for a bunch of people (plus, if you happen to have the tzinfo gem with a more recent version, it’ll just use that).

In conclusion, good work Railers!

General

Comments (6)

Permalink

Git Iterator

I wanted to generate some visualizations of our project’s growth, so I decided to put together a little shell script that looked at the output from git log to spit out some metrics.

So git-iterate was born: run anything through your entire project’s history, and get the results in something easily converted into a beautiful chart!

It does that by running git-reset --hard $COMMIT for every commit in the repository, and then calling the script given to it as the first argument. It passes the commit ID to the script too, so this:


git-iterate echo

…will generate a list of all your commit IDs, most recent last.

The code is on GitHub, as usual. I’m running a few stats on some projects I have access to, and will upload a few charts as soon as they’re ready. Meanwhile, feel free to send me the output of this:


git-iterate 'echo `flog app` `flog spec`' # if needed, replace "spec" for "test"

…and I’ll chart those for comparison as well. Also, it shouldn’t be difficult to port git-iterate to other source control systems (all of them have a checkout command, right?) and, if you do that, make sure to plug it in the comments.

Have fun! :)


Update: fresh off the oven, here’s Rails’ total lines of code. Neat, huh?

Geek
General

Comments (8)

Permalink

Networks Are Smart at the Edges

A toothpaste factory had a probem: they sometimes shipped empty boxes, without the tube inside. This was due to the way the production line was set up, and people with experience in designing production lines will tell you how difficult it is to have everything happen with timings so precise that every single unit coming out of it is perfect 100% of the time. Small variations in the environment (which can’t be controlled in a cost-effective fashion) mean you must have quality assurance checks smartly distributed across the line so that customers all the way down the supermarket don’t get pissed off and buy someone else’s product instead.

Understanding how important that was, the CEO of the toothpaste factory got the top people in the company together and they decided to start a new project, in which they would hire an external engineering company to solve their empty boxes problem, as their engineering department was already too stretched to take on any extra effort.

The project followed the usual process: budget and project sponsor allocated, RFP, third-parties selected, and six months (and $8 million) later they had a fantastic solution — on time, on budget, high quality and everyone in the project had a great time. They solved the problem by using some high-tech precision scales that would sound a bell and flash lights whenever a toothpaste box weighing less than it should. The line would stop, and someone had to walk over and yank the defective box out of it, pressing another button when done.

A while later, the CEO decides to have a look at the ROI of the project: amazing results! No empty boxes ever shipped out of the factory after the scales were put in place. Very few customer complaints, and they were gaining market share. “That’s some money well spent!” - he says, before looking closely at the other statistics in the report.

It turns out, the number of defects picked up by the scales was 0 after three weeks of production use. It should’ve been picking up at least a dozen a day, so maybe there was something wrong with the report. He filed a bug against it, and after some investigation, the engineers come back saying the report was actually correct. The scales really weren’t picking up any defects, because all boxes that got to that point in the conveyor belt were good.

Puzzled, the CEO travels down to the factory, and walks up to the part of the line where the precision scales were installed. A few feet before it, there was a $20 desk fan, blowing the empty boxes out of the belt and into a bin.

“Oh, that — one of the guys put it there ’cause he was tired of walking over every time the bell rang”, says one of the workers.

Business
General

Comments (10)

Permalink

Brown M&M Stories

In the early 80’s, Van Halen figured out a way to make sure everything in a long list of technical specifications was implemented correctly, or at least well understood:

So just as a little test, in the technical aspect of the rider, it would say, (…) in the middle of nowhere: “There will be no brown M&M’s in the backstage area, upon pain of forfeiture of the show, with full compensation.”

I’m left wondering if sometimes I encounter user stories that are just like that: “if this little thing over here doesn’t work by release X, it’s time to review the whole project.” It’s a way of failing fast, without the burden of double-checking the whole lot. The problem is, finding one of these in a project where customer collaboration comes before contract negotiation, to sound a little dogmatic here for a second, is obviously a sign that trust has been broken somewhere down the line and needs to be reestablished.

So, just as Van Halen finding brown M&Ms in the backstage area was to be considered life-threatening, finding a “no brown M&Ms” story in your backlog should be a warning that the customer isn’t involved enough, or doesn’t trust the development team. It’s probably not life-threatening, but a very high priority risk.

General

Comments (3)

Permalink