Paul M. Jones

Don't listen to the crowd, they say "jump."

Savant Has A New Owner

As many of you know, I've been the lead of many different PHP libraries over the years: Contact_Vcard_Parse, Contact_Vcard_Build, DB_Table, Text_Wiki, and others. As each matured, I handed them over to other maintainers who continued to improve on them and take them to greater heights. Now that time has come for Savant, one of my early and favorite PHP projects.

Due to time constraints, mostly because of my Solar framework project, I haven't been able to pay as much attention to Savant as I think it deserves, so I made the hard decision to put it up for adoption. Lucky for the Savant community, Brett Bieber (aka Salty Beagle) of the PEAR Group picked up on that call right away. Brett is now the steward and lead developer of the Savant Template System for PHP.

The transfer of code, domain names, and hosting is complete, but the transition period might be a bit bumpy, so please bear with us. Brett is committed to "carrying the torch" for Savant (his words). Anyone who wants to help out the new project lead can contact him at "brett.bieber --at-- gmail --dot-- com".

Thanks, Brett, for taking over the project, and good luck!


Ledger's Joker

I plan on writing a much lengthier post about The Dark Knight, and especially about Heath Ledger's portrayal of the Joker. But I want to get this bit out first.

It took me a while to figure out what it is about the Joker in this movie that strikes me as so fascinating and familiar and yet so terrifying, but I think I have it: Ledger takes the intense psychosis of Hannibal Lecter and mixes in the coyote characteristics of Daffy Duck. There's likely a lot more to it than that, but I've seen the movie three times now -- you tell me:

+ =

(The Joker was always my favorite villain, and Daffy is one of my favorite Warner Brothers characters, so maybe I'm predisposed to pick out similar behaviors.)



Exceptional command-line PHP

(Yes, I know, I've done no blogging in far too long. I've got a stack of stuff to blog about, but it's all rather heavy. In the mean time, here's something light.)

When executing code at the command line using php -r and PHP 5.2.5, be sure not to extend the Exception class. It will cause a segmentation fault.

For example, the following causes no trouble at all:


Samurai:~ pmjones$ php -r "throw new Exception();"
PHP Fatal error:  Uncaught exception 'Exception' in Command line code:1
Stack trace:
#0 {main}
  thrown in Command line code on line 1

But the next example gives a segmentation fault following a long ... pause ... after the stack trace output:


Samurai:~ pmjones$ php -r "class Foo extends Exception {} throw new Exception();"
Fatal error: Uncaught exception 'Exception' in Command line code:1
Stack trace:
#0 {main}
  thrown in Command line code on line 1
Segmentation fault

Note that we didn't even throw the extended Foo exception; we threw the native PHP exception. The mere presence of the extended class is enough to cause the segfault.

It took me two evenings to track this down; what you see here is the simplified generic case. I've entered a bug with the PHP guys here.

Update: I thought I was running 5.2.6, but I was wrong; this was occurring on PHP 5.2.5. Note to self: check to make sure you're running the latest version. :-)

Update (2008-08-12): These guys found the problem earlier, too: https://bugs.launchpad.net/ubuntu/+source/php5/+bug/198246.


On Plumbing

Note to self: next time the bathtub won't drain, first check to make sure the plug is open **before** you assume it's clogged and pour 2 1/2 bottles of Drano down the pipe.



Line Length, Volume, and Density

Update: This entry seems to be getting a lot of new attention; welcome! The lessons of line length, volume, and density, along with lots of other good design principles, are applied to the Solar Framework for PHP 5. Be sure to give it a look if you're interested in well-designed PHP code.

When it comes to coding style, there are are various ideas about how you should write the individual lines of code. The usual argument is about "how long should a line of code be"? There's more to it than that, though. Developers should also take into account line volume ("number of lines") and line density ("instructions per line").

Line Length

The PEAR style guide says lines should be no longer than 75-85 characters. Some developers think this is because we need to support terminals where lines may not wrap properly, or because some developer screens may not be big enough to show more than that without having to scroll sideways, or because it's tradition, and so on. These reasons may even be accurate in some sense. However, I see the 75-character rule as recognizing a cognitive limitation, not a requirement that can change with available technology.

How many words per line can a person scan, and still be able to grasp the content of the line in the context of the surrounding lines? Printing and publishing typographers figured out a long time ago that most people can read no more than 10 to 12 words per line before they have trouble differentiating lines from each other. (A "word" is counted as five characters on average.) Even allowing for a 25% to 50% increase, that brings us up to 15 words. Times 5 characters per word, that means 75 characters on a line.

So the style guide limitation on line length is not exactly arbitrary. It is about the developer's ability to effectively scan and comprehend strings of text, not about the technical considerations of terminals and text-editors.

Line Volume and Density

Some developers believe you should put as much code as possible on a single line, to reduce line-count. They say this makes the code read more like a "sentence". In doing so, these developers trade line "volume" for line "density" (or line "complexity").

Increasing the density of a line tends to make it less readable. Lines of code are generally lists of statements, not natural-language prose. If you put a lot of instructions on a single line of code, that tends to make it harder for other developers to decipher the logical flow.

Examine the following:

list($foo, $bar, $baz) = array(Zim::getVal('foo'), Dib::getVal('bar'), Gir::getVal('baz', Gir::DOOM));

(Yes, I have actually seen code like this. Only the identifier names have been changed.)

Now compare that to the following equivalent code:

$foo = Zim::getVal('foo');
$bar = Dib::getVal('bar');
$baz = Gir::getVal('baz', Gir::DOOM);

When I showed this rewrite to the initial developer, his complaint was: "But it's more lines!".

Increasing line volume ("more lines") and reducing line density does three things:

  1. It reduces line length to make the code more readable.

  2. Making it more readable makes the intent of the code more clear. The logical flow is easier to comprehend.

  3. In this particular case, it may be faster than the original one-liner, because it drops the list() and array() calls. True devotees of the Zend Engine will be able to say for certain if this translates into faster bytecode execution. (I am not a fan of speed for its own sake, but in this case it would be good gravy over the meat of the above two points.)

In reducing line density, you don't have to make one line correlate with a single statement (although usually that's a good idea). Here's another way to rewrite the original example, this time as a single statement across multiple lines:

list($foo, $bar, $baz) = array(
    Zim::getVal('foo'),
    Dib::getVal('bar'),
    Gir::getVal('baz', Gir::DOOM)
);

I find this less readable than the initial rewrite, but the principle is the same: more lines, but shorter, to improve readability.

Balancing Considerations

If shorter lines are better, does that mean lines should be as short as technically possible?

$foo
=
Zim::getVal(
'foo'
);

$bar
=
Dib::getVal(
'bar'
);

$baz
=
Gir::getVal(
'baz'
,
Gir::DOOM
);

It looks like the answer is "no". The line-volume vs. line-density argument is about readability and comprehension. The above example, while absurd, helps to show that overly-short lines are as difficult to read as over-long ones.

Developers with good style balance all the considerations of line length, volume, and density. That means they write lines of code no more than about 75 characters long, but not so short as to be increase line volume without need. They also show attention to line density for reasons related to cognition and comprehension, not merely technical syntax.


Why I Prefer Test-Later

I remain unconvinced of the benefits of test-first and test-driven development (TDD) because I think the underlying principles of TDD are lacking, not because of the way TDD adherents talk about those principles. I believe I understand the test-first adherents very well, and I disagree with them.

I agree with Travis that many TDD and test-first devotees suffer from what Travis calls “expertise syndrome”. I think almost anyone who has a great deal of sophisticated and nuanced knowledge on a particular topic must actively look out for and modify the kind of behavior Travis describes.

But neither the “expertise syndrome” nor the sometimes religious zeal of many TDDers are the cause of my ambivalent feelings about test-first. Instead, I simply fail to see that TDD is really so useful as its adherents anecdotally proclaim. This is not because their communication skills are lacking in some way, but because I find the idea of TDD itself to be lacking.

I have various reasons for thinking test-first is not the fantastic solution its devotees claim. Chief among my reasons is this: a programmer not already familiar with the domain of a non-trivial problem generally doesn’t know enough to write meaningful tests in advance. The act of writing the code as a solution will train the programmer in the domain. He can write more useful tests afterwards to make sure future changes to the code do not break existing behaviors.

The Map Is Not The Territory

I think test-first is not likely to help most programmers understand the problem any better — test-first is only going to tell the programmer if his code works the way he thinks it should, not if it solves the problem he is addressing.

This is because the the map is not the territory. That is, a “paper abstract” map of the problem is not the same as the “physical concrete” territory of the problem. The map may provide useful hints, but until you have explored the territory, your understanding of reality is severely handicapped regardless of how smart, intelligent, or clever you think you are.

Test-first assumes the map of the problem is sufficient. In reality, unexplored assumptions and unexpected interactions abound. Test-later depends on you having explored the territory of the problem, becoming familiar with the reality of the situation by working through it yourself and gaining personal experience with it. Your solutions will be all the better for it, and your tests will then reflect the real territory, not the abstract map.

Thus, I believe test-later is more accurate and more useful, as well as a more effective use of your limited time when checking solutions in code, because it is based on practical experiment, not hypotheticals.

Testing Is Useful

Testing is useful and necessary. Unit, system, and integration testing are a great tool to help find, remove, and prevent new errors in working code. I do not think that test-first is any better than test-later, especially for programmers who have good self-management and self-discipline.

David Sklar has said in regards to testing that “tools are secondary, discipline is primary” (see slide 40). I heartily agree. In this sense, I think for programmers who have good discipline, test-first might well be a good tool. But test-later is just as good a tool, and perhaps a better one.

Similarly, with programmers who have mediocre or poor self-discipline, test-first and TDD may help them produce better code … but the quality of that code will still be lower than that generated by good programmers who take the time to explore the problem domain.

It appears that at least one bit of research supports my otherwise unproven assertions in this regard. Or, rather, the debunking of one bit of research.

  • The control group (non-TDD or “Test Last”) had higher quality in every dimension"”they had higher floor, ceiling, mean, and median quality.

  • The control group produced higher quality with consistently fewer tests.

  • Quality was better correlated to number of tests for the TDD group (an interesting point of differentiation that I'm not sure the authors caught).

  • The control group's productivity was highly predictable as a function of number of tests and had a stronger correlation than the TDD group.

So TDD's relationship to quality is problematic at best.

Having a big clump in that upper left quadrant is troubling enough but then having the “Test Last” group almost double your “Test First” group in the over 90% quality range is something that should be noticed and highlighted.

While correlation doesn't equal causation, the lack of correlation pretty much requires a lack of causation.

Read the whole article for yourself, as well as the original report, and draw your own conclusions.



Another Smarty Emigrant

This guyHasin Hayder has finally realized that there's no need for Smarty's template language any more (even after writing a book about Smarty).

Harry Fuecks and Brian Lozier made the same conclusion four years ago, and based on those articles, so did I. It turns out that Smarty tries to solve (mostly) the wrong problem

You may have heard that you need to keep your PHP and HTML separated, but that's not quite the case. Instead, what you need is to keep your "business logic" separate from your "presentation logic", and that's a different thing entirely.

Thus, all that's required is a way to keep your views and controllers separated, and perhaps provide helpers for common view tasks. Then you can use plain PHP in your view scripts (templates), without needing a whole new language.

That line of thinking led me to write Savant first in 2004 (and versions 2 and 3 later), along with its more-recent ideological descendants Zend_View and Solar_View. Even the Cake and Symfony guys got the point early on (and -- dare I say it? -- the Rails crowd as well with eRB templates).

It's so funny to see the comments in Hasin's article; all the same old tired arguments come up. Here's the deal: if you think you need to protect your business logic from your graphic designer, you don't have a technical problem, you have a hiring/management/training problem.

Update (2008-01-11): Apparently some readers thought "this guy" in the first paragraph meant me; changed it to "Hasin Hayder" to be more explicit. (I'm looking at you, Kimsal.)