on Zope, multiple cores, and the GIL
But why did the site get faster?
Looking at a munin graph of server activity, I observed a noticeable drop in the number of rescheduling interrupts -- a change that coincided with my change in server configuration:

I suspect that the "before" portion of this graph illustrates a problem that occurs when running multi-threaded Python programs on multi-core machines, wherein threads running in different cores fight for control of the Global Interpreter Lock (a problem Dave Beazley has called to the community's attention in a recent presentation) -- and that this explains the improvement in performance once I switched to multiple processes with fewer threads. By switching to multiple processes, we let concurrent processing get managed by the operating system, which is much better at it.
Moral of the story: If you're running Zope on a multi-core machine, having more than 2 threads per Zope instance is probably a bad move performance-wise, compared to the option of running more (load-balanced) instances with fewer threads.
(Using a single thread per instance might be even better, although of course you need to make sure you have enough instances to still handle your load, and you need to make sure single-threaded instances don't make calls to external services which then call back to that instance and block. I haven't experimented with using single-threaded instances yet myself.)
Anonymous on on Zope, multiple cores, and the GIL