> Please don't FUD this one to death. Aligned memory access is
> sometimes important and we currently have no straight-forward
> way to achieve it.
I guess that a simple way to cut the discussion short would be to have a first implementation, and run some benchmarks to measure the benefits.
I can certainly see the benefit of cacheline-aligned data structures in multithreaded code (to avoid false sharing/cacheline bouncing): I'm really curious to see how much this would benefit in a single-threaded workload.