Building ARM Linux from scratch for ILP32 ABI

So… I’m insane. I want to build Linux from scratch - entirely - because I wonder if the ILP32 ABI on ARM is worth some performance. I think it is. And this has been done, but not recently.

If you’re familiar with the x32 ABI, skip the next section.

Typically, OSes are built either for a 32-bit or 64-bit environment. In a 32-bit environment, registers are 32-bit, memory addresses are 32-bit, pointers are 32-bit, etc. In a 64-bit environment, all of those are 64-bit. BUT: That means your pointers are now double the size - even if you’re not using that much memory in a process.

The x32/ILP32 ABIs are a 64-bit operating mode (so you get the increased register count, 64-bit only extensions, bigger math, typically an improved ISA) - BUT, they use 32-bit memory addresses and pointers. So your process only has up to 4GB of virtual address space, but if you use a lot of pointers (which many modern things like browsers do), you literally double your data density in your caches. It’s a thing, and it’s a thing worth quite a bit of performance in some workloads.

Given my hobby of gutless wonder ARM systems, this might be worth some useful performance gains. I’ve seen claims of 20% on ARM, which, given the itty bitty little caches, seems plausible.

Just, there’s no process to do this. And I’m not even sure what will break.

So, I’m considering a Gentoo build on ARM, just to see if I can do it. I may experiment on qemu first just for sanity reasons, but… any advice here other than “Go for it and document it?”

1 Like

You’re probably already thinking along these lines, but I’d try to script/automate as much of the process as possible so I could tweak/reproduce it later. I’ve had good luck with Vagrant in the past for automating whole-machine (VM) setup, but I know some folks prefer Ansible or similar tools not coupled to the VM orchestration layer.

So…we’re going to get SyonykLinux now? x32/ILP32 pre-compiled ARM distro? Sweet!

Someone logged a bug against it around the kernel audit subsystem last June, so it has been played with somewhat recently.

I honestly wonder if trying to do it on bare metal will be more successful than QEMU as it sounds like this is a rarely travelled path.