Battle of the Boards 2022: Pi4 vs N2+ vs PineBook Pro

Syonyk · January 8, 2022, 9:46pm

Best read here: https://www.sevarg.net/2022/01/08/battle-of-the-boards/

Every now and then, I do something I call a “Battle of the Boards.” It’s a head to head benchmark competition between the various small board computers I have laying around, with the goal of figuring out if there are meaningful differences in performance between them, and, if so, which one is the best option at the point I’m testing them. I run a range of tests, and I try to stick to things that reflect how a lot of people use their computers - so, browser benchmarks matter, kernel builds are at least somewhat less interesting. At this point in my life, I consider memory bandwidth a useful proxy for performance (the Apple M1 chips are utterly insane here), and I just hate waiting on disk IO - so that comes into play as well.

The goal is to figure out, in real world situations, which of the boards is best - or, at least, which is worth the money.

This Year’s Boards

This year’s competition includes the following:

Raspberry Pi 4
ODroid N2+
PineBook Pro
An old Core 2 Duo MacBook I’ve got around for utility work

I’ve dropped the Pi3 out of the mix, because it’s just a lot slower than the Pi4, and there’s no good reason to buy one unless you’re fitting into something that requires the form factor. I no longer have any Jetson Nanos, so those are out as well. And, unfortunately, some of these devices are going to be a little bit hard to find - supply chains and all.

The Raspberry Pi 4

If you read this blog, you’re almost certainly familiar with the Raspberry Pi 4. It’s the latest in the Raspberry Pi Foundation’s $35 single board computers (though they’re no longer $35 if you want serious RAM - and I’m glad they’ve gone that way!). It’s got a quad core A72 cluster running at 1.5GHz, though the Raspberry Pi 400 clocks it at 1.8GHz on a slightly different chip revision. I’m testing at both speeds, because 1.8GHz is 20% faster than 1.5GHz, and is a sort of “daily driver” overclock for the Pi4. I’ve previously done testing at 2GHz, and it will run up there, but my personal Pi4 started being flakey at 2.0, even with 0.1V overvolt, so I backed it off.

The Pi4 has an improved SD card controller over the Pi3 and previous, as well as USB 3. They’re well supported, can usually be found in stock somewhere in some configuration, and represent the “baseline small board computer” that most people think of in this space. OS support is excellent, and I’ve got one as my light office utility desktop.

The ODroid N2+

In the “beefier than a Pi4” category of small board computer, ODroid has been making a range of boards for a while. Some are really well suited to network attached storage devices, some are more general purpose. The N2+ is the fastest of the general purpose boards, rocking a six-core SoC with 4x A73 cores at the “stable overclock” of 2.4GHz, and a pair of A53 cores at 2.0GHz. Annoyingly, the ODroid only comes with up to 4GB RAM - so that’s a bit of a downer compared to the 8GB Pi4 you can get, though I’ve not found it to be a huge difference in real world use. For storage, it supports an SD card, an eMMC module, and USB 3. It’s powered by a barrel plug (thankfully - I hate microUSB as a power supply), and I’ve got one churning away as one of my office desktops, having replaced an M1 Mac Mini. OS support is “community,” but it’s not hard to find and install OSes, and there’s a firmware that includes a rather fun little net installer capability for the common Linuxes.

The PineBook Pro

The PineBook Pro is the nicer Pine64 laptop offering - and, of course, is entirely out of stock. This is a $200 laptop (probably more if you can get one) with a pair of A72 cores at 2.0GHz, and a set of 4x A53s at 1.5GHz. Like the N2+, it’s limited to 4GB of RAM, and also has onboard eMMC storage. I’ve had mine for a while, and have been daily driving it as a house laptop for the past several months with great success. Pine64 finally got their trackpad OEM to release a fixed firmware that isn’t horrible, so now the trackpad is “adequate and usable” instead of “insanely frustrating to use.” Battery life is insanely good (8-10 hours typically), and for $200, it’s an awful lot better than any little laptop has any right to be! Mine is running a newer DRAM training algorithm that runs the DRAM at 856MHz instead of the normal 800MHz, so there’s a bit of a performance boost from that.

I’ll be doing some more in depth reviews of the N2+ and PBP in the months to come, if you’re curious for more details. Here, though, I focus on the performance! But, seriously, $200, and a matte screen.

The MacBook5,1 Core 2 Duo

Finally, because it’s interesting to compare things with x86 hardware, I’ve got an aluminum unibody MacBook thrown into the blend. This is my attempt at an admin laptop to replace Clank (for general network configuration and such), and is dual booting Windows 10 and Linux. It’s purring away at 2.0GHz, with two cores of fury. Not fast by modern standards, but was a respectable laptop a decade ago.

Browser Benchmarks

I’ll start out with the browser benchmarks, because this makes up a lot of how most people use computers anymore, and as the modern web gets heavier and heavier, with more and more CPU overhead, it really chews up the CPU. These tests are all with Chromium, with versions in the 90s. It’s more trouble than it’s worth to try and get identical versions on everything, so I’ve just gone with “What’s up to date on the system at the time I run the tests.” The results aren’t close enough that it’s likely to impact the results..

Starting out, let’s look at SunSpider - an older Javascript benchmark. In this test, the results are in milliseconds, so lower is better. There’s a nice improvement in the Pi4 performance going from 1.5GHz to 1.8GHz, but it’s still the slowest in the tests. The PineBook Pro comes in a bit faster than the 1.8GHz Pi4, but the ODroid dominates the little ARM board tests, performing nearly dead even with a Core 2 Duo.

Moving on to Octane 2.0, where higher numbers are better, the N2+ takes the lead, with the MacBook coming in second. Once again, the performance delta between the two Pi4 speeds is obvious, and the PBP settles in the middle.

With a more modern benchmark (SunSpider and Octane are both considered obsolete), JetStream 2 returns… well, the exact same pattern as Octane did. The N2+ slightly outperforms the MacBook, with the PBP following, then the Pi4.

Finally, Speedometer 2.0 tests general web responsiveness for a range of framworks. Here, the MacBook on Intel takes the lead, with the N2+ coming in second, and then the PBP and Pi4 ending up at roughly the same performance levels.

In terms of “practical daily use,” these browser benchmarks don’t surprise me in the slightest. The N2+ regularly feels the quickest in general web use. The PBP is also entirely acceptable, and the Pi4, while being a huge improvement over the Pi3B+ I’ve used in the past, will take more time than either of the others to load something complex.

How do these compare to a modern-ish desktop? For comparison, here are the results from an i7-8700K on Linux, as compared with the N2+:

SunSpider 1.02: 188.7ms (2.61x)
Octane 2.0: 46438 (3.39x)
JetStream 2: 135.912 (2.91x)
Speedometer 2.0: 144 (4.16x)

The Intel chip is clearly faster, but… not by an awful lot. And these boards are sub-$100 boards, with the exception of the PBP, which is a $200 laptop. New. If you demand the snappiest, fastest, most responsive performance from your browser, any of these systems will probably frustrate you, but I find all of them entirely usable on the modern web. I just don’t expect instant response. Of course, with my ISPs, that won’t happen anyway.

Memory Bandwidth

One of the things I’ve been learning over the years is that, as a general rule of thumb, performance corresponds to memory bandwidth (and cache). It’s not hard to starve a modern CPU of data with a general purpose workload, so the more cache the chip has, and the faster the memory bandwidth, the better it will perform. Here, I’ve used the mbw benchmark tool, which does a range of “used by real software” style memory copies to estimate bandwidth. This is well out of cache and into DRAM, and the N2+ dominates, with the PBP coming in second. What’s interesting here is just how poorly the Core 2 Duo performs, while still dominating browser tests. I like big cache and I cannot lie, and the C2D has an awful lot more cache than most of the other systems here. Of course, the M1, which I will no longer use, has more cache than all of them combined…

7Zip Benchmark

The 7Zip tool comes with a built in benchmark mode, and whatever the N2+ is doing, 7Zip really likes it. It dominates the calculated performance for 7Zip, with, no surprise, the PBP following, and the Pi4 coming in behind. Interestingly, in this test, the Core 2 Duo comes in last frequently enough, and there’s no real difference between the Pi4 at different clock speeds. I’m not sure exactly how this test reflects reality outside the 7Zip world, but it continues to demonstrate the the N2+ is, by a good margin, the fastest little ARM SBC in the tests.

FLAC Decode/Encode

Continuing with a new test I started for the Pi 32/64-bit benchmarking, I’ve got a FLAC decode/encode cycle here. One gotcha on this test is that it can easily becomes IO bound - the data sizes are large enough that Linux will start blocking on disk writes before the decode is done, and this isn’t useful for CPU benchmarking. I’ve mentioned vm.dirty_ratio on the blog before in the context of small board performance, and here, it’s worth setting it very high - to 80 (percent) or so. This allows up to 80% of the system RAM or so to be “dirty” (needing writeback to the disk) pages, and effectively improves write performance by a lot - as long as you don’t mind the risk of an awful lot of data loss in the event of a power failure. For benchmarking? It’s fine! For daily use? It’s probably fine. Use sync if you care. I tend to leave it around 50 on daily driver SBCs and haven’t had problems, but I also have quite reliable power in my office (rather more reliable than the house).

On the decode side, there’s no clear winner beyond “Not the 1.5GHz Pi4.” The rest are all pretty much even. On the encode side, clearly the flac developers have spent some time on Intel optimizations, because the COre 2 Duo just runs away with the contest. Everything else is, again, in the same ballpark except for the 1.5GHz Pi4. So, I suppose, overclock your Pi before encoding FLAC? Or just use Intel.

Building iozone3

I like to build myself a fresh copy of iozone for benchmarking disks, and if I’m doing a build, I may as well time it! It’s not a particularly multi-threaded build, though I give make a fair chance with the -j flag. In any case, either the fast SSD on the MacBook or an awful lot of years spent on Intel optimization show up clearly here, with the MacBook coming in the far leader, followed by the ARM boards. I’ve no idea why the PineBook Pro is so slow at building here, but it is.

Disk IO: SD Card

Starting out with the disk IO tests, I’m testing the SD card interface on all the systems that have it. One of the choke points of the Raspberry Pis in the past has been the SD card interface, and while the Pi4 is far better than the Pi{1,2,3}, how does it stack up against other ARM systems? I’m using a Samsung Evo Plus here, which is a high performance SD card that should let the interfaces show off what they can do.

In general, for disk benchmarks, if everything performs the same, you’ve hit the limit of the disk. Here, for the 4k writes, it looks like the SD card is the limit. However, in reads, the N2+ and PineBook Pro clearly are getting a lot more performance from the card than the Pi4 - and the Pi4 doesn’t seem to be impacted by clock speeds, so it’s likely just riding the limit of the SD card interface. The N2+ reliably turns in the best read performance, followed by the PBP. For general system use, the read performance of the SD card impacts how the SBC “feels” in daily use, and clearly the N2+ is doing something right, followed by the PBP. For writes, though, the PBP pulls ahead, with even the Pi4 outperforming the N2+ on large writes for some reason.

USB3 SSD

Moving to external storage, I have a USB3 SSD that I’ve also beat up with the boards. Everything except the MacBook supports USB3 here, and the MBP just struggles with the SSD, being “horribly glacial” on every test (worse than I’d expect for USB2). In the small block size accesses, the usual pattern shows up, with the N2+ in the lead, the PineBook Pro in second, and the Pi4 being impacted by CPU speed. However, for the larger reads, the Pi4 manages to come in even with, or sometimes slightly ahead, of both the N2+ and PBP. Unfortunately, for general OS use, these big reads aren’t a limiting factor. My SSD appears to be capable of taking about 45MB/s write, and all the systems are pretty much even on the larger block writes. The main takeaway here is that USB3 performance on ARM seems to be impacted by CPU performance at smaller block sizes, and everything seems to end up with about the same performance at large block sizes.

“Internal” Storage: eMMC/SSD

Finally, in what is an absolutely uneven test, I’ve run benchmarks on the “internal” storage for those boards that have it. For the MacBook, I’m running an OWC SATA SSD. For the ODroid, I’m running one of the HardKernel 64GB eMMC modules. And for the PineBook Pro, I’m running a 64GB Pine64 eMMC module. I’m of the impression (untested) that both the eMMC modules can be interchanged, but they’re clearly not the same hardware from the results (or the eMMC interface on the N2+ is just broken)..

In what shouldn’t surprise anyone, the “real SSD” in the MacBook dominates everything. Regardless of CPU performance, a discrete SSD with a good controller will just crush everyone else. Except for 4k random read, apparently. But this is running, in the large block reads, up around the boundaries of the 3Gbit SATA link. No surprises.

What I didn’t expect was the huge difference in performance between the Pine64 eMMC and the ODroid eMMC. The Pine64 one performs better at write than read in small block sizes, but then scales up nicely on the larger block accesses. However, the ODroid eMMC just cannot handle much write traffic. Even on the large block size writes, it just parks at about 27MB/s and refuses to go any faster, while the Pine64 eMMC is writing at almost 150MB/s. They both read well, around 160MB/s, but the ODroid write performance is just abysmal (which explains some lags I’ve seen on that system). Further experimentation is needed, but if the Pine64 and ODroid eMMC modules are, in fact, compatible, I may switch to using Pine64 modules on everything.

Final Thoughts and Conclusions

So, what should one conclude from this?

First, it’s impressive just how fast these little SBCs have gotten. They’re entirely usable as daily driver computers now, which wasn’t the case with the Pi3 a few years back. The Jetson Nano was the first of the really daily driveable SBCs, and they’ve only gotten faster since then. While 4GB is a bit restrictive, once you enable zswap, I’ve not found it to be a serious limit in practical use.

Second, if you want CPU performance, get the ODroid N2+. It’s not that much more expensive than the Pi4, and you get a more capable system as a result. However, it doesn’t have built in wireless which can be a limit for some people, plus it only has 4GB of RAM. Despite having an 8GB Pi4, I almost never use the 8GB, though.

If you can find a PineBook Pro, they’re awesome, and I’ll be talking more about them in posts to come. But good luck there.

And, I suppose I’d better do some more testing on the Pine64 vs ODroid eMMC modules, see if they’re compatible, and see if these bizarre performance numbers hold up to more testing!

This is a companion discussion topic for the original entry at https://www.sevarg.net/2022/01/08/battle-of-the-boards/

Symbioquine · January 9, 2022, 5:25pm

Minor point of feedback; I found myself needing to parse back through the previous paragraph a few times to figure out the y-axis units of some of the graphs. This isn’t so bad for a linear read-through, but gets annoying for quickly comparing the graphs. Similarly, it’s really nice if benchmarks add a “(higher | lower | less | more) is better” note to each graph. (e.g. like the graphs here have)

Syonyk · January 9, 2022, 5:49pm

Good points - I’ll try to make those changes in future benchmarks!

Vertiginous · January 9, 2022, 6:15pm

Great article, useful data, thanks! I’d be interested in finding out if the ODroid N2+ is bottlenecked on the eMMC chip itself or if it’s the controller/drivers that bottleneck it. Different DMA setups alone can sometimes cause this sort of topping out, and that’s quite a hit to the write performance.

While we’re nitpicking the charts, too, it would be nice if they had consistent colours for each unit throughout the series - there’s one where the ODroid is blue. No big deal for close reading, but at first glance it was very misleading. Just a thought and quite minor.

Syonyk · February 3, 2022, 7:20pm

Updates on this, and will be another post digging into it:

The eMMC issue wasn’t an eMMC issue so much as a “lack of TRIM” issue. I tested on the eMMC module I’ve been thrashing on my N2+, which has done a LOT of little ARM dev build work - and it had never been trimmed. A “bare” card I had laying around performed very well, so I poked and after fstrim on the existing card, performance was right up where it should be for the testing.

However, the 128GB cards do perform still better yet.

SirDrew · April 19, 2022, 3:44am

Some notes on mbw:

Memcpy might actually be calling memmove according to the memcpy man page if running against glibc earlier than 2.14 but running against glibc 2.14. Too many applications didn’t observe the non-overlap requirement and broke.

Memcpy and dumb tests results may have their labels swapped due to a bug in older versions of mbw.

Depending on mbw version, mcblock might not read the whole source array. Depending on block size this might result in a surprisingly small amount of memory read. Likewise for the destination array and a small amount of memory written.

Edit: The memcpy with block size test doesn’t actually use memcpy. It uses mempcpy, so the destination IS updated.

Syonyk · April 19, 2022, 6:52pm

Interesting - thanks! So, “Build from source against a modern glibc and it should do what it says on the tin”?

SirDrew · April 20, 2022, 4:42am

Looks like it to me!