Apple's ARM Transition

Today, Apple actually announced their first wave of ARM hardware - a MacBook Air, MacBook Pro 13, and Mac Mini.

While performance numbers are yet to be determined, they certainly should be promising.

And, of course, I’m excited that the ARM software ecosystem issues in Node and other abominations will likely be fixed. Makes my life easier, at least.

Of particular interest to those who play down deep, Anandtech has a solid review of the A14 up today, which certainly shares a lot with the new M1 chip.

I expect the laptops to work quite well, and their battery life is impressive - 20h on the MBP.

Having been looking forward to Apple leaving Intel for, oh, a decade or so now (although to be quite honest I’m also glad they left PowerPC, that was a dead end even when they started on that path), this is great news.

It’s made even better by the fact that it’s ARM tech under the hood (though, how much of it is actually even resemblant of ARM anymore is anybody’s debate). By all indications though, it’s a very serious chip made by a team of engineers who seem very serious about improving it for quite some time, and they seem to have done it much faster than Intel has been able to do. Maybe they’re onto something good here.

In any case, while I doubt I have a need for a new Apple in the next year or so, I do expect my next Mac will have Apple Silicon in it and I’m looking forward to the reviews and software progress over the next year as these things get pressed into service.

I don’t expect too many surprises given that the A series chips have been already heavily used and this is a solid evolution of them rather than something utterly new.

I still don’t really comprehend what the neural engine wonkery is any good for (ML is not something I’m particularly bullish on - I think it’s a false illusion that humans are exceptionally vulnerable to wanting to believe in), and I hope that’s not “the future” (cough) of computing, but it certainly seems stuck here for a while. Blech.

The GPU seems awfully fast for “only” 8 cores… I wonder if each of those “cores” is actually massively parallelized itself… which brings me to another point: the ARM cores (at least the Firestorm high performance onces) seem to already be extremely parallelized, and the idea of a “core” anymore is basically slowly sliding away into a morph of independently taskable function blocks (ALUs, FPUs, SIMD units, etc) and a hugely deep decoder/out of order processor. There’s only a fine line now between just scaling that idea up, and consolidating all of the many cores’ resources under one decoder/reorder/commit unit which is simply “context aware” … then you have virtual cores (a la hyperthreading, I guess) which can be reordered amongst themselves even to maximize parallelism and utilization of the internal functional blocks.

Does anybody know more about this trend?

I fully expect Apple’s silicon to be ARMv8, with a bunch of the sub extensions, and perhaps a few of their own things added in - yet, still entirely ARM.

I’ve not seen anything regarding massive “virtual cores,” but it’s certainly an interesting concept, and one we might see out of Apple sooner than anyone else. The main problem there is that your L1/L2 caches start having to be even more huge, which requires a lot of power to look up addresses. I expect 4x 192kB L1 caches would use less power, by far, than a single large one. But… interesting concept, certainly. If anyone would go this way, I’d expect Apple to do it.

I’m probably going to replace my little-used Intel Mac Mini with an AS one. There’s a lot I’d like to do on ARM based systems (or, rather, not-Intel and ideally not-x86), but the hardware options now have been limited to fairly low power (Pi4) or “server development systems” - massive 64 core workstations with a power draw to match. There’s been nothing “ARM-NUC-like,” until now. And I can flip my Intel Mac Mini for about the cost of an AS one if I’m willing to use some external storage, which doesn’t bother me.

Apple Silicon improving software in the greater ARM ecosystem is what I’m most excited about. I, too, would love an ARM NUC.

I run a NUC right now with Proxmox that has a dozen containers (PiHole, Plex, RIPE-Atlas, UniFi controller & video, smoke pong, EmonCMS, InfluxDB, Grafana) and one VM (Home Assistant) which is great, but I would always take lower power consumption, especially if it came with more…processing power.

That’s part of why I’m interested in the Mac Mini - idle power consumption. My office being solar powered, I care about stuff like that at night, and being able to leave a powerful computer on, yet trusting it to idle at exceedingly low power, would open up some interesting options for me.

If it helps you any, I ordered a mini. Will report back shortly.

As have I…

Apple’s environmental impact summaries on the 2018 vs 2020 (AS) model are interesting.

(2018 Mac Mini, 2020 report)

(2020 AS Mac Mini)

Idle power on the 2018 is listed as 10.6W, which is broadly in line with what I see. The AS Mac is down at 6.3W - which is a noticeable difference.

Interested in what the full power difference/average running difference is. But that idle power is down almost to what a Pi idles at… for a full desktop that should be rather radically faster than a Pi.

1 Like

My reticence has really be due to Apple’s tendency to lock down the OS.

1 Like

There’s nothing about the ARM transition that allows them to lock things down more than they already have, at least in the context of running OS X. They already did trusted boot, with ways around it as needed. I don’t see any good reason to expect OS X to change drastically on that front.

And, in general, I do think more locked down OSes are the right default answer. As long as there is a way to disable some of the controls (which there has been on OS X for years now - you can limit apps to store apps, signed apps, or run anything), it makes the platform less compelling for malware authors to target. I like Chromebooks for the same reason - you can pop them into developer mode and do whatever you want, but a Chromebook is a hard target, and so few are in developer mode that nobody bothers doing anything beyond evil extensions for them.

It’s just not a huge concern to me.

1 Like

Well, couple benchmarks are out.

The Mini idles at 4W. That’s Raspberry Pi levels of idle power, with near-workstation levels of performance and literally world leading single threaded performance.

I am excited.

I haz one.

LMK if you need any info - most things need rosetta 2 at this point, including macports and homebrew, so… unless it’s one of the few already native apps it’ll be slower.

Man. Arstechnica Mac Mini review today. The raw CPU performance is nice, but the overall system performance compared to the previous gen Intel one?

Compared to the Intel i5 6-core, the M1 is:

  • About 50% faster single core.
  • Not quite twice as fast, multicore.
  • 5x faster or so, in terms of integrated graphics.

And then in actual things like “rendering workflows,” it’s just… nuts.

Absolute slaughter on native app performance…

And emulating Intel processors, it’s still faster!

Now if only I had something that was performance bound… something like writing text… :wink:

1 Like

Onchip memory does that. Not upgradable, though.

I’m not actually sure how much of a latency improvement having it on-package gives vs remote, but it certainly has an impact on power use.

As much as the whole non-replaceable memory thing bugs me, I also recognize that very, very few people actually do that sort of thing. And, the flip side, 24 hour battery life is kind of a nice trade.

So, one open question about Rosetta has been, “Erm. Wait, how did you get that performance?”

A kernel extension that enables total store ordering on Apple silicon, with semantics similar to x86_64’s memory model. This is normally done by the kernel through modifications to a special register upon exit from the kernel for programs running under Rosetta 2; however, it is possible to enable this for arbitrary processes (on a per-thread basis, technically) as well by modifying the flag for this feature and letting the kernel enable it for us on. Setting this flag on certain processors can only be done on high-performance cores, so as a side effect of enabling TSO the kernel extension will also migrate your code off the efficiency cores permanently.

Background for those not steeped in the weeds of modern computing architectures:

x86 is a strongly ordered architecture, which means that memory accesses appear, to the system, in program execution order - and code relies on this. ARM is weakly ordered, meaning a lot of memory stuff can be reordered. Emulating ARM on x86 is no problem, emulating x86 on ARM (especially multi-threaded) is a problem. You can do it with a lot of memory fences, but it slaughters performance. Apple pulling off 80% of native on Rosetta is nuts - but they have hardware support for the x86 style ordering. Crazy!

Yeah the emulation is crazy fast considering.

Well, rumors of the M1X in the upcoming 16" MBP would indicate double the performance cores, so 8/4.

I’d expect to see more memory capacity and bandwidth to feed them - if they just go nuts and double the DRAM chip count and bus width, well then. They can already feed the cores well, but doubling the performance cores without upping memory bandwidth would probably start to choke them a bit. More RAM, more cache, more performance cores, holy crap! :smiley:

1 Like

Hello from M1 Mac Mini power. This thing is amazing - I mean, it’ll even run Blogger!


1 Like

This thing is FAST. Wow.

Someone’s breakdown of the whole thing.