Would ARM on RISC-V emulation have lower overhead than, say, x86-64 on ARMv8? I would imagine the common RISC stuff would be beneficial there, but I know next to nothing about CPU architecture.
No. Both x86 and ARM are very expensive to emulate on another ISA, primarily because of the intricate details of updating their respective status registers / condition codes after each instruction. In an emulator, getting that right takes a lot more work than doing the actual add or xor or whatever the instruction appears to do.
RISC-V on the other hand is very easy to emulate at high performance because it doesn't have condition codes at all. Rather than doing something like "cmp A,B;blt foo" as x86 and ARM do (with the result of the cmp stored in the condition codes), the equivalent RISC-V code is "blt a,b,foo".
AArch64 doesn't rely on NZCV that much. Unlike AArch32, most instructions can't run conditionally anymore, it's mostly just the usual conditional branches. Like on RISC-V, cb(n)z and tb(n)z perform comparion/bit test together with a branch in the same instruction.
Yes, but it's not specific to "common RISC stuff", but instead it's because of a peculiarity of the x86 family, which has a strong memory ordering, while both ARM and RISC-V have a weak memory ordering. Apple avoided that overhead by adding a special strong memory ordering mode to its processors, which is enabled while running their x86 emulator; but emulators running on other processors (unless they have the RISC-V TSO extension or similar) either have more overhead, or have to use only a single core to run the emulation.