"Jeff Laughton (Dr Jefyll on the 6502.org forum) says, "I recall hanging out with a programmer pal o' mine and a younger fella who was in college. The young fella was complaining, 'We have to take assembly language,' and Len corrected him immediately, saying, 'You get to take assembly language!'"
What's great is that learning assembly is like taking the first step to understanding the bridge between software and hardware.
I remember a taking class that started out as confusing as hell. The first few exercises seemed really mystical and just so brittle. You'd have to be a wizard to ever get this, one thinks. But by the end of the class, we made an asteroids-like game, complete with RNG based on input timing (the time between game execution and when the user clicked start). That was really an enlightening moment.
"...self-modifying code, [is] sometimes appropriate to solve certain problems that have no other solution, or for improving efficiency, as in double-indirect addressing."
I have read that self-modifying code on the x86 architecture is pretty dangerous at the assembly level.
More broadly, this kind of comes back to all the issues in the "C is not a low level language" thread. Some level of assembler certainly gives the programmer as full access to the machine as possible. But naive assembler from the 8086 - 80486 eras is going to be rearranged in a lot of ways in a modern Pentium processor and counting on in-order execution may be a mistake.
Edit: at the same time, the modern processor doesn't really allow a lower level than assembler normally and the default approach is assuming flat memory but being aware of the pitfalls of multiple caches being involved.
X86 is particularly friendly to self modifying code, going so far as to require fewer fences and hints when code is modified. It’s generally easier to implement self- and cross-modifying code on x86 than other ISAs, like say arm.
(I manage a team thag writes self modifying code for a living.)
It's quite widely used for dynamic library jump tables. When calling a function in a dynamically linked library, it calls a stub, which is initially a call to the lazy linker, but gets replaced with a call to the resolved function.
You may be remembering early x86 chips that didn't properly invalidate the instruction cache after a write. Modern chips are fully cache-coherent.
This is interesting. I've read documentation about dynamic linking and it describes this replacement process but I never truly understood the fact it was self-modifying code. Doesn't this imply the program's code is writable? I know that JIT compilers also emit code into writable and executable pages. Aren't there security implications?
There are. That’s why dynamic linking typically doesn’t use self-modifying code and JIT compilers take a number of precautions to prevent attackers from being able to execute arbitrary code.
I don't know much about x86, but on other cpus/architectures (PowerPC, MIPS, and possibly others) one might still need to fiddle with the instruction cache.
Yes but it still needs to:
a) empty the write buffers into the cache, then
b) flush the current instruction stream (in case the CPU has already fetched and decoded instructions from the modified memory)
Doing this on every write (especially considering multiple possible virtual to physical mappings) is very expensive in terms of hardware - it's why some architectures (RISC-V for example) have explicit instructions to trigger these things
Having written self modifying x86 code. The most annoying thing is that instructions are not a fixed width. This means patching code requires you are able to parse every instruction, or a look up table where each instruction starts or just regenerate the entire code whole cloth. This also can cause problems if self modifying code has more than one thread since you suddenly may need to update 1 to 15 bytes atomically.
Generally, easiest to do the last or put NOPs or doing something like windows hot patch point for functions. Where hot patchable functions are preceded with a 5 bytes of nops, and the function always starts with MOV EDI, EDI which again is a pretty much a NOP, but takes two bytes.
This allows one to replace MOV EDI, EDI to a short jump to the start of those 5 bytes which is large enough to hold a long jump to any code. Windows went this route because originally multi byte NOPs where not part of the spec so if you used the one byte NOPs not only would each nop need to be execute slowing down function calls, but in multi-threaded code you would have to lock all threads to edit the code since it would be fetching on byte at time ect...
The original 8086 had a six byte instruction cache (the 8088 in the original IBM PC had a four byte instruction cache). If you modify an instruction less than six bytes away, it won't be seen by the CPU unless you issue a JMP (or CALL) instruction. It was not normally that big of an issue (just make sure you modify the instruction from far enough away). You can use the fact that the 8086 has a six byte instruction cache and the 8088 a four byte instruction cache to determine which CPU the program is executing on.
These days, I think you would need to 1) have memory pages with code with write permissions, 2) possibly flush the instruction cache and 3) hope no other thread is using said routine. With today's security concerns, 1) will not be likely, 2) possibly requires elevated privileges (I don't recall---I've only really done ring-3 level code on x86) and 3) is probably okay in a single-threaded program.
I remember using the instruction-cache as a way to catch debuggers single-stepping through my code. If you rewrote a near instruction to be a jump-to-self single-steppers would fall victim to it..
When I was young I imagined that when I'll be an advanced programming/CS student I would learn advanced stuff like self-modifying code. Too bad real life is not that exciting...
Which is kind of interesting considering how removed the assembly which was written is from what is actually executed. You'd think x86 would be even less likely to notice you changed the plan out from underneath it.
> It is common for the beginner to want all the fancy tools too soon
Am I the only one who’s never felt that way? I get grief from people around me (especially “hurry up and get it done” management types) for spending too much time in the low levels, trying to really understand what I’m doing and what’s going on.
Not sure why HN is saying it is obvious that assembly is relevant because compilers... The article's intent is around a programmer writing assembly. I'm sure there are niches but I can see web developers getting away without writing assembly in their professional career.
Ok, we've put programming in the title above to make this distinction clearer. The author is not writing about computer-generated assembly language, such as compiler back ends.
WebAssembly is not assembly in the ways that the article talks about. Like, writing it directly doesn’t give you any special control or guarantees over timing.
It's assembly against a virtual machine, not a physical one. You're right it's not appropriate for an embedded system or some other RTS, but assembly doesn't stop being assembly when you target a virtual machine.
It kinda does in this case. Don’t kid yourself. In real assembly, the really interesting part is how to use a finite register file. WebAssembly has an infinite slab of variables available, in the sense that you get to say how big it is. That fundamentally changes the game.
Sounds like you’re saying those machines executed bytecode.
Otherwise there isn’t a great limiting principle to your logic. Just because someone once built hardware that executes such a high level assembly that the manual referred to it as bytecode doesn’t mean that all bytecode formats are assembly.
Indeed I am, the interpreter is the microcoded CPU.
Even modern 80x86 Assembly is a low level form of bytecode, given that the micro-ops that are processed by the microcoded CPU are completly unrelated to 80x86 Assembly opcodes.
It targets a virtual machine, not a physical one, but other than that it's "assembly-like" enough that learning some core ASM coding practices will help you.
Not sure why you are sticking to your guns here. First, it's highly unlikely anyone would ever write wasm by hand. Second, the article rambles a bit but the most compelling argument for assembly is writing fast code for constrained hardware. At a high level, that's not what wasm is solving.
Bytecode is just the instructions, which is actually a level lower than asm, but for a virtual machine instead of a physical one. It's a direct enough abstraction that you can predict the bytecode you'd build from the asm you write.
> Assembly language yields maximum control and execution speed.
It yields the only control and execution. Almost programs are generated through an assembler and assembly language. How could it possibly not be relevant?
I read that line as suggesting that when using HLL the programmer generally does not have control over the assembly that is generated. I guess that is why sometimes, as the author suggests, the programmer may write part of a program in HLL and another part in assembly in order to achieve increase execution speed.
It's more than just execution speed. It could something as simple making a function using assembly that swaps the virtual memory page table on a processor. That's not something you gonna find in a high level language.
Yea done that myself, save all the register states, save the stack pointer, and then set a new stack pointer and load all that tasks register states.
Although doing the same on windows is kinda annoying. Also the amount callee saved registers is kinda like what the heck. Here is some code I wrote doing that for window's ABI.
https://pastebin.com/jnxeMRcV
There are a huge number of software engineers today, dare I say most, who cannot read or write assembly and are isolated from it by multiple layers of abstractions.
I would probably struggle to write assembly without difficulty and blowing something up but... is most macro assembler really that hard to read? The examples the author of the article gave seemed reasonably easy to parse out for anyone who has a conceptual model of control flow, variables, memory locations, and basic operations.
You can generate machine language without an assembly language intermediate phase, if you just directly put bytes into a file. But most compilers have some kind of assembler in them.
I could be mistaken, but, don't JVM languages avoid assembly steps? They use bytecode and are translated directly to machine code; I'm not sure there is an translate-to-assembly step. I'm also not sure how much of this confusion is semantics vs actually different components of the compilation and running process.
To me, assembly is one of the things while self studying CS that I felt lacked good support in resources such as this, MOOCs, or just plain explaining it. Usually when I post a topic trying to Demystify the topic I am greeted with an extremely hard to digest read about said topic that is more meant for people Already knowledgeable in the subject.
And I feel assembly should be more a core building skill in a programmers toolbox. So this article is very welcoming for o me.
I would recommend checking out an old book for an old mainframe's assembly language. They're usually much less mystic by virtue of being much less complex. IBM had some really nice manuals and books; no one ever got fired for buying IBM because an IBM machine could be programmed by a dog.
Octal is where it's really at, though, if you get really into this. A fun weekend project is to write an octal "decompiler" (ideally you won't have compiled anything, just having written some octal by hand) that allows you to reason with what it's doing by translating it to an actual language rather than just thin syntactic sugar over 1s and 0s. Octal itself isn't so difficult, it makes binary much easier to reason with, but this definitely helps you get a more intuitive sense of what is what.
Of course, it's not something that has a substantial amount of value with modern machines. Maybe eventually we'll get back there; I think I'll enjoy it when we do. Until then, though, it's fun to play with.
If you really want to dig into the concepts behind assembler, the assembler language part of TAOCP is available for free download: http://mmix.cs.hm.edu/doc/fasc1.pdf. It doesn’t talk about a “real” assembler language like 6502 or 8086, but a made-up one that was designed to present the concepts. It’s easy to move from Knuth’s academic introduction to an actual assembler.
I actually want to learn it to work on reverse engineering projects.
I just don’t know how to get started at all.
I don’t know how people can reverse engineer a device that you don’t access to the running program to. How do you monitor and track all the bits being passed around to break back firmware? Specifically video game mods and hacks I wanted to dabble in since I find their programming fascinating and know I’d be interested to contribute most in my spare time in that.
If you want to modify 3D rendered output, you normally need to adjust shaders, textures and such. For extreme cases, you can hook the entire Direct3D API adjusting how it works for the game. The only assembly you might need for that is shader assembly https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/... but not always necessary as the HLSL decompilers are often OK.
If you want to modify game logic, it’s normally implemented as scripts. Game designers and level designers don’t often know C++, and they certainly don’t want to recompile the game because it’s slow, they adjust scripts and see the result in real time.
I know and understand C++ as it’s the main language I’ve been working in for some time.
How do I modify the code of that which I don’t have access to?
What reverse engineer projects are good for beginners? I see people post here their first project attempt to reverse an older gadget. I’d love to pick up an older gadget and try to reverse engineer it and make it do what I want it to.
> How do I modify the code of that which I don’t have access to?
Native code reverse engineering is very time consuming. It’s often possible to achieve similar results by focusing on the code which you have access to. You don’t have source code of Windows OS components, but you do have their APIs and debug symbols, and that’s much better than just binaries.
If you want to change what’s rendered, you can replace the GPU API with a wrapped version, like renderdoc does. If you want to change what’s loaded from disk, patch game files, or replace whatever OS file I/O APIs is used by the game (DLL injection, then MinHook or Detours).
Even when you do need to change game’s own native code, directly patching machine code is rarely a good idea, very hard to implement and especially debug. An easier way is replacing complete functions with API-compatible replacements implemented in your DLL library in C++. Again, use MinHook or Detours to replace the implementation. C++ allows unrestricted memory access so you can read and write everywhere, here’s working examples: https://github.com/Const-me/vis_avs_dx/blob/master/avs_dx/Dx...https://github.com/Const-me/vis_avs_dx/blob/master/avs_dx/Dx... I didn’t have source code of these C++ classes, but wanted their data regardless. Found the offsets by using VS debugger, these third-party DLLs include GUI to change the values, I compared memory before/after making changes.
> What reverse engineer projects are good for beginners?
In the context of modern Windows games, assuming you wanna change what’s rendered, a good start might be https://renderdoc.org/. Officially, the tool is only supported when you run your own code. Technically, it often works with retail games too, just don’t open issues about that, they’ll be closed as a not supported use case. As a nice side effect, you’ll learn a thing of 2 about Direct3D. The tool is open source with good license (MIT), so you can fork, disable their frame captures, and change their API wrappers to modify the output of some particular game.
One more thing, modern games use a lot of bytecodes. E.g. D3D shaders are byte code, search “3dmigoto decompiler” to decompile dxbc into HLSL. .NET is often byte code (Unity3D is based on .NET), use reflector to decompile into C#. Many games use custom VMs, sometimes modding community has decompilers for their custom byte code.
> I’d love to pick up an older gadget and try to reverse engineer it
Depends on the platform. Older platforms like the NES or SEGA Genesis often had software written in ARM - there are huge communities around modifying these games.
I got a fairly good introduction to assembly with LC-3 (Little Computer 3, an instruction set for learning) programming in an elementary electrical engineering class in college. I haven't looked myself, but for those self studying, searching "LC-3" might be a good option for self-learning assembly.
As recently as a year ago I used built in assembler in Delphi to perform some low level timing. It was not for just super-duper performance though. Surprisingly it was the most straightforward and simple way. Meanwhile my firmware for 3-Phase AC motor torque and speed control for really low power micro controller was doing fine with plain C without any assembly.
I design and develop product for living. Universal guy. Mostly software but sometimes part of it is also firmware / elecronics / hardware. I happened to be a person who did everything from billing systems and other giant products for TELCOs and down to firmware for micro-controller.
Some products I own and some are made to order. Some I did on my own and for some I was a leader of big team
There is many kind of assembly language, some for the actual computer hardware, some for VMs, and some used as both.
I have used and sometimes still do use assembly language, including 6502 (specifically, NMOS 6502 without decimal arithmetic, including unofficial opcodes), and a little bit of x86 stuff (although the modern x86 is very messy, I think), but also Z-machine and Glulx. I have also used MIX and MMIX assembly (and may use MMIX more if I would actually make a computer with it). And then some other programs (such as ZZ Zero, which is similar to ZZT) has its own kind of assembly language.
One feature not mentioned is the relative numbered labels such as 1H and 2H available in MIXAL and MMIXAL; you can then use 2F to find the next 2H label forward, or 2B to find the next 2H label backward. My own assemblers for Glulx and ZZ Zero support the similar feature too.
There's food for thought there. As part of my current project I wrote an assembler for 8-bit Avr. (on github https://github.com/Lerc/AvrAsm/ )
Part of my motivation for this was to have an assembler that ran in the browser (for my fantasy console that also runs in the browser), but another big part of it was to write an assembler designed to be more friendly to people writing assembly directly.
When I wrote 6502 asm I mostly did it from Supermon which is a no frills experience. It's nice to see the features that assemblers have now, I think I'll be implementing quite a few of those macros from this link in my own assembler.
Even back in the 1980s, author Lance Leventhal who wrote some great programming books on assembly always warned that productivity wise you write the same number of lines of code whether its 6502 or C.
As that 6502 example in the article shows, you don't great great productivity with assembly. And even macros don't improve on it that much.
One writes 6502 assembly by hand because there is really no alternative on that ISA. It does not make a good target for a C compiler, there are very few of them to begin with, and hand written 6502 (or 65816) is going to be better than anything produced by any compiler for it at this point in time.
On modern x86 it's not very useful at all anymore.
An ICC or LLVM backend with compiler intrinsics will get you quicker performance, with reduced maintenance and cost. Performance will also move over time with backend optimizations getting better.
You can still do it if you care about debug build performance.
I'm rarely keen on posting negatives on articles that clearly took a lot of time to make, but I think this requires a bit of correction.
I think this article is very, very simplistic. All of it relates to a 8 bits CPU that is 40+ years old.
I switched to HLL as soon as I could get my hand on a compiler, namely, UCSD Pascal at the time! Then the Pascal, then to C and then myriads of other languages. I covered 6502, Z80, 68k (all of them, to 68040), PowerPC (all of them from 601 prototypes to G5s), ARMs (more than I can count) and x86s (same).
True to be told, the assembly language I started with /helped a LOT/ with be becoming an efficient developer; a developer who understand what 'code' is being generated when he writes an expression, a statement, a loop, and one who understands what the runtime implication are for most of the 'sugar coating' HLL gives.
However, starting (a bit) with the 68k, then even more so with the PowerPC, it became pretty much impossible to write /from scratch/ an assembly equivalent that was QUICKER than the compiler generated code. That was 20+ years ago. DRAM latency happened, pipelining happened and SIMD happened.
Today, hand writing assembly is pretty much stupid on modern CPUs. Given the register files, timings, shadow registers, bus latencies etc etc the compiler will ALWAYS be better because there is so much criteria to think about when generating code...
I'm not saying that having the knowledge is not useful; the best use of assembly is to write some code il HLL, one that is supposed to be super-mega-critical-quick, then disassemble it and see how it looks. More often then not, you can't make it better than it is in situ -- most of the time you will gain is to prepare your data better, align it better etc etc -- basically, 'hinting' the compiler to do a better job. You can do serious code butchery like that, without a hint of assembler [0].
But really, I haven't written any assembly for /performance reasons/ in 15 years, and that was Altivec on PowerPC.
For 8 bits, it's all smooth as butter, but the article also doesn't take into account the massive progress in compilers; I'm the author of SimAVR [1] and I've seen my load of generated code for that CPU, and the GCC toolchain is /very hard to beat/ by hand these days.
[0]: critical audio loop on one of my old PCI card driver, converting float<->int, applying gain etc while using the register file to the max, and making most use of the pipelining of the G4 (at the time) https://gist.github.com/buserror/0a3a69cca927b8da6c9c7ee1605... -- note, the inner loop was generated by a script that was doing the cycle calculations (!)
> Today, hand writing assembly is pretty much stupid on modern CPUs
Yup. Explains all that neat hand-written AVX asm code in your video decoder, strcmp() implementation, lzma decompressor, utf8 parser, and the base64 decode logic in your browser.
A lot of people put in a lot of hard work so that you can have the cute thought that there is no more reason to write assembly. Many of them wrote your compilers, some of them wrote some of the logic I mentioned above. Quite sure that none of them appreciate being called "pretty much stupid".
This article is specific to the 6502 where the commonly used CC65 C compiler the author references produces much, much worse code speed wise than what you can with pure assembly. In that regard, the article is not simplistic in the least. Coincidentally, I messaged the author just yesterday about the for loop example to point out that it was generated without optimization. Even with optimization enabled, the code is still about 3 times slower than hand written assembly. I know this may not be typical for other architectures like AVR but it certainly is for 6502.
I used to do my production code exclusively in 6502 assembler, with some tools in P-system Pascal. As I would read in magazines about the C language I would try to imagine what the C compiler would generate for certain constructs, and I couldn't imagine it being efficient compared to other 8-bit processors. Then we decided to experiment (at the company) with C and got a compiler. I was right, the code was awful. It used exactly the idioms I thought I would use if I had to to it. I can picture a really top-notch compiler doing better (because I'm more familiar with optimization in compilers now), but sooner or later some of the quirks (like 8-bit index registers and only page 0 can be used for pointers) will catch you.
It's more like being an architect and working with bricks and cement yourself. The author's argument is that sometimes an "architect" should do that. Agree or disagree, but it's certainly not a truism.
An architect might in fact design something down to the brick-and-mortar level, such that someone else assembles it to the actual building. That architect is working with bricks and cement, effectively at the design level.
That's the same as using assembly language, rather than poking binary/hex values into memory.
"Jeff Laughton (Dr Jefyll on the 6502.org forum) says, "I recall hanging out with a programmer pal o' mine and a younger fella who was in college. The young fella was complaining, 'We have to take assembly language,' and Len corrected him immediately, saying, 'You get to take assembly language!'"
<g>