In my youth I wrote m68k assembly programs with tens of thousands of lines and speed optimized every section of the code, even initialization/cleanup executed exactly once. It was very very silly. It was a lot of fun.
#development #assembly #coding #programming
My inner Amiga fanatic thanks you and everyone like you.
In uni we had a couple assembly projects. It was a lot of fun. I also never cried so much out of frustration.
I also participated in very useless size/speedcoding competitions - some of them are still accessible from this old web page: https://amycoders.org/compo/
Note that some of the HTML is a bit broken, for example https://amycoders.org/compo/circlecompo.html - you can view source to see the full routine
#m68k #assembly #sizecoding #speedcoding
I still occasionally write some m68k code and apps. These are from 2024:
- Execute code in #amiga color registers: https://sintonen.fi/src/colexec/colexec.asm
- RXS-M-XS 32bit->32bit Permuted Congruential Generator: https://sintonen.fi/src/misc/pcg/_rand.asm
- Minimal modplayer (protracker music player): https://sintonen.fi/src/minimod/ (the replayer routine is mostly from Frank Wille however)I wrote loads of assembly programs, too, but only one stands out for crazy, stupid, and useless optimization.
I was trying to draw a spline on the screen. I took the algorithms from a scientific paper I got, and my C program was dead slow. The issue boiled down to solving an equation being the problem. I moved this subroutine from C to MC68K assembler and optimized the heck out of it, with no real change to the result: It still was dead slow. Whenever I changed a param, it redrew the line, and it took something like five or more seconds to do that. For a single spline.
So I dove down into the algorithm. What were they doing with that time-eating equation? Turned out this was about measuring the distance between two coordinates - and they did not even use Phytagoras for that, but something oddly complicated. I remember 15 or 16 MULS per call. With up to 70 clock cycles per MULS instruction, this burned.
I replaced this by a simple “is delta X and delta Y both in -1, 0, or +1 range” function, and suddenly the algorithm ran like a lightning bolt on steroides. I could move any defnition point around with the mouse, and the spline followed smoothly.
So it is nice to be able to optimize assembler, but with chosing the right algo, you can get way better than that.