What's bad are the extremes, as indicated by commandblockguy. In fact, that often goes beyond optimization of computer programs, but that's an unrelated topic we really shall not pursue here
On the TI-68k series, I've witnessed a number of times extreme speed optimization blowing size and/or memory consumption out of any reasonable proportion (e.g. with preshifted sprites, gotta
really need them...), or extreme size optimization having significantly negative speed impact, e.g. because of multiplication or division instructions being introduced in situations where few enough shift, add and sub instructions are faster.
One of the few areas where people have historically tolerated a significantly unrolled loop was the software grayscale routine (
https://github.com/debrouxl/gcc4ti/blob/experimental/trunk/tigcc/archive/gray.s below label __gray_perform_copying, that's the late version which even uses the stack pointer as data register since that part of the code has always run with all interrupts disabled anyway), since any cycle wasted there is taken from cycles usable for user programs. Too bad I did not realize much earlier that the rest of this software grayscale code was ripe for size optimization even in the absence of a memory consumption optimization for old calculator models...
However, for the likes of
https://github.com/debrouxl/ExtGraph/blob/master/src/lib/Misc/FastCopyScreen.s , it makes no sense to make the routine nearly 10x larger by fully unrolling the loop instead of copying a 10th of the screen size 10 times: the speed cost of copying with 1 fewer register because of the counter, and branching, is almost negligible.