How does avl affect register renaming? (there's the edge-case of vl=0 that is horrifically stupid (which is by itself a mistake for which I have seen no justification but whatever) but that's probably not what you're thinking of?) Agnostic mode makes it pretty simple for hardware to do whatever it wants.
Over masks it has the benefit of allowing simple hardware short-circuiting, though I'd imagine it'd be cheap enough to 'or' together mask bit groups to short-circuit on (and would also have the benefit of better masked throughput)
Cray-1 (1976) had VL, though, granted, that's a pretty long span of no-VL until RVV.
Was thinking of a shorter avl producing partial results merged into another reg.
Something like a += b; a[0] += c[0]. Without avl we'd just have a write-after-write, but with it, we now have an additional input, and whether this happens depends on global state (VL).
> Agree agnostic would help, but the machine also has to handle SW asking for mask/tail unchanged, right?
Yes, but it should rarely do so.
The problem is that because of the vl=0 case you always have a dependency on avl. I think the motivavtion for the vl=0 case was that any serious ooo implementation will need to predict vl/vtype anyways, so there might as well be this nice to have feature.
IMO they should've only supported ta,mu. I think the only usecase for ma, is when you need to avoid exceptions. And while tu is usefull, e.g. summing am array, it could be handled differently. E.g. once vl<vlmax you write the summ to a difgerent vector and do two reductions (or rather two diffetent vectors given the avl to vl rules).
What's the "nice to have feature" of vl=0 not modifying registers? I can't see any benefit from it. If anything, it's worse, due to the problems on reduce and vmv.s.x.
"nice to hace" because it removes the need for a branch for the n=0 case, for regular loops you probably still want it, but there are siturations were not needing to worry about vl=0 corrupting your data is somewhat nice.
Huh, in what situation would vl=0 clobbering registers be undesirable while on vl≥1 it's fine?
If hardware will be predicting vl, I'd imagine that would break down anyway. Potentially catastrophically so if hardware always chooses to predict vl=0 doesn't happen.
> Agree agnostic would help, but the machine also has to handle SW asking for mask/tail unchanged, right?
The agnosticness flags can be forwarded at decode-time (at the cost of the non-immediate-vtype vsetvl being very slow), so for most purposes it could be as fast as if it were a bit inside the vector instruction itself. Doesn't help vl=0 though.
RVV does have significant departures from prior work, and some of them are difficult to understand:
- the whole concept of avl, which adds complexity in many areas including reg renaming. From where I sit, we could just use masks instead.
- mask bits reside in the lower bits of a vector, so we either require tons of lane-crossing wires or some kind of caching.
- global state LMUL/SEW makes things hard for compilers and OoO.
- LMUL is cool but I imagine it's not fun to implement reductions, and vrgather.