For VU microcode programs, it is very often helpful if data is interleaved in VU mem. ie. instead of data being arranged in VU mem like:
XYZW |
XYZW |
XYZW |
XYZW |
UV |
UV |
UV |
UV |
normal |
normal |
normal |
normal |
it is often helpful if the data is arranged like:
XYZW |
UV |
normal |
XYZW |
UV |
normal |
XYZW |
UV |
normal |
XYZW |
UV |
normal |
This allows for only a single "read" pointer to be maintained, as well as allowing a number of subtle optimisations. It is also more intuitive, which is never a bad thing.
The problem with this however, is that it is not convenient to store the data in this order in main memory. As the VIF can automatically decompress data, it makes sense to store data in the most compressed format possible. And for colour data, the most sensible format is V4_8 [ie. 32bits per colour] or even V4_5 [16 bits per colour]. Obviously, however, this format really isn't appropriate for geometry- you really want 16 bits per component or better.
If we try to send our interleaved data to the VIF with a single UNPACK block, we find that it will only take one data format [eg. V4_16], so multiple compression formats are not possible. And if we use multiple UNPACK blocks, the data's not interleaved. The trick: to use multiple UNPACK blocks after a STCYCL VIFcode.
The STCYCL VIFcode can be used to tell the VIF to skip some addresses when it's unpacking data. This means that you can make the first block write to the first of every 3 memory addresses, the second block to the second of every 3 and the third block to the third. It's almost like they designed it like that ;)
STCYCL sets the WL and CL values of the VIFn_CYCLE register. The docs aren't exactly clear on exactly how to use these values, so here's what I think they mean [this understanding works for what we're doing here; any corrections to the address at the top please :)]
WL is the "Write Length". This is the number of qwords per write cycle that are written. CL is the "Cycle Length". This is the number of qwords in a write cycle. The concept of a write cycle is a tricky one and depends on whether WL>CL or not. If WL is greater, a "filling write" is performed. If CL is greater, a "skipping write" is performed [this is what we want]. If they are equal, the data is decompressed and written without bothering with cycles at all.
In skipping write, the VIF treats output as blocks of length CL. It writes WL qwords into the start of each block and ignores the rest. So if WL is 1, and CL is the number of qwords per vertex, the VIF will interleave our data for us! The only other thing we need to do is start each unpack operation at a valid address. So, we could unpack the XYZ to address 1, the UV to address 2 and the normal to address 3 [for example]. That will give the interleaved pattern above.
So, the data stream could look like:
UNPACK size=1, addr=0, flg=1, format=V4_32 |
GIFtag |
STCYCL wl=1, cl=3 |
UNPACK size=4, addr=1, flg=1, format=V4_16 |
XYZ x 4, 16 bits per component |
UNPACK size=4, addr=2, flg=1, format=V4_8 |
RGBA x 4, 8 bits per component |
UNPACK size=4, addr=3, flg=1, format=V4_16 |
normal x 4, 16 bits per component |
This would store the following to VU mem:
GIFtag |
XYZW |
RGBA |
normal |
XYZW |
RGBA |
normal |
XYZW |
RGBA |
normal |
XYZW |
RGBA |
normal |
Lovely.
Sony's VU assembler has some special syntax to make this simpler. In order to get the assembler to create the raw data stream for you, you could use something like the following:
unpack[r] 4, 4, V4_32, 0, * [GIFTAG] .endunpack unpack[r] 1, 3, V4_16, 1, * [XYZ DATA] .endunpack unpack[r] 1, 3, V4_8, 2, * [RGBA DATA] .endunpack unpack[r] 1, 3, V4_16, 3, * [NORMAL DATA] .endunpack
Note that the assembler inserts an STCYCL VIFcode as well as an UNPACK for each instance of the "unpack" directive.