5:  Instruction set
Now you're going to need the VU User's Manual.  Don't worry about the size of the beast, we're only interested in the micromode reference, i.e. Chapters 3 and 4, starting on page 29, working through to page 196.

For now, don't worry about the distinction between upper and lower instructions.  VCL takes care of instruction pairing for us, so we don't actually have to know whether an instruction is upper or lower.  However, this knowledge greatly helps when it comes to optimising areas of code that are running too slowly.  For this series, that's a little way into the future, so if you're using VCL, forget I ever mentioned that upper and lower instructions can run at the same time.

If you haven't programmed in assembly before, you may well be surprised at how little the VU can actually do.  For instance, there isn't even an instruction to put the value zero in a register, in 4: Making your code file, I showed you the instruction iaddiu iKick, vi00, kGifpacket which was a fancy way of saying exactly that.

Before I get into what instructions are available, I need to explain registers.  A register is a place in the VU where a number can be stored.  Almost all of the instructions shown here act on registers, of which there are 16 integer registers (called vi00 to vi15) and 32 floating point registers (called vf00 to vf31).  What I'm trying to say is that in assembly you can't add QWord 1 to QWord 2 and put the result in QWord 3, instead you have to load QWords 1 and 2 into registers, add the values in the registers together, and store the result back into QWord 3.  Remember those three lines we added to the basic sample to move the vertex?  That's what we did there, load, add, store.

There are also some special registers, called ACC, I, Q, R and P, and they each have very specific uses, which I'll explain as they arise.

So, what can go into a register?  Well, a floating point register can store an entire QWord of memory, and thankfully for us we can refer to each of the four 32-bit fields (x, y, z, and w) separately, or together.  An integer register can only store 1 single field of 16 bits.  Just because I called vi01 an integer register doesn't mean it *has* to store an integer, those 16 bits could mean anything.  However, instructions that use integer registers expect integers to be in there, don't ask me what'll happen if you tried to add two floating point numbers in integer registers!

There are 2 exceptions, vi00 and vf00, the first integer and float registers respectively.  You may remember that from last lesson that vi00 *always* contains 0.  That's because we can't ever write into vi00, similarly, vf00 always contains (0, 0, 0, 1).  Note: When I want to refer to an entire floating register, or a QWord of memory, I'll use this (x, y, z, w) 4-vector style notation, so (0, 0, 0, 1) means the vf00.x = 0, vf00.y = 0, vf00.z = 0, and vf00.w = 1.

On to the instruction set then.  There are 127 different instructions that VU1 can perform, but a lot of them belong to big families.  By this I mean that although you might consider ADD to be one instruction, but actually there are 11 different add-type instructions, depending on where the numbers you want to add are coming from.  In reality, those 127 instructions comprise a much smaller set of abilities.

Let's tackle them one family at a time:


Floating Point Calculations (ADD, SUB, MUL, MADD, and MSUB)

The first 40 instructions cover simple floating point calculations.  With all of these instructions, you can choose which of the x, y, z, and w fields get operated on.  If you like, you can operate on all 4 fields simultaneously.  Yes, the VU can add 4 32-bit floating point numbers together simultaneously, taking the same amount of time as if only 1 field was used.

For example, if vf01 contains (1.0, 2.0, 3.0, 4.0), and vf02 contains (0.1, 0.2, 0.3, 0.4) then ADD vf03, vf01, vf02 will result in (1.1, 2.2, 3.3, 4.4) in vf03, whereas ADD.xyz vf03, vf01, vf02 would result in (1.1, 2.2, 3.3, 4.0).  SUB and MUL are the subtract and multiply operations respectively, as you would expect.

One of the flavours of ADD allows for a broadcast field to be used, and again, an example would come in handy.  With vf01 and vf02 as above, ADD vf03, vf01, vf02.x would result in (1.1, 2.1, 3.1, 4.1) being stored in vf03.  In other words, each field of vf01 will have the x field of vf02 added to it.

Read the description of the ADD instruction on P65 of the VU User's Manual, and pay particular attention to the "Example" section at the bottom of the page.  There you find an example instruction together with the effect it has.

Also, we can combine operations with the MADD and MSUB instructions.  These operations take the number currently present in the ACC register (A special floating point register), and add or subtract a product from it.  An example will clarify things a bit:

If ACC=(10, 20, 30, 40), vf01=(2, 2, 2, 2) and vf02=(4, 0.5, 2, 3.5), then MSUB vf03, vf01, vf02 would result in (10-2x4, 20-2x0.5, 30-2x2, 40-2x3.5) = (2, 19, 26, 33) being stored in vf03.  Why would we want to do something like that?  Hold tight, this lesson is an overview of the available instructions.  We'll see some applications next lesson.

Note:  In proper VSM, instructions have to be specified differently if the I, Q, or ACC registers are involved, or if a broadcast field is used.  VCL simplifies all these different instructions by working out for you which instruction you mean.  SO, to use any of the floating point addition instructions, just use the mnemonic ADD.


Other Floating Point operations (ABS, DIV, MAX, MINI, OPMULA, OPMSUB, SQRT and RSQRT)
As well as addition, subtraction and multiplication, we have a few other operations.  We can take the ABSolute value of a number (i.e. make it positive if it isn't already), find the largest (MAX) or smallest (MINI) of 2 numbers, calculate vector products (OPMULA and OPMSUB) and perform division (DIV).  I've separated division from the other 3 basic operations because it works a little differently.  For more information, see the appropriate instruction reference pages (There is a contents list at the start of the VU manual!).

The result of a DIV operation has to go into the Q register, and can only operate on one field at a time.  Trying to place the result in a normal floating point register is an error, so is trying to specify more than one field to divide.  Divisions also take longer than adds, subtracts, and multiplies. Grab a piece of paper and work out 2.5 multiplied by 7.4, then work out 2.5 divided by 7.4.  See which one takes you longer!

We can also take the square root (SQRT) of a floating point number, or find the reciprocal (1 divided by) of the square root (RSQRT), but these results also have to go to the Q register.  The SQRT takes the same time as a DIV, but the RSQRT is almost twice as long!

The VU won't normally wait around for these long instructions to finish writing to the Q register, if you want to force it to wait until the instruction is finished, you must use the WAITQ instruction.


Conversion operations (FTOI, ITOF)
You may recall in 3: Making your data file that we had to shift our X and Y co-ordinates left by 4 bits because the GS expects them in 12:4 fixed point format.  Well, thankfully, the VU is quite happy to convert numbers from integers into floating point numbers for us.  All we have to do is specify the number of bits after the binary point there should be, 0, 4, 12 or 15.  The one you'll see me make the most use of is FTOI4, which converts a floating point number into an integer with 4 fractional bits... perfect for sending to the GS!


Clipping Judgement (CLIP, FCAND, FCEQ, FCOR, FCSET, FCGET)
The CLIP instruction lives in a family of it's own, but you'll need the various FC instructions to work with it.  The CLIP instruction compares each of the x, y, and z fields of a register with the w field of another (potentially the same) register, and sets some flags if any of x, y, and z are bigger than w, or smaller than -w.  The FC commands are then used to check the status of those flags.


Integer Calculations (IADD, ISUB, IAND, IOR)
When it comes to the integer registers, we can only add, subtract, and perform logical ANDs and ORs.  If we wanted to do an integer multiplication, we'd have to load the numbers into a floating point register, use one of those conversion instructions, operate on them as floats, then convert them back to integers.


Loading and Storing values (LQ, SQ, ILW, ISW, LOI)
These are the instructions that go fetch values from VU data memory and put them into registers, or vice versa.  We can load a QWord at a time, suitable to go into a float register, or just one 32-bit field at a time, suitable for an integer register.  Just remember that we load from memory into a register, and we store back into memory.

Hang on, did I just say that a 32-bit field could be loaded into an integer register?  Well, yes I did, but the upper 16 bits are silently discarded as if they were never there.

The LOI instruction puts a number in the I register for us, which we can use with various flavours of the ADD instruction and suchlike.  (Technically, the LOI is a pseudo-instruction, but I don't care enough about the difference to bother you with the details :)


Register Transfer (MOVE, MFIR, MTIR, MR32, MFP)
This set of instructions is here to move values around from float registers to float registers (MOVE or MR32), from integer registers to float registers (MFIR), from float registers to integer registers (MTIR), or from the P register (MFP).  How do values get into the P register?  See the Elementary functions below.

MR32 does a quick alteration as the values are moved, in that all the values get moved round a field, so what was in the x field end up in the w field, y goes to x, z goes to y, and w goes to z.

Be warned that in moving from a 32-bit field of a float register into an integer register using MTIR, the top 16 bits get silently discarded.


Flags (FC*, FM*, FS*)
We've already met the FC* instructions which are used to check the clipping flags after a clipping instruction, but there are also FM* instructions to check the MAC flags, and FS* instructions to check the Sticky flags.  Flags are a topic I'll come on to a little later


Branching (B, BAL, JR, JALR, IB*)
If our program started at instruction 0 and flowed right through to the end in one straight line, that's be pretty boring, so we have a set of instructions that allow us to change program flow by branching or jumping to different instructions.  We can branch unconditionally (B), we can save our place to come back to later (BAL), or we can decide to branch or not based on whether certain pairs of registers are equal or not, and whether certain register are positive or negative.

These instructions can be put together in lots of imaginative ways to look like commands you're used to in other languages, but I'll get to that later.


Elementary functions (E*)
These last three sections are a little more exotic than the others, and signify command you might not be used to finding in assembly instruction sets (They were certainly never there on the Z80!)  We have single commands to execute what are termed Elementary functions, and with them we can calculate the sum of the squares of the fields of a register, the reciprocal sum of squares, the length or reciprocal length of a vector, the inverse tangent of a number or a ratio, sines (not cosines) and exponentials.

However, be warned that just because they are represented as one instruction, it doesn't mean that they execute at the same speed as an ADD!  These all take different amounts of time to complete, and they all store their results in the P register.  As with the Q register, the VU won't wait for the P register to be written to unless you use the WAITP instruction


Random Numbers (RINIT, RGET, RNEXT, RXOR)
The VU is capable of generating it's own random numbers!  The 4 instructions give us the ability to set the random number seed, pull a new or old random number out of the hat, and for some reason XOR the R register.  Why we'd want to do that I don't know.  The random numbers are generated in M-series, lie between 1.0 and 2.0, and the R register is used to keep track of the random seed.

Note that there is another way to get random numbers... the harness gives them to us!  The harness automatically puts a random number in the z field of the last QWord of memory, number 1023.  Use the R* instructions with a constant seed to get predictable random numbers, use the last QWord of memory to get unpredictable random numbers.


External Control (XGKICK, XTOP, XITOP)
Finally, these 3 instructions allow the VU to interact with other parts of the PS2.  We've already met XGKICK, that one kicks off a GIFpacket to the GS.  XTOP and XITOP are used to read from the TOP or ITOP register of the VIF, but as the harness doesn't use those registers, they're a bit less than useless to us.  Ignore those instructions at will!


So, that's an overview of the instructions we have to play with.  "What, no FOR loops?", I hear you cry.  "No IF statements,  or switches?"  Firstly, calm down, this is only the set of things that can be done *in one instruction*.  With a combination of instructions we can achieve a lot, lot more, and with 16kb of room to put those instructions in, the possibilities are just about infinite!  The skill of this competition is finding those tricks, and creating something visually stunning.

Assimilate this information, you need a good grasp of these building blocks before we can put them together to make anything substantial out of them.

Prev - 4: Making your code file Next - 6: Using the building blocks