9: Uploading Image data
The Graphics Synthesiser is so called because it creates graphics out of the primitives you ask it to draw.  You send a GIFpacket with the instructions to draw a series of triangles, and you get them.  Many different things are done for you by the GS, such as Z-buffering to ensure things are drawn in the correct order, Antialiasing to reduce the jagged appearance of lines, even elementary clipping.  However, sometimes you may want to do something that the GS simply cannot do for you, for example you may wish to use a shading method different to the standard flat or Gouraud methods.

If that is the case, you must use a different method to tell the GS what to draw.  You must simply draw the image yourself, and transfer it into the part of GS memory reserved for the frame buffer.  This lesson is about the transfer.

There is another use for the upload of image data, and that is in texture mapping.  The GS must know about a texture before you can draw using it, right?  This lesson deals with the upload of the image data, 10: Texture Mapping deals with actually drawing textured primitives.

Pick a format, any format
There are a number of different pixel formats we can choose for our image transfer.  I would normally recommend the use of full on 32-bit colour, but in this VU competition we are quite severely limited by memory, so sometimes you're going to be better off using 24, 16 or 8 bit colour.  I'm going to teach you how to do a 32-bit colour transfer first, then let you figure out the details of changing down to 24 or 16 bits.  8 bits is another story though, as a further level of indirection is used, the CLUT.

In formats above 8 bit colour, you directly specify the colour of each pixel, with a certain numbers of bits allocated to each of the red, green and blue components.  In 8 bit colour or below, the value for each pixel instead specifies an entry in a table of colours, called a Colour Look Up Table, or CLUT.  More on that later, let's get an image uploaded!

Setting up the transfer
There are two steps needed to actually make an image transfer, and we can use a GIFtag to accomplish each task.  The first tag consists of some setup details, telling the GS how the transfer is going to happen, and the second contains the actual data to upload.

To access the necessary registers to set up the transfer, I have to tell you about a little trick called the A&D register.  Remember in a GIFtag you can specify a list of registers to send data to, and the number of times to loop through the data, in NLOOP and NREG?  Well, what you may not have noticed at the time is that you only use one hex digit to specify each register, and that only allows for data to be sent to one of 16 registers.  However, the GS has 54 registers, so how do you get to them?

The answer is a secret in PACKED mode, there is a pseudo-register called the A&D register.  You send data to it in packed mode in exactly the same way as you would to the RGBAQ or XYZ2 registers we've been using, only the top 64 bits consist of the address of the real register you want to send data to, and the lower 64 bits consist of the actual data to be sent.  Think of the Address and Data (A&D) register as like a postman, you feed it data and an address, and it posts it off to the correct register for you.

Our first GIFtag has the job of setting 4 registers, BITBLTBUF, TRXPOS, TRXREG and TRXDIR, at addresses 0x50, 0x51, 0x52 and 0x53 respectively.  None of these are directly specifiable in the GIFtag, so we have to loop through the A&D register 4 times.  NREG = 1, NLOOP = 4.  EOP = 0 as we're going to send the image data immediately afterwards, FLG = 00 for packed mode, as that is the only mode that the A&D register is available in.  We're not sending any actual primitives, so the PRIM field and the PRE field both contain 0.  Finally, the 'address' of the A&D register is 0x0e.

This bit of code (which should go into gen_data.c) will set up our image transfer then.  The transfer we will set up will be for an 8x8 texture defined in 32-bit colour, i.e. PSMCT32 pixel mode.

qword_t reg;

gifpacket_addgsdata(&gifpacket, GIF_SET_TAG(NLOOP(4), EOP(0), PRE(0), 0, GIF_PACKED, NREG(1)));
gifpacket_addgsdata(&gifpacket, AD_REG);
reg.ul64[0] = GS_SET_BITBLTBUF(0, 1, PSMCT32, 0x118 << 5, 1, PSMCT32);
reg.ul64[1] = BITBLTBUF_REG;
gifpacket_addgspacked(&gifpacket, reg.ul128);
reg.ul64[0] = GS_SET_TRXPOS(0, 0, 0, 0, 0);
reg.ul64[1] = TRXPOS_REG;
gifpacket_addgspacked(&gifpacket, reg.ul128);
reg.ul64[0] = GS_SET_TRXREG(8, 8);
reg.ul64[1] = TRXREG_REG;
gifpacket_addgspacked(&gifpacket, reg.ul128);
reg.ul64[0] = GS_SET_TRXDIR(TRXDIR_TO_GS);
reg.ul64[1] = TRXDIR_REG;
gifpacket_addgspacked(&gifpacket, reg.ul128);

Note: The GS_SET_* macros, and the *_REG macros are defined in the updated defines.h file available at my project page.  Let's go through all those parameters!

Firstly, the BITBLTBUF register.  The first 3 parameters are about the source of the image transfer, and the last three are about the destination.  We're transferring into the GS from somewhere else, so there's no need to set the source buffer pointer (parameter 1).  We're going to do the transfer in a buffer of width 64 pixels.  I don't think the buffer width parameter is important in Host->GS transfers, as long as you use the same buffer throughout.  And, finally, the incoming data is in the PSMCT32 format.  Then, the destination pointer is 0x118, we have the same buffer width of 64 pixels, and we are storing the data in PSMCT32 format.

Note:  I only know that the destination should be 0x118 because I looked at the source code of the harness.  The harness has 4 buffers of 640x224 at the start of GS memory, 2 for frame buffers, one for the Z buffer, and one for the merged in help screen.  That leaves the area of memory from 0x118 onwards completely untouched.  Use it at your own will.  If my calculations are correct, you still have about 1800k of the 4Mb of GS ram there to play with.  That's a nice lot of room to generate textures and CLUTs into.

Then, we set the "transmission position".  This register only really has any importance when transferring between areas of the GS, so just set all the parameters to zero.  You probably don't need to set the register at all, but I like to just for safety :)

TRXREG contains the size of the actual data we're sending, which is an 8x8 texture, so we set both the height and width of the transfer to 8.  Finally, we tell the TRXDIR register that the transfer is going into the GS from outside.  Writing to this register has the hidden side effect of actually starting the transfer, so it must be the last register set.

I've deliberately set EOP to 0, so that the following image data is sent to the GS in the same GIFpacket.  In your VCL code, all you need is an xgkick to send the GIFtag off, and you're done.  Of course, you could save a few microseconds by only uploading the texture on your very first frame, and setting a value in memory to say that the textures have been uploaded, then every subsequent frame use a conditional branch to check whether textures have been uploaded, to save duplicating your efforts every frame.  Just a thought.  (Ever noticed how some demos seem to freeze for a little while before starting, and you can't even switch demo for a few seconds?  What do you think they're using those few seconds for?)

Writing the image
The GIFtag which follows is easy to set up, because in IMAGE mode, nearly all of the fields are ignored!  All we need to specify is FLG = 3 to get us into image mode, NLOOP set to the number of QWords we're going to send, and EOP=1 because we're done with the image upload after this.  No need for the PRE and PRIM fields to be set, nor the NREG and REGS.  An 8x8 texture in PSMCT32 mode takes up 16 QWords (See page 80 of the GS manual for more on transmission data format).

I'll leave you with a piece of code that generates a very boring image for you to upload, see if you can't do something better.  How about automatically getting your values out of a BMP/JPG/PNG file?

gifpacket_addgsdata(&gifpacket, GIF_SET_TAG(NLOOP(16), EOP(1), PRE(0), 0, GIF_IMAGE, NREG(0)));
gifpacket_addgsdata(&gifpacket, 0); //Even though we're not using registers, we still have to finish the GIFtag!
for(i=0, j=0; i<16; i++)
{
    colour.ui32[0] = (j++) | 0x80000000;
    colour.ui32[1] = (j++) | 0x80000000;
    colour.ui32[2] = (j++) | 0x80000000;
    colour.ui32[3] = (j++) | 0x80000000;
    gifpacket_addgspacked(&gifpacket, colour.ul128);
}

Note: Thanks to Sauce of the SPS2 project for spotting a bug in the above which was causing 4 times too much data to be generated.  To clear up a possible area of confusion, each QWord of the above code represents a whole colour, not individual components.  Thus, the code above should generate a red gradient texture for you.  In earlier tutorials, colour.ui32[0] would represent the RED component of a colour, in this part of the code, it represents an entire colour in ABGR format.  Thus, the OR 0x80000000 part is to set a medium alpha value to each texel, which could come in handy later.

Also, as this is a texture transfer, we need to alert the GS that a new texture has been uploaded by accessing the TEXFLUSH register before sending new primitives in for texture mapping.  For more information, see 10: Texture Mapping.  If we were using a pixel mode of 8-bits or below, we would also need to upload a CLUT, using a very similar register setting sequence.  I'll leave you to figure out those details for yourself, if you want to use such textures.  Some would argue that unless your texture contains more than 256 pixels, it can't possibly need more than 256 colours, so you're just wasting space if you use a colour mode higher than 8-bits.  As usual, it's up to you. (See the note below)

Other transfers
As you may have noticed form the above register setting exercise, it's possible not only to upload images to GS memory, but to download them back again, and also to copy bits of GS memory into other bits of GS memory.  Why would you want to do any of this?  Well, this is where your imagination comes in.  You could, for instance, draw only to the left hand half of the screen, and use an image transfer to copy it backwards to the right hand side of the screen every frame (ooh, that's a good one, I might use that in my own demo!)

Initially, I thought you'd be able to download images from the GS to the VU memory.  It turns out that this is not possible, as amongst other things, you need access to the privileged registers of the GS, which you simply don't have from VU microcode.

A bit of a cheat?
Some people have wondered how certain demos, notably VUniverse, fit all of their data into the available 16kb of VU data.  They've accused people (light-heartedly) of cheating, wondered if the rules are different between different competitions, but none of this is true.

The answer, as I now hope you're aware, lies in GS ram.  Simply put, although you could load your texture into VU data, then blast it off to the GS and forget about it, you could write a piece of VU code that dynamically generates your texture data for you, and then blasts it off to the GS.  If you can write 1kb of code that produces 1Mb of texture data, that's perfectly fine!  There's a whole 4Mb of memory to play with is the GS, once you've taken out the two framebuffers, the Z buffer and the help screen buffer, we still have just under 2 Mb that's free for our own use.  That's how people manage to fit so much in to their competition entries.

Am I going to tell you how to dynamically generate textures?  Maybe, one day, when I figure it out myself.  Until then, you're on your own!

Prev - 8: What's in that last QWord? Next - 10: Texture Mapping

A note on memory usage
My comment above about using a colour mode capable of more than 256 colours for a texture 256 pixels or less seemed like a good idea at the time, but it generated quite a discussion on the #sps2 channel.  I'll present here the main points of the argument, and let you decide which way round you should go.

Bear in mind that on 8-bit colour, you need a CLUT of 256 colours in at least 16-bit colour.  That means, your CLUT will take up at least 512 bytes of GS ram, at most 1k.  So, a single 16x16 texture in 32-bit mode takes up 1k of memory, where in 8-bit mode it takes up only 256 bytes.  You think you have saved yourself 768 bytes, but if your CLUT is in 32-bit colour you've actually wasted 256 bytes!

On the other hand, I'm talking about memory within the GS there, which isn't that limited.  What is limited is the amount of space in your VU data ram, just 16k.  IF you can come up with a nice routine to generate your CLUT, then you're saving a valuable chunk of VU data memory.  Ever better, if you can use the same CLUT for two textures, you're really saving yourself some memory.  If you can drop down to several textures, each of only 4-bit colour, your CLUT cost drops tremendously.

A general rule of thumb seems to be: For larger textures, use lower bit colour, as the CLUT takes up little room compared to the texture.  For smaller textures, don't immediately assume that lower bit modes will save you memory.