CFYC's Guide to PS2Linux Programming

CFYC's Guide to PS2Linux Programming, Part #1
Lionel Lemarié - hikey@playstation2-linux.com
Last update: 2002-08-13

This is Part #1 of a series of guides to using Linux (for Playstation 2) to develop small/simple graphical applications. This series aims to make the users aware of the PS2 architecture and to provide a solid ground of understanding as to how to design an application to take advantage of this hardware.

This document is currently in the process of being written, it is online for testing purposes only and is not the full version.
You can find the latest version here: http://playstation2-linux.com/download/cfyc/guide_ps2_programming_01.html.

Please do check out the Tips & Tricks page, as I will populate it with a list of what you absolutly have to know, and try to clear up some misconceptions I have witnessed so far.

Should you have any comments, feedback, or insults, please do not hesitate to post them in the forums in the CFYC project or to contact me directly by e-mail.

Chapter #1 - Introduction to the Guide - the general idea

This Guide is aimed at people wanting to get the best out of the kit. A good application has to be designed properly in order to take advantage of the hardware. In the following chapters, we will cover:
- Setting up the devices (GS, VUs, SPR, pads)
- Loading a texture in VRAM
- Building a DMA packet to send a sprite via PATH3, using the texture.
- Using scratchpad for better memory access performance.
- Coding and uploading a VU1 micro-program.
- Building a DMA packet to send geometry via PATH1, using the texture.
- displaying text on the screen.
- Using VU0 to offload calculations from the main CPU.
- Using Interlock to speed things up (VU0/CPU).
- Using interrupts (if it turns out to be possible at all).

Each part of the Guide includes a step by step chapter, concretely demonstrating how to use what is explained.

I strongly recommend you make a local copy of the pdf hardware manuals from disc 1 and to keep them handy, as I will reference them as much as possible.

The general idea is that at all times, the different parts of the PS2 should be kept busy. This is not an easy task to design the application taking everything in consideration, so you have to study the general architecture of the console before even attempting to do anything.
The console has the following devices: (eeuser_e.pdf, p16)
- GS, which is the graphics chip. It draws 2D primitives (points, lines, triangles, strips and sprites). There are three PATHs to reach it:

- PATH1: Mainly intended for geometry. You send the 3d data of your model to VU1, which transforms it and sends it to the GIF via a dedicated bus. You want to use it as much as possible.
- PATH2: Not interesting for now. Mainly used to grab the VRAM back to main memory.
- PATH3: Mainly intended for texture and pre-transformed geometry uploads (HUD, text, ...).

- VUs, which are 2 additional processors with custom sets of instructions especially targeted at vector calculations.

- VU0: Ideally used for vertex deformation (e.g. skinning), physics, possibly AI, particle systems...
- VU1: Ideally used for transform and lighting and special effects (e.g. cartoon rendering).

- ScratchPad, which is a small fast access memory device. You want to use it as much as possible to overcome data cache misses problems which are a killer on PS2Linux.
- DMAC, which is an intelligent memory transfer manager. It is mandatory that you get familiar with it, as this is the main reason why the PS2 is so fast.

It is important you give an extra effort in understanding the function of each device, before you even learn how to use them.

Little design example (1)
Imagine you want to write a little textured rotating cube demo over a textured background.

CPU The CPU side in that case won't have too much to do, mainly setting up buffers to be sent.
VU0 VU0 will do the rotation matrix calculations, camera movements, that kind of things.
VU1 VU1 will do the transform and lighting. Don't even think about doing on the CPU, it's a waste of time.
ScratchPad You can store your matrices, model data, temporary variables in there. ScratchPad is generally good for you, it doesn't have any cache issues and Linux doesn't mess with it...

So basically,
- you initialise the console,
- prebuild a GIF packet once and for all to draw the background sprite (we'll see later for optimisations here) in SPRmem,
- prebuild a DMA packet once and for all for your model in SPRmem,
- prebuild a GIF packet once and for all for the textures in main memory,
- prebuild a DMA packet once and for all for the tranformation matrices, the data will be updated on each frame,
- then in a loop,

- wait for start of VSYNC,
- DMA send the background sprite in normal mode (eeuser_e.pdf, p44), don't wait for completion,
- update the rotation matrix with VU0,
- wait for completion of DMA transfer of the bg sprite,
- DMA send the tranformation matrices to VIF1,
- DMA send the model data to VIF1 in source chain mode (eeuser_e.pdf, p45),
- DMA send the textures to the GIF for the next frame.

Little design example (2)
Now, for the hell of it, you want to make a car game, GTA style, 2D and everything.
I am not saying that this is the greatest design ever, it's just showing the way you should think about those things.

CPU First of all, it will obviously deal with the input from the player. Then visibility test (e.g. quadtree), stitch a bunch of prebuilt DMA packets together sorted by texture usage. A bit of IA as well, to make it sound cool.

VU0 All the (visible) cars DMA packets are sent to VU0 to update position, direction, speed, all that. Collision detection could be done here, but that means that you have to find out roughly potentially colliding cars on CPU, because chances are they will not all fit in memory at once.
VU1 VU1 will do procedural terrain generation while the CPU and VU0 are busy doing physics (that's a nice name for a textured grid, isn't it ?). Then it will generate triangle strips from the position and direction of each car and send a few particles, generate pedestrians, and other eye candies.
ScratchPad You can use the SPR to efficiently exchange data between CPU and VU0 at high rate, without having to deal with cache issues.

So again, basically,
- initialise the lot,
- load all the textures in main memory, in DMA packets ready to send,
- prebuild DMA packet that will hold the data for the cars, making structures with pointers to the data inside the packet,
- prebuild some separate packets that VU1 will use to generate the road grid, they're separate from each other so you can do visibility tests on them,
- then in a loop,

- read input, update stuff accordingly,
- test visibility for the terrain and cars,
- send visible textures (see note),
- send the bits of roads to VU1, do not wait for completion,
- work with CPU and VU0 coupled (possibly interlock) to work out new positions for the cars, collisions and everything,
- wait for terrain rendering to finish,
- send the cars, grouped by texture, VU1 will create a quad (tri-strip) for each car,
- prepare some eye-candy on CPU,
- wait for the cars to be rendered, send the eye-candy.

note: There is room from improvement here (and everywhere, but particularly here). It is actually possible to send the textures via PATH3 and send some geomatry via PATH1 at the same time. The main problem is to synchronise both, and it will not be addressed here.

Chapter #2 - Setting the devices

Before you can do anything, you have to open the different devices to get access to the processors, scratchpad, controllers...
Coming pretty soon.

Step by step, Part #1

Setting up the devices

Source taken from harness.c in the base code (not available yet).

Open the pad devices to be able to read the data with a read command later.

pad_fd[0] = open( PS2_DEV_PAD0,O_RDWR ); pad_fd[1] = open( PS2_DEV_PAD1,O_RDWR );

if (read(pad_fd[id],pad,32)!=8) pad->button = 0xFFFF;

Open the VU0 device, it needs to be done before VU1. Then mmap it so we can used it as normal memory, using the pointers vu0mem and vu1mem. Don't hesitate to do a man mmap to get more information.

vu0_fd = open( PS2_DEV_VPU0,O_RDWR ); vu1_fd = open( PS2_DEV_VPU1,O_RDWR );

vu0mem = mmap( 0, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, vu0_fd, 0 ); vu1mem = mmap( 0, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, vu1_fd, 0 );

Open the scratchpad device, and mmap.

spr_fd = open( PS2_DEV_SPR, O_RDWR );

sprmem = mmap( (void*)0x70000000, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, spr_fd, 0 );

Open the GS device, it is then used with IOCTL to set the different registers.

gs_fd = open( PS2_DEV_GS, O_RDWR );

err=ioctl( gs_fd, PS2IOC_GSCREENINFO, &old_screeninfo);

Open the event device, so we can wait for vsync.

event_fd = open( PS2_DEV_EVENT, O_RDWR );

CPU	The CPU side in that case won't have too much to do, mainly setting up buffers to be sent.
VU0	VU0 will do the rotation matrix calculations, camera movements, that kind of things.
VU1	VU1 will do the transform and lighting. Don't even think about doing on the CPU, it's a waste of time.
ScratchPad	You can store your matrices, model data, temporary variables in there. ScratchPad is generally good for you, it doesn't have any cache issues and Linux doesn't mess with it...

CPU	First of all, it will obviously deal with the input from the player. Then visibility test (e.g. quadtree), stitch a bunch of prebuilt DMA packets together sorted by texture usage. A bit of IA as well, to make it sound cool.
VU0	All the (visible) cars DMA packets are sent to VU0 to update position, direction, speed, all that. Collision detection could be done here, but that means that you have to find out roughly potentially colliding cars on CPU, because chances are they will not all fit in memory at once.
VU1	VU1 will do procedural terrain generation while the CPU and VU0 are busy doing physics (that's a nice name for a textured grid, isn't it ?). Then it will generate triangle strips from the position and direction of each car and send a few particles, generate pedestrians, and other eye candies.
ScratchPad	You can use the SPR to efficiently exchange data between CPU and VU0 at high rate, without having to deal with cache issues.