Performance

There are pretty much two ps2gl performance bottlenecks over which the application has control: dma transfer and vu1 rendering.  ps2gl uses a number of different vu1 renderers to do transform and lighting, choosing the fastest one that fits the current rendering requirements.  Take a look at the 'performance' example to get an idea of how different parameters influence the choice of vu1 renderer and the impact on speed.

Tips

What YOU can do to make things go faster!
    use display lists -- display lists have been optimized at the expense of immediate-mode. The main problem with them now is inefficient use of memory when used to cache glBegin/glEnd draw commands, which brings us to.. 

    use DrawArrays - memory is almost allocated efficiently (at least it's loosely related to the size of the input data..) and there's no copying 

    when rendering a model, group each of [vertices, normals, tex coords, colors] contiguously in memory

    For example:

     < all vertices >
    < all normals >
    < all tex coords >

    NOT:

     < vertex0, normal0, texCoord0 >
    < vertex1, normal1, texCoord1 >
    ...

    for geometry that changes frequently we have a problem. The DrawArrays call and the creation of display lists take a fair amount of time so we don't want to be doing it every frame. Furthermore, if only the values of vertices and normals are changing (and not the topology), like with a skinned model, we shouldn't need to rebuild the display list since the data is passed by reference. It would be nice if we could just create one display list that contains calls to DrawArrays pointing at our data, and then change the data behind the display list's back. But according to the documentation, glDrawArrays only mostly references the array data, i.e., some data does get copied

    Fear not, for all hope is not lost. The only time the display list will copy any data is when it needs to transfer elements that start on a non-qword-aligned boundary. That means that if all your vertices, normals, tex coords, and colors are either 2 or 4 floats everything should be aligned correctly and nothing will be copied. (It's useful to note at this point that the "w" field of all vertices is implicitly forced to 1.0f, so it doesn't matter what is actually written to that field in memory.) The only hitch in this plan is that glNormalPointer implicitly sets the length of normals to be 3 elements. For this reason ps2gl has a new call 'pglNormalPointer' that allows you to specify the length of the normals, as in glVertexPointer. 

    So to render geometry that's changing frequently, here's the plan: 

    1. Allocate memory for the data starting on a qword boundary (malloc/new). 
    2. Store vertices as (xyz?), tex coords as (uv), and normals as (xyz?). 
    3. Create a display list and render with glDrawArrays. 
    4. Now the data can be modified and glCallList will still render it correctly. 


    writing custom renderers is, of course, the best way to optimize your app.  Everything from the dma chains that are created to the microcode used can be overridden by the application.  Some ideas:

    • write a dummy renderer that builds dma chains that can be saved to a file, then another renderer that just calls those chains