
I spent a little time with Flash 10’s 3d features recently. Since Flash 10.1 is imminent and FP10 has been at 90%+ penetration for a while now, it’s probably safe to start looking at using FP10 stuff in my projects.
I also used this as an opportunity to try out git. It was easy to get git installed on OSX (I used the command line version, installed from git-osx-installer) and put my code up on Github. You can browse my test code at http://github.com/bengarney/garney-experiments/tree/master/exploringFlash3D/.
My main concern was the transformation pipeline – I think there might be some benefits to using 3d positions for the rendering pipeline in PBE. So I wanted to do a brief survey of the built-in 3d capabilities, then look more closely at the transformation routines.
My first test was making a DisplayObject rotate in 3d (minimal source code). It runs well, and if you turn on redraw region display, you can see that it’s properly calculating the parts of the screen that need to be modified.
This was easy to write, but it revealed the primary flaws with the built-in Flash 3d capabilities. First, look closely – the edges of the shape were sharp, but the interior detail is aliased. This is because whenever the 3D rendering path is engaged, it turns on cacheAsBitmap. This is fine for simple scenarios (say, taking a single element in a UI and giving it a nice transition) but not for more complex situations.
Which brings us to the second and bigger flaw. I added a thousand simple particle sprites at different 3d positions (source code). This runs extremely slowly because of an issue described by Keith Peters involving nested 3d transforms. Nested 3d objects cause excessive bitmap caching, dramatically reducing performance. You might end up with bitmap-caching in action on every 3d object and every DisplayObject containing 3d objects.
In addition, because it’s cached, moving objects further/closer from the camera results in an upsampled/downsampled image. So you tend to get mediocre visual results if your objects move much.
My next step was to stop using the 3d capabilities of DisplayObjects, and just position them in x/y based on their 3D position (source code, notice it is two files now). This gave a massive performance gain. At low quality, 1000 particles runs at 1440×700 at acceptable FPS. Most of the overhead is in the Flash renderer, not the code to update DisplayObject positions, but it still takes a while to do all the transformation, and it causes a lot of pressure on the garbage collector from all the 1000 temporary Vector3D instances that are created every frame. (600kb/second or so – not insignificant.)
Next I figured it would be helpful to make my camera move around (sample code).
This required that I understand the coordinate space all this operated in. What are the coordinate spaces? According to the official docs, screen XY maps to world XY. So forward is Z+, up is Y-, right is X+. Once I figured that out, I had to prepare a worldMatrix with the transform of the camera, then append the projectionMatrix. The PerspectiveProjection class always seems to assume screen coordinate (0,0) is the center of the projection so you will have to manually offset. Maybe I was not using the projection right, since the docs imply otherwise.
There were two other details to sort out. First, I had to reject objects behind the camera, and second, I had to scale objects correctly so they appeared to have perspective. The solution revolved around the same information – the pre-projection Z. By hiding all objects with Z < 0 and scaling by focalLength / preZ, I was able to get it to behave properly.
Next up is Matrix3D.transformVector… which is slow. Transforming 1000 vectors eats 3ms in release build! This is really slow in absolute terms (Ralph Hauwert has a good example of the same functionality running much much faster). I didn’t really want to introduce Alchemy for this project. But we can use AS3 code that avoids the allocations, saving us GC overhead and getting us an incremental improvement in performance.
Andre Michelle has some interesting thoughts on the problem of temporary objects related to transformations (see http://blog.andre-michelle.com/2008/too-late-very-simple-but-striking-feature-request-for-flash10-3d/). I did notice that Utils3D.projectVectors had some options for avoiding allocations, but it did not seem to run significantly faster (even in release build). (sample code for using projectVectors)
In the end, I settled on my own implementation of transformVectors, as it seemed to give the best balance between performance and ease of us. There’s a final version of the sample app where you can toggle between transformVector and the AS3 version by commenting out line 105/106 up on github, so you can test it for yourself. The transform function took some effort to get right, so here it is to save you the pain of implementing it yourself. It transform i by m and stores it in o.
final public function transformVec(m:Matrix3D, i:Vector3D, o:Vector3D):void
{
const x:Number = i.x, y:Number = i.y, z:Number = i.z;
const d:Vector. = m.rawData;
o.x = x * d[0] + y * d[4] + z * d[8] + d[12];
o.y = x * d[1] + y * d[5] + z * d[9] + d[13];
o.z = x * d[2] + y * d[6] + z * d[10] + d[14];
o.w = x * d[3] + y * d[7] + z * d[11] + d[15];
}
Time for some conclusions. I think that the 3D capabilities built into DisplayObject are OK, but very focused on light-weight graphic design use. Building a significant 3D application requires you write your own rendering code built on Flash’s 2D capabilities (either DisplayObjects or drawTriangles and friends). The 3d math classes are ok, but immature. Some things are very handy (like the prepend/append versions of all the methods on Matrix3D), but the tendency for Flash APIs to implicitly allocate temporary objects limits the usefulness of some the most central API calls. In addition, important assumptions like the order of the values in Matrix3D.rawData were not documented, leading to frustrating trial and error. I am excited to see Flash’s 3d capabilities mature.

The main things to watch out for is rendering – you must use hardware acceleration, which acts a lot like cacheAsBitmap – and memory allocation – the iPhone has very slow memory, so the GC running is the kiss of death. I could list a bunch of specifics but at this stage in the game, they would be out of date by the next SDK update. The best thing to do in this and all other optimization cases is to measure, find the biggest bottleneck, change it, and measure again to see if you got a win. Repeat until you have adequate performance.
For the rendering performance, the build of Trading Stuff that is on the App Store is not using hardware acceleration at all. And the ARM11 is a crappy chip for software rendering. I was showing a build at Adobe MAX that used hardware rendering, and it runs smoothly, like a real game should. I am waiting for a few bugs in Adobe’s SDK to be fixed, at which point I will release a new version on the store. The performance difference is night and day. I’ll post about that when I get the update out.
Is AS3 any good at all? JavaScript 
If you’re in the Flash space, you’ve probably seen a lot of coverage of 
Our latest game, 