Enlisted 04/09/2013 to Archive[actionscript, inline, inlining]

AS3: Inline methods

Theory

You might know the inlining concept from other programming languages as C++, the idea is quite simple. If you are familiar with assembly languages and their processing in CPU, you know the instructions are processed sequentially and when a jump occurs (e.g. a method call), the hardware must handle the jump to the proper address within the code and then jump back after its finish. It is clear this mechanism consumes an extra performance so why don't just replace jumps with the code itself? And that's how inlining works.

You can argue such approach increases the code size and memory consumption per call and you are right, but it is only question of priority whether you want to achieve better performance or lower memory consumption.

Another issue with inlined code can occur while debugging. It is strongly based on the debugger how it handles the link between inlined bytecode and regular code in your debug interface. This situation is very similar to debugging an optimized bytecode, so my recommendation is don't use inlining while debugging. I would also like to warn before using inlined code when compiling for iOS via AIR environment, it occurred to me multiple times that the generated and optimized LLVM bytecode was unable to be debugged at all, sometimes inlining of specific methods even prevented the build with unknown error.

In AS3 inlining presents a simple mechanism that allows compiler to avoid method calls by replacing the call with the method body. This allows to significantly improve code performance because the method calls are quite expensive as we said before, even more expensive in AVM.

Inlining in ASC 2.0

ASC (ActionScript Compiler) in version 2.0 brought the inlining capability. According to its documentation a method can be inlined when

  • is final, static or the containing scope is file or package
  • does not contain any activations
  • does not contain any try or with statements
  • does not contain any function closures
  • body contains less than 50 expressions

It is also needed to add -inline compiler argument and I recommend to mark inlined methods with [Inline] metadata to highlight them both for compiler and programmer. When inline compiler argument is used, methods are inlined regardless they contains this metadata if they meet the rules above.

Bytecode

Let's see now a simple implementation of methods that increase value of class private field. One is non-inlined and the second is inlined due to rules mentioned above:

private var m_x:uint = 0;

public function _inc_n():void
{
	m_x++;
}

[Inline]
public final function _inc_i():void
{
	m_x++;
}

and their bytecode interpretation:

function _inc_n():void
{
   0   getlocal0
   1   pushscope
   2   getlex        	private::m_x
   4   convert_d
   5   increment
   6   findpropstrict	private::m_x
   8   swap
   9   setproperty   	private::m_x
   11  returnvoid
}
function _inc_i():void
{
   0    getlocal0
   1    pushscope
   2    getlex        	private::m_x
   4    convert_d
   5    increment
   6    findpropstrict	private::m_x
   8    swap
   9    setproperty   	private::m_x
   11   returnvoid
}

We can see bellow that their bytecode interpretation is identical. It simply fetches the variable, increments it and sets the value back. That's it. You are not surprised there is no difference, are you? :) You already know that the difference is in the call.

NOTE: You might notice that the variable is converted into double before incrementation. It's because the increment instruction takes a Number operand and convert_d instruction converts the value to Number. You can often see such "instruction pairs" in the AVM bytecode.

Let's put these methods into context now. We create a script that calls our methods. The non-inlined first:

public function Main()
{
	_inc_n();
}

In bytecode represantion of the script we can see the instruction callpropvoid providing the method call.

function Main():*
{
   0    getlocal0
   1    pushscope
   2    getlocal0
   3    constructsuper	(0)
   5    findpropstrict	_inc_n
   7    callpropvoid  	_inc_n    (0)
   10   returnvoid
}

Now the inlined method call:

public function Main()
{
	_inc_i();
}

I highlighted the inlined block with the comment. We've seen this logic before — in the _inc_i() method. It is obvious that the method call has been replaced with the method body. The code is not identical because of differences in variable fetching and scope control. I hope it's clear now.

function Main():*
{
   0    getlocal0
   1    pushscope
   2    getlocal0
   3    constructsuper	(0)
   // Inlined method code placed here
   5    getlocal0
   6    getproperty   	private::m_x
   8    convert_d
   9    increment
   10   getlocal0
   11   swap
   12   setproperty   	private::m_x
   // end of the inline block
   14   returnvoid
}

Performance

Let's figure out the performance gain of inlining. I performed a little test that calls methods in a loop and measures time needed to execute method calls. It periodically increases the amount of calls to see the evolution. I also left debug mode on and let the slow debug instructions help to get measurable results at low loop sets.

In the test I double size of the set by power of two, just for little bit faster execution thanks to shift operations. For time measurement I use the getTimer() method and the finish time fetching follows directly after inner for-loop to avoid distortion from _reportStatus method call.

public class Main extends Sprite
{
	/** Use squares of 2 */
	private static const MULTIPLIER:uint = 1;

	/** Too low value gives immeasurable results */
	private static const STARTING_LOOPS:uint = 10000;

	/** Too high value can cause delay and distortion of results */
	private static const MAX_LOOPS:uint = 2<<15;

	private var m_loops:uint = STARTING_LOOPS;
	private var m_x:uint = 0;

	public function Main()
	{
		_test();
	}


	private function _test():void
	{
		var startTime:uint;
		var call:uint;

		m_loops = STARTING_LOOPS;

		while (m_loops < MAX_LOOPS)
		{
			startTime = getTimer();

			for (call = 0; call <= m_loops; call++)
			{
				// METHOD CALL HERE
			}

			// getTimer() is called before the method call for better accuracy
			_reportStatus(startTime, getTimer());

			m_loops <<= MULTIPLIER;
		}
	}


	private function _reportStatus(startTime:uint, endTime:uint):void
	{
		trace(endTime - startTime + " | " + m_loops);
	}


	//region TEST METHODS

	public function _inc_n():void
	{
		m_x++;
	}


	[Inline]
	public final function _inc_i():void
	{
		m_x++;
	}

	//endregion
}

It is logical that the function of duration is exponential. What we look for is the ratio of call durations of both result lines. Based on the test, statistical results with debug configuration were:

  • medianD = 2,886254154
  • meanD = 3,032160526

and for the non-debug (fastest) version:

  • median = 9,413533835
  • mean = 9,057931766

Gross performance gain of inlined calls is at least 280 % against the non-inlined calls. When debug mode was turned off, performance multiplier rises up to 940 %. This ratio will be probably higher on more powerful machines. We can see the method call really is an expensive operation.

Graph is missing
Inline vs. non-inline AS3 method call performance (debug on)
Graph is missing
Inline vs. non-inline AS3 method call performance (debug off)

Conclusion

Based on the theory, bytecode analysis and results of the test we can summarize when inline methods are good to use:

  • when many method calls occurs in the code (especially in loops)
  • when we prefer better performance over lower memory consumption
  • when not debugging

From my own experience, inlining is useful for building mathematical helpers, tool classes and similar concepts where using in loops is assumed. If you need to cut down frame time, this could be a good solution.

References

  • ActionScript Virtual Machine 2 (AVM2) Overview [online]
  • ASC 2.0 (ActionScript Compiler) wiki [online]
  • getTimer() reference [online]