Implementing imp_implementationWithBlock()

14 Apr 2011, 06:46 PDT

In iOS 4.3, Apple introduced a new API to support the use of blocks as Objective-C method implementations. The API provides similar functionality as Mike Ash's MABlockClosure, which uses libffi (or libffi-ios) to implement support for generating arbitrary function pointers that dispatch to a block.

However, Apple's new API differs from MABlockClosure in a few important ways:

Today, I'll be discussing how block-based message dispatch is implemented by Apple, and how you can implement your own similar, custom trampolines on iOS and Mac OS X. Additionally, I've posted PLBlockIMP on github. This project provides:

In addition to this article, Bill Bumgarner has written an excellent introduction to imp_implementationWithBlock, and if you need a refresher in objective message dispatch, Mike Ash has a more in-depth explanation here.

This work has been funded by my employer, Plausible Labs. We specialize in Mac OS X, iOS, and Android development and we're available for hire.

Creating Trampolines without Writable Code

The implementation of imp_implementationWithBlock() relies on trampolines to convert between Objective-C method calls and block dispatch. Trampolines are small pieces of code that, when called, perform some intermediary operations and then jump to the actual target destination. When you call imp_implementationWithBlock(), a function pointer to a trampoline is returned; it's this trampoline's responsibility to modify the function arguments and then jump to the actual code corresponding to the block's implementation.

Trampolines often require more information than can be derived from their function parameters -- such is the case with our IMP trampolines, which must have a pointer to the target block that they should call. Historically, this type of trampoline has generally been implemented through the use of writable code pages; the instructions are written to a PROT_EXEC|PROT_WRITE page at runtime, with any additional context information included directly in the generated code.

Unfortunately, iOS has instituted a restriction on the use of writable, executable pages (although there are signs that this may eventually be lifted), necessitating the use of an alternative mechanism for implementing trampoline-specific context data. While iOS does not allow the use of writable code, we can leverage a combination of vm_remap() and PC-relative addressing to implement configurable trampolines without writable code.

On Darwin, vm_remap() provides support for mapping an existing code page at new address, while retaining the existing page protections; using vm_remap(), we can create multiple copies of existing, executable code, placed at arbitrary addresses. If we generate a template page filled with trampolines at build time, we can create arbitrary duplicates of that page at runtime. This allows us to allocate an arbitrary number of trampolines using that template without requiring writable code:

Figure 1: vm_remap()

However, executable trampoline allocation only solves half the problem -- we still need a way to configure each trampoline.

The solution is PC-relative addressing. The processor's program counter register indicates the address of the currently executing instruction; PC-relative addressing uses the program counter to address memory relative to the currently executing instruction. When we remap our trampolines and then jump to them, each trampoline is executing at a unique address. If we then map a writable data page next to our trampoline page, we can use PC-relative addressing to load per-trampoline data from adjacent writable data page:

Figure 2: vm_remap() with writable data pages

Once a full page of trampolines are allocated, we simply need to provide the individual trampolines on request, and allocate additional pages if the pool of trampolines is exhausted. A full implementation of a trampoline allocator is available in PLBlockIMP; refer to pl_trampoline_alloc(), pl_trampoline_data_ptr(), and pl_trampoline_free().

To save space within our trampoline page, each individual trampoline saves the PC register, and then jumps to a common implementation at the start of the trampoline page. The ARM implementation of the individual trampoline stub is two instructions:

mov r12, pc
b _block_tramp_dispatch;

IMP->Block Dispatch (self, _cmd, and Block)

As noted in Bill Bumgarner's article on imp_implementationWithBlock, every Objective-C method has two implicit, pointer-sized arguments at the head of the method's argument list: self, and _cmd. Likewise, block implementations also have an implicit first argument; the block literal, which maintains the block's reference count, bock descriptor, references to captured variables, and other block data.

When called, a trampoline returned by imp_implementationWithBlock() is responsible for re-ordering its arguments to match those required by the block's implementation: the 'self' argument must be moved to second argument slot (overwriting _cmd), and the block literal moved to the (now vacated) first argument slot.

However, there is one wrinkle that Bill didn't touch on: structure return values. On the current architectures supported by Darwin, functions that return structures by value may have an additional pointer at the start of their argument list. This pointer is used to provide the address on the caller's stack at which the structure return value should be written.

In the case where structure return (stret) calling conventions are used, the structure return pointer in the first argument slot must remain unmodified, the block literal argument must be in the second argument slot, and the self pointer in the third. This requires that imp_implementationWithBlock() provide two different trampoline implementations to support both calling conventions, and that the requisite trampoline type for a block be determined when imp_implementationWithBlock is called.

There is no way to determine from a raw function pointer whether a function requires stret calling conventions -- to work around this, Apple's compilers set an additional flag, BLOCK_USE_STRET, when emitting a block that requires the stret calling conventions. This flag may be used to easily determine the necessarily trampoline type for a block.

As described in the previous section, the individual trampolines save their PC address and then immediately jump to a shared implementation at the start of the trampoline page. That shared implementation re-orders the existing arguments, loads the block literal from its PC-relative configuration data, and then jumps to the block's implementation -- on ARM, our non-stret shared implementation looks like this:

    # trampoline address+8 is in r12 -- calculate our config page address
    sub r12, #0x8
    sub r12, #0x1000
    # Set the 'self' argument as the second argument
    mov r1, r0
    # Load the block pointer as the first argument
    ldr r0, [r12]
    # Jump to the block pointer
    ldr pc, [r0, #0xc]


While I still have my fingers crossed for PROT_EXEC|PROT_WRITE pages on iOS, vm_remap()-based trampolines can serve as a viable replacement for some tasks. If you have further questions, or ideas for other neat projects worth tackling, feel free to drop me an e-mail.