This post will dive into an accepted PR I made to the Firefox for iOS that improves the performance of content blocking and helps Firefox load faster. Instruments is used to profile performance. If you are not familiar with Instruments (part of Xcode) check out this Instruments Overview.

Finding The Performance Opportunity

A startup trace was collected showing CPU and memory allocation from a cold start on an iPhone X. At first glance, the app showed low cumulative overhead from a cold start.

One tactic when profiling performance is to look at the data in different ways to find unexpected patterns. In a memory trace, Instruments allows allocations to be sorted by the number of persistent (still alive) allocations. When sorted by the number of persistent allocations, the Firefox cold start trace showed the following:

Category Persistent Bytes # Persistent
icu::UnicodeSet 611.81 KiB 6526
Malloc 48 Bytes 223.59 KiB 4770
Malloc 3.00 KiB 12.31 MiB 4203
icu::RegexMatcher 1.34 MiB 4181
icu::RegexPattern 914.59 KiB 4181
NSRegularExpression 195.98 KiB 4181
icu::UVector32 130.66 KiB 4181
Malloc 128 Bytes 488.12 KiB 3905

Notice that a number of rows all related to Regular Expressions, identifiable by the category and # Persistent allocations. Upon further inspection, a large majority of the Malloc 3.00 KiB category is also related to Regular Expressions. Sample:

Address Category Size Responsible Caller
0x7fecfe87cc00 Malloc 3.00 KiB 3.00 KiB [NSRegularExpression initWithPattern:options:error:]

In total, we can identify about 15 MiB allocated toward regular expressions from the information presented thus far. Consider the total allocation of the application (no saved tabs, no saved user, combined Heap and Anonymous VM) is 65.24 MiB the significant portion related to Regular Expressions is unexpected.

Existing Implementation

Most of the Regular Expressions used in Firefox are created on startup as part of the Content Blocker implementation (more commonly know as AdBlock).

Specifically, the TPStatsBlocklists class loads multiple files containing definitions for which url patterns to block. The wildcardContentBlockerDomainToRegex(domain:) function creates a NSRegularExpression from the passed in domain. Since the files contain thousands of url definitions, thousands of NSRegularExpression objects are allocated.

Note: the existing implementation is correct and NSRegularExpression is a natural choice for dealing with regular expressions. Profiling holistically on a physical device is the most reliable way to find performance opportunities that would not be obvious otherwise.

Identifying The Performance Improvement

Knowing that thousands of NSRegularExpression objects take up a large portion of the total memory allocation from a cold start, the next step is to understand how the objects are used during the application lifecycle.

Firefox uses regular expressions to identify which loading resources should be blocked. Specifically, if NSRegularExpression.firstMatch(in:, options:, range:) is not nil then the resource matches the regular expression and should be blocked.

No other required interfaces for regular expressions are needed for the TPStatsBlocklists class. Thus, if there was an alternative to storing NSRegularExpression objects that would allow for a regular expression comparison to be made much of the memory allocation could be eliminated.

Faster Implementation

One hypothesis, which proved to be the better alternative, was to store the regular expressions as strings until needed. When a sanity test was performance, storing thousands of urls as String objects in Swift did not cross 100 KiB.

To support this change, an interface must be available to uses strings are regular expressions and identify if there is a match. iOS already provided this interface in the form of String.range(of:, options:). The result, if not nil, indicates a match was found.

To ensure correctness, a test case was added testing the url blocking features of the TPStatsBlocklists interface before any performance changes were made. All tests passed before and after the performance improvement, helping ensure performance optimization did not introduce a regression.

Finally, in additional to improving memory performance removing the need for thousands of expensive allocations at startup also reduced the CPU usage of Firefox from a cold start by about 10%.

App Performance, combining all performance enhancements including changes like this one, can have a large impact on the success of an app. For example, App Load Time is one of the 8 Overlooked Ways To Increase App Revenue I wrote about in another post.