分析Objective-C二进制图像大小

时间:2022-09-06 21:33:05

I'm looking for tools and approaches to determining what parts of of my Cocoa and Cocoa-Touch programs are most contributing the the final binary image size, and ways to help reduce it. I'm not looking for a "magic bullet" compiler flag. I'm looking for profiling techniques for evaluating and reducing image size waste in the same vein as Shark and Instruments help for run-time evaluation.

我正在寻找工具和方法来确定我的Cocoa和Cocoa-Touch程序的哪些部分对最终二进制图像大小的贡献最大,以及帮助减少它的方法。我不是在寻找一个“魔术子弹”编译器标志。我正在寻找用于评估和减少图像尺寸浪费的分析技术,与Shark和Instruments帮助进行运行时评估相同。

A first-order approximation may be the size of the .o's, but how trustworthy is this in terms of final image size after optimizations and dead-code stripping? If I add up all the .o's they are much larger than my final image, so clearly the linker is already helping me out significantly. But this means the size of the .o's may not be a useful measure.

一阶近似可能是.o的大小,但在优化和死代码剥离后的最终图像大小方面,这是多么值得信赖?如果我把所有的.o都加起来,它们比我的最终图像大得多,所以链接器显然已经帮我了很多。但这意味着.o的大小可能不是一个有用的衡量标准。

Where do others look to reduce image size without undermining code maintainability?

在不破坏代码可维护性的情况下,其他人会在哪些方面减少图像大小?

2 个解决方案

#1


39  

Apple has some awesome docs on Code Size Performance Guidelines, almost all of which applies to this question in some form. There are even tips for pedantic approaches like manually ordering symbols in the binary if desired. :-)

Apple在代码大小性能指南方面有一些很棒的文档,几乎所有这些都以某种形式适用于这个问题。如果需要,甚至还有针对迂腐方法的提示,例如在二进制中手动排序符号。 :-)

I'm totally a fan of simple, slim code and minimizing disk/memory footprint. Premature optimization is always a bad idea, but consistent housekeeping can be a good way to prevent cruft from accumulating. Unfortunately, I don't know of an automated way to profile code sizes, but several tools exist that can help provide specific insight.

我完全喜欢简单,纤薄的代码并最大限度地减少磁盘/内存占用。过早优化总是一个坏主意,但一致的内务管理可以是防止残余累积的好方法。遗憾的是,我不知道自动分析代码大小的方法,但有几种工具可以帮助提供具体的洞察力。

Binary Image Size

Object files aren't as terrible an approximation as you'd guess. One reason the sum is smaller than the parts is because the code is all joined together with a single header. Although the percentages won't be precise, the biggest object files are the biggest parts of the linked binary.

对象文件并不像你猜的那样可怕。总和小于部分的一个原因是因为代码全部用单个头连接在一起。尽管百分比不准确,但最大的目标文件是链接二进制文件的最大部分。

For understanding the raw length of each particular method in an object file, you could use /usr/bin/otool to print out the assembly code, punctuated by Objective-C method names:

要了解目标文件中每个特定方法的原始长度,可以使用/ usr / bin / otool打印出汇编代码,并使用Objective-C方法名称标点:

$ otool -tV MyClass.o

I look for long stretches of assembly that correspond to relatively short or simple methods and examine whether the code can be simplified or removed entirely.

我寻找对应于相对简短或简单方法的长段组装,并检查代码是否可以简化或完全删除。

In addition to otool, I've found that /usr/bin/size can be quite useful, since it breaks up segments and sections hierarchically and shows you the size of each, both for object files and compiled binaries. For example:

除了otool之外,我发现/ usr / bin / size非常有用,因为它分层次地分解了段和段,并显示了每个段的大小,包括目标文件和编译的二进制文件。例如:

$ size -m -s __TEXT __text MyClass.o
$ size -m /Applications/iCal.app/Contents/MacOS/iCal

This is a "bigger picture" view, although it usually reinforces that __TEXT __text is often one of the largest in the file, and hence a good place to start pruning.

这是一个“更大的图片”视图,虽然它通常强调__TEXT __text通常是文件中最大的一个,因此是开始修剪的好地方。

Dead Code Identification

Nobody really wants their binary to be littered with code that is never used. In a dynamic and loosely-coupled language like Objective-C, it can be difficult or impossible to statically determine whether specific code is "used" or not. Even if a class is instantiated or a method is called, tracing code paths (both theoretical and actual) can be a headache. I use a few tricks to help with this.

没有人真的希望他们的二进制文件充斥着从未使用过的代码。在像Objective-C这样的动态且松散耦合的语言中,静态地确定特定代码是否“被使用”可能是困难的或不可能的。即使实例化一个类或调用一个方法,跟踪代码路径(理论上和实际上)都会令人头疼。我用一些技巧来帮助解决这个问题。

  • For static analysis, I strongly recommend the Clang Static Analyzer (which is happily built into Xcode 3.2 on Snow Leopard). Among all its other virtues, this tool can trace code paths an identify chunks of code that cannot possibly be executed, and should either be removed or the surrounding code should be fixed so that it can be called.
  • 对于静态分析,我强烈推荐使用Clang Static Analyzer(它很好地内置于Snow Leopard的Xcode 3.2中)。在其所有其他优点中,该工具可以跟踪代码路径,识别不可能执行的代码块,并且应该被删除或者应该修复周围的代码以便可以调用它。
  • For dynamic analysis, I use gcov (with unit testing) to identify which code is actually executed. Coverage reports (read with something like CoverStory) reveal un-executed code, which — coupled with manual examination and testing — can help identify code that may be dead. You do have to tweak some setting and run gcov manually on your binaries. I used this blog post to get started.
  • 对于动态分析,我使用gcov(带单元测试)来识别实际执行的代码。覆盖率报告(使用类似CoverStory的内容读取)显示未执行的代码,再加上手动检查和测试,可以帮助识别可能已死的代码。您必须调整一些设置并在二进制文件上手动运行gcov。我用这篇博文开始了。

In practice, it's uncommon for dead code to be a large enough proportion of the code to make a substantial difference in binary size or load time, but dead code certainly complicates maintenance, and it's best to get rid of it if you can.

在实践中,死代码是一个足够大的代码比例,以便在二进制大小或加载时间方面产生重大差异,但死代码确实使维护变得复杂,并且如果可以的话最好摆脱它,这是不常见的。

Symbol Visibility

Reducing symbol visibility may seem like a strange recommendation, but it makes things much easier for dyld (the linker that loads programs at runtime) and enables the compiler to perform better optimizations. Consider hiding global variables (that aren't declared as static) etc. by prefixing them with a "hidden" attribute, or enabling "Symbols Hidden by Default" in Xcode and explicitly making symbols visible. I use the following macros:

降低符号可见性似乎是一个奇怪的建议,但它使dyld(在运行时加载程序的链接器)更容易,并使编译器能够执行更好的优化。考虑隐藏全局变量(未声明为静态)等,方法是在其前面添加“隐藏”属性,或在Xcode中启用“默认隐藏符号”并明确使符号可见。我使用以下宏:

#define HIDDEN __attribute__((visibility("hidden")))
#define VISIBLE __attribute__((visibility("default")))

I find /usr/bin/nm invaluable for identifying unnecessarily visible symbols, and for identifying potential external dependencies you might be unaware of or hadn't considered.

我发现/ usr / bin / nm对于识别不必要的可见符号非常有用,并且用于识别您可能不知道或未考虑的潜在外部依赖关系。

$ nm -m -s __TEXT __text MyClass.o  # -s displays only a given section
$ nm -m -p MyClass.o  # -p preserves symbol table ordering (no sort) 
$ nm -m -u MyClass.o  # -u displays only undefined symbols

Although reducing symbol visibility is unlikely to directly reduce the size of your binary, the compiler may be able to make improvements it couldn't otherwise. Also, you stand to reduce accidental dependencies on symbols you didn't intend to expose.

尽管降低符号可见性不太可能直接减小二进制文件的大小,但编译器可能无法进行改进。此外,您可以减少对您不打算公开的符号的意外依赖性。

Analyzing Library Dependencies and Loading

In addition to raw binary size, it can often be quite helpful to analyze which dynamic libraries you link to, and eliminate those that might be unnecessary, particularly less-commonly-used frameworks that may not be loaded yet. (You can also see this from Xcode too, but with complex projects, sometimes things slip through, so this also makes for a handy sanity check after building.) Again, otool to the rescue...

除了原始二进制文件大小之外,分析链接到哪些动态库通常非常有用,并且可以消除那些可能不必要的动态库,特别是那些可能尚未加载的不太常用的框架。 (你也可以从Xcode看到这个,但是对于复杂的项目,有时事情会漏掉,所以这也可以在建造之后进行方便的理智检查。)再次,otool救援......

$ otool -L MyClass.o

Another (extremely verbose) alternative is to have dyld print loaded libraries, like so (from Terminal):

另一个(非常详细)替代方案是使用dyld打印加载库,如此(从终端):

$ export DYLD_PRINT_LIBRARIES=1
$ /Applications/iCal.app/Contents/MacOS/iCal

This shows exactly what is being loaded, including dependencies of the libraries your code links against.

这显示了正在加载的内容,包括代码链接的库的依赖关系。

Analyzing Launch Performance

Usually, what you really care about is whether the code size and library dependencies are truly affecting launch time. Setting this environment variable will cause dyld to report load statistics, which can really help pinpoint how time was spent on load:

通常,您真正关心的是代码大小和库依赖性是否真正影响启动时间。设置此环境变量将导致dyld报告负载统计信息,这可以帮助确定加载时间的时间:

$ export DYLD_PRINT_STATISTICS=1
$ /Applications/iCal.app/Contents/MacOS/iCal

On Leopard and later, you'll notice entries about "dyld shared cache". Basically, the dynamic linker creates a consolidated "super library" composed of the most frequently-used dynamic libraries. It is mentioned in this Apple documentation, and the behavior can be altered with the DYLD_SHARED_REGION and DYLD_NO_FIX_PREBINDING environment variables, similar to above. See man dyld for details.

在Leopard及更高版本中,您会注意到有关“dyld共享缓存”的条目。基本上,动态链接器创建一个由最常用的动态库组成的合并“超级库”。在Apple文档中提到了这一点,并且可以使用DYLD_SHARED_REGION和DYLD_NO_FIX_PREBINDING环境变量更改行为,类似于上面的内容。有关详细信息,请参阅man dyld。

#2


3  

You might want to look at otool. Specifically, you probably want to use the -l flag which displays all the load commands (a.k.a. the sections and segments) that make up your binary.

你可能想看看otool。具体来说,您可能希望使用-l标志来显示构成二进制文件的所有加载命令(例如,部分和段)。

Having said all that, you would usually find that the resources are more significant than the code you write, so I'm wondering what problem you’ve encountered that you’re trying to solve. Our applications have a fair bit of code yet are still only a few MB. Maybe you’re statically linking to some big libraries—I don't know.

说了这么多,你会发现资源比你编写的代码更重要,所以我想知道你遇到的问题是你要解决的问题。我们的应用程序有相当多的代码,但仍然只有几MB。也许你是静态链接到一些大型图书馆 - 我不知道。

If most of your code is Objective-C, very little of it will be removed with dead-code stripping (for obvious reasons), so that won’t make much difference.

如果你的大部分代码都是Objective-C,那么很少有代码会被删除(由于显而易见的原因),所以这不会产生太大的影响。

What will make a difference is the debug information which will be substantial. Your object files will include this, but you'd typically have it stored in a separate dSYM bundle when you link it so it won't be included in the final binary (or at least this is what you should be doing).

什么将产生影响是调试信息将是实质性的。您的目标文件将包含此文件,但是当您链接它时,通常将它存储在单独的dSYM包中,因此它不会包含在最终二进制文件中(或者至少这是您应该做的事情)。

Your code will be in the __TEXT, __text segment/section.

您的代码将位于__TEXT,__ text段/部分中。

I'm pretty sure the linker will coalesce equivalent strings so the total will be less than the sum of the parts for these sections, but, I guess, typically not by much.

我很确定链接器会合并等效的字符串,因此总数将小于这些部分的部分总和,但是,我想,通常不会太多。

I would also expect your relocation and symbols sections to be less than the sum of the parts. You should strip your linked binary of unneeded symbols to save space (which isn't the same as stripping debug information). See the "Strip Linked Product" setting in Xcode.

我还希望你的重定位和符号部分小于各部分的总和。您应该删除不需要的符号的链接二进制文件以节省空间(这与剥离调试信息不​​同)。请参阅Xcode中的“Strip Linked Product”设置。

One other thing to remember is that your linked binary will be a FAT binary, whereas the object files usually aren’t.

另一件需要记住的事情是,您的链接二进制文件将是FAT二进制文件,而目标文件通常不是。

#1


39  

Apple has some awesome docs on Code Size Performance Guidelines, almost all of which applies to this question in some form. There are even tips for pedantic approaches like manually ordering symbols in the binary if desired. :-)

Apple在代码大小性能指南方面有一些很棒的文档,几乎所有这些都以某种形式适用于这个问题。如果需要,甚至还有针对迂腐方法的提示,例如在二进制中手动排序符号。 :-)

I'm totally a fan of simple, slim code and minimizing disk/memory footprint. Premature optimization is always a bad idea, but consistent housekeeping can be a good way to prevent cruft from accumulating. Unfortunately, I don't know of an automated way to profile code sizes, but several tools exist that can help provide specific insight.

我完全喜欢简单,纤薄的代码并最大限度地减少磁盘/内存占用。过早优化总是一个坏主意,但一致的内务管理可以是防止残余累积的好方法。遗憾的是,我不知道自动分析代码大小的方法,但有几种工具可以帮助提供具体的洞察力。

Binary Image Size

Object files aren't as terrible an approximation as you'd guess. One reason the sum is smaller than the parts is because the code is all joined together with a single header. Although the percentages won't be precise, the biggest object files are the biggest parts of the linked binary.

对象文件并不像你猜的那样可怕。总和小于部分的一个原因是因为代码全部用单个头连接在一起。尽管百分比不准确,但最大的目标文件是链接二进制文件的最大部分。

For understanding the raw length of each particular method in an object file, you could use /usr/bin/otool to print out the assembly code, punctuated by Objective-C method names:

要了解目标文件中每个特定方法的原始长度,可以使用/ usr / bin / otool打印出汇编代码,并使用Objective-C方法名称标点:

$ otool -tV MyClass.o

I look for long stretches of assembly that correspond to relatively short or simple methods and examine whether the code can be simplified or removed entirely.

我寻找对应于相对简短或简单方法的长段组装,并检查代码是否可以简化或完全删除。

In addition to otool, I've found that /usr/bin/size can be quite useful, since it breaks up segments and sections hierarchically and shows you the size of each, both for object files and compiled binaries. For example:

除了otool之外,我发现/ usr / bin / size非常有用,因为它分层次地分解了段和段,并显示了每个段的大小,包括目标文件和编译的二进制文件。例如:

$ size -m -s __TEXT __text MyClass.o
$ size -m /Applications/iCal.app/Contents/MacOS/iCal

This is a "bigger picture" view, although it usually reinforces that __TEXT __text is often one of the largest in the file, and hence a good place to start pruning.

这是一个“更大的图片”视图,虽然它通常强调__TEXT __text通常是文件中最大的一个,因此是开始修剪的好地方。

Dead Code Identification

Nobody really wants their binary to be littered with code that is never used. In a dynamic and loosely-coupled language like Objective-C, it can be difficult or impossible to statically determine whether specific code is "used" or not. Even if a class is instantiated or a method is called, tracing code paths (both theoretical and actual) can be a headache. I use a few tricks to help with this.

没有人真的希望他们的二进制文件充斥着从未使用过的代码。在像Objective-C这样的动态且松散耦合的语言中,静态地确定特定代码是否“被使用”可能是困难的或不可能的。即使实例化一个类或调用一个方法,跟踪代码路径(理论上和实际上)都会令人头疼。我用一些技巧来帮助解决这个问题。

  • For static analysis, I strongly recommend the Clang Static Analyzer (which is happily built into Xcode 3.2 on Snow Leopard). Among all its other virtues, this tool can trace code paths an identify chunks of code that cannot possibly be executed, and should either be removed or the surrounding code should be fixed so that it can be called.
  • 对于静态分析,我强烈推荐使用Clang Static Analyzer(它很好地内置于Snow Leopard的Xcode 3.2中)。在其所有其他优点中,该工具可以跟踪代码路径,识别不可能执行的代码块,并且应该被删除或者应该修复周围的代码以便可以调用它。
  • For dynamic analysis, I use gcov (with unit testing) to identify which code is actually executed. Coverage reports (read with something like CoverStory) reveal un-executed code, which — coupled with manual examination and testing — can help identify code that may be dead. You do have to tweak some setting and run gcov manually on your binaries. I used this blog post to get started.
  • 对于动态分析,我使用gcov(带单元测试)来识别实际执行的代码。覆盖率报告(使用类似CoverStory的内容读取)显示未执行的代码,再加上手动检查和测试,可以帮助识别可能已死的代码。您必须调整一些设置并在二进制文件上手动运行gcov。我用这篇博文开始了。

In practice, it's uncommon for dead code to be a large enough proportion of the code to make a substantial difference in binary size or load time, but dead code certainly complicates maintenance, and it's best to get rid of it if you can.

在实践中,死代码是一个足够大的代码比例,以便在二进制大小或加载时间方面产生重大差异,但死代码确实使维护变得复杂,并且如果可以的话最好摆脱它,这是不常见的。

Symbol Visibility

Reducing symbol visibility may seem like a strange recommendation, but it makes things much easier for dyld (the linker that loads programs at runtime) and enables the compiler to perform better optimizations. Consider hiding global variables (that aren't declared as static) etc. by prefixing them with a "hidden" attribute, or enabling "Symbols Hidden by Default" in Xcode and explicitly making symbols visible. I use the following macros:

降低符号可见性似乎是一个奇怪的建议,但它使dyld(在运行时加载程序的链接器)更容易,并使编译器能够执行更好的优化。考虑隐藏全局变量(未声明为静态)等,方法是在其前面添加“隐藏”属性,或在Xcode中启用“默认隐藏符号”并明确使符号可见。我使用以下宏:

#define HIDDEN __attribute__((visibility("hidden")))
#define VISIBLE __attribute__((visibility("default")))

I find /usr/bin/nm invaluable for identifying unnecessarily visible symbols, and for identifying potential external dependencies you might be unaware of or hadn't considered.

我发现/ usr / bin / nm对于识别不必要的可见符号非常有用,并且用于识别您可能不知道或未考虑的潜在外部依赖关系。

$ nm -m -s __TEXT __text MyClass.o  # -s displays only a given section
$ nm -m -p MyClass.o  # -p preserves symbol table ordering (no sort) 
$ nm -m -u MyClass.o  # -u displays only undefined symbols

Although reducing symbol visibility is unlikely to directly reduce the size of your binary, the compiler may be able to make improvements it couldn't otherwise. Also, you stand to reduce accidental dependencies on symbols you didn't intend to expose.

尽管降低符号可见性不太可能直接减小二进制文件的大小,但编译器可能无法进行改进。此外,您可以减少对您不打算公开的符号的意外依赖性。

Analyzing Library Dependencies and Loading

In addition to raw binary size, it can often be quite helpful to analyze which dynamic libraries you link to, and eliminate those that might be unnecessary, particularly less-commonly-used frameworks that may not be loaded yet. (You can also see this from Xcode too, but with complex projects, sometimes things slip through, so this also makes for a handy sanity check after building.) Again, otool to the rescue...

除了原始二进制文件大小之外,分析链接到哪些动态库通常非常有用,并且可以消除那些可能不必要的动态库,特别是那些可能尚未加载的不太常用的框架。 (你也可以从Xcode看到这个,但是对于复杂的项目,有时事情会漏掉,所以这也可以在建造之后进行方便的理智检查。)再次,otool救援......

$ otool -L MyClass.o

Another (extremely verbose) alternative is to have dyld print loaded libraries, like so (from Terminal):

另一个(非常详细)替代方案是使用dyld打印加载库,如此(从终端):

$ export DYLD_PRINT_LIBRARIES=1
$ /Applications/iCal.app/Contents/MacOS/iCal

This shows exactly what is being loaded, including dependencies of the libraries your code links against.

这显示了正在加载的内容,包括代码链接的库的依赖关系。

Analyzing Launch Performance

Usually, what you really care about is whether the code size and library dependencies are truly affecting launch time. Setting this environment variable will cause dyld to report load statistics, which can really help pinpoint how time was spent on load:

通常,您真正关心的是代码大小和库依赖性是否真正影响启动时间。设置此环境变量将导致dyld报告负载统计信息,这可以帮助确定加载时间的时间:

$ export DYLD_PRINT_STATISTICS=1
$ /Applications/iCal.app/Contents/MacOS/iCal

On Leopard and later, you'll notice entries about "dyld shared cache". Basically, the dynamic linker creates a consolidated "super library" composed of the most frequently-used dynamic libraries. It is mentioned in this Apple documentation, and the behavior can be altered with the DYLD_SHARED_REGION and DYLD_NO_FIX_PREBINDING environment variables, similar to above. See man dyld for details.

在Leopard及更高版本中,您会注意到有关“dyld共享缓存”的条目。基本上,动态链接器创建一个由最常用的动态库组成的合并“超级库”。在Apple文档中提到了这一点,并且可以使用DYLD_SHARED_REGION和DYLD_NO_FIX_PREBINDING环境变量更改行为,类似于上面的内容。有关详细信息,请参阅man dyld。

#2


3  

You might want to look at otool. Specifically, you probably want to use the -l flag which displays all the load commands (a.k.a. the sections and segments) that make up your binary.

你可能想看看otool。具体来说,您可能希望使用-l标志来显示构成二进制文件的所有加载命令(例如,部分和段)。

Having said all that, you would usually find that the resources are more significant than the code you write, so I'm wondering what problem you’ve encountered that you’re trying to solve. Our applications have a fair bit of code yet are still only a few MB. Maybe you’re statically linking to some big libraries—I don't know.

说了这么多,你会发现资源比你编写的代码更重要,所以我想知道你遇到的问题是你要解决的问题。我们的应用程序有相当多的代码,但仍然只有几MB。也许你是静态链接到一些大型图书馆 - 我不知道。

If most of your code is Objective-C, very little of it will be removed with dead-code stripping (for obvious reasons), so that won’t make much difference.

如果你的大部分代码都是Objective-C,那么很少有代码会被删除(由于显而易见的原因),所以这不会产生太大的影响。

What will make a difference is the debug information which will be substantial. Your object files will include this, but you'd typically have it stored in a separate dSYM bundle when you link it so it won't be included in the final binary (or at least this is what you should be doing).

什么将产生影响是调试信息将是实质性的。您的目标文件将包含此文件,但是当您链接它时,通常将它存储在单独的dSYM包中,因此它不会包含在最终二进制文件中(或者至少这是您应该做的事情)。

Your code will be in the __TEXT, __text segment/section.

您的代码将位于__TEXT,__ text段/部分中。

I'm pretty sure the linker will coalesce equivalent strings so the total will be less than the sum of the parts for these sections, but, I guess, typically not by much.

我很确定链接器会合并等效的字符串,因此总数将小于这些部分的部分总和,但是,我想,通常不会太多。

I would also expect your relocation and symbols sections to be less than the sum of the parts. You should strip your linked binary of unneeded symbols to save space (which isn't the same as stripping debug information). See the "Strip Linked Product" setting in Xcode.

我还希望你的重定位和符号部分小于各部分的总和。您应该删除不需要的符号的链接二进制文件以节省空间(这与剥离调试信息不​​同)。请参阅Xcode中的“Strip Linked Product”设置。

One other thing to remember is that your linked binary will be a FAT binary, whereas the object files usually aren’t.

另一件需要记住的事情是,您的链接二进制文件将是FAT二进制文件,而目标文件通常不是。