为什么生成执行操作的Java代码比“解释器循环”运行得慢?

时间:2022-04-21 17:22:34

I have some Java code which performs bitwise operations on a BitSet. I have a list of operations and can "interpret" them by looping over them, but it's important to me that I can perform these operations as quickly as possible, so I've been trying to dynamically generate code to apply them. I generate Java source to perform the operations and compile a class implementing those operations using Javassist.

我有一些Java代码在BitSet上执行按位操作。我有一个操作列表,可以通过循环来“解释”它们,但对我来说,尽可能快地执行这些操作对我很重要,所以我一直在尝试动态生成代码来应用它们。我生成Java源代码来执行操作,并使用Javassist编译实现这些操作的类。

Unfortunately, my dynamically-generated code runs slower than the interpreted code. It appears that this is because HotSpot is optimizing the interpreted code but isn't optimizing the compiled code: After I run it a few thousand times, my interpreted code runs twice as fast as it did initially, but my compiled code shows no speedup. Consistent with this hypothesis, my interpreted code is initially slower than the compiled code, but is eventually faster.

不幸的是,我动态生成的代码比解释代码运行得慢。看来这是因为HotSpot正在优化解释的代码,但没有优化编译的代码:运行几千次之后,我解释的代码运行速度是最初的两倍,但我的编译代码没有显示加速。与此假设一致,我的解释代码最初比编译代码慢,但最终更快。

I'm not sure why this is happening. My guess is that maybe Javassist uses a class loader whose classes HotSpot doesn't touch. But I'm not expert on class loading in Java, so I'm not sure if this is a reasonable guess or how to go about testing it. Here's how I'm creating and loading the class with Javassist:

我不确定为什么会这样。我的猜测是,Javassist可能使用类HotSpot不接触的类加载器。但我不是Java中的类加载专家,所以我不确定这是一个合理的猜测还是如何进行测试。这是我用Javassist创建和加载类的方法:

ClassPool pool = ClassPool.getDefault();
CtClass tClass = pool.makeClass("foo");

// foo implements MyInterface, with one method
tClass.addInterface(pool.get(MyInterface.class.getName()));

// Get the source for the method and add it
CtMethod tMethod = CtNewMethod.make(getSource(), tClass);
tClass.addMethod(tMethod);

// finally, compile and load the class
return (MyInterface)tClass.toClass().newInstance();

Does anyone have an idea as to what's going on here? I'd really appreciate whatever help you can give.

有没有人知道这里发生了什么?我真的很感激你能给予的任何帮助。

I'm using the Sun 1.6 server JVM on Windows XP 32-bit.

我在Windows XP 32位上使用Sun 1.6服务器JVM。

3 个解决方案

#1


HotSpot doesn't care where the code comes from. For instance, it'll happily inline code called through a virtual method call with an implementation loaded by a different class loader.

HotSpot并不关心代码的来源。例如,它很乐意内联通过虚拟方法调用调用的代码,其中实现由不同的类加载器加载。

I suggest you write out in source code the operations you are trying to perform for this benchmark, and then benchmark that. It's usually easier to write out an example of generated code rather than writing the generator anyway.

我建议你在源代码中写出你试图为这个基准测试执行的操作,然后对其进行基准测试。编写生成代码的示例通常更容易,而不是编写生成器。

There a number of reasons why HotSpot might not optimise code as hard as it might. For instance very long methods will tend not to be inlined or have method inlined into them.

HotSpot可能没有尽可能地优化代码的原因有很多。例如,很长的方法往往不会内联或将方法内联到它们中。

#2


I think I understand what was going on here. My first mistake was generating methods which were too long. After I fixed that, I noticed that although my generated code was still slower, it eventually approached the speed of the interpreted code.

我想我明白这里发生了什么。我的第一个错误是生成太长的方法。在我修复之后,我注意到虽然我生成的代码仍然较慢,但它最终接近了解释代码的速度。

I think that the biggest speedup here comes from HotSpot optimizing my code. In the interpreted version, there's very little code to optimize, so HotSpot quickly takes care of it. In the generated version, there's a lot of code to optimize, so HotSpot takes longer to work its magic over all the code.

我认为最大的加速来自HotSpot优化我的代码。在解释版本中,优化的代码非常少,因此HotSpot会快速处理它。在生成的版本中,有很多代码需要优化,因此HotSpot需要更长的时间来完成所有代码的魔力。

If I run my benchmarks for long enough, I now see my generated code performing just slightly better than the interpreted code.

如果我运行基准测试的时间足够长,我现在看到生成的代码表现略好于解释代码。

#3


There is a JVM settings that controls how fast code should be compiled -XX:CompileThreshold=10000

有一个JVM设置可以控制编译代码的速度-XX:CompileThreshold = 10000

Number of method invocations/branches before compiling [-client: 1,500]

编译前的方法调用/分支数[-client:1,500]

I do not know if this will help, because in your example, the size seem to play a vital role.

我不知道这是否会有所帮助,因为在你的例子中,大小似乎起着至关重要的作用。

#1


HotSpot doesn't care where the code comes from. For instance, it'll happily inline code called through a virtual method call with an implementation loaded by a different class loader.

HotSpot并不关心代码的来源。例如,它很乐意内联通过虚拟方法调用调用的代码,其中实现由不同的类加载器加载。

I suggest you write out in source code the operations you are trying to perform for this benchmark, and then benchmark that. It's usually easier to write out an example of generated code rather than writing the generator anyway.

我建议你在源代码中写出你试图为这个基准测试执行的操作,然后对其进行基准测试。编写生成代码的示例通常更容易,而不是编写生成器。

There a number of reasons why HotSpot might not optimise code as hard as it might. For instance very long methods will tend not to be inlined or have method inlined into them.

HotSpot可能没有尽可能地优化代码的原因有很多。例如,很长的方法往往不会内联或将方法内联到它们中。

#2


I think I understand what was going on here. My first mistake was generating methods which were too long. After I fixed that, I noticed that although my generated code was still slower, it eventually approached the speed of the interpreted code.

我想我明白这里发生了什么。我的第一个错误是生成太长的方法。在我修复之后,我注意到虽然我生成的代码仍然较慢,但它最终接近了解释代码的速度。

I think that the biggest speedup here comes from HotSpot optimizing my code. In the interpreted version, there's very little code to optimize, so HotSpot quickly takes care of it. In the generated version, there's a lot of code to optimize, so HotSpot takes longer to work its magic over all the code.

我认为最大的加速来自HotSpot优化我的代码。在解释版本中,优化的代码非常少,因此HotSpot会快速处理它。在生成的版本中,有很多代码需要优化,因此HotSpot需要更长的时间来完成所有代码的魔力。

If I run my benchmarks for long enough, I now see my generated code performing just slightly better than the interpreted code.

如果我运行基准测试的时间足够长,我现在看到生成的代码表现略好于解释代码。

#3


There is a JVM settings that controls how fast code should be compiled -XX:CompileThreshold=10000

有一个JVM设置可以控制编译代码的速度-XX:CompileThreshold = 10000

Number of method invocations/branches before compiling [-client: 1,500]

编译前的方法调用/分支数[-client:1,500]

I do not know if this will help, because in your example, the size seem to play a vital role.

我不知道这是否会有所帮助,因为在你的例子中,大小似乎起着至关重要的作用。