（Dis）证明由于语言内部原因，一种算法比另一种算法运行得更快

For a project at university, we had to implement a few different algorithms to calculate the equivalenceclasses when given a set of elements and a collection of relations between said elements.

对于大学的项目,当给定一组元素和所述元素之间的关系集合时,我们必须实现一些不同的算法来计算等价类。

We were instructed to implement, among others, the Union-Find algorithm and its optimizations (Union by Depth, Size). By accident (doing something I thought was necessary for the correctness of the algorithm) I discovered another way to optimize the algorithm.

我们被指示实施Union-Find算法及其优化(Union by Depth,Size)。偶然(做一些我认为是算法正确性所必需的东西)我发现了另一种优化算法的方法。

It isn't as fast as Union By Depth, but close. I couldn't for the life of me figure out why it was as fast as it was, so I consulted one of the teaching assistants who couldn't figure it out either.

它没有Union By Depth那么快,但接近。我不能为我的生活找出原因,为什么它一如既往地快,所以我咨询了一位无法弄明白的助教。

The project was in java and the datastructures I used were based on simple arrays of Integers (the object, not the int) Later, at the project's evaluation, I was told that it probably had something to do with 'Java caching', but I can't find anything online about how caching would effect this.

该项目是在java中,我使用的数据结构基于简单的整数数组(对象,而不是int)后来,在项目的评估中,我被告知它可能与'Java缓存'有关,但我在网上找不到关于缓存如何影响这一点的任何内容。

What would be the best way, without calculating the complexity of the algorithm, to prove or disprove that my optimization is this fast because of java's way of doing stuff? Implementing it in another, (lower level?) language? But who's to say that language won't do the same thing?

在没有计算算法复杂性的情况下,最好的方法是证明或反驳我的优化是如此快,因为java的做法是什么?用另一种(低级?)语言实现它?但谁能说语言不会做同样的事情呢?

I hope I made myself clear,

我希望我清楚自己,

thanks

4 个解决方案

#1

The only way is to prove the worst-case (average case, etc) complexity of the algorithm.

唯一的方法是证明算法的最坏情况(平均情况等)复杂性。

Because if you don't, it might just be a consequence of a combination of

因为如果你不这样做,它可能只是一个组合的结果

The particular data

特定数据

The size of the data

数据的大小

Some aspect of the hardware

硬件的某些方面

Some aspect of the language implementation

语言实现的某些方面

etc.

#2

It is generally very difficult to perform such task given modern VM's! Like you hint they perform all sorts of stuff behind your back. Method calls gets inlined, objects are reused. Etc. A prime example is seeing how trivial loops gets compiled away if their obviously are not performing anything other than counting. Or how a funtioncall in functional programming are inlined or tail-call optimized.

鉴于现代VM,通常很难执行此类任务!就像你暗示他们背后的各种各样的东西。方法调用内联,对象被重用。等等。一个主要的例子是看看如果他们显然没有执行除计数以外的任何其他操作,那么如何编译琐碎的循环。或者如何在函数式编程中使用funtioncall进行内联或尾部调用优化。

Further, you have the difficulty of proving your point in general on any data set. An O(n^2) can easily be much faster than a seemingly faster, say O(n), algorithm. Two examples

此外,您很难在任何数据集上证明您的观点。 O(n ^ 2)可以比看似更快的O(n)算法快得多。两个例子

Bubble sort is faster at sorting a near-sorted data collection than quick sort.

与快速排序相比,冒泡排序在排序接近排序的数据集合时更快。

Quick sort in the general case, of course is faster.

在一般情况下快速排序,当然更快。

Generally the big-O notation purposely ignores constants which in a practical situation can mean life or death to your implementation. and those constants may have been what hit you. So in practice 0.00001 * n ^2 (say the running time of your algorithm) is faster than 1000000 * n log n

通常,大O符号故意忽略常量,这些常量在实际情况下可能对您的实现意味着生死。那些常数可能是什么打击了你。所以在实践中0.00001 * n ^ 2(比如算法的运行时间)快于1000000 * n log n

So reasoning is hard given the limited information you provide.

因为您提供的信息有限,所以推理很难。

#3

It is likely that either the compiler or the JVM found an optimisation for your code. You could try reading the bytecode that is output by the javac compiler, and disabling runtime JIT compilation with the -Djava.compiler=NONE option.

编译器或JVM很可能找到了代码的优化。您可以尝试读取javac编译器输出的字节码,并使用-Djava.compiler = NONE选项禁用运行时JIT编译。

#4

If you have access to the source code -- and the JDK source code is available, I believe -- then you can trawl through it to find the relevant implementation details.

如果您可以访问源代码 - 并且JDK源代码可用,我相信 - 那么您可以浏览它以查找相关的实现细节。

#1