正确索引Lucene中的纬度和经度值

时间:2023-02-01 03:07:22

Am working on a "US based nearest city search within a given radius" functionality using Lucene API. Am indexing city's lat and long values in Lucene as follows:

我正在使用Lucene API处理“在给定半径内基于​​美国的最近城市搜索”功能。我在Lucene索引城市的纬度和长值如下:

doc.Add(new Field("latitude", paddedLatitude, Field.Store.YES, Field.Index.UN_TOKENIZED));

doc.Add(new Field("longitude", paddedLongitude, Field.Store.YES, Field.Index.UN_TOKENIZED));

Since Lucene only understands strings and not numbers, am padding lat and long values.

由于Lucene只理解字符串而不是数字,因此填充lat和long值。

For example, if original lat and long are 41.811846 and -87.820628 respectively, after padding,values look like:

例如,如果原始lat和long分别为41.811846和-87.820628,则在填充后,值看起来像:

paddedLatitude -->"0041.811846" and paddedLongitude-->"-087.820628"

paddedLatitude - >“0041.811846”和paddedLongitude - >“ - 087.820628”

Am doing the same padding while building the nearest city query(using Lucene's ConstantScoreRangeQuery class).

在构建最近的城市查询时使用相同的填充(使用Lucene的ConstantScoreRangeQuery类)。

Given the fact that lat and long values could be decimal/negative numbers, is this the right approach to index them so that I would get correct nearest cities in the search results when lucene would perform a number Range/comparison operation on these values?

鉴于lat和long值可能是十进制/负数,这是正确的索引方法,以便当lucene对这些值执行数字范围/比较操作时,我会在搜索结果中得到正确的最近城市吗?

Thanks.

2 个解决方案

#1


Here's the bleeding edge about Searching Numerical Fields in Lucene by Uwe Schindler, the expert on the subject. You may need to use the older (and slower) ConstantScoreRangeQuery because Lucene.net is a bit behind Lucene, and the class NumericRangeQuery described in the link was not yet released in Java Lucene.

这是关于该主题的专家Uwe Schindler在Lucene中搜索数字字段的最前沿。您可能需要使用较旧(且较慢)的ConstantScoreRangeQuery,因为Lucene.net稍微落后于Lucene,并且链接中描述的NumericRangeQuery类尚未在Java Lucene中发布。

#2


The linked article in Yuval F's answer made me realize I was wrong in an earlier answer, which you seem to be relying on.

Yuval F的答案中的链接文章让我意识到我在之前的答案中错了,你似乎依赖它。

You shouldn't index negative numbers as is, especially in this case, where some of the values are negative and some are positive.

您不应该按原样索引负数,尤其是在这种情况下,其中一些值为负值且一些值为正值。

This article seems to have a pretty good discussion of spatial search. He uses some transformations to make all the values positive, and he also touches on other subjects you should probably be aware of, like distance calculations.

这篇文章似乎对空间搜索有了很好的讨论。他使用一些变换来使所有值都变为正值,并且他还触及了你可能应该注意的其他主题,比如距离计算。

One thing to remember if you're encoding the values is to encode them both for the indexing and when building the query.

如果您对值进行编码,则要记住的一件事是将它们编码为索引和构建查询时。

#1


Here's the bleeding edge about Searching Numerical Fields in Lucene by Uwe Schindler, the expert on the subject. You may need to use the older (and slower) ConstantScoreRangeQuery because Lucene.net is a bit behind Lucene, and the class NumericRangeQuery described in the link was not yet released in Java Lucene.

这是关于该主题的专家Uwe Schindler在Lucene中搜索数字字段的最前沿。您可能需要使用较旧(且较慢)的ConstantScoreRangeQuery,因为Lucene.net稍微落后于Lucene,并且链接中描述的NumericRangeQuery类尚未在Java Lucene中发布。

#2


The linked article in Yuval F's answer made me realize I was wrong in an earlier answer, which you seem to be relying on.

Yuval F的答案中的链接文章让我意识到我在之前的答案中错了,你似乎依赖它。

You shouldn't index negative numbers as is, especially in this case, where some of the values are negative and some are positive.

您不应该按原样索引负数,尤其是在这种情况下,其中一些值为负值且一些值为正值。

This article seems to have a pretty good discussion of spatial search. He uses some transformations to make all the values positive, and he also touches on other subjects you should probably be aware of, like distance calculations.

这篇文章似乎对空间搜索有了很好的讨论。他使用一些变换来使所有值都变为正值,并且他还触及了你可能应该注意的其他主题,比如距离计算。

One thing to remember if you're encoding the values is to encode them both for the indexing and when building the query.

如果您对值进行编码,则要记住的一件事是将它们编码为索引和构建查询时。