如何存储和搜索IP地址

时间:2022-09-16 13:30:30

I have the 4 sources of IP addresses , I want to store them in SQL Server and allow the ranges, that can be categorised by the originating country code, to be maked in an Exclusion list by country.

我有4个IP地址源,我想将它们存储在SQL Server中,并允许按原始国家/地区代码分类的范围按国家/地区列在排除列表中。

For this I have 2 tables.

为此我有2张桌子。

IPAddressRange CountryCode

What I need to know is, if this data was returned to the client then cached for quick querying , what is the best way to store the returned data to query a specific IP address within the ranges. I want to know if the supplied IP address is in the list.

我需要知道的是,如果这些数据被返回到客户端然后缓存以便快速查询,那么存储返回数据以查询范围内的特定IP地址的最佳方法是什么。我想知道提供的IP地址是否在列表中。

The reason the list is in the db is for easy storage.

列表在数据库中的原因是为了便于存储。

The reason I want to cache then use the data on the client is that I have heard that searching IP addresses is faster in a trie structure. So , I am think I need to get the list from the db , store in cache in a structure that is very quick to search.

我想缓存然后使用客户端上的数据的原因是我听说在trie结构中搜索IP地址更快。所以,我认为我需要从db中获取列表,在缓存中存储一​​个非常快速搜索的结构。

Any help in the A) The SQL stucture to store the addresses and b) Code to search the IP addresses.

A)用于存储地址的SQL结构和b)用于搜索IP地址的代码的任何帮助。

I know of a code project solution which has a code algorithm for searching not sure how to mix this with the storage aspect.

我知道一个代码项目解决方案,它有一个搜索代码算法,不知道如何将它与存储方面混合。

Ideally without the use of a third party library. The code must be on our own server.

理想情况下,不使用第三方库。代码必须在我们自己的服务器上。

7 个解决方案

#1


I've done a filter by country exactly like you describe.

我完全像你描述的那样按照国家做了一个过滤器。

However, after experimenting a while, I found out that it can't be done in a performant way with SQL. That's why IP databases like this one (the one I'm using) offer a binary database, which is much faster because it's optimized for this kind of data.

但是,经过一段时间的实验,我发现它不能用SQL以高效的方式完成。这就是为什么像这样的IP数据库(我正在使用的那个)提供二进制数据库的原因,它更快,因为它针对这种数据进行了优化。

They even say explicitly:

他们甚至明确地说:

Note that queries made against the CSV data imported into a SQL database can take up to a few seconds. If performance is an issue, the binary format is much faster, and can handle thousands of lookups per second.

请注意,针对导入SQL数据库的CSV数据进行的查询最多可能需要几秒钟。如果性能是一个问题,二进制格式要快得多,并且每秒可以处理数千次查找。

Plus, they even give you the code to query this database.

此外,他们甚至会为您提供查询此数据库的代码。

I'm using this in a production website with medium traffic, filtering every request, with no performance problems.

我在具有中等流量的生产网站中使用它,过滤每个请求,没有性能问题。

#2


Assuming your IP Addresses are IPV4, you could just store them in an integer field. Create 2 fields, one for the lower bound for the range, and another for the upper bound. Then make sure these to fields are indexed. When searching for values, just search where the value is greater than or equal to the lower bound, and less than or equal to the upper bound. I would experiment with something simple like this before trying to program something more complicated yourself, which doesn't actually give noticeably quicker results.

假设您的IP地址是IPV4,您可以将它们存储在整数字段中。创建2个字段,一个用于范围的下限,另一个用于上限。然后确保将这些字段编入索引。搜索值时,只搜索值大于或等于下限的位置,并小于或等于上限。在尝试编写更复杂的东西之前,我会尝试这样简单的事情,这实际上并没有给出明显更快的结果。

#3


An IPv4 address can be stored as a four-byte unsigned integer (an uint in C#). An IPv6 address can be an eight-byte unsigned integer (an ulong in C#). Create columns of the appropriate width in SQL, then retrieve and store them in variables. You then use simple integer math to check for the ranges you want, assuming that the ranges are actually contiguous.

IPv4地址可以存储为四字节无符号整数(C#中的uint)。 IPv6地址可以是8字节无符号整数(C#中的ulong)。在SQL中创建适当宽度的列,然后检索并将它们存储在变量中。然后使用简单整数数学来检查所需的范围,假设范围实际上是连续的。

A more elaborate solution would be to create an IPAddress class that gives you access to the more familiar dotted-quad structure, but under the covers it would do the exact same thing that you have here.

一个更精细的解决方案是创建一个IPAddress类,使您可以访问更熟悉的点阵四边形结构,但在幕后它将完成与此处完全相同的操作。

#4


I have never attempted this, so take my answer with a grain of salt, but I think a trie isn't actually what you want unless you intend to store every single IP you want to block (as opposed to ranges or subnets/masks). I think a btree would be better suited, in which case, just go ahead and use your regular database (many databases are implemented with btrees or equally good data structures). I'd store each of the 4 bytes of the IP in a separate column to aide in searching by class A/B/C subnets with "don't care" values equal to NULL, but there's no reason why you couldn't store it as a single 32 bit integer column and crunch the numbers to figure out what range it should fall into (storing masked-out values would be marginally more tricky in this case).

我从来没有尝试过这个,所以我的回答很简单,但我认为除非你打算存储你想要阻止的每一个IP(而不是范围或子网/掩码),否则实际上并不是你想要的。 。我认为btree更适合,在这种情况下,只需继续使用您的常规数据库(许多数据库使用btree或同样好的数据结构实现)。我将IP的4个字节中的每一个存储在一个单独的列中,以帮助按类A / B / C子网搜索“不关心”值等于NULL,但是没有理由不能存储它作为一个32位整数列,并确定数字以确定它应该落入哪个范围(在这种情况下存储屏蔽值会稍微有些棘手)。

#5


An IPv6 address can be an eight-byte unsigned integer (an ulong in C#)

IPv6地址可以是8字节无符号整数(C#中的ulong)

IPv6 addresses are 128-bit (16 byte) not 8 as suggested. I am grappling with this very problem right now for IP ranges.

IPv6地址是128位(16字节),而不是建议的8位。我正在努力解决IP范围的这个问题。

I am looking to try padded or hex strings and just do < and > comparisons

我希望尝试填充或十六进制字符串,只需进行 <和> 比较

#6


You can efficiently do it provided you store your IPv4 start addresses in the right data type. A varchar (or other string type) is not right - you need to use an int.

如果以正确的数据类型存储IPv4起始地址,则可以高效地执行此操作。 varchar(或其他字符串类型)不正确 - 您需要使用int。

For IPv4, store the IP number in an unsigned in which is big enough, then store it as a INET_ATON format (which is easy enough to generate; I'm not sure how in C# but it ain't difficult).

对于IPv4,将IP号存储在一个足够大的无符号中,然后将其存储为INET_ATON格式(这很容易生成;我不确定如何在C#中使用它并不困难)。

You can then easily and efficiently look up which range an IP address is part of by arranging for the database to do a range scan.

然后,您可以通过安排数据库进行范围扫描,轻松高效地查找IP地址所属的范围。

By using LIMIT (or SELECT TOP 1 in MSSQL) you can have it stop once it finds a record.

通过使用LIMIT(或MSSQL中的SELECT TOP 1),您可以在找到记录后停止它。

SELECT TOP 1 networkidorwhatever, IPNumber, IPNumberUpperBoundOrWhateverYouCallIt 
FROM networks 
WHERE IPNumber <= IPNUMBERTOQUERY ORDER BY IPNumber DESC 

Should find the highest numbered network number which is <= the IP number, then it's a trivial check to determine whether that IP address is within it.

如果找到编号最大的网络号<= IP号,那么这是一个简单的检查,以确定该IP地址是否在其中。

It should be efficient provided there is a conventional index on IPNumber.

如果IPNumber上有常规索引,它应该是有效的。

For IPv6 the types are different but the principle is the same.

对于IPv6,类型不同,但原理是相同的。

#7


For IPv4 normally a DBA would recommend 4 tinyint fields but you're doing ranges, which lend itself more to the integer storage solutions previously provided. In that case you would store a beginning IP address and an ending IP address for the range. Then it's a simple matter to do the comparison.

对于IPv4,通常DBA会推荐4个tinyint字段,但是你正在做范围,这更适合以前提供的整数存储解决方案。在这种情况下,您将存储范围的起始IP地址和结束IP地址。然后进行比较是一件简单的事情。

#1


I've done a filter by country exactly like you describe.

我完全像你描述的那样按照国家做了一个过滤器。

However, after experimenting a while, I found out that it can't be done in a performant way with SQL. That's why IP databases like this one (the one I'm using) offer a binary database, which is much faster because it's optimized for this kind of data.

但是,经过一段时间的实验,我发现它不能用SQL以高效的方式完成。这就是为什么像这样的IP数据库(我正在使用的那个)提供二进制数据库的原因,它更快,因为它针对这种数据进行了优化。

They even say explicitly:

他们甚至明确地说:

Note that queries made against the CSV data imported into a SQL database can take up to a few seconds. If performance is an issue, the binary format is much faster, and can handle thousands of lookups per second.

请注意,针对导入SQL数据库的CSV数据进行的查询最多可能需要几秒钟。如果性能是一个问题,二进制格式要快得多,并且每秒可以处理数千次查找。

Plus, they even give you the code to query this database.

此外,他们甚至会为您提供查询此数据库的代码。

I'm using this in a production website with medium traffic, filtering every request, with no performance problems.

我在具有中等流量的生产网站中使用它,过滤每个请求,没有性能问题。

#2


Assuming your IP Addresses are IPV4, you could just store them in an integer field. Create 2 fields, one for the lower bound for the range, and another for the upper bound. Then make sure these to fields are indexed. When searching for values, just search where the value is greater than or equal to the lower bound, and less than or equal to the upper bound. I would experiment with something simple like this before trying to program something more complicated yourself, which doesn't actually give noticeably quicker results.

假设您的IP地址是IPV4,您可以将它们存储在整数字段中。创建2个字段,一个用于范围的下限,另一个用于上限。然后确保将这些字段编入索引。搜索值时,只搜索值大于或等于下限的位置,并小于或等于上限。在尝试编写更复杂的东西之前,我会尝试这样简单的事情,这实际上并没有给出明显更快的结果。

#3


An IPv4 address can be stored as a four-byte unsigned integer (an uint in C#). An IPv6 address can be an eight-byte unsigned integer (an ulong in C#). Create columns of the appropriate width in SQL, then retrieve and store them in variables. You then use simple integer math to check for the ranges you want, assuming that the ranges are actually contiguous.

IPv4地址可以存储为四字节无符号整数(C#中的uint)。 IPv6地址可以是8字节无符号整数(C#中的ulong)。在SQL中创建适当宽度的列,然后检索并将它们存储在变量中。然后使用简单整数数学来检查所需的范围,假设范围实际上是连续的。

A more elaborate solution would be to create an IPAddress class that gives you access to the more familiar dotted-quad structure, but under the covers it would do the exact same thing that you have here.

一个更精细的解决方案是创建一个IPAddress类,使您可以访问更熟悉的点阵四边形结构,但在幕后它将完成与此处完全相同的操作。

#4


I have never attempted this, so take my answer with a grain of salt, but I think a trie isn't actually what you want unless you intend to store every single IP you want to block (as opposed to ranges or subnets/masks). I think a btree would be better suited, in which case, just go ahead and use your regular database (many databases are implemented with btrees or equally good data structures). I'd store each of the 4 bytes of the IP in a separate column to aide in searching by class A/B/C subnets with "don't care" values equal to NULL, but there's no reason why you couldn't store it as a single 32 bit integer column and crunch the numbers to figure out what range it should fall into (storing masked-out values would be marginally more tricky in this case).

我从来没有尝试过这个,所以我的回答很简单,但我认为除非你打算存储你想要阻止的每一个IP(而不是范围或子网/掩码),否则实际上并不是你想要的。 。我认为btree更适合,在这种情况下,只需继续使用您的常规数据库(许多数据库使用btree或同样好的数据结构实现)。我将IP的4个字节中的每一个存储在一个单独的列中,以帮助按类A / B / C子网搜索“不关心”值等于NULL,但是没有理由不能存储它作为一个32位整数列,并确定数字以确定它应该落入哪个范围(在这种情况下存储屏蔽值会稍微有些棘手)。

#5


An IPv6 address can be an eight-byte unsigned integer (an ulong in C#)

IPv6地址可以是8字节无符号整数(C#中的ulong)

IPv6 addresses are 128-bit (16 byte) not 8 as suggested. I am grappling with this very problem right now for IP ranges.

IPv6地址是128位(16字节),而不是建议的8位。我正在努力解决IP范围的这个问题。

I am looking to try padded or hex strings and just do < and > comparisons

我希望尝试填充或十六进制字符串,只需进行 <和> 比较

#6


You can efficiently do it provided you store your IPv4 start addresses in the right data type. A varchar (or other string type) is not right - you need to use an int.

如果以正确的数据类型存储IPv4起始地址,则可以高效地执行此操作。 varchar(或其他字符串类型)不正确 - 您需要使用int。

For IPv4, store the IP number in an unsigned in which is big enough, then store it as a INET_ATON format (which is easy enough to generate; I'm not sure how in C# but it ain't difficult).

对于IPv4,将IP号存储在一个足够大的无符号中,然后将其存储为INET_ATON格式(这很容易生成;我不确定如何在C#中使用它并不困难)。

You can then easily and efficiently look up which range an IP address is part of by arranging for the database to do a range scan.

然后,您可以通过安排数据库进行范围扫描,轻松高效地查找IP地址所属的范围。

By using LIMIT (or SELECT TOP 1 in MSSQL) you can have it stop once it finds a record.

通过使用LIMIT(或MSSQL中的SELECT TOP 1),您可以在找到记录后停止它。

SELECT TOP 1 networkidorwhatever, IPNumber, IPNumberUpperBoundOrWhateverYouCallIt 
FROM networks 
WHERE IPNumber <= IPNUMBERTOQUERY ORDER BY IPNumber DESC 

Should find the highest numbered network number which is <= the IP number, then it's a trivial check to determine whether that IP address is within it.

如果找到编号最大的网络号<= IP号,那么这是一个简单的检查,以确定该IP地址是否在其中。

It should be efficient provided there is a conventional index on IPNumber.

如果IPNumber上有常规索引,它应该是有效的。

For IPv6 the types are different but the principle is the same.

对于IPv6,类型不同,但原理是相同的。

#7


For IPv4 normally a DBA would recommend 4 tinyint fields but you're doing ranges, which lend itself more to the integer storage solutions previously provided. In that case you would store a beginning IP address and an ending IP address for the range. Then it's a simple matter to do the comparison.

对于IPv4,通常DBA会推荐4个tinyint字段,但是你正在做范围,这更适合以前提供的整数存储解决方案。在这种情况下,您将存储范围的起始IP地址和结束IP地址。然后进行比较是一件简单的事情。