将数据包存储在数据库中

时间:2022-02-10 16:38:34

Problem description: In my application, I have to present the contents of data packets with a certain format. An example:

问题描述:在我的应用程序中,我必须以特定格式呈现数据包的内容。一个例子:

An example Any packed binary data, for example: 4 byte header, 4 byte type (type codes having pre-defined meanings), then source address, destination address, and so on.

示例任何打包的二进制数据,例如:4字节标头,4字节类型(具有预定义含义的类型代码),然后是源地址,目标地址等。

Previously, I made home cooked implementations that stored the data in a binary file (fixed record length allowed fast lookup), but with time I'm realized I'm inventing some kind of a database. For example, I'm implementing my own efficient binary storage format for very large data files. I'm also implementing my own indexing to rapidly run searches on some fields. I think a real DB (even the simple SQLite) can make this stuff transparently simple.

以前,我做了家庭烹饪的实现,将数据存储在二进制文件中(固定记录长度允许快速查找),但随着时间的推移,我意识到我正在发明某种数据库。例如,我正在为非常大的数据文件实现我自己的高效二进制存储格式。我也在实现自己的索引,以便在某些字段上快速运行搜索。我认为一个真正的数据库(甚至是简单的SQLite)可以使这些内容透明化。

Question #1: are DBs useful for storing such data, and how should it be done? Note that there are no 1-to-many, many-to-many mappings here and other advanced things, it's just a plain sequence of packets with a certain internal structure I want to display to the user and let him interact with (i.e. search by a certain field).

问题1:DB是否对存储此类数据很有用,应该如何处理?请注意,这里没有1对多,多对多的映射和其他高级内容,它只是一个简单的数据包序列,具有一定的内部结构,我想向用户显示并让他与之交互(即搜索某某领域)。

Question #2: Now suppose the user himself can specify the format of his packets, i.e. in a configuration file: the length of each field, its type, what its values mean (in case of an enumeration) and so on. How do I extend a DB-backed implementation for this? Should the user define DB schemas? Should the configuration file be auto-translated into this schemas? ORM?

问题#2:现在假设用户自己可以指定其数据包的格式,即在配置文件中:每个字段的长度,其类型,其值的含义(在枚举的情况下)等等。如何为此扩展数据库支持的实现?用户应该定义数据库模式吗?配置文件是否应自动转换为此架构? ORM?

Question #3: Even more advanced... Now suppose the data packages can be varying in length and contents. I.e., for type #2 packages, there are some field, for type #3, some other fields, and so on. But I'd still like my app to handle it, displaying everything nicely and also allowing users to specify the formats in config files. How is it done?

问题3:更高级......现在假设数据包的长度和内容可能不同。即,对于类型#2包,有一些字段,类型#3,一些其他字段,依此类推。但我仍然喜欢我的应用程序来处理它,很好地显示所有内容,并允许用户指定配置文件中的格式。怎么做?

Thanks in advance.

提前致谢。

6 个解决方案

#1


1  

Question #1: are DBs useful for storing such data, and how should it be done?

问题1:DB是否对存储此类数据很有用,应该如何处理?

Certainly a database is useful for this application. You could implement your own special-purpose data store, and perhaps it would be more efficient for your specific application, because you can design it for that specialization. A relational database is more general-purpose, but you can avoid weeks or months of development time by employing a database.

当然,数据库对此应用程序很有用。您可以实现自己的专用数据存储,也许它对您的特定应用程序更有效,因为您可以为该专业化设计它。关系数据库更通用,但您可以通过使用数据库来避免数周或数月的开发时间。

I answered another question earlier today on the subject of how to handle extensible types, where each new sub-type has its own distinct set of attributes.

我今天早些时候回答了另一个关于如何处理可扩展类型的问题,其中每个新子类型都有自己独特的属性集。

"product table, many kind of product, each product have many parameters."

“产品表,多种产品,每种产品都有很多参数。”

For your application, I would choose the Concrete Table Inheritance design.

对于您的应用程序,我会选择Concrete Table Inheritance设计。

Question #2: Now suppose the user himself can specify the format of his packets, i.e. in a configuration file: the length of each field, its type, what its values mean (in case of an enumeration) and so on. How do I extend a DB-backed implementation for this?

问题#2:现在假设用户自己可以指定其数据包的格式,即在配置文件中:每个字段的长度,其类型,其值的含义(在枚举的情况下)等等。如何为此扩展数据库支持的实现?

I assume the number of packet types are relatively few, and then many packets are inserted with pretty much the same structure. So you should use the database's ability to manage metadata. I would define an additional table for each new packet types.

我假设数据包类型的数量相对较少,然后插入许多数据包具有几乎相同的结构。因此,您应该使用数据库管理元数据的能力。我会为每个新的数据包类型定义一个附加表。

I would also store the packets "exploded" so each field of the packet is stored in a separate database column. That way you can index each column individually, to support efficient searching.

我还会将数据包存储为“已爆炸”,因此数据包的每个字段都存储在一个单独的数据库列中。这样,您可以单独索引每列,以支持高效搜索。

You can also define constraints so that some fields are mandatory (NOT NULL) or their values constrained by lookup tables. Again, leveraging the database's capabilities to use metadata to enforce consistent structure where it's desirable.

您还可以定义约束,以便某些字段是必需的(NOT NULL)或其值受查找表约束。再次,利用数据库的功能,使用元数据在需要的地方实施一致的结构。

SQL already supports a standard, declarative language for specifying fields with data types, constraints, etc. Why develop a different language which you then have to translate to SQL?

SQL已经支持一种标准的声明性语言,用于指定具有数据类型,约束等的字段。为什么要开发一种不同的语言,然后必须将其转换为SQL?

Question #3: Even more advanced... Now suppose the data packages can be varying in length and contents.

问题3:更高级......现在假设数据包的长度和内容可能不同。

Fields that are optional in a given packet type should permit NULL in the corresponding column.

给定数据包类型中可选的字段应允许相应列中的NULL。

#2


1  

A simple rule is this: If you are going to query the data, then it should be a discrete field within a table within the DB. If not, you can store the BLOB and be done with it.

一个简单的规则是:如果要查询数据,那么它应该是数据库中表中的离散字段。如果没有,您可以存储BLOB并完成它。

That said, if you want to derive "meta data" from a BLOB, and index THAT, then you can do that readily as well.

也就是说,如果你想从BLOB中获取“元数据”,并且索引THAT,那么你也可以轻松地做到这一点。

If your data types are congruent with what the database can support (or can be accurately converted), there can be some value is exploding the BLOB in to its component parts that map nicely in to DB columns.

如果您的数据类型与数据库可以支持(或者可以准确转换)的数据类型一致,那么可以有一些值将BLOB扩展到其组成部分,这些组件部分很好地映射到DB列。

The problem with defining "tables on the fly" (which could be easily done) is not so much the definition of the table, but the potential change of the table. Tables that are being changed (i.e. a column added, or dropped, etc.) tend to be unusable for the duration of the change. Not an issue for 100 rows. A real problem for millions of rows.

定义“动态表”(可以轻松完成)的问题不在于表的定义,而在于表的潜在变化。正在改变的表(即添加或删除的列等)在更改期间往往不可用。不是100行的问题。数百万行的真正问题。

If the data definitions are fairly static, then creating a mapping facility that lets your users describe the BLOB, and then you use that definition to both create a compliant table and convert the BLOBs appropriately during import.

如果数据定义是相当静态的,那么创建一个允许用户描述BLOB的映射工具,然后使用该定义创建兼容表并在导入期间适当地转换BLOB。

As for the "different rows of different types", you can still stuff that data in to a single table. Some rows have "unused" columns compared to others, each row is identified by type. If you have lots of row definitions, and lots of variance, you get lots of wasted space doing this. Then you may want to go to having a table for each row type, and a master table that holds the row types and references to the real rows in the actual tables. You would only need this master table if you care about the relationships of the original data packets to each other (then you can store them in receipt order, say, etc.).

至于“不同类型的不同行”,您仍然可以将数据填充到单个表中。某些行与其他行相比具有“未使用”列,每行都按类型标识。如果你有很多行定义和很多方差,那么你会浪费很多浪费的空间。然后,您可能希望为每个行类型创建一个表,以及一个包含行类型的主表和对实际表中实际行的引用。如果您关心原始数据包之间的关系,那么您只需要这个主表(然后您可以将它们存储在收据顺序中,比如说等)。

Really, it all boils down to how much data you have, how much you expect, how much work you want to do vs how much you already have done, etc.

真的,这一切都归结为你有多少数据,你期望多少,你想做多少工作,你已经完成了多少,等等。

#3


1  

Another option you may wish to consider is Berkeley DB or one of its clones. BDB is pretty low level, there's no SQL. It's pretty much a really small, really fast file-backed hash table. It's been around forever, and is used in a lot of places where speed and simplicity is paramount. You'd need to add some functionality on top to do what you're trying to accomplish, though.

您可能希望考虑的另一个选项是Berkeley DB或其中一个克隆。 BDB相当低级,没有SQL。它几乎是一个非常小,非常快的文件支持哈希表。它一直存在,并且在许多速度和简单性至关重要的地方使用。但是,您需要在顶部添加一些功能才能完成您要完成的任务。

#4


1  

Despite the fact that you stated that there are no 1-many relationships, there are :)

尽管你说没有1-many关系,但有:)

I would recommend creating two tables for packet storage. One to store "header" or "scalar" information, which is common to the packet and--while it may define WHAT data is present--isn't the actual data stored in the packet.

我建议为数据包存储创建两个表。一个用于存储“报头”或“标量”信息,这些信息对于数据包是通用的,并且 - 虽然它可以定义存在什么数据 - 但不是存储在数据包中的实际数据。

Your second table would store the data for each packet, with each field-value combination representing a row in this table. For example, the following two tables:

您的第二个表将存储每个数据包的数据,每个字段值组合表示此表中的一行。例如,以下两个表:

create table packet
(
    packet_id int identity(1, 1) primary key,
    destination varchar(50),
    sender varchar(50),
    packet_type_id int not null
)

create table packet_field
(
    packet_field_id int identity(1, 1) primary key,
    packet_id int not null references packet (packet_id),
    field_id int not null,
    data varbinary(500)
)

Obviously these two tables are making assumptions about the type and size of data being stored and aren't exhaustive in what they'll need to store. However, this fundamental structure will allow for dynamically-defined packet formats and is a schema that's easily indexed (for instance, adding an index on packet_id+field_id in packet_field would be a no-brainer).

显然,这两个表正在对存储的数据的类型和大小做出假设,并且在它们需要存储的内容中并非详尽无遗。但是,这个基本结构将允许动态定义的数据包格式,并且是一个易于索引的模式(例如,在packet_field中的packet_id + field_id上添加索引将是一个明智的选择)。

All your application is then responsible for is unpacking the packet and storing it in your DB in this schema, then repacking (if necessary).

然后,您的所有应用程序负责解压缩数据包并将其存储在此模式的数据库中,然后重新打包(如有必要)。

Of course, from this point you'll need tables that store the actual format of the packet. Something like...

当然,从这一点开始,您将需要存储数据包实际格式的表。就像是...

create table packet_type
(
    packet_type_id int identity(1, 1) primary key,
    name varchar(200) not null
)

create table packet_type_field
(
    field_id int identity(1, 1) primary key,
    packet_type_id int not null references packet_type (packet_type_id)
    field_offset int not null,
    name varchar(200) not null
)

Again, obviously simplified but it shows the basic idea. You would have a single record in your packet_type table for each packet format, and one row in the packet_type_field for each field in a given packet. This should give you most of the information you would need to be able to process an arbitrary chunk of binary data into the aforementioned packet storage schema.

再次,显然简化,但它显示了基本的想法。对于每种数据包格式,您的packet_type表中将有一条记录,而对于给定数据包中的每个字段,您将在packet_type_field中有一条记录。这应该为您提供了将大量二进制数据处理成上述数据包存储架构所需的大部分信息。

#5


1  

Three methods come to mind.

想到三种方法。

sFlow and IPFlow can transmit a limited set of packet contents. This can be logged directly into several different databases.

sFlow和IPFlow可以传输一组有限的数据包内容。这可以直接记录到几个不同的数据库中。

Another more targeted method would be a write a very simple snort rule such as source or destination address. Then have snort capture the payload of the packets. That way you would only get the actual data you require. For instance you could grab just the fields of data inside the packet. e.g. password etc.

另一个更有针对性的方法是编写一个非常简单的snort规则,例如源或目标地址。然后让snort捕获数据包的有效负载。这样,您只能获得所需的实际数据。例如,您可以只抓取数据包内的数据字段。例如密码等

ngrep can also grab selective data right off the wire.

ngrep还可以直接获取选择性数据。

Of course each of these could require a tap or monitor session on a port if you are not doing the capture on the server/workstation itself.

当然,如果您没有在服务器/工作站本身进行捕获,则每个都可能需要在端口上进行点击或监视会话。

#6


0  

Though I'm not a huge fan of this implementation, we have some software that essentially does this for some calling lists. Essentially, here's what they do:

虽然我不是这个实现的忠实粉丝,但我们有一些软件基本上可以用于某些调用列表。基本上,这是他们做的:

  1. A table with column definitions - call it tblColumnDefs. This table contains columns like "Name", "Type", "Length" and "Description"
  2. 具有列定义的表 - 将其命名为tblColumnDefs。此表包含“名称”,“类型”,“长度”和“描述”等列

  3. An instance master table (tblPacketNames). Essentially, just "PacketTypeID", "PacketName", and "Description" for each packet type you're defining
  4. 实例主表(tblPacketNames)。基本上,只是您要定义的每种数据包类型的“PacketTypeID”,“PacketName”和“Description”

  5. An instance definition table (for you, this would be tblPacketColumns). This table collects the pre-defined columns together to form the data structure that you're storing. For example, it might hold "PacketTypeID", "ColumnNumber", "ColumnID". In database-normalization-speak, this is a many-to-many table, since it maps the columns to the packets that use them.
  6. 实例定义表(对你来说,这将是tblPacketColumns)。此表一起收集预定义的列以形成您正在存储的数据结构。例如,它可能包含“PacketTypeID”,“ColumnNumber”,“ColumnID”。在数据库规范化中,这是一个多对多表,因为它将列映射到使用它们的数据包。

  7. In a second database (because of dynamic SQL/injection implications of this step), tables are created dynamically to hold the actual data. For example, if you've defined (in steps 2/3) a packet type called "PING", you might have a table called "PING" in your database to hold that data. You'd use tblPacketColumns, linked to tblColumnDefs, to figure out what field types to create and how big they should be. You end up with a collection of tables that match the packet type definitions from step 3, using the columns from step 1.
  8. 在第二个数据库中(由于此步骤的动态SQL /注入含义),动态创建表以保存实际数据。例如,如果您已定义(在步骤2/3中)名为“PING”的数据包类型,则数据库中可能会有一个名为“PING”的表来保存该数据。您将使用链接到tblColumnDefs的tblPacketColumns来确定要创建的字段类型以及它们应该有多大。您最终会得到一组与第3步中的数据包类型定义匹配的表,使用第1步中的列。

NOTE: I don't particular like the SQL-injection implications of step 4. Creating tables dynamically can lead to some consequences if security isn't designed properly and the input from any user-entered fields in your application isn't cleansed properly, especially if this application has an interface that available to untrusted callers (ie, the Internet).

注意:我不特别喜欢步骤4中的SQL注入含义。如果安全性设计不正确并且应用程序中任何用户输入字段的输入未得到正确清理,动态创建表可能会导致一些后果,特别是如果此应用程序具有可供不受信任的呼叫者(即Internet)使用的接口。

Using this, you can create indexes however you want when the tables are created (maybe you have a column in step 1 where you flag certain columns as "Indexable", and indexes are created on top of them when the tables are created.

使用此方法,您可以在创建表时创建所需的索引(可能在步骤1中有一列,您将某些列标记为“可索引”),并在创建表时在其上创建索引。

#1


1  

Question #1: are DBs useful for storing such data, and how should it be done?

问题1:DB是否对存储此类数据很有用,应该如何处理?

Certainly a database is useful for this application. You could implement your own special-purpose data store, and perhaps it would be more efficient for your specific application, because you can design it for that specialization. A relational database is more general-purpose, but you can avoid weeks or months of development time by employing a database.

当然,数据库对此应用程序很有用。您可以实现自己的专用数据存储,也许它对您的特定应用程序更有效,因为您可以为该专业化设计它。关系数据库更通用,但您可以通过使用数据库来避免数周或数月的开发时间。

I answered another question earlier today on the subject of how to handle extensible types, where each new sub-type has its own distinct set of attributes.

我今天早些时候回答了另一个关于如何处理可扩展类型的问题,其中每个新子类型都有自己独特的属性集。

"product table, many kind of product, each product have many parameters."

“产品表,多种产品,每种产品都有很多参数。”

For your application, I would choose the Concrete Table Inheritance design.

对于您的应用程序,我会选择Concrete Table Inheritance设计。

Question #2: Now suppose the user himself can specify the format of his packets, i.e. in a configuration file: the length of each field, its type, what its values mean (in case of an enumeration) and so on. How do I extend a DB-backed implementation for this?

问题#2:现在假设用户自己可以指定其数据包的格式,即在配置文件中:每个字段的长度,其类型,其值的含义(在枚举的情况下)等等。如何为此扩展数据库支持的实现?

I assume the number of packet types are relatively few, and then many packets are inserted with pretty much the same structure. So you should use the database's ability to manage metadata. I would define an additional table for each new packet types.

我假设数据包类型的数量相对较少,然后插入许多数据包具有几乎相同的结构。因此,您应该使用数据库管理元数据的能力。我会为每个新的数据包类型定义一个附加表。

I would also store the packets "exploded" so each field of the packet is stored in a separate database column. That way you can index each column individually, to support efficient searching.

我还会将数据包存储为“已爆炸”,因此数据包的每个字段都存储在一个单独的数据库列中。这样,您可以单独索引每列,以支持高效搜索。

You can also define constraints so that some fields are mandatory (NOT NULL) or their values constrained by lookup tables. Again, leveraging the database's capabilities to use metadata to enforce consistent structure where it's desirable.

您还可以定义约束,以便某些字段是必需的(NOT NULL)或其值受查找表约束。再次,利用数据库的功能,使用元数据在需要的地方实施一致的结构。

SQL already supports a standard, declarative language for specifying fields with data types, constraints, etc. Why develop a different language which you then have to translate to SQL?

SQL已经支持一种标准的声明性语言,用于指定具有数据类型,约束等的字段。为什么要开发一种不同的语言,然后必须将其转换为SQL?

Question #3: Even more advanced... Now suppose the data packages can be varying in length and contents.

问题3:更高级......现在假设数据包的长度和内容可能不同。

Fields that are optional in a given packet type should permit NULL in the corresponding column.

给定数据包类型中可选的字段应允许相应列中的NULL。

#2


1  

A simple rule is this: If you are going to query the data, then it should be a discrete field within a table within the DB. If not, you can store the BLOB and be done with it.

一个简单的规则是:如果要查询数据,那么它应该是数据库中表中的离散字段。如果没有,您可以存储BLOB并完成它。

That said, if you want to derive "meta data" from a BLOB, and index THAT, then you can do that readily as well.

也就是说,如果你想从BLOB中获取“元数据”,并且索引THAT,那么你也可以轻松地做到这一点。

If your data types are congruent with what the database can support (or can be accurately converted), there can be some value is exploding the BLOB in to its component parts that map nicely in to DB columns.

如果您的数据类型与数据库可以支持(或者可以准确转换)的数据类型一致,那么可以有一些值将BLOB扩展到其组成部分,这些组件部分很好地映射到DB列。

The problem with defining "tables on the fly" (which could be easily done) is not so much the definition of the table, but the potential change of the table. Tables that are being changed (i.e. a column added, or dropped, etc.) tend to be unusable for the duration of the change. Not an issue for 100 rows. A real problem for millions of rows.

定义“动态表”(可以轻松完成)的问题不在于表的定义,而在于表的潜在变化。正在改变的表(即添加或删除的列等)在更改期间往往不可用。不是100行的问题。数百万行的真正问题。

If the data definitions are fairly static, then creating a mapping facility that lets your users describe the BLOB, and then you use that definition to both create a compliant table and convert the BLOBs appropriately during import.

如果数据定义是相当静态的,那么创建一个允许用户描述BLOB的映射工具,然后使用该定义创建兼容表并在导入期间适当地转换BLOB。

As for the "different rows of different types", you can still stuff that data in to a single table. Some rows have "unused" columns compared to others, each row is identified by type. If you have lots of row definitions, and lots of variance, you get lots of wasted space doing this. Then you may want to go to having a table for each row type, and a master table that holds the row types and references to the real rows in the actual tables. You would only need this master table if you care about the relationships of the original data packets to each other (then you can store them in receipt order, say, etc.).

至于“不同类型的不同行”,您仍然可以将数据填充到单个表中。某些行与其他行相比具有“未使用”列,每行都按类型标识。如果你有很多行定义和很多方差,那么你会浪费很多浪费的空间。然后,您可能希望为每个行类型创建一个表,以及一个包含行类型的主表和对实际表中实际行的引用。如果您关心原始数据包之间的关系,那么您只需要这个主表(然后您可以将它们存储在收据顺序中,比如说等)。

Really, it all boils down to how much data you have, how much you expect, how much work you want to do vs how much you already have done, etc.

真的,这一切都归结为你有多少数据,你期望多少,你想做多少工作,你已经完成了多少,等等。

#3


1  

Another option you may wish to consider is Berkeley DB or one of its clones. BDB is pretty low level, there's no SQL. It's pretty much a really small, really fast file-backed hash table. It's been around forever, and is used in a lot of places where speed and simplicity is paramount. You'd need to add some functionality on top to do what you're trying to accomplish, though.

您可能希望考虑的另一个选项是Berkeley DB或其中一个克隆。 BDB相当低级,没有SQL。它几乎是一个非常小,非常快的文件支持哈希表。它一直存在,并且在许多速度和简单性至关重要的地方使用。但是,您需要在顶部添加一些功能才能完成您要完成的任务。

#4


1  

Despite the fact that you stated that there are no 1-many relationships, there are :)

尽管你说没有1-many关系,但有:)

I would recommend creating two tables for packet storage. One to store "header" or "scalar" information, which is common to the packet and--while it may define WHAT data is present--isn't the actual data stored in the packet.

我建议为数据包存储创建两个表。一个用于存储“报头”或“标量”信息,这些信息对于数据包是通用的,并且 - 虽然它可以定义存在什么数据 - 但不是存储在数据包中的实际数据。

Your second table would store the data for each packet, with each field-value combination representing a row in this table. For example, the following two tables:

您的第二个表将存储每个数据包的数据,每个字段值组合表示此表中的一行。例如,以下两个表:

create table packet
(
    packet_id int identity(1, 1) primary key,
    destination varchar(50),
    sender varchar(50),
    packet_type_id int not null
)

create table packet_field
(
    packet_field_id int identity(1, 1) primary key,
    packet_id int not null references packet (packet_id),
    field_id int not null,
    data varbinary(500)
)

Obviously these two tables are making assumptions about the type and size of data being stored and aren't exhaustive in what they'll need to store. However, this fundamental structure will allow for dynamically-defined packet formats and is a schema that's easily indexed (for instance, adding an index on packet_id+field_id in packet_field would be a no-brainer).

显然,这两个表正在对存储的数据的类型和大小做出假设,并且在它们需要存储的内容中并非详尽无遗。但是,这个基本结构将允许动态定义的数据包格式,并且是一个易于索引的模式(例如,在packet_field中的packet_id + field_id上添加索引将是一个明智的选择)。

All your application is then responsible for is unpacking the packet and storing it in your DB in this schema, then repacking (if necessary).

然后,您的所有应用程序负责解压缩数据包并将其存储在此模式的数据库中,然后重新打包(如有必要)。

Of course, from this point you'll need tables that store the actual format of the packet. Something like...

当然,从这一点开始,您将需要存储数据包实际格式的表。就像是...

create table packet_type
(
    packet_type_id int identity(1, 1) primary key,
    name varchar(200) not null
)

create table packet_type_field
(
    field_id int identity(1, 1) primary key,
    packet_type_id int not null references packet_type (packet_type_id)
    field_offset int not null,
    name varchar(200) not null
)

Again, obviously simplified but it shows the basic idea. You would have a single record in your packet_type table for each packet format, and one row in the packet_type_field for each field in a given packet. This should give you most of the information you would need to be able to process an arbitrary chunk of binary data into the aforementioned packet storage schema.

再次,显然简化,但它显示了基本的想法。对于每种数据包格式,您的packet_type表中将有一条记录,而对于给定数据包中的每个字段,您将在packet_type_field中有一条记录。这应该为您提供了将大量二进制数据处理成上述数据包存储架构所需的大部分信息。

#5


1  

Three methods come to mind.

想到三种方法。

sFlow and IPFlow can transmit a limited set of packet contents. This can be logged directly into several different databases.

sFlow和IPFlow可以传输一组有限的数据包内容。这可以直接记录到几个不同的数据库中。

Another more targeted method would be a write a very simple snort rule such as source or destination address. Then have snort capture the payload of the packets. That way you would only get the actual data you require. For instance you could grab just the fields of data inside the packet. e.g. password etc.

另一个更有针对性的方法是编写一个非常简单的snort规则,例如源或目标地址。然后让snort捕获数据包的有效负载。这样,您只能获得所需的实际数据。例如,您可以只抓取数据包内的数据字段。例如密码等

ngrep can also grab selective data right off the wire.

ngrep还可以直接获取选择性数据。

Of course each of these could require a tap or monitor session on a port if you are not doing the capture on the server/workstation itself.

当然,如果您没有在服务器/工作站本身进行捕获,则每个都可能需要在端口上进行点击或监视会话。

#6


0  

Though I'm not a huge fan of this implementation, we have some software that essentially does this for some calling lists. Essentially, here's what they do:

虽然我不是这个实现的忠实粉丝,但我们有一些软件基本上可以用于某些调用列表。基本上,这是他们做的:

  1. A table with column definitions - call it tblColumnDefs. This table contains columns like "Name", "Type", "Length" and "Description"
  2. 具有列定义的表 - 将其命名为tblColumnDefs。此表包含“名称”,“类型”,“长度”和“描述”等列

  3. An instance master table (tblPacketNames). Essentially, just "PacketTypeID", "PacketName", and "Description" for each packet type you're defining
  4. 实例主表(tblPacketNames)。基本上,只是您要定义的每种数据包类型的“PacketTypeID”,“PacketName”和“Description”

  5. An instance definition table (for you, this would be tblPacketColumns). This table collects the pre-defined columns together to form the data structure that you're storing. For example, it might hold "PacketTypeID", "ColumnNumber", "ColumnID". In database-normalization-speak, this is a many-to-many table, since it maps the columns to the packets that use them.
  6. 实例定义表(对你来说,这将是tblPacketColumns)。此表一起收集预定义的列以形成您正在存储的数据结构。例如,它可能包含“PacketTypeID”,“ColumnNumber”,“ColumnID”。在数据库规范化中,这是一个多对多表,因为它将列映射到使用它们的数据包。

  7. In a second database (because of dynamic SQL/injection implications of this step), tables are created dynamically to hold the actual data. For example, if you've defined (in steps 2/3) a packet type called "PING", you might have a table called "PING" in your database to hold that data. You'd use tblPacketColumns, linked to tblColumnDefs, to figure out what field types to create and how big they should be. You end up with a collection of tables that match the packet type definitions from step 3, using the columns from step 1.
  8. 在第二个数据库中(由于此步骤的动态SQL /注入含义),动态创建表以保存实际数据。例如,如果您已定义(在步骤2/3中)名为“PING”的数据包类型,则数据库中可能会有一个名为“PING”的表来保存该数据。您将使用链接到tblColumnDefs的tblPacketColumns来确定要创建的字段类型以及它们应该有多大。您最终会得到一组与第3步中的数据包类型定义匹配的表,使用第1步中的列。

NOTE: I don't particular like the SQL-injection implications of step 4. Creating tables dynamically can lead to some consequences if security isn't designed properly and the input from any user-entered fields in your application isn't cleansed properly, especially if this application has an interface that available to untrusted callers (ie, the Internet).

注意:我不特别喜欢步骤4中的SQL注入含义。如果安全性设计不正确并且应用程序中任何用户输入字段的输入未得到正确清理,动态创建表可能会导致一些后果,特别是如果此应用程序具有可供不受信任的呼叫者(即Internet)使用的接口。

Using this, you can create indexes however you want when the tables are created (maybe you have a column in step 1 where you flag certain columns as "Indexable", and indexes are created on top of them when the tables are created.

使用此方法,您可以在创建表时创建所需的索引(可能在步骤1中有一列,您将某些列标记为“可索引”),并在创建表时在其上创建索引。