在数据库设计中不鼓励使用个人身份信息(PII)作为外键吗?

时间:2022-07-07 12:34:14

While clensing PII from test data I have been stuck with a challenging scenario: cascading the changes through the foreign key relationships in the data. Given the focus on privacy and regulations should this practice be discouraged? If the PII itself were not used in any key fashion a neat trick would be to just shuffle the columns.

在从测试数据中清除PII的过程中,我遇到了一个具有挑战性的情况:通过数据中的外键关系级联更改。鉴于对隐私和法规的关注应该不鼓励这种做法吗?如果PII本身没有以任何关键方式使用,那么巧妙的技巧就是将列重新洗牌。

There are some commercial tools available to address this problem but none of them seem to handle a large variety of databases well.

有一些商业工具可用于解决这个问题,但它们似乎都没有很好地处理各种各样的数据库。

3 个解决方案

#1


HIPAA has a concept called the "Unique Patient Identifier" which can be used as we describe to link data: http://www.ncvhs.hhs.gov/app4.htm

HIPAA有一个名为“唯一患者标识符”的概念,可以像我们描述的那样用于链接数据:http://www.ncvhs.hhs.gov/app4.htm

Unique Patient Identifier eliminates the need for the repetitive use and disclosure of an individual's personal identification information (i.e. name, age, sex, race, marital status, place of residence, etc.) for routine internal and external communications (e.g. orders, results, medication, consultation, etc.) and protects the privacy of the individual. It helps preserve the patient anonymity while facilitating communication and information sharing. Healthcare is fundamentally a multi-disciplinary process. A Unique Patient Identifier enables the integration and the availability of critically needed information from multi-disciplinary sources and multiple care settings. Therefore, the integrity and security of the patient information depend on the use of a reliable Unique Patient Identifier.

独特的患者标识符无需重复使用和披露个人的个人身份信息(即姓名,年龄,性别,种族,婚姻状况,居住地等),用于日常的内部和外部沟通(例如,订单,结果,药物,咨询等)并保护个人隐私。它有助于保护患者匿名,同时促进沟通和信息共享。医疗保健从根本上说是一个多学科的过程。独特的患者标识符可以集成和提供来自多学科来源和多种护理设置的急需信息。因此,患者信息的完整性和安全性取决于使用可靠的唯一患者标识符。

The privacy issue hinges not so much on the identifier itself, but on the security and privacy of the data that the identifier is used to access, and how that access is controlled. My understanding is that typically this means that a system querying for information via a patient identifier should only get back information that can not be pieced together to reveal private information.

隐私问题不仅取决于标识符本身,还取决于标识符用于访问的数据的安全性和隐私性,以及如何控制访问权限。我的理解是,通常这意味着通过患者标识符查询信息的系统应该只返回不能拼凑在一起以显示私人信息的信息。

Essentially you would generate an artificial key for each person. Even though it is unique to the person, it is not personally identifying, unless you also were to release personally identifiable information along with it. For example, if you let people see only first names with a particular query, but also returned the artificial key, then they now know that artificial key 00003 is associated with first name Bob. now if you allow them to somehow go back and query with 00003 as criteria, and allow them access to the lastname, you can see how they can start to accumulate information. It is important that there be no way for an unauthorized user to get the artifical key and PII returned in the same query, since that would then make the artifical key itself PII. that's my interpretation at least.

基本上你会为每个人生成一个人工密钥。即使它是个人独有的,也不是个人识别,除非您同时发布个人身份信息。例如,如果您让人们只看到具有特定查询的名字,但也返回了人工密钥,那么他们现在知道人工密钥00003与名字Bob相关联。现在如果你允许他们以某种方式返回并以00003作为标准查询,并允许他们访问姓氏,你可以看到他们如何开始积累信息。重要的是,未经授权的用户无法获得人工密钥并在同一查询中返回PII,因为这将使人工密钥本身成为PII。这至少是我的解释。

#2


Sounds dangerous and stupid and inefficient. Keys should be synthetic ids.

听起来危险,愚蠢,效率低下。键应该是合成ID。

#3


Besides the HIPPA issues, another problem with using PII as a key is that it changes. People get new SSNs when they have their identities stolen. SSNs are also often miskeyed and thus relate the information for the wrong person (thinking more of data imports from other systems here). People (especially female people) often change their names. Differnt people also have the same name (and often, for this reason, databases hold incorrect SSN infomation for them as well becasue they match to the wrong SSN for that name) and thus very little PPI is in fact unique enough to be a key field. Further, PII should be stored in an encrypted field making it an even worse choice for a key field.

除了HIPPA问题之外,使用PII作为关键的另一个问题是它会发生变化。当他们的身份被盗时,人们会获得新的SSN。 SSN也经常被误导,因此将错误的人的信息联系起来(在这里考虑更多来自其他系统的数据输入)。人们(特别是女性)经常改名。不同的人也有相同的名称(通常,由于这个原因,数据库为他们保留不正确的SSN信息,因为他们匹配该名称的错误SSN)因此很少PPI实际上是唯一足以成为关键字段。此外,PII应存储在加密字段中,使其成为关键字段的更糟糕的选择。

#1


HIPAA has a concept called the "Unique Patient Identifier" which can be used as we describe to link data: http://www.ncvhs.hhs.gov/app4.htm

HIPAA有一个名为“唯一患者标识符”的概念,可以像我们描述的那样用于链接数据:http://www.ncvhs.hhs.gov/app4.htm

Unique Patient Identifier eliminates the need for the repetitive use and disclosure of an individual's personal identification information (i.e. name, age, sex, race, marital status, place of residence, etc.) for routine internal and external communications (e.g. orders, results, medication, consultation, etc.) and protects the privacy of the individual. It helps preserve the patient anonymity while facilitating communication and information sharing. Healthcare is fundamentally a multi-disciplinary process. A Unique Patient Identifier enables the integration and the availability of critically needed information from multi-disciplinary sources and multiple care settings. Therefore, the integrity and security of the patient information depend on the use of a reliable Unique Patient Identifier.

独特的患者标识符无需重复使用和披露个人的个人身份信息(即姓名,年龄,性别,种族,婚姻状况,居住地等),用于日常的内部和外部沟通(例如,订单,结果,药物,咨询等)并保护个人隐私。它有助于保护患者匿名,同时促进沟通和信息共享。医疗保健从根本上说是一个多学科的过程。独特的患者标识符可以集成和提供来自多学科来源和多种护理设置的急需信息。因此,患者信息的完整性和安全性取决于使用可靠的唯一患者标识符。

The privacy issue hinges not so much on the identifier itself, but on the security and privacy of the data that the identifier is used to access, and how that access is controlled. My understanding is that typically this means that a system querying for information via a patient identifier should only get back information that can not be pieced together to reveal private information.

隐私问题不仅取决于标识符本身,还取决于标识符用于访问的数据的安全性和隐私性,以及如何控制访问权限。我的理解是,通常这意味着通过患者标识符查询信息的系统应该只返回不能拼凑在一起以显示私人信息的信息。

Essentially you would generate an artificial key for each person. Even though it is unique to the person, it is not personally identifying, unless you also were to release personally identifiable information along with it. For example, if you let people see only first names with a particular query, but also returned the artificial key, then they now know that artificial key 00003 is associated with first name Bob. now if you allow them to somehow go back and query with 00003 as criteria, and allow them access to the lastname, you can see how they can start to accumulate information. It is important that there be no way for an unauthorized user to get the artifical key and PII returned in the same query, since that would then make the artifical key itself PII. that's my interpretation at least.

基本上你会为每个人生成一个人工密钥。即使它是个人独有的,也不是个人识别,除非您同时发布个人身份信息。例如,如果您让人们只看到具有特定查询的名字,但也返回了人工密钥,那么他们现在知道人工密钥00003与名字Bob相关联。现在如果你允许他们以某种方式返回并以00003作为标准查询,并允许他们访问姓氏,你可以看到他们如何开始积累信息。重要的是,未经授权的用户无法获得人工密钥并在同一查询中返回PII,因为这将使人工密钥本身成为PII。这至少是我的解释。

#2


Sounds dangerous and stupid and inefficient. Keys should be synthetic ids.

听起来危险,愚蠢,效率低下。键应该是合成ID。

#3


Besides the HIPPA issues, another problem with using PII as a key is that it changes. People get new SSNs when they have their identities stolen. SSNs are also often miskeyed and thus relate the information for the wrong person (thinking more of data imports from other systems here). People (especially female people) often change their names. Differnt people also have the same name (and often, for this reason, databases hold incorrect SSN infomation for them as well becasue they match to the wrong SSN for that name) and thus very little PPI is in fact unique enough to be a key field. Further, PII should be stored in an encrypted field making it an even worse choice for a key field.

除了HIPPA问题之外,使用PII作为关键的另一个问题是它会发生变化。当他们的身份被盗时,人们会获得新的SSN。 SSN也经常被误导,因此将错误的人的信息联系起来(在这里考虑更多来自其他系统的数据输入)。人们(特别是女性)经常改名。不同的人也有相同的名称(通常,由于这个原因,数据库为他们保留不正确的SSN信息,因为他们匹配该名称的错误SSN)因此很少PPI实际上是唯一足以成为关键字段。此外,PII应存储在加密字段中,使其成为关键字段的更糟糕的选择。