如何从原始数组创建numpy子数组列表,然后在所述列表上应用函数?

时间:2022-08-07 19:37:34

Good Morning. Right now I am working with a csv of numerical data and have converted it into a numpy matrix. The CSV is rather large (10000x5) and is constructed as follows (the acronyms for the column vectors arn't super important I suppose, but I'll include them) : name of subject, Blood Pressure, PDAC, GSIC, TDAP

早上好。现在我正在使用数字数据的csv并将其转换为numpy矩阵。 CSV相当大(10000x5)并且构造如下(我认为列向量的首字母缩略词并不重要,但我会包括它们):主题名称,血压,PDAC,GSIC,TDAP

What I would like to do is take this create a list of numpy matrices such that each matrix contains the values associated with a unique subject name, as a simple example Edit( at suggestion I changed the "subject name column" to "subject id" by creating a mapping from names to id. In this example carl has id 1, and doug has id 2):

我想要做的是创建一个numpy矩阵列表,使每个矩阵包含与唯一主题名称相关的值,作为一个简单的例子编辑(建议我将“主题名称列”更改为“主题ID”通过创建从名称到id的映射。在此示例中,carl具有id 1,而doug具有id 2):

Original=np.matrix['1 17 28 32 79; 1 89 72 46 22; 1 91 93 88 90; 2 21 57 73 68; 2 43 32 21 22']

Carl = np.matrix['1 17 28 32 79; 1 89 72 46 22; 1 91 93 88 90']
Doug = ['2 21 57 73 68 ;2 43 32 21 22']

matrixlist = [ Doug, Carl]

For a few matrices this wouldn't be too tough of a problem-but there are a lot of subjects spread out in the parent csv, and not every subject has the same number of entries. I have tried converting all the data into a list and then using list comprehension but I'm running into some issues.

对于一些矩阵来说,问题并不是太难 - 但是在父csv中有很多主题分散,并且并非每个主题都具有相同数量的条目。我已经尝试将所有数据转换为列表,然后使用列表理解,但我遇到了一些问题。

Lastly, I was wondering if there was a way to apply a function to each element in the list of matrices. As another simple example: I wrote a function that computes the correlation matrix of a numpy array using its svd. Is it possible to apply it to every element in the list?

最后,我想知道是否有办法将函数应用于矩阵列表中的每个元素。另一个简单的例子:我编写了一个函数,使用它的svd计算numpy数组的相关矩阵。是否可以将其应用于列表中的每个元素?

def correlation_matrix(x):
    covariance_matrix = np.cov(x, y=None, rowvar=False, bias=False, ddof=None, fweights=None, aweights=None)
    correlation_matrix =np.matmul(np.matmul(fractional_matrix_power(np.diag(np.diag(covariance_matrix)),-1/2),covariance_matrix),(fractional_matrix_power(np.diag(np.diag(covariance_matrix)),-1/2)))
    return correlation_matrix

thanks in advance!

提前致谢!

1 个解决方案

#1


1  

Good evening. A very nice way to do this is to use pandas DataFrame. To read your data and to sort for subjects, do the following:

晚上好。一个非常好的方法是使用pandas DataFrame。要读取数据并对主题进行排序,请执行以下操作:

import pandas as pd
my_df = pd.read_csv(your_filename, names=['subject','0','1','2','3'])
grouped_output = my_df.groupby('subject').get_group('Carl')

This will return just the Carl Data from your DataFrame. After this you could loop through all group subjects and do whatever you'd like to do with them. A loop could look like this:

这将只返回DataFrame中的Carl Data。在此之后,你可以遍历所有小组科目,并做任何你想做的事情。循环可能如下所示:

for key, subject in my_df.groupby('subject').groups.items():
    print(my_df.groupby('subject').get_group(subject))

#1


1  

Good evening. A very nice way to do this is to use pandas DataFrame. To read your data and to sort for subjects, do the following:

晚上好。一个非常好的方法是使用pandas DataFrame。要读取数据并对主题进行排序,请执行以下操作:

import pandas as pd
my_df = pd.read_csv(your_filename, names=['subject','0','1','2','3'])
grouped_output = my_df.groupby('subject').get_group('Carl')

This will return just the Carl Data from your DataFrame. After this you could loop through all group subjects and do whatever you'd like to do with them. A loop could look like this:

这将只返回DataFrame中的Carl Data。在此之后,你可以遍历所有小组科目,并做任何你想做的事情。循环可能如下所示:

for key, subject in my_df.groupby('subject').groups.items():
    print(my_df.groupby('subject').get_group(subject))