如何在python中重新采样具有最大索引长度的numpy数组

I'm a newbie in python and trying to normalize each index in list using preprocessing.normalize. However, it gives me an error with ValueError: setting an array element with a sequence.

我是python中的新手,并尝试使用preprocessing.normalize规范化列表中的每个索引。但是,它给我一个ValueError错误:设置一个带序列的数组元素。

And then, I found what the problem was. It was because the length(size) of each index in np.array was different.

然后,我发现了问题所在。这是因为np.array中每个索引的长度(大小)不同。

Here is my code,

这是我的代码,

result = []

for url in target_url :
    sensor = pd.read_csv(url, header=None, delimiter=r"\s+")
    result.append(sensor[2])

result = np.array(result)
# I want to resample here before it goes to normalize.
result = preprocessing.normalize(result, norm='l1')

I have target_url to get sensor data from webserver, and each appends to the result list. Then, it converts to array by using np.array

我有target_url从webserver获取传感器数据,每个都附加到结果列表。然后,它使用np.array转换为数组

For example,

I have len(result[0]) has 121598 and len(result[1]) has 1215601. I want to make result[0] to be same length of result[1] using resample to fill NaN.

我有len(result [0])有121598和len(result [1])有1215601.我想使用resample填充NaN使result [0]与result [1]的长度相同。

How can I do that?

我怎样才能做到这一点?

Please help me out here.

请帮帮我。

Thanks in advance.

提前致谢。

EDIT

After normalizing, I'm trying to do correlation using corr()

正常化后,我正在尝试使用corr()进行相关

Here is the code,

这是代码,

result = preprocessing.normalize(result, norm='l1')
ret = pd.DataFrame(result)
corMat = DataFrame(ret.T.corr())

1 个解决方案

#1

Since you are using pandas to read csv, you are off to a good start. One way to do it is simply use pd.concat, to join the Series (I assume sensor[2] is a Series) in the result list into one DataFrame. This is an example:

既然您正在使用pandas来阅读csv,那么您将有一个良好的开端。一种方法是使用pd.concat将结果列表中的Series(我假设sensor [2]是一个Series)加入到一个DataFrame中。这是一个例子:

a = [pd.Series([1, 2, 3]), pd.Series([1, 2]), pd.Series([1, 2, 3, 4])]
pd.concat(a, axis=1)

Which gives:

     0    1  2
0  1.0  1.0  1
1  2.0  2.0  2
2  3.0  NaN  3
3  NaN  NaN  4

In the example provided by OP, this should suffice:

在OP提供的示例中,这应该足够了:

result = []

for url in target_url :
    sensor = pd.read_csv(url, header=None, delimiter=r"\s+")
    result.append(sensor[2])

# concatenate Series, and do both forward and backward fill for NaNs 
result = pd.concat(result, axis=1).fillna(method='bfill').fillna(method='ffill')

result = preprocessing.normalize(result, norm='l1')

# correlation
pd.DataFrame(result).T.corr()

Depending on what the Series indices look like, and your application, you can do different types of concatenations. Here's the docs.

根据Series索引的外观和应用程序,您可以执行不同类型的连接。这是文档。

#1

a = [pd.Series([1, 2, 3]), pd.Series([1, 2]), pd.Series([1, 2, 3, 4])]
pd.concat(a, axis=1)