如何使用Scipy.io.loadmat将Matlab mat文件中的字符串数组加载到Python列表或元组中

I am a Matlab user new to Python. I would like to write a cell array of strings in Matlab to a Mat file, and load this Mat file using Python (maybe scipy.io.loadmat) into some similar type (e.g list of strings or tuple of strings). But loadmat read things into array and I am not sure how to convert it into a list. I tried the "tolist" function which does not work as I expected ( I have a poor understanding of Python array or numpy array). For example:

我是Python新手的Matlab用户。我想在Matlab中将一个字符串的单元格数组写入Mat文件,并使用Python(可能是scipy.io.loadmat)将这个Mat文件加载到某个类似的类型中(例如字符串列表或字符串元组)。但是loadmat将东西读入数组,我不知道如何将其转换为列表。我尝试了“tolist”函数,它不能像我预期的那样工作(我对Python数组或numpy数组的理解很差)。例如:

Matlab code:

cell_of_strings = {'thank',  'you', 'very', 'much'};
save('my.mat', 'cell_of_strings');

Python code:

matdata=loadmat('my.mat', chars_as_strings=1, matlab_compatible=1);
array_of_strings = matdata['cell_of_strings']

Then, the variable array_of_strings is:

然后,变量array_of_strings是:

array([[[[u't' u'h' u'a' u'n' u'k']], [[u'y' u'o' u'u']],
    [[u'v' u'e' u'r' u'y']], [[u'm' u'u' u'c' u'h']]]], dtype=object)

I am not sure how to convert this array_of_strings into a Python list or tuple so that it looks like

我不知道如何将这个array_of_strings转换为Python列表或元组,以便它看起来像

list_of_strings = ['thank',  'you', 'very', 'much'];

I am not familiar with the array object in Python or numpy. Your help will be highly appreciated.

我不熟悉Python或numpy中的数组对象。我们将非常感谢您的帮助。

2 个解决方案

#1

Have your tried this:

你试过这个:

import scipy.io as si

a = si.loadmat('my.mat')
b = a['cell_of_strings']                # type(b) <type 'numpy.ndarray'>
list_of_strings  = b.tolist()           # type(list_of_strings ) <type 'list'>

print list_of_strings 
# output: [u'thank', u'you', u'very', u'much']

#2

This looks like a job for list comprehension. Repeating your example, I did this in MATLAB:

这看起来像列表理解的工作。重复你的例子,我在MATLAB中做了这个:

cell_of_strings = {'thank',  'you', 'very', 'much'};
save('my.mat', 'cell_of_strings','-v7');

I'm using a newer version of MATLAB, which saves .mat files in HDF5 format by default. loadmat can't read HDF5 files, so the '-v7' flag is to force MATLAB to save to an older version .mat file, which loadmat can understand.

我正在使用更新版本的MATLAB,它默认以HDF5格式保存.mat文件。 loadmat无法读取HDF5文件,因此'-v7'标志是强制MATLAB保存到较旧版本的.mat文件,loadmat可以理解。

In Python, I loaded the cell array just like you did:

在Python中,我像你一样加载了单元格数组:

import scipy.io as sio
matdata = sio.loadmat('%s/my.mat' %path, chars_as_strings=1, matlab_compatible=1);  
array_of_strings = matdata['cell_of_strings']

Printing array_of_strings gives:

打印array_of_strings给出:

[[array([[u't', u'h', u'a', u'n', u'k']], 
          dtype='<U1')
      array([[u'y', u'o', u'u']], 
          dtype='<U1')
      array([[u'v', u'e', u'r', u'y']], 
          dtype='<U1')
      array([[u'm', u'u', u'c', u'h']], 
          dtype='<U1')]]

The variable array_of_strings is a (1,4) numpy object array but there are arrays nested within each object. For example, the first element of array_of_strings is an (1,5) array containing the letters for 'thank'. That is,

变量array_of_strings是一个(1,4)numpy对象数组,但是每个对象中都嵌套了数组。例如,array_of_strings的第一个元素是包含'thank'字母的(1,5)数组。那是,

array_of_strings[0,0]
array([[u't', u'h', u'a', u'n', u'k']], 
      dtype='<U1')

To get at the first letter 't', you have to do something like:

要获得第一个字母't',您必须执行以下操作:

array_of_strings[0,0][0,0]
u't'

Since we are dealing with nested arrays, we need to employ some recursive technique to extract the data, i.e. nested for loops. But first, I'll show you how to extract the first word:

由于我们处理嵌套数组,我们需要使用一些递归技术来提取数据,即嵌套for循环。但首先,我将向您展示如何提取第一个单词:

first_word = [str(''.join(letter)) for letter in array_of_strings[0][0]]
first_word
['thank']

Here I am using a list comprehension. Basically, I am looping through each letter in array_of_strings[0][0] and concatenating them using the ''.join method. The string() function is to convert the unicode strings into regular strings.

在这里,我使用列表理解。基本上,我循环遍历array_of_strings [0] [0]中的每个字母,并使用'.join方法连接它们。 string()函数用于将unicode字符串转换为常规字符串。

Now, to get the list strings you want, we just need to loop through each array of letters:

现在,要获取所需的列表字符串,我们只需要遍历每个字母数组:

words = [str(''.join(letter)) for letter_array in array_of_strings[0] for letter in letter_array]
words
['thank', 'you', 'very', 'much']

List comprehensions take some getting used to, but they are extremely useful. Hope this helps.

列表理解需要一些习惯,但它们非常有用。希望这可以帮助。

#1