如何在scikit-learn中处理带有名字的数据?

时间:2022-08-24 07:00:51

I am about to experiment with clustering algorithms to cluster file attributes (e.g. access time).

我即将尝试使用聚类算法来聚类文件属性(例如访问时间)。

Does scikit support clustering of named data, i.e., how can I retrieve the file names after the clustering algorithm run?

scikit是否支持命名数据的聚类,即如何在聚类算法运行后检索文件名?

Is there a way to store metadata with the training data, e.g., the file names? This metadata should survive feature scaling, introduction of artificial features, etc.

有没有办法用训练数据存储元数据,例如文件名?此元数据应在特征缩放,人工特征的引入等方面存在。

1 个解决方案

#1


It is currently not possible to attach names or properties to rows in scikit-learn. This will change soon (https://github.com/scikit-learn/scikit-learn/issues/4497). But for now, it is really easy to keep track of this yourself. The order of the data points is the same as the order of the cluster labels you get out, so the first cluster label corresponds to the first file name etc

目前无法在scikit-learn中将名称或属性附加到行。这将很快改变(https://github.com/scikit-learn/scikit-learn/issues/4497)。但就目前而言,自己跟踪这个很容易。数据点的顺序与您获得的集群标签的顺序相同,因此第一个集群标签对应于第一个文件名等

#1


It is currently not possible to attach names or properties to rows in scikit-learn. This will change soon (https://github.com/scikit-learn/scikit-learn/issues/4497). But for now, it is really easy to keep track of this yourself. The order of the data points is the same as the order of the cluster labels you get out, so the first cluster label corresponds to the first file name etc

目前无法在scikit-learn中将名称或属性附加到行。这将很快改变(https://github.com/scikit-learn/scikit-learn/issues/4497)。但就目前而言,自己跟踪这个很容易。数据点的顺序与您获得的集群标签的顺序相同,因此第一个集群标签对应于第一个文件名等