
时间:2022-12-16 22:52:04

I really like data.frames in R because you can store different types of data in one data structure and you have a lot of different methods to modify the data (add column, combine data.frames,...), it is really easy to extract a subset from the data,...

我非常喜欢data.frame in R,因为你可以在一个数据结构中存储不同类型的数据,并且你有很多不同的方法来修改数据(添加列,合并数据,…),从数据中提取一个子集非常简单,……

Is there any Java library available which have the same functionality? I'm mostly interested in storing different types of data in a matrix-like fashion and be able to extract a subset of the data.


Using a two-dimensional array in Java can provide a similar structure, but it is much more difficult to add a column and afterwards extract the top k records.


6 个解决方案



I have just open-sourced a first draft version of Paleo, a Java 8 library which offers data frames based on typed columns (including support for primitive values). Columns can be created programmatically (through a simple builder API), or imported from text file.

我刚刚开放了Paleo的初稿版本,这是一个Java 8库,它提供基于类型化列的数据帧(包括对原始值的支持)。可以通过编程方式创建列(通过一个简单的生成器API),或者从文本文件中导入。

Please refer to the README for further details.


The project is still wet from birth – I am very interested in feedback / PRs, tia!

这个项目从出生起就一直是湿的-我对反馈/ PRs很感兴趣,tia!



I also found myself in need of a data frame structure while working in Java recently. Fortunately, after writing a very basic implementation I was able to get approval to release it as open source. You can find my implementation here: Joinery -- Data frames for Java. Contributions and feature requests are welcome.




Tablesaw ( is Java dataframe begun in 2015 and under active development in 2017. It's designed to be as scalable as possible without sacrificing ease-of-use. Features include filtering by rows and columns, descriptive stats, map/reduce functions, cross-tabs, plots, machine learning. Apache license.

在2015年开始的Java dataframe,在2017年的积极开发下。它被设计成尽可能的可伸缩,同时又不牺牲易用性。特性包括按行和列进行过滤、描述性统计、map/reduce函数、交叉选项卡、绘图、机器学习。Apache许可证。

In one query test it returned 500+ records from a 500,000,000 record table in 2 ms.


It also includes a column-oriented store that is much smaller and faster than working with CSV files. Contributions, feature requests, and just-plain feedback is welcome.




Not being very proficient with R, but you should have a look at Guava, specifically Tables. They do not provide the exact functionality you want, but you could either extend them or their specification could help you in writing your own Collection.




Morpheus ( provides a DataFrame analogue to that of R. It is a high performance column store data structure that enables data to sorted, sliced, grouped, and aggregated in either the row or column dimension. It also supports parallel processing for many of these operations using the Fork & Join framework internally.

Morpheus (提供了一个DataFrame类似于r的数据结构,它是一个高性能的列存储数据结构,它可以使数据在行或列维度中进行排序、切片、分组和聚合。它还支持在内部使用Fork & Join框架对许多这些操作进行并行处理。

You can easily read & write data to CSV files, databases and also a proprietary JSON format. Adapters to load data from Quandl, Google Finance and others are also available.


It has built in support for various styles of Linear Regressions, Principal Component Analysis, Linear Algebra and other types of analytics support. The feature set is still growing, but it is already a very capable framework.




In R we have the dataframe, in Python we have pandas, in Java: There is the Schema from the deeplearning4j


There is also a version for the data analysis of the ubiquitous iris data if you want to just get started, here


There are also other custom objects (from Weka, from Tensorflow that are more or less the same).




I have just open-sourced a first draft version of Paleo, a Java 8 library which offers data frames based on typed columns (including support for primitive values). Columns can be created programmatically (through a simple builder API), or imported from text file.

我刚刚开放了Paleo的初稿版本,这是一个Java 8库,它提供基于类型化列的数据帧(包括对原始值的支持)。可以通过编程方式创建列(通过一个简单的生成器API),或者从文本文件中导入。

Please refer to the README for further details.


The project is still wet from birth – I am very interested in feedback / PRs, tia!

这个项目从出生起就一直是湿的-我对反馈/ PRs很感兴趣,tia!



I also found myself in need of a data frame structure while working in Java recently. Fortunately, after writing a very basic implementation I was able to get approval to release it as open source. You can find my implementation here: Joinery -- Data frames for Java. Contributions and feature requests are welcome.




Tablesaw ( is Java dataframe begun in 2015 and under active development in 2017. It's designed to be as scalable as possible without sacrificing ease-of-use. Features include filtering by rows and columns, descriptive stats, map/reduce functions, cross-tabs, plots, machine learning. Apache license.

在2015年开始的Java dataframe,在2017年的积极开发下。它被设计成尽可能的可伸缩,同时又不牺牲易用性。特性包括按行和列进行过滤、描述性统计、map/reduce函数、交叉选项卡、绘图、机器学习。Apache许可证。

In one query test it returned 500+ records from a 500,000,000 record table in 2 ms.


It also includes a column-oriented store that is much smaller and faster than working with CSV files. Contributions, feature requests, and just-plain feedback is welcome.




Not being very proficient with R, but you should have a look at Guava, specifically Tables. They do not provide the exact functionality you want, but you could either extend them or their specification could help you in writing your own Collection.




Morpheus ( provides a DataFrame analogue to that of R. It is a high performance column store data structure that enables data to sorted, sliced, grouped, and aggregated in either the row or column dimension. It also supports parallel processing for many of these operations using the Fork & Join framework internally.

Morpheus (提供了一个DataFrame类似于r的数据结构,它是一个高性能的列存储数据结构,它可以使数据在行或列维度中进行排序、切片、分组和聚合。它还支持在内部使用Fork & Join框架对许多这些操作进行并行处理。

You can easily read & write data to CSV files, databases and also a proprietary JSON format. Adapters to load data from Quandl, Google Finance and others are also available.


It has built in support for various styles of Linear Regressions, Principal Component Analysis, Linear Algebra and other types of analytics support. The feature set is still growing, but it is already a very capable framework.




In R we have the dataframe, in Python we have pandas, in Java: There is the Schema from the deeplearning4j


There is also a version for the data analysis of the ubiquitous iris data if you want to just get started, here


There are also other custom objects (from Weka, from Tensorflow that are more or less the same).
