1.什么是tablesaw?
Tablesaw是一款Java的数据可视化库,主要包括两部分:
-
数据解析库,主要用于加载数据,对数据进行操作(转化,过滤,汇总等),类比Python中的Pandas库;
-
数据可视化库,将目标数据转化为可视化的图表,类比Python中的Matplotlib库。
与Pandas不同的是,Tablesaw中的表格以列(Column)为基本单位,因此大部分操作都是基于列进行的。当然也包括部分对行操作的函数,但是功能比较有限
tablesaw目录说明:
-
aggregate:maven 的项目父级项目,主要定义项目打包的配置。
-
beakerx:tablesaw 库的注册中心,主要注册表和列。
-
core:tablesaw 库的核心代码,主要是数据的加工处理操作:数据的追加,排序,分组,查询等。
-
data:项目测试数据目录。
-
docs:项目 MarkDown 文档目录。
-
docs-src:项目文档源码目录,主要作用是生成 MarkDown 文档。
-
excel:解析 excel 文件数据的子项目。
-
html:解析 html 文件数据的子项目。
-
json:解析 json 文件数据的子项目。
-
jsplot:数据可视化的子项目,主要作用加载数据生成可视化图表。
-
saw:tablesaw 读写图表数据的子项目。
2.环境准备
mysql setup
docker run --name docker-mysql-5.7 -e MYSQL_ROOT_PASSWORD=123456 -p 3306:3306 -d mysql:5.7
init data
create database demo;create table user_info(user_id varchar(64) not null primary key,username varchar(100) null ,age int(3) null ,gender tinyint(1) null ,remark varchar(255) null ,create_time datetime null ,create_id varchar(64) null ,update_time datetime null ,update_id varchar(64) null ,enabled tinyint(1) default 1 null);INSERT INTO demo.user_info(user_id, username, age, gender, remark, create_time, create_id, update_time, update_id, enabled)VALUES('1', '1', 1, 1, '1', NULL, '1', NULL, NULL, 1);
remark
msyql username:rootmysql password:123456
3.代码工程
实验目的:
利用tablesaw加工和处理二维数据,并且可视化
pom.xml
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <parent> <artifactId>springboot-demo</artifactId> <groupId>com.et</groupId> <version>1.0-SNAPSHOT</version> </parent> <modelVersion>4.0.0</modelVersion>
<artifactId>tablesaw</artifactId>
<properties> <maven.compiler.source>8</maven.compiler.source> <maven.compiler.target>8</maven.compiler.target> </properties> <dependencies>
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency>
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-autoconfigure</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>tech.tablesaw</groupId> <artifactId>tablesaw-core</artifactId> <version>0.43.1</version> </dependency> <dependency> <groupId>tech.tablesaw</groupId> <artifactId>tablesaw-jsplot</artifactId> <version>0.43.1</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-jdbc</artifactId> </dependency> <!--mysql--> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.29</version> </dependency> </dependencies></project>
application.yaml
server: port: 8088spring: datasource: driver-class-name: com.mysql.cj.jdbc.Driver type: com.zaxxer.hikari.HikariDataSource url: jdbc:mysql://127.0.0.1:3306/demo?useUnicode=true&characterEncoding=utf-8&useSSL=false username: root password: 123456
读取csv数据
@Beforepublic void before() throws IOException { log.info("init some data"); tornadoes = Table.read().csv("/Users/liuhaihua/IdeaProjects/springboot-demo/tablesaw/src/main/resources/data/tornadoes_1950-2014.csv");
}
打印列名
@Testpublic void columnNames() throws IOException { System.out.println(tornadoes.columnNames());}
结果 * *
[Date, Time, State, State No, Scale, Injuries, Fatalities, Start Lat, Start Lon, Length, Width]
查看shape
@Testpublic void shape() throws IOException { System.out.println(tornadoes.shape());}
结果 * *
tornadoes_1950-2014.csv: 59945 rows X 11 cols
查看表结构
@Testpublic void structure() throws IOException { System.out.println(tornadoes.structure().printAll());}
结果 * * * * * * * * * * * * * * * * *
2024-06-05 21:13:01.297 INFO 41685 --- [ main] com.et.tablesaw.DemoTests : init some data Structure of tornadoes_1950-2014.csv Index | Column Name | Column Type |----------------------------------------- 0 | Date | LOCAL_DATE | 1 | Time | LOCAL_TIME | 2 | State | STRING | 3 | State No | INTEGER | 4 | Scale | INTEGER | 5 | Injuries | INTEGER | 6 | Fatalities | INTEGER | 7 | Start Lat | DOUBLE | 8 | Start Lon | DOUBLE | 9 | Length | DOUBLE | 10 | Width | INTEGER |2024-06-05 21:13:02.907 INFO 41685 --- [ main] com.et.tablesaw.DemoTests : clean some data
查看数据
@Testpublic void show() throws IOException { System.out.println(tornadoes);}
结果 * * * * * * * * * * * * * * * * * * * * * * * * * * *
2024-06-05 21:14:52.181 INFO 41732 --- [ main] com.et.tablesaw.DemoTests : init some data tornadoes_1950-2014.csv Date | Time | State | State No | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width |----------------------------------------------------------------------------------------------------------------------------------------- 1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 | 1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 | 1950-01-03 | 11:10:00 | IL | 1 | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 | 1950-01-03 | 11:55:00 | IL | 2 | 3 | 3 | 0 | 39.1 | -89.3 | 3.6 | 130 | 1950-01-03 | 16:00:00 | OH | 1 | 1 | 1 | 0 | 40.88 | -84.58 | 0.1 | 10 | 1950-01-13 | 05:25:00 | AR | 1 | 3 | 1 | 1 | 34.4 | -94.37 | 0.6 | 17 | 1950-01-25 | 19:30:00 | MO | 2 | 2 | 5 | 0 | 37.6 | -90.68 | 2.3 | 300 | 1950-01-25 | 21:00:00 | IL | 3 | 2 | 0 | 0 | 41.17 | -87.33 | 0.1 | 100 | 1950-01-26 | 18:00:00 | TX | 1 | 2 | 2 | 0 | 26.88 | -98.12 | 4.7 | 133 | 1950-02-11 | 13:10:00 | TX | 2 | 2 | 0 | 0 | 29.42 | -95.25 | 9.9 | 400 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | 2014-12-23 | 14:46:00 | MS | 0 | 0 | 0 | 0 | 31.4247 | -89.5358 | 0.24 | 50 | 2014-12-23 | 15:22:00 | MS | 0 | 2 | 0 | 2 | 31.6986 | -89.2239 | 5.89 | 350 | 2014-12-23 | 16:27:00 | MS | 0 | 0 | 0 | 0 | 32.0005 | -88.4854 | 0.49 | 100 | 2014-12-24 | 09:30:00 | NC | 0 | 1 | 1 | 0 | 34.3124 | -77.917 | 0.2 | 25 | 2014-12-24 | 11:50:00 | GA | 0 | 1 | 0 | 0 | 31.406 | -82.251 | 5.69 | 100 | 2014-12-24 | 16:16:00 | OH | 0 | 0 | 0 | 0 | 39.728 | -82.634 | 0.63 | 30 | 2014-12-27 | 13:12:00 | TX | 0 | 0 | 0 | 0 | 30.3853 | -94.7823 | 0.23 | 75 | 2014-12-27 | 14:00:00 | TX | 0 | 1 | 0 | 0 | 30.6829 | -94.2553 | 4.25 | 50 | 2014-12-27 | 14:35:00 | TX | 0 | 1 | 0 | 0 | 30.864 | -93.979 | 1.98 | 150 | 2014-12-29 | 10:26:00 | GA | 0 | 2 | 9 | 0 | 30.8157 | -83.2833 | 0.74 | 180 |2024-06-05 21:14:54.430 INFO 41732 --- [ main] com.et.tablesaw.DemoTests : clean some data
表结构过滤
@Testpublic void structurefilter() throws IOException { System.out.println( tornadoes .structure() .where(tornadoes.structure().stringColumn("Column Type").isEqualTo("DOUBLE")));
}
结果
2024-06-05 21:16:12.489 INFO 41766 --- [ main] com.et.tablesaw.DemoTests : init some data Structure of tornadoes_1950-2014.csv Index | Column Name | Column Type |----------------------------------------- 7 | Start Lat | DOUBLE | 8 | Start Lon | DOUBLE | 9 | Length | DOUBLE |2024-06-05 21:16:14.104 INFO 41766 --- [ main] com.et.tablesaw.DemoTests : clean some data
预览数据
@Testpublic void previewdata() throws IOException { System.out.println(tornadoes.first(3));
}
结果 * * * * * * * * *
2024-06-05 21:17:21.268 INFO 41798 --- [ main] com.et.tablesaw.DemoTests : init some data tornadoes_1950-2014.csv Date | Time | State | State No | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width |----------------------------------------------------------------------------------------------------------------------------------------- 1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 | 1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 | 1950-01-03 | 11:10:00 | IL | 1 | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 |2024-06-05 21:17:23.206 INFO 41798 --- [ main] com.et.tablesaw.DemoTests : clean some data
列操作
@Testpublic void ColumnOperate() throws IOException { StringColumn month = tornadoes.dateColumn("Date").month(); tornadoes.addColumns(month); System.out.println(tornadoes.first(3)); tornadoes.removeColumns("State No"); System.out.println(tornadoes.first(3));
}
结果 * * * * * * * * * * * * * * *
2024-06-05 21:20:33.554 INFO 41879 --- [ main] com.et.tablesaw.DemoTests : init some data tornadoes_1950-2014.csv Date | Time | State | State No | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width | Date month |-------------------------------------------------------------------------------------------------------------------------------------------------------- 1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 | JANUARY | 1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 | JANUARY | 1950-01-03 | 11:10:00 | IL | 1 | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 | JANUARY | tornadoes_1950-2014.csv Date | Time | State | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width | Date month |------------------------------------------------------------------------------------------------------------------------------------------- 1950-01-03 | 11:00:00 | MO | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 | JANUARY | 1950-01-03 | 11:00:00 | MO | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 | JANUARY | 1950-01-03 | 11:10:00 | IL | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 | JANUARY |2024-06-05 21:20:37.052 INFO 41879 --- [ main] com.et.tablesaw.DemoTests : clean some data
排序
@Testpublic void sort() throws IOException { tornadoes.sortOn("-Fatalities"); System.out.println(tornadoes.first(20));
}
结果 * * * * * * * * * * * * * * * * * * * * * * * * * *
2024-06-05 21:23:22.978 INFO 41945 --- [ main] com.et.tablesaw.DemoTests : init some data tornadoes_1950-2014.csv Date | Time | State | State No | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width |----------------------------------------------------------------------------------------------------------------------------------------- 1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 | 1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 | 1950-01-03 | 11:10:00 | IL | 1 | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 | 1950-01-03 | 11:55:00 | IL | 2 | 3 | 3 | 0 | 39.1 | -89.3 | 3.6 | 130 | 1950-01-03 | 16:00:00 | OH | 1 | 1 | 1 | 0 | 40.88 | -84.58 | 0.1 | 10 | 1950-01-13 | 05:25:00 | AR | 1 | 3 | 1 | 1 | 34.4 | -94.37 | 0.6 | 17 | 1950-01-25 | 19:30:00 | MO | 2 | 2 | 5 | 0 | 37.6 | -90.68 | 2.3 | 300 | 1950-01-25 | 21:00:00 | IL | 3 | 2 | 0 | 0 | 41.17 | -87.33 | 0.1 | 100 | 1950-01-26 | 18:00:00 | TX | 1 | 2 | 2 | 0 | 26.88 | -98.12 | 4.7 | 133 | 1950-02-11 | 13:10:00 | TX | 2 | 2 | 0 | 0 | 29.42 | -95.25 | 9.9 | 400 | 1950-02-11 | 13:50:00 | TX | 3 | 3 | 12 | 1 | 29.67 | -95.05 | 12 | 1000 | 1950-02-11 | 21:00:00 | TX | 4 | 2 | 5 | 0 | 32.35 | -95.2 | 4.6 | 100 | 1950-02-11 | 23:55:00 | TX | 5 | 2 | 6 | 0 | 32.98 | -94.63 | 4.5 | 67 | 1950-02-12 | 00:30:00 | TX | 6 | 2 | 8 | 1 | 33.33 | -94.42 | 8 | 833 | 1950-02-12 | 01:15:00 | TX | 7 | 1 | 0 | 0 | 32.08 | -98.35 | 2.3 | 233 | 1950-02-12 | 06:10:00 | TX | 8 | 2 | 0 | 0 | 31.52 | -96.55 | 3.4 | 100 | 1950-02-12 | 11:57:00 | TX | 9 | 1 | 32 | 0 | 31.8 | -94.2 | 7.7 | 100 | 1950-02-12 | 12:00:00 | TX | 10 | 3 | 15 | 3 | 31.8 | -94.2 | 1.9 | 50 | 1950-02-12 | 12:00:00 | MS | 2 | 1 | 0 | 0 | 34.6 | -89.12 | 2 | 10 | 1950-02-12 | 12:00:00 | MS | 1 | 2 | 2 | 3 | 34.6 | -89.12 | 0.1 | 10 |2024-06-05 21:23:24.864 INFO 41945 --- [ main] com.et.tablesaw.DemoTests : clean some data
summary
@Testpublic void summary() throws IOException { System.out.println( tornadoes.column("Fatalities").summary().print());
}
结果 * * * * * * * * * * * * * * *
2024-06-05 21:24:17.166 INFO 41963 --- [ main] com.et.tablesaw.DemoTests : init some data Column: Fatalities Measure | Value |------------------------------------ Count | 59945 | sum | 6802 | Mean | 0.11347068145800349 | Min | 0 | Max | 158 | Range | 158 | Variance | 2.901978053261765 | Std. Dev | 1.7035193140266314 |2024-06-05 21:24:19.131 INFO 41963 --- [ main] com.et.tablesaw.DemoTests : clean some data
数据过滤
@Testpublic void filter() throws IOException { Table result = tornadoes.where(tornadoes.intColumn("Fatalities").isGreaterThan(0)); result = tornadoes.where(result.dateColumn("Date").isInApril()); result = tornadoes.where( result .intColumn("Width") .isGreaterThan(300) // 300 yards .or(result.doubleColumn("Length").isGreaterThan(10))); // 10 miles
result = result.select("State", "Date"); System.out.println(result);}
结果 * * * * * * * * * * * * * * * * * * * * * * * * * *
2024-06-05 21:25:03.671 INFO 41980 --- [ main] com.et.tablesaw.DemoTests : init some datatornadoes_1950-2014.csv State | Date |------------------------ MO | 1950-01-03 | IL | 1950-01-03 | OH | 1950-01-03 | AR | 1950-01-13 | MO | 1950-01-25 | IL | 1950-01-25 | TX | 1950-02-12 | TX | 1950-02-12 | LA | 1950-02-12 | AR | 1950-02-12 | ... | ... | KS | 1951-06-21 | OK | 1951-06-21 | KS | 1951-06-23 | WV | 1951-06-26 | KS | 1951-06-27 | IL | 1951-06-27 | PA | 1951-06-27 | SD | 1951-07-20 | KS | 1951-08-06 | NC | 1951-08-09 |2024-06-05 21:25:05.764 INFO 41980 --- [ main] com.et.tablesaw.DemoTests : clean some data
写入文件
@Testpublic void write() throws IOException { tornadoes.write().csv("rev_tornadoes_1950-2014-test.csv");
}
从mysql读取数据
@Resourceprivate JdbcTemplate jdbcTemplate;@Testpublic void datafrommysql() throws IOException { Table table = jdbcTemplate.query("SELECT user_id,username,age from user_info", new ResultSetExtractor<Table>() { @Override public Table extractData(ResultSet resultSet) throws SQLException, DataAccessException { return Table.read().db(resultSet); } }); System.out.println(table);}
结果 * * * * * * * * *
2024-06-05 21:26:04.963 INFO 42016 --- [ main] com.et.tablesaw.DemoTests : init some data2024-06-05 21:26:06.622 INFO 42016 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...2024-06-05 21:26:06.949 INFO 42016 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed. user_id | username | age |-------------------------------- 1 | 1 | 1 |2024-06-05 21:26:07.037 INFO 42016 --- [ main] com.et.tablesaw.DemoTests : clean some data
数据可视化
/* * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package com.et.tablesaw;
import java.io.IOException;import tech.tablesaw.api.Table;import tech.tablesaw.plotly.Plot;import tech.tablesaw.plotly.api.BubblePlot;import tech.tablesaw.plotly.components.Figure;
/** */public class BubbleExample2 {
public static void main(String[] args) throws IOException {
Table wines = Table.read().csv("/Users/liuhaihua/IdeaProjects/springboot-demo/tablesaw/src/main/resources/data/test_wines.csv");
Table champagne = wines.where( wines .stringColumn("wine type") .isEqualTo("Champagne & Sparkling") .and(wines.stringColumn("region").isEqualTo("California")));
Figure figure = BubblePlot.create( "Average retail price for champagnes by year and rating", champagne, // table namex "highest pro score", // x variable column name "year", // y variable column name "Mean Retail" // bubble size );
Plot.show(figure); }}
结果如下图
以上只是一些关键代码,所有代码请参见下面代码仓库
代码仓库
- https://github.com/Harries/springboot-demo
4.引用
-
https://jtablesaw.github.io/tablesaw/tutorial
-
http://www.liuhaihua.cn/archives/710673.html