1
0
mirror of https://github.com/minio/minio.git synced 2025-03-13 13:02:57 -04:00

22 Commits

Author SHA1 Message Date
Klaus Post
1c90a6bd49 S3 Select: Convert CSV data to JSON () 2019-11-09 09:10:35 -08:00
Klaus Post
38e6d911ea S3 Select: Detect full object ()
Check if select is `SELECT s.* from S3Object s` and forward it to All

Fixes  and makes this case run significantly faster.
2019-10-30 13:46:55 +05:30
Klaus Post
51456e6adc Select: Support Square Bracket Lists ()
Allows for S3 compatible `SELECT * from s3object s WHERE id IN [3,2]`

Fixes 
2019-10-30 11:34:40 +05:30
Klaus Post
002ac82631 S3 Select: Add parser support for lists. () 2019-10-06 07:52:45 -07:00
Klaus Post
1c5b05c130 S3 select: Fix output conversion on select * ()
Fixes 
2019-09-27 12:33:14 -07:00
Klaus Post
520552ffa9 S3 select: flush when reaching limit ()
Add missing flush when reaching select limit.
2019-09-20 11:00:17 -07:00
Klaus Post
ddea0bdf11 Concurrent CSV parsing and reduce S3 select allocations ()
```
CSV parsing, BEFORE:
BenchmarkReaderBasic-12         	    2842	    407533 ns/op	  397860 B/op	     957 allocs/op
BenchmarkReaderReplace-12       	    2718	    429914 ns/op	  397844 B/op	     957 allocs/op
BenchmarkReaderReplaceTwo-12    	    2718	    435556 ns/op	  397855 B/op	     957 allocs/op
BenchmarkAggregateCount_100K-12    	     171	   6798974 ns/op	16667102 B/op	  308077 allocs/op
BenchmarkAggregateCount_1M-12    	      19	  65657411 ns/op	168057743 B/op	 3146610 allocs/op
BenchmarkSelectAll_10M-12    	       1	20882119900 ns/op	2758799896 B/op	41978762 allocs/op

CSV parsing, AFTER:
BenchmarkReaderBasic-12         	    3721	    312549 ns/op	  101920 B/op	     338 allocs/op
BenchmarkReaderReplace-12       	    3776	    318810 ns/op	  101993 B/op	     340 allocs/op
BenchmarkReaderReplaceTwo-12    	    3610	    330967 ns/op	  102012 B/op	     341 allocs/op
BenchmarkAggregateCount_100K-12    	     295	   4149588 ns/op	 3553623 B/op	  103261 allocs/op
BenchmarkAggregateCount_1M-12    	      30	  37746503 ns/op	33827931 B/op	 1049435 allocs/op
BenchmarkSelectAll_10M-12    	       1	17608495800 ns/op	1416504040 B/op	21007082 allocs/op

~ benchcmp old.txt new.txt
benchmark                           old ns/op       new ns/op       delta
BenchmarkReaderBasic-12             407533          312549          -23.31%
BenchmarkReaderReplace-12           429914          318810          -25.84%
BenchmarkReaderReplaceTwo-12        435556          330967          -24.01%
BenchmarkAggregateCount_100K-12     6798974         4149588         -38.97%
BenchmarkAggregateCount_1M-12       65657411        37746503        -42.51%
BenchmarkSelectAll_10M-12           20882119900     17608495800     -15.68%

benchmark                           old allocs     new allocs     delta
BenchmarkReaderBasic-12             957            338            -64.68%
BenchmarkReaderReplace-12           957            340            -64.47%
BenchmarkReaderReplaceTwo-12        957            341            -64.37%
BenchmarkAggregateCount_100K-12     308077         103261         -66.48%
BenchmarkAggregateCount_1M-12       3146610        1049435        -66.65%
BenchmarkSelectAll_10M-12           41978762       21007082       -49.96%

benchmark                           old bytes      new bytes      delta
BenchmarkReaderBasic-12             397860         101920         -74.38%
BenchmarkReaderReplace-12           397844         101993         -74.36%
BenchmarkReaderReplaceTwo-12        397855         102012         -74.36%
BenchmarkAggregateCount_100K-12     16667102       3553623        -78.68%
BenchmarkAggregateCount_1M-12       168057743      33827931       -79.87%
BenchmarkSelectAll_10M-12           2758799896     1416504040     -48.66%
```

```
BenchmarkReaderHuge/97K-12         	    2200	    540840 ns/op	 184.32 MB/s	 1604450 B/op	     687 allocs/op
BenchmarkReaderHuge/194K-12        	    1522	    752257 ns/op	 265.04 MB/s	 2143135 B/op	    1335 allocs/op
BenchmarkReaderHuge/389K-12        	    1190	    947858 ns/op	 420.69 MB/s	 3221831 B/op	    2630 allocs/op
BenchmarkReaderHuge/778K-12        	     806	   1472486 ns/op	 541.61 MB/s	 5201856 B/op	    5187 allocs/op
BenchmarkReaderHuge/1557K-12       	     426	   2575269 ns/op	 619.36 MB/s	 9101330 B/op	   10233 allocs/op
BenchmarkReaderHuge/3115K-12       	     286	   4034656 ns/op	 790.66 MB/s	12397968 B/op	   16099 allocs/op
BenchmarkReaderHuge/6230K-12       	     172	   6830563 ns/op	 934.05 MB/s	16008416 B/op	   26844 allocs/op
BenchmarkReaderHuge/12461K-12      	     100	  11409467 ns/op	1118.39 MB/s	22655163 B/op	   48107 allocs/op
BenchmarkReaderHuge/24922K-12      	      66	  19780395 ns/op	1290.19 MB/s	35158559 B/op	   90216 allocs/op
BenchmarkReaderHuge/49844K-12      	      34	  37282559 ns/op	1369.03 MB/s	60528624 B/op	  174497 allocs/op
```
2019-09-13 14:18:35 -07:00
Ryan Tam
bd56f80250 Fix ignored alias for aggregate result in S3 Select ()
The SQL parser as it stands right now ignores alias for aggregate
result, e.g. `SELECT COUNT(*) AS thing FROM s3object` doesn't actually
return record like `{"thing": 42}`, it returns a record like `{"_1": 42}`.
Column alias for aggregate result is supported in AWS's S3 Select, so
this commit fixes that by respecting the `expr.As` in the expression.

Also improve test for S3 select

On top of testing a simple `SELECT` query, we want to test a few more
"advanced" queries (e.g. aggregation).

Convert existing tests into table driven tests[1], and add the new test
cases with "advanced" queries into them.

[1] - https://github.com/golang/go/wiki/TableDrivenTests
2019-07-03 16:34:54 -07:00
Joe Stevens
a19cf063b5 Fixes for multiplatform dev and testing from forks ()
Add support for correct dependency URLs on all platforms

only build mountinfo.go on linux

make testfile path relative to support fork work
2019-06-04 00:59:40 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links () 2019-04-09 11:39:42 -07:00
Aditya Manthramurthy
91c839ad28 Use a buffer to collect SQL Select result rows ()
Batching records into a single SQL Select message in the response
leads to significant speed up as the message header overhead is made
negligible.

This change leads to a speed up of 3-5x for queries that select many
small records.
2019-01-28 20:00:18 -08:00
Bala FA
e23a42305c Rebase minio/parquet-go and fix null handling. () 2019-01-16 21:52:04 +05:30
Bala FA
b0deea27df Refactor s3select to support parquet. ()
Also handle pretty formatted JSON documents.
2019-01-08 16:53:04 -08:00
Harshavardhana
7e1661f4fa Performance improvements to SELECT API on certain query operations ()
This improves the performance of certain queries dramatically,
such as 'count(*)' etc.

Without this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762

real	0m42.464s
user	0m0.071s
sys	0m0.010s
```

With this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762

real	0m17.603s
user	0m0.093s
sys	0m0.008s
```

Almost a 250% improvement in performance. This PR avoids a lot of type
conversions and instead relies on raw sequences of data and interprets
them lazily.

```
benchcmp old new
benchmark                        old ns/op       new ns/op       delta
BenchmarkSQLAggregate_100K-4     551213          259782          -52.87%
BenchmarkSQLAggregate_1M-4       6981901985      2432413729      -65.16%
BenchmarkSQLAggregate_2M-4       13511978488     4536903552      -66.42%
BenchmarkSQLAggregate_10M-4      68427084908     23266283336     -66.00%

benchmark                        old allocs     new allocs     delta
BenchmarkSQLAggregate_100K-4     2366           485            -79.50%
BenchmarkSQLAggregate_1M-4       47455492       21462860       -54.77%
BenchmarkSQLAggregate_2M-4       95163637       43110771       -54.70%
BenchmarkSQLAggregate_10M-4      476959550      216906510      -54.52%

benchmark                        old bytes       new bytes      delta
BenchmarkSQLAggregate_100K-4     1233079         1086024        -11.93%
BenchmarkSQLAggregate_1M-4       2607984120      557038536      -78.64%
BenchmarkSQLAggregate_2M-4       5254103616      1128149168     -78.53%
BenchmarkSQLAggregate_10M-4      26443524872     5722715992     -78.36%
```
2018-11-14 15:55:10 -08:00
Ashish Kumar Sinha
c0b4bf0a3e SQL select query for CSV/JSON ()
select * , select column names have been implemented for CSV.
select * is implemented for JSON.
2018-10-22 12:12:22 -07:00
Praveen raj Mani
cef044178c Treat columns with spaces inbetween [s3Select] ()
replace the double/single quotes with backticks for the xwb1989/sqlparser
to recognise such queries.

Fixes 
2018-10-17 11:01:26 -07:00
Aditya Manthramurthy
16a100b597 Fix out-of-bound array access crash in select processing ()
Fix test case.
2018-10-09 09:45:56 -07:00
Ashish Kumar Sinha
670f9788e3 Count(*) to give integer value ()
The Max, Min functions were giving float value even when they were integers.  
Resolved max and Min to return integers in that scenario.

Fixes 
2018-10-04 17:33:53 -07:00
Harshavardhana
a0683d3c1f Send progress only when requested by client in SelectObject () 2018-09-17 11:52:46 +05:30
Praveen raj Mani
30d4a2cf53 s3select should honour custom record delimiter ()
Allow custom delimiters like `\r\n`, `a`, `\r` etc in input csv and 
replace with `\n`.

Fixes 
2018-09-10 21:50:28 +05:30
Raphael Randschau
8601f29d95 select: fix int overflow of math.MaxInt64 on ARM () 2018-08-22 16:16:04 +05:30
Arjun Mishra
7c14cdb60e S3 Select API Support for CSV ()
Add support for trivial where clause cases
2018-08-15 03:30:19 -07:00