Klaus Post
e4900b99d7
s3 select: Infer types for comparison ( #9438 )
2020-04-24 13:02:59 -07:00
Anis Elleuch
35ecc04223
Support configurable quote character parameter in Select ( #8955 )
2020-03-13 22:09:34 -07:00
Aditya Manthramurthy
cec8cdb35e
S3Select: Handle array selection in from clause ( #9076 )
2020-03-10 22:34:58 -07:00
Klaus Post
e4020fb41f
SIMDJSON S3 select input ( #8401 )
2020-02-13 14:03:52 -08:00
Klaus Post
f1e2e1cc9e
S3 Select: Mismatched types don't match ( #8608 )
...
When comparing for equality, if types cannot be matched, they don't match.
2019-12-06 07:24:41 -08:00
Klaus Post
1c90a6bd49
S3 Select: Convert CSV data to JSON ( #8464 )
2019-11-09 09:10:35 -08:00
Klaus Post
38e6d911ea
S3 Select: Detect full object ( #8456 )
...
Check if select is `SELECT s.* from S3Object s` and forward it to All
Fixes #8371 and makes this case run significantly faster.
2019-10-30 13:46:55 +05:30
Klaus Post
51456e6adc
Select: Support Square Bracket Lists ( #8457 )
...
Allows for S3 compatible `SELECT * from s3object s WHERE id IN [3,2]`
Fixes #8422
2019-10-30 11:34:40 +05:30
Klaus Post
002ac82631
S3 Select: Add parser support for lists. ( #8329 )
2019-10-06 07:52:45 -07:00
Klaus Post
1c5b05c130
S3 select: Fix output conversion on select * ( #8303 )
...
Fixes #8268
2019-09-27 12:33:14 -07:00
Klaus Post
520552ffa9
S3 select: flush when reaching limit ( #8279 )
...
Add missing flush when reaching select limit.
2019-09-20 11:00:17 -07:00
Klaus Post
ddea0bdf11
Concurrent CSV parsing and reduce S3 select allocations ( #8200 )
...
```
CSV parsing, BEFORE:
BenchmarkReaderBasic-12 2842 407533 ns/op 397860 B/op 957 allocs/op
BenchmarkReaderReplace-12 2718 429914 ns/op 397844 B/op 957 allocs/op
BenchmarkReaderReplaceTwo-12 2718 435556 ns/op 397855 B/op 957 allocs/op
BenchmarkAggregateCount_100K-12 171 6798974 ns/op 16667102 B/op 308077 allocs/op
BenchmarkAggregateCount_1M-12 19 65657411 ns/op 168057743 B/op 3146610 allocs/op
BenchmarkSelectAll_10M-12 1 20882119900 ns/op 2758799896 B/op 41978762 allocs/op
CSV parsing, AFTER:
BenchmarkReaderBasic-12 3721 312549 ns/op 101920 B/op 338 allocs/op
BenchmarkReaderReplace-12 3776 318810 ns/op 101993 B/op 340 allocs/op
BenchmarkReaderReplaceTwo-12 3610 330967 ns/op 102012 B/op 341 allocs/op
BenchmarkAggregateCount_100K-12 295 4149588 ns/op 3553623 B/op 103261 allocs/op
BenchmarkAggregateCount_1M-12 30 37746503 ns/op 33827931 B/op 1049435 allocs/op
BenchmarkSelectAll_10M-12 1 17608495800 ns/op 1416504040 B/op 21007082 allocs/op
~ benchcmp old.txt new.txt
benchmark old ns/op new ns/op delta
BenchmarkReaderBasic-12 407533 312549 -23.31%
BenchmarkReaderReplace-12 429914 318810 -25.84%
BenchmarkReaderReplaceTwo-12 435556 330967 -24.01%
BenchmarkAggregateCount_100K-12 6798974 4149588 -38.97%
BenchmarkAggregateCount_1M-12 65657411 37746503 -42.51%
BenchmarkSelectAll_10M-12 20882119900 17608495800 -15.68%
benchmark old allocs new allocs delta
BenchmarkReaderBasic-12 957 338 -64.68%
BenchmarkReaderReplace-12 957 340 -64.47%
BenchmarkReaderReplaceTwo-12 957 341 -64.37%
BenchmarkAggregateCount_100K-12 308077 103261 -66.48%
BenchmarkAggregateCount_1M-12 3146610 1049435 -66.65%
BenchmarkSelectAll_10M-12 41978762 21007082 -49.96%
benchmark old bytes new bytes delta
BenchmarkReaderBasic-12 397860 101920 -74.38%
BenchmarkReaderReplace-12 397844 101993 -74.36%
BenchmarkReaderReplaceTwo-12 397855 102012 -74.36%
BenchmarkAggregateCount_100K-12 16667102 3553623 -78.68%
BenchmarkAggregateCount_1M-12 168057743 33827931 -79.87%
BenchmarkSelectAll_10M-12 2758799896 1416504040 -48.66%
```
```
BenchmarkReaderHuge/97K-12 2200 540840 ns/op 184.32 MB/s 1604450 B/op 687 allocs/op
BenchmarkReaderHuge/194K-12 1522 752257 ns/op 265.04 MB/s 2143135 B/op 1335 allocs/op
BenchmarkReaderHuge/389K-12 1190 947858 ns/op 420.69 MB/s 3221831 B/op 2630 allocs/op
BenchmarkReaderHuge/778K-12 806 1472486 ns/op 541.61 MB/s 5201856 B/op 5187 allocs/op
BenchmarkReaderHuge/1557K-12 426 2575269 ns/op 619.36 MB/s 9101330 B/op 10233 allocs/op
BenchmarkReaderHuge/3115K-12 286 4034656 ns/op 790.66 MB/s 12397968 B/op 16099 allocs/op
BenchmarkReaderHuge/6230K-12 172 6830563 ns/op 934.05 MB/s 16008416 B/op 26844 allocs/op
BenchmarkReaderHuge/12461K-12 100 11409467 ns/op 1118.39 MB/s 22655163 B/op 48107 allocs/op
BenchmarkReaderHuge/24922K-12 66 19780395 ns/op 1290.19 MB/s 35158559 B/op 90216 allocs/op
BenchmarkReaderHuge/49844K-12 34 37282559 ns/op 1369.03 MB/s 60528624 B/op 174497 allocs/op
```
2019-09-13 14:18:35 -07:00
Ryan Tam
bd56f80250
Fix ignored alias for aggregate result in S3 Select ( #7849 )
...
The SQL parser as it stands right now ignores alias for aggregate
result, e.g. `SELECT COUNT(*) AS thing FROM s3object` doesn't actually
return record like `{"thing": 42}`, it returns a record like `{"_1": 42}`.
Column alias for aggregate result is supported in AWS's S3 Select, so
this commit fixes that by respecting the `expr.As` in the expression.
Also improve test for S3 select
On top of testing a simple `SELECT` query, we want to test a few more
"advanced" queries (e.g. aggregation).
Convert existing tests into table driven tests[1], and add the new test
cases with "advanced" queries into them.
[1] - https://github.com/golang/go/wiki/TableDrivenTests
2019-07-03 16:34:54 -07:00
Joe Stevens
a19cf063b5
Fixes for multiplatform dev and testing from forks ( #7734 )
...
Add support for correct dependency URLs on all platforms
only build mountinfo.go on linux
make testfile path relative to support fork work
2019-06-04 00:59:40 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links ( #7494 )
2019-04-09 11:39:42 -07:00
Aditya Manthramurthy
91c839ad28
Use a buffer to collect SQL Select result rows ( #7158 )
...
Batching records into a single SQL Select message in the response
leads to significant speed up as the message header overhead is made
negligible.
This change leads to a speed up of 3-5x for queries that select many
small records.
2019-01-28 20:00:18 -08:00
Bala FA
e23a42305c
Rebase minio/parquet-go and fix null handling. ( #7067 )
2019-01-16 21:52:04 +05:30
Bala FA
b0deea27df
Refactor s3select to support parquet. ( #7023 )
...
Also handle pretty formatted JSON documents.
2019-01-08 16:53:04 -08:00
Harshavardhana
7e1661f4fa
Performance improvements to SELECT API on certain query operations ( #6752 )
...
This improves the performance of certain queries dramatically,
such as 'count(*)' etc.
Without this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762
real 0m42.464s
user 0m0.071s
sys 0m0.010s
```
With this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762
real 0m17.603s
user 0m0.093s
sys 0m0.008s
```
Almost a 250% improvement in performance. This PR avoids a lot of type
conversions and instead relies on raw sequences of data and interprets
them lazily.
```
benchcmp old new
benchmark old ns/op new ns/op delta
BenchmarkSQLAggregate_100K-4 551213 259782 -52.87%
BenchmarkSQLAggregate_1M-4 6981901985 2432413729 -65.16%
BenchmarkSQLAggregate_2M-4 13511978488 4536903552 -66.42%
BenchmarkSQLAggregate_10M-4 68427084908 23266283336 -66.00%
benchmark old allocs new allocs delta
BenchmarkSQLAggregate_100K-4 2366 485 -79.50%
BenchmarkSQLAggregate_1M-4 47455492 21462860 -54.77%
BenchmarkSQLAggregate_2M-4 95163637 43110771 -54.70%
BenchmarkSQLAggregate_10M-4 476959550 216906510 -54.52%
benchmark old bytes new bytes delta
BenchmarkSQLAggregate_100K-4 1233079 1086024 -11.93%
BenchmarkSQLAggregate_1M-4 2607984120 557038536 -78.64%
BenchmarkSQLAggregate_2M-4 5254103616 1128149168 -78.53%
BenchmarkSQLAggregate_10M-4 26443524872 5722715992 -78.36%
```
2018-11-14 15:55:10 -08:00
Ashish Kumar Sinha
c0b4bf0a3e
SQL select query for CSV/JSON ( #6648 )
...
select * , select column names have been implemented for CSV.
select * is implemented for JSON.
2018-10-22 12:12:22 -07:00
Praveen raj Mani
cef044178c
Treat columns with spaces inbetween [s3Select] ( #6597 )
...
replace the double/single quotes with backticks for the xwb1989/sqlparser
to recognise such queries.
Fixes #6589
2018-10-17 11:01:26 -07:00
Aditya Manthramurthy
16a100b597
Fix out-of-bound array access crash in select processing ( #6594 )
...
Fix test case.
2018-10-09 09:45:56 -07:00
Ashish Kumar Sinha
670f9788e3
Count(*) to give integer value ( #6564 )
...
The Max, Min functions were giving float value even when they were integers.
Resolved max and Min to return integers in that scenario.
Fixes #6472
2018-10-04 17:33:53 -07:00
Harshavardhana
a0683d3c1f
Send progress only when requested by client in SelectObject ( #6467 )
2018-09-17 11:52:46 +05:30
Praveen raj Mani
30d4a2cf53
s3select should honour custom record delimiter ( #6419 )
...
Allow custom delimiters like `\r\n`, `a`, `\r` etc in input csv and
replace with `\n`.
Fixes #6403
2018-09-10 21:50:28 +05:30
Raphael Randschau
8601f29d95
select: fix int overflow of math.MaxInt64 on ARM ( #6317 )
2018-08-22 16:16:04 +05:30
Arjun Mishra
7c14cdb60e
S3 Select API Support for CSV ( #6127 )
...
Add support for trivial where clause cases
2018-08-15 03:30:19 -07:00