10 Commits

Author SHA1 Message Date
Bala FA
b0deea27df Refactor s3select to support parquet. (#7023)
Also handle pretty formatted JSON documents.
2019-01-08 16:53:04 -08:00
Harshavardhana
7e1661f4fa Performance improvements to SELECT API on certain query operations (#6752)
This improves the performance of certain queries dramatically,
such as 'count(*)' etc.

Without this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762

real	0m42.464s
user	0m0.071s
sys	0m0.010s
```

With this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762

real	0m17.603s
user	0m0.093s
sys	0m0.008s
```

Almost a 250% improvement in performance. This PR avoids a lot of type
conversions and instead relies on raw sequences of data and interprets
them lazily.

```
benchcmp old new
benchmark                        old ns/op       new ns/op       delta
BenchmarkSQLAggregate_100K-4     551213          259782          -52.87%
BenchmarkSQLAggregate_1M-4       6981901985      2432413729      -65.16%
BenchmarkSQLAggregate_2M-4       13511978488     4536903552      -66.42%
BenchmarkSQLAggregate_10M-4      68427084908     23266283336     -66.00%

benchmark                        old allocs     new allocs     delta
BenchmarkSQLAggregate_100K-4     2366           485            -79.50%
BenchmarkSQLAggregate_1M-4       47455492       21462860       -54.77%
BenchmarkSQLAggregate_2M-4       95163637       43110771       -54.70%
BenchmarkSQLAggregate_10M-4      476959550      216906510      -54.52%

benchmark                        old bytes       new bytes      delta
BenchmarkSQLAggregate_100K-4     1233079         1086024        -11.93%
BenchmarkSQLAggregate_1M-4       2607984120      557038536      -78.64%
BenchmarkSQLAggregate_2M-4       5254103616      1128149168     -78.53%
BenchmarkSQLAggregate_10M-4      26443524872     5722715992     -78.36%
```
2018-11-14 15:55:10 -08:00
Ashish Kumar Sinha
c0b4bf0a3e SQL select query for CSV/JSON (#6648)
select * , select column names have been implemented for CSV.
select * is implemented for JSON.
2018-10-22 12:12:22 -07:00
Praveen raj Mani
cef044178c Treat columns with spaces inbetween [s3Select] (#6597)
replace the double/single quotes with backticks for the xwb1989/sqlparser
to recognise such queries.

Fixes #6589
2018-10-17 11:01:26 -07:00
Aditya Manthramurthy
16a100b597 Fix out-of-bound array access crash in select processing (#6594)
Fix test case.
2018-10-09 09:45:56 -07:00
Ashish Kumar Sinha
670f9788e3 Count(*) to give integer value (#6564)
The Max, Min functions were giving float value even when they were integers.  
Resolved max and Min to return integers in that scenario.

Fixes #6472
2018-10-04 17:33:53 -07:00
Harshavardhana
a0683d3c1f Send progress only when requested by client in SelectObject (#6467) 2018-09-17 11:52:46 +05:30
Praveen raj Mani
30d4a2cf53 s3select should honour custom record delimiter (#6419)
Allow custom delimiters like `\r\n`, `a`, `\r` etc in input csv and 
replace with `\n`.

Fixes #6403
2018-09-10 21:50:28 +05:30
Raphael Randschau
8601f29d95 select: fix int overflow of math.MaxInt64 on ARM (#6317) 2018-08-22 16:16:04 +05:30
Arjun Mishra
7c14cdb60e S3 Select API Support for CSV (#6127)
Add support for trivial where clause cases
2018-08-15 03:30:19 -07:00