1
0
mirror of https://github.com/minio/minio.git synced 2025-03-22 05:24:15 -04:00

81 Commits

Author SHA1 Message Date
Harshavardhana
91d85a0d53
Fix stale locks held by SelectParquet API ()
Vendorize upstream parquet-go to fix this issue.
2019-03-13 20:33:18 -07:00
Aditya Manthramurthy
e463386921 Add JSON Path expression evaluation support ()
- Includes support for FROM clause JSON path
2019-03-09 08:13:37 -08:00
Aditya Manthramurthy
f4879ed96d Use jstream to serialize records to JSON format in S3Select ()
- Also, switch to jstream to generate internal record representation
  from CSV/JSON readers

- This fixes a bug in which JSON output objects have their keys
  reversed from the order they are specified in the Select columns.

- Also includes a fix for tests.
2019-03-07 00:20:10 -08:00
Harshavardhana
2520e535a0
Allow lazyQuotes for certain types of CSV ()
Set lazyQuotes to true, to allow a quote to appear
in an unquote field and a non-doubled quote may
appear in a quoted field.
2019-02-24 06:51:02 -08:00
Aditya Manthramurthy
80a351633f Update vendorized bcicen/jstream ()
- Includes an error handling fix that is waiting to be merged upstream
- Uses order-preserving (un)marshalling for JSON objects.
2019-02-20 23:59:23 -08:00
Aditya Manthramurthy
8a405cab2f COUNT() function in select should return an int () 2019-02-13 16:32:59 -08:00
Harshavardhana
df35d7db9d Introduce staticcheck for stricter builds () 2019-02-13 18:29:36 +05:30
Aditya Manthramurthy
ee5b3622a5 Evaluate where clause in aggregation queries () 2019-02-12 13:54:26 -08:00
Harshavardhana
85e939636f Fix JSON parser handling for certain objects ()
This PR also adds some comments and simplifies
the code. Primary handling is done to ensure
that we make sure to honor cached buffer.

Added unit tests as well

Fixes 
2019-02-07 08:04:42 +05:30
Aditya Manthramurthy
4aa9ee153b Fix S3 Select request XML parsing () 2019-02-06 13:25:52 -08:00
Aditya Manthramurthy
fd4e15c116 Flush the records staging buffer periodically ()
- Staging buffer is flushed every 500ms. In cases where the result
  records are slowly generated (e.g. when a where condition
  matches very few records), this change causes the server to send
  results even though the staging buffer is not full.

- Refactor messageWriter code to use simpler channel based
  co-ordination instead of atomic variables.
2019-02-06 16:03:05 +05:30
Aditya Manthramurthy
f04f8bbc78 Add support for Timestamp data type in SQL Select ()
This change adds support for casting strings to Timestamp via CAST:
`CAST('2010T' AS TIMESTAMP)`

It also implements the following date-time functions:
  - UTCNOW()
  - DATE_ADD()
  - DATE_DIFF()
  - EXTRACT()

For values passed to these functions, date-types are automatically
inferred.
2019-02-04 20:54:45 -08:00
Aditya Manthramurthy
91c839ad28 Use a buffer to collect SQL Select result rows ()
Batching records into a single SQL Select message in the response
leads to significant speed up as the message header overhead is made
negligible.

This change leads to a speed up of 3-5x for queries that select many
small records.
2019-01-28 20:00:18 -08:00
Aditya Manthramurthy
2786055df4 Add new SQL parser to support S3 Select syntax ()
- New parser written from scratch, allows easier and complete parsing
  of the full S3 Select SQL syntax. Parser definition is directly
  provided by the AST defined for the SQL grammar.

- Bring support to parse and interpret SQL involving JSON path
  expressions; evaluation of JSON path expressions will be
  subsequently added.

- Bring automatic type inference and conversion for untyped
  values (e.g. CSV data).
2019-01-28 17:59:48 -08:00
Bala FA
e23a42305c Rebase minio/parquet-go and fix null handling. () 2019-01-16 21:52:04 +05:30
Bala FA
b0deea27df Refactor s3select to support parquet. ()
Also handle pretty formatted JSON documents.
2019-01-08 16:53:04 -08:00
Aditya Manthramurthy
2aeb3fbe86 Fix csv output delimiter bug () 2018-12-19 11:49:06 +05:30
Harshavardhana
4c7c571875 Support JSON to CSV and CSV to JSON output format conversion ()
This PR implements one of the pending items in issue 
in S3 API a user can request CSV output for a JSON document
and a JSON output for a CSV document. This PR refactors
the code a little bit to bring this feature.
2018-12-07 14:55:32 -08:00
Harshavardhana
272b8003d6 Honor header only when requested for use () 2018-11-16 10:27:48 -08:00
Harshavardhana
7e1661f4fa Performance improvements to SELECT API on certain query operations ()
This improves the performance of certain queries dramatically,
such as 'count(*)' etc.

Without this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762

real	0m42.464s
user	0m0.071s
sys	0m0.010s
```

With this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762

real	0m17.603s
user	0m0.093s
sys	0m0.008s
```

Almost a 250% improvement in performance. This PR avoids a lot of type
conversions and instead relies on raw sequences of data and interprets
them lazily.

```
benchcmp old new
benchmark                        old ns/op       new ns/op       delta
BenchmarkSQLAggregate_100K-4     551213          259782          -52.87%
BenchmarkSQLAggregate_1M-4       6981901985      2432413729      -65.16%
BenchmarkSQLAggregate_2M-4       13511978488     4536903552      -66.42%
BenchmarkSQLAggregate_10M-4      68427084908     23266283336     -66.00%

benchmark                        old allocs     new allocs     delta
BenchmarkSQLAggregate_100K-4     2366           485            -79.50%
BenchmarkSQLAggregate_1M-4       47455492       21462860       -54.77%
BenchmarkSQLAggregate_2M-4       95163637       43110771       -54.70%
BenchmarkSQLAggregate_10M-4      476959550      216906510      -54.52%

benchmark                        old bytes       new bytes      delta
BenchmarkSQLAggregate_100K-4     1233079         1086024        -11.93%
BenchmarkSQLAggregate_1M-4       2607984120      557038536      -78.64%
BenchmarkSQLAggregate_2M-4       5254103616      1128149168     -78.53%
BenchmarkSQLAggregate_10M-4      26443524872     5722715992     -78.36%
```
2018-11-14 15:55:10 -08:00
Harshavardhana
f162d7bd97 Performance improvements by re-using record buffer ()
Avoid unnecessary pointer reference allocations
when not needed, for example

- *SelectFuncs{}
- *Row{}
2018-10-31 08:48:01 +05:30
Ashish Kumar Sinha
c0b4bf0a3e SQL select query for CSV/JSON ()
select * , select column names have been implemented for CSV.
select * is implemented for JSON.
2018-10-22 12:12:22 -07:00
Praveen raj Mani
cef044178c Treat columns with spaces inbetween [s3Select] ()
replace the double/single quotes with backticks for the xwb1989/sqlparser
to recognise such queries.

Fixes 
2018-10-17 11:01:26 -07:00
Aditya Manthramurthy
e3eec89d24 Optimize string processing in select ()
Reduce allocations during string concatenation and simplify some
processing code.
2018-10-09 14:02:19 -07:00
Aditya Manthramurthy
16a100b597 Fix out-of-bound array access crash in select processing ()
Fix test case.
2018-10-09 09:45:56 -07:00
Ashish Kumar Sinha
670f9788e3 Count(*) to give integer value ()
The Max, Min functions were giving float value even when they were integers.  
Resolved max and Min to return integers in that scenario.

Fixes 
2018-10-04 17:33:53 -07:00
Harshavardhana
a0683d3c1f Send progress only when requested by client in SelectObject () 2018-09-17 11:52:46 +05:30
Praveen raj Mani
30d4a2cf53 s3select should honour custom record delimiter ()
Allow custom delimiters like `\r\n`, `a`, `\r` etc in input csv and 
replace with `\n`.

Fixes 
2018-09-10 21:50:28 +05:30
Raphael Randschau
8601f29d95 select: fix int overflow of math.MaxInt64 on ARM () 2018-08-22 16:16:04 +05:30
Harshavardhana
5a4a57700b Add select docs and fix return values for Select API () 2018-08-17 17:11:39 -07:00
Arjun Mishra
7c14cdb60e S3 Select API Support for CSV ()
Add support for trivial where clause cases
2018-08-15 03:30:19 -07:00