Commit Graph

40 Commits

Author SHA1 Message Date
Aditya Manthramurthy
b2936243f9
Fix S3Select SQL column reference handling (#11957)
This change fixes handling of these types of queries:

- Double quoted column names with special characters:
    SELECT "column.name" FROM s3object
- Double quoted column names with reserved keywords:
    SELECT "CAST" FROM s3object
- Table name as prefix for column names:
    SELECT S3Object."CAST" FROM s3object
2021-04-06 08:49:04 -07:00
mailsmail
27eb4ae3bc
fix: sql cast function when converting to float (#11817) 2021-03-19 09:14:38 -07:00
Klaus Post
e5a1a2a974
s3 select: fix date_diff behavior (#11786)
Fixes #11785 and adds tests for samples given.
2021-03-15 14:15:52 -07:00
Klaus Post
0987069e37
select: Fix integer conversion overflow (#10437)
Do not convert float value to integer if it will over/underflow.

The comparison cannot be `<=` since rounding may overflow it.

Fixes #10436
2020-09-08 15:56:11 -07:00
Harshavardhana
caad314faa
add ruleguard support, fix all the reported issues (#10335) 2020-08-24 12:11:20 -07:00
Bruce Wang
e464a5bfbc
Fix bug with fields that contain trimming spaces (#10079)
String x might contain trimming spaces. And it needs to be trimmed. For
example, in csv files, there might be trimming spaces in a field that
ought to meet a query condition that contains the value without
trimming spaces. This applies to both intCast and floatCast functions.
2020-07-21 12:57:09 -07:00
Harshavardhana
1bc32215b9
enable full linter across the codebase (#9620)
enable linter using golangci-lint across
codebase to run a bunch of linters together,
we shall enable new linters as we fix more
things the codebase.

This PR fixes the first stage of this
cleanup.
2020-05-18 09:59:45 -07:00
Klaus Post
e4900b99d7
s3 select: Infer types for comparison (#9438) 2020-04-24 13:02:59 -07:00
Anis Elleuch
9902c9baaa
sql: Add support of escape quote in CSV (#9231)
This commit modifies csv parser, a fork of golang csv
parser to support a custom quote escape character.

The quote escape character is used to escape the quote
character when a csv field contains a quote character
as part of data.
2020-04-01 15:39:34 -07:00
Klaus Post
8d98662633
re-implement data usage crawler to be more efficient (#9075)
Implementation overview: 

https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b
2020-03-18 16:19:29 -07:00
Anis Elleuch
35ecc04223
Support configurable quote character parameter in Select (#8955) 2020-03-13 22:09:34 -07:00
Aditya Manthramurthy
cec8cdb35e
S3Select: Handle array selection in from clause (#9076) 2020-03-10 22:34:58 -07:00
Klaus Post
e4020fb41f
SIMDJSON S3 select input (#8401) 2020-02-13 14:03:52 -08:00
Klaus Post
f1e2e1cc9e S3 Select: Mismatched types don't match (#8608)
When comparing for equality, if types cannot be matched, they don't match.
2019-12-06 07:24:41 -08:00
Harshavardhana
5d3d57c12a
Start using error wrapping with fmt.Errorf (#8588)
Use fatih/errwrap to fix all the code to use
error wrapping with fmt.Errorf()
2019-12-02 09:28:01 -08:00
Klaus Post
1c90a6bd49 S3 Select: Convert CSV data to JSON (#8464) 2019-11-09 09:10:35 -08:00
Klaus Post
38e6d911ea S3 Select: Detect full object (#8456)
Check if select is `SELECT s.* from S3Object s` and forward it to All

Fixes #8371 and makes this case run significantly faster.
2019-10-30 13:46:55 +05:30
Klaus Post
51456e6adc Select: Support Square Bracket Lists (#8457)
Allows for S3 compatible `SELECT * from s3object s WHERE id IN [3,2]`

Fixes #8422
2019-10-30 11:34:40 +05:30
Klaus Post
002ac82631 S3 Select: Add parser support for lists. (#8329) 2019-10-06 07:52:45 -07:00
Klaus Post
c1a17c2561 S3 Select: Aggregate AVG/SUM as float (#8326)
Force sum/average to be calculated as a float.

As noted in #8221

> run SELECT AVG(CAST (Score as int)) FROM S3Object on

```
Name,Score
alice,80
bob,81
```

> AWS S3 gives 80.5 and MinIO gives 80.

This also makes overflows much more unlikely.
2019-09-27 16:12:03 -07:00
Klaus Post
1c5b05c130 S3 select: Fix output conversion on select * (#8303)
Fixes #8268
2019-09-27 12:33:14 -07:00
Klaus Post
dac1cf5a9a S3 Select: Parsing tweaks (#8261)
* Don't output empty lines.
* Trim whitespace from byte to int/float/bool conversions.
2019-09-17 17:21:23 -07:00
Klaus Post
c9b8bd8de2 S3 Select: optimize output (#8238)
Queue output items and reuse them.
Remove the unneeded type system in sql and just use the Go type system.

In best case this is more than an order of magnitude speedup:

```
BenchmarkSelectAll_1M-12    	       1	1841049400 ns/op	274299728 B/op	 4198522 allocs/op
BenchmarkSelectAll_1M-12    	      14	  84833400 ns/op	169228346 B/op	 3146541 allocs/op
```
2019-09-17 05:56:27 +05:30
Yao Zongyou
ec9bfd3aef speed up the performance of s3select on csv (#7945) 2019-08-31 00:07:40 -07:00
Yao Zongyou
60831e3299 aggregation functions' argument may already has been cast to numeric (#7876) 2019-07-05 10:38:38 -07:00
Yao Zongyou
037319066f fix unicode support related bugs in s3select (#7877) 2019-07-05 09:43:10 -07:00
Ryan Tam
bd56f80250 Fix ignored alias for aggregate result in S3 Select (#7849)
The SQL parser as it stands right now ignores alias for aggregate
result, e.g. `SELECT COUNT(*) AS thing FROM s3object` doesn't actually
return record like `{"thing": 42}`, it returns a record like `{"_1": 42}`.
Column alias for aggregate result is supported in AWS's S3 Select, so
this commit fixes that by respecting the `expr.As` in the expression.

Also improve test for S3 select

On top of testing a simple `SELECT` query, we want to test a few more
"advanced" queries (e.g. aggregation).

Convert existing tests into table driven tests[1], and add the new test
cases with "advanced" queries into them.

[1] - https://github.com/golang/go/wiki/TableDrivenTests
2019-07-03 16:34:54 -07:00
Yao Zongyou
55092bede1 add timestamp compare support (#7832) 2019-06-25 11:05:37 -07:00
Yao Zongyou
90a3b830f4 fix typo and the string representation of the time.Time value (#7831) 2019-06-25 09:54:14 -07:00
Yao Zongyou
23b9df0694 Fix s3select TRIM function's nil pointer dereference bug (#7817) 2019-06-24 16:59:33 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Kirill Motkov
3d29ab4059 Rewrite if-else chains to switch statements (#7382) 2019-03-18 07:46:20 -07:00
Aditya Manthramurthy
e463386921 Add JSON Path expression evaluation support (#7315)
- Includes support for FROM clause JSON path
2019-03-09 08:13:37 -08:00
Aditya Manthramurthy
8a405cab2f COUNT() function in select should return an int (#7243) 2019-02-13 16:32:59 -08:00
Harshavardhana
df35d7db9d Introduce staticcheck for stricter builds (#7035) 2019-02-13 18:29:36 +05:30
Aditya Manthramurthy
ee5b3622a5 Evaluate where clause in aggregation queries (#7235) 2019-02-12 13:54:26 -08:00
Aditya Manthramurthy
f04f8bbc78 Add support for Timestamp data type in SQL Select (#7185)
This change adds support for casting strings to Timestamp via CAST:
`CAST('2010T' AS TIMESTAMP)`

It also implements the following date-time functions:
  - UTCNOW()
  - DATE_ADD()
  - DATE_DIFF()
  - EXTRACT()

For values passed to these functions, date-types are automatically
inferred.
2019-02-04 20:54:45 -08:00
Aditya Manthramurthy
2786055df4 Add new SQL parser to support S3 Select syntax (#7102)
- New parser written from scratch, allows easier and complete parsing
  of the full S3 Select SQL syntax. Parser definition is directly
  provided by the AST defined for the SQL grammar.

- Bring support to parse and interpret SQL involving JSON path
  expressions; evaluation of JSON path expressions will be
  subsequently added.

- Bring automatic type inference and conversion for untyped
  values (e.g. CSV data).
2019-01-28 17:59:48 -08:00
Bala FA
e23a42305c Rebase minio/parquet-go and fix null handling. (#7067) 2019-01-16 21:52:04 +05:30
Bala FA
b0deea27df Refactor s3select to support parquet. (#7023)
Also handle pretty formatted JSON documents.
2019-01-08 16:53:04 -08:00