19 Commits

Author SHA1 Message Date
Harshavardhana
d087e28dce
start using t.SetEnv instead of os.Setenv (#14787) 2022-04-23 15:33:45 -07:00
Aditya Manthramurthy
e8e48e4c4a
S3 select switch to new parquet library and reduce locking (#14731)
- This change switches to a new parquet library
- SelectObjectContent now takes a single lock at the beginning and holds it
during the operation. Previously the operation took a lock every time the
parquet library performed a Seek on the underlying object stream.
- Add basic support for LogicalType annotations for timestamps.
2022-04-14 06:54:47 -07:00
Aditya Manthramurthy
79ba458051
fix: free up reader resources in S3Select properly (#14600) 2022-03-23 20:58:53 -07:00
Klaus Post
c07af89e48
select: Add ScanRange to CSV&JSON (#14546)
Implements https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ScanRange

Fixes #14539
2022-03-14 09:48:36 -07:00
Klaus Post
88fd1cba71
select: add MISSING operator support (#14406)
Probably not full support, but for regular checks it should work.

Fixes #14358
2022-02-25 12:31:19 -08:00
Klaus Post
2cea944cdb
select: Allow lower case 'is' (#14405)
Ref: #14358
2022-02-24 09:10:48 -08:00
Harshavardhana
f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Klaus Post
91f72f25ab
select: Return early from bool AND, OR (#13914)
Return as soon as an AND fails and whenever an OR succeeds. Faster and more flexible.

For example makes `select * from S3object where _2 != '' AND _2 > 1` able to operate on empty fields.

Followup to #13900
2021-12-15 16:47:21 -08:00
Klaus Post
a8d4042853
select: Add IS (NOT) operators (#13906)
Add `IS` and `IS NOT` as comparison operators.

This may be a bit wider than the S3 spec, but we can rather 
easily remove the forwarding.
2021-12-14 09:54:50 -08:00
Klaus Post
d6fe0f61a9
do not panic when input cannot be parsed (#13791)
Fix cases where `s3Select.Open` fails and doesn't set the recordReader.

Fixes #13786
2021-11-30 08:42:42 -08:00
Harshavardhana
661b263e77
add gocritic/ruleguard checks back again, cleanup code. (#13665)
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2021-11-16 09:28:29 -08:00
Harshavardhana
ea820b30bf
fix: use equalFold() instead of lower and compare (#13624) 2021-11-10 08:12:50 -08:00
Harshavardhana
34680c5ccf
fix: SQL select to honor limits properly for array queries (#13568)
added tests to cover the scenarios as well.
2021-11-02 19:14:46 -07:00
Klaus Post
c2eb60df4a
bz2: limit max concurrent CPU (#13458)
Ensure that bz2 decompression will never take more than 50% CPU.
2021-10-18 08:44:36 -07:00
Klaus Post
5e53f767c4
Use concurrent bz2 decompression (#13360)
Testing with `mc sql --compression BZIP2 --csv-input "rd=\n,fh=USE,fd=;" --query="select COUNT(*) from S3Object" local2/testbucket/nyc-taxi-data-10M.csv.bz2`

Before 96.98s, after 10.79s. Uses about 70% CPU while running.
2021-10-14 11:11:07 -07:00
Klaus Post
5a64003f6f
select: Return null for non-exiting column indexes (#13196)
Fixes #13186
2021-09-13 09:13:25 -07:00
Klaus Post
b2c92cdaaa
select: Add more compression formats (#13142)
Support Zstandard, LZ4, S2, and snappy as additional 
compression formats for S3 Select.
2021-09-06 09:09:53 -07:00
Harshavardhana
202d0b64eb
fix: enable go1.17 github ci/cd (#12997) 2021-08-18 18:35:22 -07:00
Harshavardhana
1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00