Commit Graph

41 Commits

Author SHA1 Message Date
Klaus Post 974cbb3bb7
Limit jstream parse depth (#20474)
Add https://github.com/bcicen/jstream/pull/15 by vendoring the package.

Sets JSON depth limit to 100 entries in S3 Select.
2024-09-23 12:35:41 -07:00
Harshavardhana 03e996320e
upgrade deps pkg/v3, madmin-go/v3 and lz4/v4 (#20467) 2024-09-21 17:33:43 -07:00
Klaus Post 8a30967542
Limit S3 Select JSON documents to 10MB (#20439)
Closes #20430

Limit allocations from badly formed documents.
2024-09-16 09:59:03 -07:00
Aditya Manthramurthy 8a11282522
[fix] S3Select: Add some missing input validation (#20278)
Prevents server panic when some CSV parameters are empty.
2024-08-20 11:31:45 -07:00
Aditya Manthramurthy 5f78691fcf
ldap: Add user DN attributes list config param (#19758)
This change uses the updated ldap library in minio/pkg (bumped
up to v3). A new config parameter is added for LDAP configuration to
specify extra user attributes to load from the LDAP server and to store
them as additional claims for the user.

A test is added in sts_handlers.go that shows how to access the LDAP
attributes as a claim.

This is in preparation for adding SSH pubkey authentication to MinIO's SFTP
integration.
2024-05-24 16:05:23 -07:00
Harshavardhana 53aa8f5650
use typos instead of codespell (#19088) 2024-02-21 22:26:06 -08:00
Harshavardhana cd419a35fe
simplify broker healthcheck by following kafka guidelines (#19082)
fixes #19081
2024-02-20 00:16:35 -08:00
Harshavardhana dd2542e96c
add codespell action (#18818)
Original work here, #18474,  refixed and updated.
2024-01-17 23:03:17 -08:00
jiuker a89e0bab7d
fix: s3 sql parse error for colums as with quotes (#18765) 2024-01-09 09:19:11 -08:00
Aditya Manthramurthy 496027b589
Fix precendence bug in S3Select SQL IN clauses (#18708)
Fixes a precendence issue in SQL Select where `a in b and c = 3` was parsed as `a
in (b and c = 3)`.

Fixes #18682
2023-12-22 23:19:11 -08:00
Harshavardhana 754f7a8a39
replace io.Discard usage to fix some NUMA copy() latencies (#18394)
replace io.Discard usage to fix NUMA copy() latencies

On NUMA systems copying from 8K buffer allocated via
io.Discard leads to large latency build-up for every

```
copy(new8kbuf, largebuf)
```

can in-cur upto 1ms worth of latencies on NUMA systems
due to memory sharding across NUMA nodes.
2023-11-06 14:26:08 -08:00
Aditya Manthramurthy 1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Harshavardhana e12ab486a2
avoid using os.Getenv for internal code, use env.Get() instead (#17688) 2023-07-20 07:52:49 -07:00
Klaus Post 6fe028b7c5
Revert s3 select simdjson reuse (#17310) 2023-05-30 10:02:22 -07:00
Klaus Post b06d7bf834
fix: leaking connections in JSON SQL with limited return (#17239) 2023-05-18 11:26:46 -07:00
ferhat elmas 714283fae2
cleanup ignored static analysis (#16767) 2023-03-06 08:56:10 -08:00
ferhat elmas 3423028713
cleanup Go linter settings (#16736) 2023-03-04 20:57:35 -08:00
jiuker e470268c7c
fix: a possible closer leak in SelectObjectHandler (#16598) 2023-02-17 01:44:40 -08:00
Klaus Post ff12080ff5
Remove deprecated io/ioutil (#15707) 2022-09-19 11:05:16 -07:00
Klaus Post c22f3ca7a8
fix: S3 Select CSV -> JSON with variable field count (#15677)
When there are fewer fields than expected, output fewer fields.
2022-09-12 17:00:59 -07:00
Abirdcfly d4e0f13bb3
chore: remove duplicate word in comments (#15607)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>

Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2022-08-30 08:26:43 -07:00
Harshavardhana 433b6fa8fe
upgrade golang-lint to the latest (#15600) 2022-08-26 12:52:29 -07:00
Harshavardhana d087e28dce
start using t.SetEnv instead of os.Setenv (#14787) 2022-04-23 15:33:45 -07:00
Aditya Manthramurthy e8e48e4c4a
S3 select switch to new parquet library and reduce locking (#14731)
- This change switches to a new parquet library
- SelectObjectContent now takes a single lock at the beginning and holds it
during the operation. Previously the operation took a lock every time the
parquet library performed a Seek on the underlying object stream.
- Add basic support for LogicalType annotations for timestamps.
2022-04-14 06:54:47 -07:00
Aditya Manthramurthy 79ba458051
fix: free up reader resources in S3Select properly (#14600) 2022-03-23 20:58:53 -07:00
Klaus Post c07af89e48
select: Add ScanRange to CSV&JSON (#14546)
Implements https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ScanRange

Fixes #14539
2022-03-14 09:48:36 -07:00
Klaus Post 88fd1cba71
select: add MISSING operator support (#14406)
Probably not full support, but for regular checks it should work.

Fixes #14358
2022-02-25 12:31:19 -08:00
Klaus Post 2cea944cdb
select: Allow lower case 'is' (#14405)
Ref: #14358
2022-02-24 09:10:48 -08:00
Harshavardhana f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Klaus Post 91f72f25ab
select: Return early from bool AND, OR (#13914)
Return as soon as an AND fails and whenever an OR succeeds. Faster and more flexible.

For example makes `select * from S3object where _2 != '' AND _2 > 1` able to operate on empty fields.

Followup to #13900
2021-12-15 16:47:21 -08:00
Klaus Post a8d4042853
select: Add IS (NOT) operators (#13906)
Add `IS` and `IS NOT` as comparison operators.

This may be a bit wider than the S3 spec, but we can rather 
easily remove the forwarding.
2021-12-14 09:54:50 -08:00
Klaus Post d6fe0f61a9
do not panic when input cannot be parsed (#13791)
Fix cases where `s3Select.Open` fails and doesn't set the recordReader.

Fixes #13786
2021-11-30 08:42:42 -08:00
Harshavardhana 661b263e77
add gocritic/ruleguard checks back again, cleanup code. (#13665)
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2021-11-16 09:28:29 -08:00
Harshavardhana ea820b30bf
fix: use equalFold() instead of lower and compare (#13624) 2021-11-10 08:12:50 -08:00
Harshavardhana 34680c5ccf
fix: SQL select to honor limits properly for array queries (#13568)
added tests to cover the scenarios as well.
2021-11-02 19:14:46 -07:00
Klaus Post c2eb60df4a
bz2: limit max concurrent CPU (#13458)
Ensure that bz2 decompression will never take more than 50% CPU.
2021-10-18 08:44:36 -07:00
Klaus Post 5e53f767c4
Use concurrent bz2 decompression (#13360)
Testing with `mc sql --compression BZIP2 --csv-input "rd=\n,fh=USE,fd=;" --query="select COUNT(*) from S3Object" local2/testbucket/nyc-taxi-data-10M.csv.bz2`

Before 96.98s, after 10.79s. Uses about 70% CPU while running.
2021-10-14 11:11:07 -07:00
Klaus Post 5a64003f6f
select: Return null for non-exiting column indexes (#13196)
Fixes #13186
2021-09-13 09:13:25 -07:00
Klaus Post b2c92cdaaa
select: Add more compression formats (#13142)
Support Zstandard, LZ4, S2, and snappy as additional 
compression formats for S3 Select.
2021-09-06 09:09:53 -07:00
Harshavardhana 202d0b64eb
fix: enable go1.17 github ci/cd (#12997) 2021-08-18 18:35:22 -07:00
Harshavardhana 1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00