Klaus Post
adca28801d
feat: disable Parquet by default (breaking change) ( #9920 )
...
I have built a fuzz test and it crashes heavily in seconds and will OOM shortly after.
It seems like supporting Parquet is basically a completely open way to crash the
server if you can upload a file and run s3 select on it.
Until Parquet is more hardened it is DISABLED by default since hostile
crafted input can easily crash the server.
If you are in a controlled environment where it is safe to assume no hostile
content can be uploaded to your cluster you can safely enable Parquet.
To enable Parquet set the environment variable `MINIO_API_SELECT_PARQUET=on`
while starting the MinIO server.
Furthermore, we guard parquet by recover functions.
2020-08-18 10:23:28 -07:00
Leletir
db3f41fcb4
Doc: change url for Total Population CSV ( #8633 )
2019-12-11 14:37:48 -08:00
Klaus Post
ddea0bdf11
Concurrent CSV parsing and reduce S3 select allocations ( #8200 )
...
```
CSV parsing, BEFORE:
BenchmarkReaderBasic-12 2842 407533 ns/op 397860 B/op 957 allocs/op
BenchmarkReaderReplace-12 2718 429914 ns/op 397844 B/op 957 allocs/op
BenchmarkReaderReplaceTwo-12 2718 435556 ns/op 397855 B/op 957 allocs/op
BenchmarkAggregateCount_100K-12 171 6798974 ns/op 16667102 B/op 308077 allocs/op
BenchmarkAggregateCount_1M-12 19 65657411 ns/op 168057743 B/op 3146610 allocs/op
BenchmarkSelectAll_10M-12 1 20882119900 ns/op 2758799896 B/op 41978762 allocs/op
CSV parsing, AFTER:
BenchmarkReaderBasic-12 3721 312549 ns/op 101920 B/op 338 allocs/op
BenchmarkReaderReplace-12 3776 318810 ns/op 101993 B/op 340 allocs/op
BenchmarkReaderReplaceTwo-12 3610 330967 ns/op 102012 B/op 341 allocs/op
BenchmarkAggregateCount_100K-12 295 4149588 ns/op 3553623 B/op 103261 allocs/op
BenchmarkAggregateCount_1M-12 30 37746503 ns/op 33827931 B/op 1049435 allocs/op
BenchmarkSelectAll_10M-12 1 17608495800 ns/op 1416504040 B/op 21007082 allocs/op
~ benchcmp old.txt new.txt
benchmark old ns/op new ns/op delta
BenchmarkReaderBasic-12 407533 312549 -23.31%
BenchmarkReaderReplace-12 429914 318810 -25.84%
BenchmarkReaderReplaceTwo-12 435556 330967 -24.01%
BenchmarkAggregateCount_100K-12 6798974 4149588 -38.97%
BenchmarkAggregateCount_1M-12 65657411 37746503 -42.51%
BenchmarkSelectAll_10M-12 20882119900 17608495800 -15.68%
benchmark old allocs new allocs delta
BenchmarkReaderBasic-12 957 338 -64.68%
BenchmarkReaderReplace-12 957 340 -64.47%
BenchmarkReaderReplaceTwo-12 957 341 -64.37%
BenchmarkAggregateCount_100K-12 308077 103261 -66.48%
BenchmarkAggregateCount_1M-12 3146610 1049435 -66.65%
BenchmarkSelectAll_10M-12 41978762 21007082 -49.96%
benchmark old bytes new bytes delta
BenchmarkReaderBasic-12 397860 101920 -74.38%
BenchmarkReaderReplace-12 397844 101993 -74.36%
BenchmarkReaderReplaceTwo-12 397855 102012 -74.36%
BenchmarkAggregateCount_100K-12 16667102 3553623 -78.68%
BenchmarkAggregateCount_1M-12 168057743 33827931 -79.87%
BenchmarkSelectAll_10M-12 2758799896 1416504040 -48.66%
```
```
BenchmarkReaderHuge/97K-12 2200 540840 ns/op 184.32 MB/s 1604450 B/op 687 allocs/op
BenchmarkReaderHuge/194K-12 1522 752257 ns/op 265.04 MB/s 2143135 B/op 1335 allocs/op
BenchmarkReaderHuge/389K-12 1190 947858 ns/op 420.69 MB/s 3221831 B/op 2630 allocs/op
BenchmarkReaderHuge/778K-12 806 1472486 ns/op 541.61 MB/s 5201856 B/op 5187 allocs/op
BenchmarkReaderHuge/1557K-12 426 2575269 ns/op 619.36 MB/s 9101330 B/op 10233 allocs/op
BenchmarkReaderHuge/3115K-12 286 4034656 ns/op 790.66 MB/s 12397968 B/op 16099 allocs/op
BenchmarkReaderHuge/6230K-12 172 6830563 ns/op 934.05 MB/s 16008416 B/op 26844 allocs/op
BenchmarkReaderHuge/12461K-12 100 11409467 ns/op 1118.39 MB/s 22655163 B/op 48107 allocs/op
BenchmarkReaderHuge/24922K-12 66 19780395 ns/op 1290.19 MB/s 35158559 B/op 90216 allocs/op
BenchmarkReaderHuge/49844K-12 34 37282559 ns/op 1369.03 MB/s 60528624 B/op 174497 allocs/op
```
2019-09-13 14:18:35 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links ( #7494 )
2019-04-09 11:39:42 -07:00
Harshavardhana
a8cd70f3e5
Remove GPL go-lzo dependency for parquet-go ( #7220 )
...
Also remove any other unused dependencies
2019-02-11 14:57:24 +05:30
Aditya Manthramurthy
f04f8bbc78
Add support for Timestamp data type in SQL Select ( #7185 )
...
This change adds support for casting strings to Timestamp via CAST:
`CAST('2010T' AS TIMESTAMP)`
It also implements the following date-time functions:
- UTCNOW()
- DATE_ADD()
- DATE_DIFF()
- EXTRACT()
For values passed to these functions, date-types are automatically
inferred.
2019-02-04 20:54:45 -08:00
Harshavardhana
e005910051
Add more information in our select docs ( #7177 )
2019-02-01 11:34:56 -08:00
Aditya Manthramurthy
2786055df4
Add new SQL parser to support S3 Select syntax ( #7102 )
...
- New parser written from scratch, allows easier and complete parsing
of the full S3 Select SQL syntax. Parser definition is directly
provided by the AST defined for the SQL grammar.
- Bring support to parse and interpret SQL involving JSON path
expressions; evaluation of JSON path expressions will be
subsequently added.
- Bring automatic type inference and conversion for untyped
values (e.g. CSV data).
2019-01-28 17:59:48 -08:00
Harshavardhana
5a4a57700b
Add select docs and fix return values for Select API ( #6300 )
2018-08-17 17:11:39 -07:00
Harshavardhana
f5df3b4795
Remove select docs ( #6287 )
...
Select API is sufficiently documented, this doc is also incomplete.
- https://aws.amazon.com/blogs/aws/s3-glacier-select/
- https://aws.amazon.com/blogs/developer/introducing-support-for-amazon-s3-select-in-the-aws-sdk-for-ruby/
- https://aws.amazon.com/blogs/developer/introducing-support-for-amazon-s3-select-in-the-aws-sdk-for-javascript/
- https://aws.amazon.com/blogs/developer/category/storage/s3-select/
2018-08-15 19:47:22 -07:00
Arjun Mishra
7c14cdb60e
S3 Select API Support for CSV ( #6127 )
...
Add support for trivial where clause cases
2018-08-15 03:30:19 -07:00