minio

mirror of https://github.com/minio/minio.git synced 2025-07-20 14:01:14 -04:00

Author	SHA1	Message	Date
Klaus Post	bf3a97d3aa	S3 Select: Concurrent LINES delimited json parsing (#8610 ) The speedup is ~5x on a 6 core CPU	2019-12-09 06:55:31 -08:00
Klaus Post	f1e2e1cc9e	S3 Select: Mismatched types don't match (#8608 ) When comparing for equality, if types cannot be matched, they don't match.	2019-12-06 07:24:41 -08:00
Harshavardhana	5d3d57c12a	Start using error wrapping with fmt.Errorf (#8588 ) Use fatih/errwrap to fix all the code to use error wrapping with fmt.Errorf()	2019-12-02 09:28:01 -08:00
Klaus Post	1c90a6bd49	S3 Select: Convert CSV data to JSON (#8464 )	2019-11-09 09:10:35 -08:00
Klaus Post	26e760ee62	Fix JSON Close data race. (#8486 ) The JSON stream library has no safe way of aborting while Since we cannot expect the called to safely handle "Read" and "Close" calls we must handle this. Also any Read error returned from upstream will crash the server. We preserve the errors and instead always return io.EOF upstream, but send the error on Close. `readahead v1.3.1` handles Read after Close better. Updates to `progressReader` is mostly to ensure safety. Fixes #8481	2019-11-05 14:20:37 -08:00
Klaus Post	38e6d911ea	S3 Select: Detect full object (#8456 ) Check if select is `SELECT s.* from S3Object s` and forward it to All Fixes #8371 and makes this case run significantly faster.	2019-10-30 13:46:55 +05:30
Klaus Post	51456e6adc	Select: Support Square Bracket Lists (#8457 ) Allows for S3 compatible `SELECT * from s3object s WHERE id IN [3,2]` Fixes #8422	2019-10-30 11:34:40 +05:30
Harshavardhana	d48fd6fde9	Remove unusued params and functions (#8399 )	2019-10-15 18:35:41 -07:00
Klaus Post	002ac82631	S3 Select: Add parser support for lists. (#8329 )	2019-10-06 07:52:45 -07:00
Klaus Post	c1a17c2561	S3 Select: Aggregate AVG/SUM as float (#8326 ) Force sum/average to be calculated as a float. As noted in #8221 > run SELECT AVG(CAST (Score as int)) FROM S3Object on ``` Name,Score alice,80 bob,81 ``` > AWS S3 gives 80.5 and MinIO gives 80. This also makes overflows much more unlikely.	2019-09-27 16:12:03 -07:00
Klaus Post	1c5b05c130	S3 select: Fix output conversion on select * (#8303 ) Fixes #8268	2019-09-27 12:33:14 -07:00
Klaus Post	be313f1758	S3 Select: Workaround java buffer size (#8312 ) Updates #7475 The Java implementation has a 128KB buffer and a message must be emitted before that is used. #7475 therefore limits the message size to 128KB. But up to 256 bytes are written to the buffer in each call. This means we must emit a message before shorter than 128KB. Therefore we change the limit to 128KB minus 256 bytes.	2019-09-26 04:56:20 +05:30
Klaus Post	520552ffa9	S3 select: flush when reaching limit (#8279 ) Add missing flush when reaching select limit.	2019-09-20 11:00:17 -07:00
Klaus Post	dac1cf5a9a	S3 Select: Parsing tweaks (#8261 ) * Don't output empty lines. * Trim whitespace from byte to int/float/bool conversions.	2019-09-17 17:21:23 -07:00
Klaus Post	c9b8bd8de2	S3 Select: optimize output (#8238 ) Queue output items and reuse them. Remove the unneeded type system in sql and just use the Go type system. In best case this is more than an order of magnitude speedup: ``` BenchmarkSelectAll_1M-12 1 1841049400 ns/op 274299728 B/op 4198522 allocs/op BenchmarkSelectAll_1M-12 14 84833400 ns/op 169228346 B/op 3146541 allocs/op ```	2019-09-17 05:56:27 +05:30
Klaus Post	017456df63	Wait clearing the close channel (#8250 ) Close channel should not be nilled before goroutines have exited. Fixes potential hang on closing.	2019-09-16 16:18:01 -07:00
Klaus Post	ddea0bdf11	Concurrent CSV parsing and reduce S3 select allocations (#8200 ) ``` CSV parsing, BEFORE: BenchmarkReaderBasic-12 2842 407533 ns/op 397860 B/op 957 allocs/op BenchmarkReaderReplace-12 2718 429914 ns/op 397844 B/op 957 allocs/op BenchmarkReaderReplaceTwo-12 2718 435556 ns/op 397855 B/op 957 allocs/op BenchmarkAggregateCount_100K-12 171 6798974 ns/op 16667102 B/op 308077 allocs/op BenchmarkAggregateCount_1M-12 19 65657411 ns/op 168057743 B/op 3146610 allocs/op BenchmarkSelectAll_10M-12 1 20882119900 ns/op 2758799896 B/op 41978762 allocs/op CSV parsing, AFTER: BenchmarkReaderBasic-12 3721 312549 ns/op 101920 B/op 338 allocs/op BenchmarkReaderReplace-12 3776 318810 ns/op 101993 B/op 340 allocs/op BenchmarkReaderReplaceTwo-12 3610 330967 ns/op 102012 B/op 341 allocs/op BenchmarkAggregateCount_100K-12 295 4149588 ns/op 3553623 B/op 103261 allocs/op BenchmarkAggregateCount_1M-12 30 37746503 ns/op 33827931 B/op 1049435 allocs/op BenchmarkSelectAll_10M-12 1 17608495800 ns/op 1416504040 B/op 21007082 allocs/op ~ benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkReaderBasic-12 407533 312549 -23.31% BenchmarkReaderReplace-12 429914 318810 -25.84% BenchmarkReaderReplaceTwo-12 435556 330967 -24.01% BenchmarkAggregateCount_100K-12 6798974 4149588 -38.97% BenchmarkAggregateCount_1M-12 65657411 37746503 -42.51% BenchmarkSelectAll_10M-12 20882119900 17608495800 -15.68% benchmark old allocs new allocs delta BenchmarkReaderBasic-12 957 338 -64.68% BenchmarkReaderReplace-12 957 340 -64.47% BenchmarkReaderReplaceTwo-12 957 341 -64.37% BenchmarkAggregateCount_100K-12 308077 103261 -66.48% BenchmarkAggregateCount_1M-12 3146610 1049435 -66.65% BenchmarkSelectAll_10M-12 41978762 21007082 -49.96% benchmark old bytes new bytes delta BenchmarkReaderBasic-12 397860 101920 -74.38% BenchmarkReaderReplace-12 397844 101993 -74.36% BenchmarkReaderReplaceTwo-12 397855 102012 -74.36% BenchmarkAggregateCount_100K-12 16667102 3553623 -78.68% BenchmarkAggregateCount_1M-12 168057743 33827931 -79.87% BenchmarkSelectAll_10M-12 2758799896 1416504040 -48.66% ``` ``` BenchmarkReaderHuge/97K-12 2200 540840 ns/op 184.32 MB/s 1604450 B/op 687 allocs/op BenchmarkReaderHuge/194K-12 1522 752257 ns/op 265.04 MB/s 2143135 B/op 1335 allocs/op BenchmarkReaderHuge/389K-12 1190 947858 ns/op 420.69 MB/s 3221831 B/op 2630 allocs/op BenchmarkReaderHuge/778K-12 806 1472486 ns/op 541.61 MB/s 5201856 B/op 5187 allocs/op BenchmarkReaderHuge/1557K-12 426 2575269 ns/op 619.36 MB/s 9101330 B/op 10233 allocs/op BenchmarkReaderHuge/3115K-12 286 4034656 ns/op 790.66 MB/s 12397968 B/op 16099 allocs/op BenchmarkReaderHuge/6230K-12 172 6830563 ns/op 934.05 MB/s 16008416 B/op 26844 allocs/op BenchmarkReaderHuge/12461K-12 100 11409467 ns/op 1118.39 MB/s 22655163 B/op 48107 allocs/op BenchmarkReaderHuge/24922K-12 66 19780395 ns/op 1290.19 MB/s 35158559 B/op 90216 allocs/op BenchmarkReaderHuge/49844K-12 34 37282559 ns/op 1369.03 MB/s 60528624 B/op 174497 allocs/op ```	2019-09-13 14:18:35 -07:00
Yao Zongyou	18fedc67d5	friendly prompt for s3select MalformedXML error (#8171 ) partly fix #7911	2019-09-09 21:33:27 -07:00
Yao Zongyou	ec9bfd3aef	speed up the performance of s3select on csv (#7945 )	2019-08-31 00:07:40 -07:00
Kanagaraj M	12353caf35	Fix: Support Unicode delimiters in s3 select (#7931 )	2019-07-17 19:10:17 +01:00
Yao Zongyou	c4f480a839	fix csv read bug (#7885 )	2019-07-05 12:08:56 -07:00
Yao Zongyou	60831e3299	aggregation functions' argument may already has been cast to numeric (#7876 )	2019-07-05 10:38:38 -07:00
Yao Zongyou	037319066f	fix unicode support related bugs in s3select (#7877 )	2019-07-05 09:43:10 -07:00
Ryan Tam	bd56f80250	Fix ignored alias for aggregate result in S3 Select (#7849 ) The SQL parser as it stands right now ignores alias for aggregate result, e.g. `SELECT COUNT(*) AS thing FROM s3object` doesn't actually return record like `{"thing": 42}`, it returns a record like `{"_1": 42}`. Column alias for aggregate result is supported in AWS's S3 Select, so this commit fixes that by respecting the `expr.As` in the expression. Also improve test for S3 select On top of testing a simple `SELECT` query, we want to test a few more "advanced" queries (e.g. aggregation). Convert existing tests into table driven tests[1], and add the new test cases with "advanced" queries into them. [1] - https://github.com/golang/go/wiki/TableDrivenTests	2019-07-03 16:34:54 -07:00
Yao Zongyou	941fed8e4a	s3Select: call Close on error to release the read lock (#7830 )	2019-06-25 13:30:48 -07:00
Yao Zongyou	55092bede1	add timestamp compare support (#7832 )	2019-06-25 11:05:37 -07:00
Yao Zongyou	90a3b830f4	fix typo and the string representation of the time.Time value (#7831 )	2019-06-25 09:54:14 -07:00
Yao Zongyou	23b9df0694	Fix s3select TRIM function's nil pointer dereference bug (#7817 )	2019-06-24 16:59:33 -07:00
Joe Stevens	a19cf063b5	Fixes for multiplatform dev and testing from forks (#7734 ) Add support for correct dependency URLs on all platforms only build mountinfo.go on linux make testfile path relative to support fork work	2019-06-04 00:59:40 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Aditya Manthramurthy	b1b1d77893	Set S3 Select record message length to 128KiB (#7475 ) - Previously this limit was a little more than 1MiB, and it broke compatibility with AWS SDK Java causing a buffer overflow error.	2019-04-04 00:41:52 -07:00
Kirill Motkov	3d29ab4059	Rewrite if-else chains to switch statements (#7382 )	2019-03-18 07:46:20 -07:00
Harshavardhana	91d85a0d53	Fix stale locks held by SelectParquet API (#7364 ) Vendorize upstream parquet-go to fix this issue.	2019-03-13 20:33:18 -07:00
Aditya Manthramurthy	e463386921	Add JSON Path expression evaluation support (#7315 ) - Includes support for FROM clause JSON path	2019-03-09 08:13:37 -08:00
Aditya Manthramurthy	f4879ed96d	Use jstream to serialize records to JSON format in S3Select (#7318 ) - Also, switch to jstream to generate internal record representation from CSV/JSON readers - This fixes a bug in which JSON output objects have their keys reversed from the order they are specified in the Select columns. - Also includes a fix for tests.	2019-03-07 00:20:10 -08:00
Harshavardhana	2520e535a0	Allow lazyQuotes for certain types of CSV (#7278 ) Set lazyQuotes to true, to allow a quote to appear in an unquote field and a non-doubled quote may appear in a quoted field.	2019-02-24 06:51:02 -08:00
Aditya Manthramurthy	80a351633f	Update vendorized bcicen/jstream (#7257 ) - Includes an error handling fix that is waiting to be merged upstream - Uses order-preserving (un)marshalling for JSON objects.	2019-02-20 23:59:23 -08:00
Aditya Manthramurthy	8a405cab2f	COUNT() function in select should return an int (#7243 )	2019-02-13 16:32:59 -08:00
Harshavardhana	df35d7db9d	Introduce staticcheck for stricter builds (#7035 )	2019-02-13 18:29:36 +05:30
Aditya Manthramurthy	ee5b3622a5	Evaluate where clause in aggregation queries (#7235 )	2019-02-12 13:54:26 -08:00
Harshavardhana	85e939636f	Fix JSON parser handling for certain objects (#7162 ) This PR also adds some comments and simplifies the code. Primary handling is done to ensure that we make sure to honor cached buffer. Added unit tests as well Fixes #7141	2019-02-07 08:04:42 +05:30
Aditya Manthramurthy	4aa9ee153b	Fix S3 Select request XML parsing (#7202 )	2019-02-06 13:25:52 -08:00
Aditya Manthramurthy	fd4e15c116	Flush the records staging buffer periodically (#7193 ) - Staging buffer is flushed every 500ms. In cases where the result records are slowly generated (e.g. when a where condition matches very few records), this change causes the server to send results even though the staging buffer is not full. - Refactor messageWriter code to use simpler channel based co-ordination instead of atomic variables.	2019-02-06 16:03:05 +05:30
Aditya Manthramurthy	f04f8bbc78	Add support for Timestamp data type in SQL Select (#7185 ) This change adds support for casting strings to Timestamp via CAST: `CAST('2010T' AS TIMESTAMP)` It also implements the following date-time functions: - UTCNOW() - DATE_ADD() - DATE_DIFF() - EXTRACT() For values passed to these functions, date-types are automatically inferred.	2019-02-04 20:54:45 -08:00
Aditya Manthramurthy	91c839ad28	Use a buffer to collect SQL Select result rows (#7158 ) Batching records into a single SQL Select message in the response leads to significant speed up as the message header overhead is made negligible. This change leads to a speed up of 3-5x for queries that select many small records.	2019-01-28 20:00:18 -08:00
Aditya Manthramurthy	2786055df4	Add new SQL parser to support S3 Select syntax (#7102 ) - New parser written from scratch, allows easier and complete parsing of the full S3 Select SQL syntax. Parser definition is directly provided by the AST defined for the SQL grammar. - Bring support to parse and interpret SQL involving JSON path expressions; evaluation of JSON path expressions will be subsequently added. - Bring automatic type inference and conversion for untyped values (e.g. CSV data).	2019-01-28 17:59:48 -08:00
Bala FA	e23a42305c	Rebase minio/parquet-go and fix null handling. (#7067 )	2019-01-16 21:52:04 +05:30
Bala FA	b0deea27df	Refactor s3select to support parquet. (#7023 ) Also handle pretty formatted JSON documents.	2019-01-08 16:53:04 -08:00
Aditya Manthramurthy	2aeb3fbe86	Fix csv output delimiter bug (#6994 )	2018-12-19 11:49:06 +05:30
Harshavardhana	4c7c571875	Support JSON to CSV and CSV to JSON output format conversion (#6910 ) This PR implements one of the pending items in issue #6286 in S3 API a user can request CSV output for a JSON document and a JSON output for a CSV document. This PR refactors the code a little bit to bring this feature.	2018-12-07 14:55:32 -08:00

1 2

63 Commits