minio

mirror of https://github.com/minio/minio.git synced 2025-11-24 19:46:16 -05:00

Author	SHA1	Message	Date
Yao Zongyou	60831e3299	aggregation functions' argument may already has been cast to numeric (#7876 )	2019-07-05 10:38:38 -07:00
Yao Zongyou	037319066f	fix unicode support related bugs in s3select (#7877 )	2019-07-05 09:43:10 -07:00
Ryan Tam	bd56f80250	Fix ignored alias for aggregate result in S3 Select (#7849 ) The SQL parser as it stands right now ignores alias for aggregate result, e.g. `SELECT COUNT(*) AS thing FROM s3object` doesn't actually return record like `{"thing": 42}`, it returns a record like `{"_1": 42}`. Column alias for aggregate result is supported in AWS's S3 Select, so this commit fixes that by respecting the `expr.As` in the expression. Also improve test for S3 select On top of testing a simple `SELECT` query, we want to test a few more "advanced" queries (e.g. aggregation). Convert existing tests into table driven tests[1], and add the new test cases with "advanced" queries into them. [1] - https://github.com/golang/go/wiki/TableDrivenTests	2019-07-03 16:34:54 -07:00
Yao Zongyou	941fed8e4a	s3Select: call Close on error to release the read lock (#7830 )	2019-06-25 13:30:48 -07:00
Yao Zongyou	55092bede1	add timestamp compare support (#7832 )	2019-06-25 11:05:37 -07:00
Yao Zongyou	90a3b830f4	fix typo and the string representation of the time.Time value (#7831 )	2019-06-25 09:54:14 -07:00
Yao Zongyou	23b9df0694	Fix s3select TRIM function's nil pointer dereference bug (#7817 )	2019-06-24 16:59:33 -07:00
Joe Stevens	a19cf063b5	Fixes for multiplatform dev and testing from forks (#7734 ) Add support for correct dependency URLs on all platforms only build mountinfo.go on linux make testfile path relative to support fork work	2019-06-04 00:59:40 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Aditya Manthramurthy	b1b1d77893	Set S3 Select record message length to 128KiB (#7475 ) - Previously this limit was a little more than 1MiB, and it broke compatibility with AWS SDK Java causing a buffer overflow error.	2019-04-04 00:41:52 -07:00
Kirill Motkov	3d29ab4059	Rewrite if-else chains to switch statements (#7382 )	2019-03-18 07:46:20 -07:00
Harshavardhana	91d85a0d53	Fix stale locks held by SelectParquet API (#7364 ) Vendorize upstream parquet-go to fix this issue.	2019-03-13 20:33:18 -07:00
Aditya Manthramurthy	e463386921	Add JSON Path expression evaluation support (#7315 ) - Includes support for FROM clause JSON path	2019-03-09 08:13:37 -08:00
Aditya Manthramurthy	f4879ed96d	Use jstream to serialize records to JSON format in S3Select (#7318 ) - Also, switch to jstream to generate internal record representation from CSV/JSON readers - This fixes a bug in which JSON output objects have their keys reversed from the order they are specified in the Select columns. - Also includes a fix for tests.	2019-03-07 00:20:10 -08:00
Harshavardhana	2520e535a0	Allow lazyQuotes for certain types of CSV (#7278 ) Set lazyQuotes to true, to allow a quote to appear in an unquote field and a non-doubled quote may appear in a quoted field.	2019-02-24 06:51:02 -08:00
Aditya Manthramurthy	80a351633f	Update vendorized bcicen/jstream (#7257 ) - Includes an error handling fix that is waiting to be merged upstream - Uses order-preserving (un)marshalling for JSON objects.	2019-02-20 23:59:23 -08:00
Aditya Manthramurthy	8a405cab2f	COUNT() function in select should return an int (#7243 )	2019-02-13 16:32:59 -08:00
Harshavardhana	df35d7db9d	Introduce staticcheck for stricter builds (#7035 )	2019-02-13 18:29:36 +05:30
Aditya Manthramurthy	ee5b3622a5	Evaluate where clause in aggregation queries (#7235 )	2019-02-12 13:54:26 -08:00
Harshavardhana	85e939636f	Fix JSON parser handling for certain objects (#7162 ) This PR also adds some comments and simplifies the code. Primary handling is done to ensure that we make sure to honor cached buffer. Added unit tests as well Fixes #7141	2019-02-07 08:04:42 +05:30
Aditya Manthramurthy	4aa9ee153b	Fix S3 Select request XML parsing (#7202 )	2019-02-06 13:25:52 -08:00
Aditya Manthramurthy	fd4e15c116	Flush the records staging buffer periodically (#7193 ) - Staging buffer is flushed every 500ms. In cases where the result records are slowly generated (e.g. when a where condition matches very few records), this change causes the server to send results even though the staging buffer is not full. - Refactor messageWriter code to use simpler channel based co-ordination instead of atomic variables.	2019-02-06 16:03:05 +05:30
Aditya Manthramurthy	f04f8bbc78	Add support for Timestamp data type in SQL Select (#7185 ) This change adds support for casting strings to Timestamp via CAST: `CAST('2010T' AS TIMESTAMP)` It also implements the following date-time functions: - UTCNOW() - DATE_ADD() - DATE_DIFF() - EXTRACT() For values passed to these functions, date-types are automatically inferred.	2019-02-04 20:54:45 -08:00
Aditya Manthramurthy	91c839ad28	Use a buffer to collect SQL Select result rows (#7158 ) Batching records into a single SQL Select message in the response leads to significant speed up as the message header overhead is made negligible. This change leads to a speed up of 3-5x for queries that select many small records.	2019-01-28 20:00:18 -08:00
Aditya Manthramurthy	2786055df4	Add new SQL parser to support S3 Select syntax (#7102 ) - New parser written from scratch, allows easier and complete parsing of the full S3 Select SQL syntax. Parser definition is directly provided by the AST defined for the SQL grammar. - Bring support to parse and interpret SQL involving JSON path expressions; evaluation of JSON path expressions will be subsequently added. - Bring automatic type inference and conversion for untyped values (e.g. CSV data).	2019-01-28 17:59:48 -08:00
Bala FA	e23a42305c	Rebase minio/parquet-go and fix null handling. (#7067 )	2019-01-16 21:52:04 +05:30
Bala FA	b0deea27df	Refactor s3select to support parquet. (#7023 ) Also handle pretty formatted JSON documents.	2019-01-08 16:53:04 -08:00
Aditya Manthramurthy	2aeb3fbe86	Fix csv output delimiter bug (#6994 )	2018-12-19 11:49:06 +05:30
Harshavardhana	4c7c571875	Support JSON to CSV and CSV to JSON output format conversion (#6910 ) This PR implements one of the pending items in issue #6286 in S3 API a user can request CSV output for a JSON document and a JSON output for a CSV document. This PR refactors the code a little bit to bring this feature.	2018-12-07 14:55:32 -08:00
Harshavardhana	272b8003d6	Honor header only when requested for use (#6815 )	2018-11-16 10:27:48 -08:00
Harshavardhana	7e1661f4fa	Performance improvements to SELECT API on certain query operations (#6752 ) This improves the performance of certain queries dramatically, such as 'count()' etc. Without this PR ``` ~ time mc select --query "select count() from S3Object" myminio/sjm-airlines/star2000.csv.gz 2173762 real 0m42.464s user 0m0.071s sys 0m0.010s ``` With this PR ``` ~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz 2173762 real 0m17.603s user 0m0.093s sys 0m0.008s ``` Almost a 250% improvement in performance. This PR avoids a lot of type conversions and instead relies on raw sequences of data and interprets them lazily. ``` benchcmp old new benchmark old ns/op new ns/op delta BenchmarkSQLAggregate_100K-4 551213 259782 -52.87% BenchmarkSQLAggregate_1M-4 6981901985 2432413729 -65.16% BenchmarkSQLAggregate_2M-4 13511978488 4536903552 -66.42% BenchmarkSQLAggregate_10M-4 68427084908 23266283336 -66.00% benchmark old allocs new allocs delta BenchmarkSQLAggregate_100K-4 2366 485 -79.50% BenchmarkSQLAggregate_1M-4 47455492 21462860 -54.77% BenchmarkSQLAggregate_2M-4 95163637 43110771 -54.70% BenchmarkSQLAggregate_10M-4 476959550 216906510 -54.52% benchmark old bytes new bytes delta BenchmarkSQLAggregate_100K-4 1233079 1086024 -11.93% BenchmarkSQLAggregate_1M-4 2607984120 557038536 -78.64% BenchmarkSQLAggregate_2M-4 5254103616 1128149168 -78.53% BenchmarkSQLAggregate_10M-4 26443524872 5722715992 -78.36% ```	2018-11-14 15:55:10 -08:00
Harshavardhana	f162d7bd97	Performance improvements by re-using record buffer (#6622 ) Avoid unnecessary pointer reference allocations when not needed, for example - SelectFuncs{} - Row{}	2018-10-31 08:48:01 +05:30
Ashish Kumar Sinha	c0b4bf0a3e	SQL select query for CSV/JSON (#6648 ) select * , select column names have been implemented for CSV. select * is implemented for JSON.	2018-10-22 12:12:22 -07:00
Praveen raj Mani	cef044178c	Treat columns with spaces inbetween [s3Select] (#6597 ) replace the double/single quotes with backticks for the xwb1989/sqlparser to recognise such queries. Fixes #6589	2018-10-17 11:01:26 -07:00
Aditya Manthramurthy	e3eec89d24	Optimize string processing in select (#6593 ) Reduce allocations during string concatenation and simplify some processing code.	2018-10-09 14:02:19 -07:00
Aditya Manthramurthy	16a100b597	Fix out-of-bound array access crash in select processing (#6594 ) Fix test case.	2018-10-09 09:45:56 -07:00
Ashish Kumar Sinha	670f9788e3	Count(*) to give integer value (#6564 ) The Max, Min functions were giving float value even when they were integers. Resolved max and Min to return integers in that scenario. Fixes #6472	2018-10-04 17:33:53 -07:00
Harshavardhana	a0683d3c1f	Send progress only when requested by client in SelectObject (#6467 )	2018-09-17 11:52:46 +05:30
Praveen raj Mani	30d4a2cf53	s3select should honour custom record delimiter (#6419 ) Allow custom delimiters like `\r\n`, `a`, `\r` etc in input csv and replace with `\n`. Fixes #6403	2018-09-10 21:50:28 +05:30
Raphael Randschau	8601f29d95	select: fix int overflow of math.MaxInt64 on ARM (#6317 )	2018-08-22 16:16:04 +05:30
Harshavardhana	5a4a57700b	Add select docs and fix return values for Select API (#6300 )	2018-08-17 17:11:39 -07:00
Arjun Mishra	7c14cdb60e	S3 Select API Support for CSV (#6127 ) Add support for trivial where clause cases	2018-08-15 03:30:19 -07:00

42 Commits