The JSON stream library has no safe way of aborting while
Since we cannot expect the called to safely handle "Read" and "Close" calls we must handle this.
Also any Read error returned from upstream will crash the server. We preserve the errors and instead always return io.EOF upstream, but send the error on Close.
`readahead v1.3.1` handles Read after Close better.
Updates to `progressReader` is mostly to ensure safety.
Fixes#8481
Force sum/average to be calculated as a float.
As noted in #8221
> run SELECT AVG(CAST (Score as int)) FROM S3Object on
```
Name,Score
alice,80
bob,81
```
> AWS S3 gives 80.5 and MinIO gives 80.
This also makes overflows much more unlikely.
Updates #7475
The Java implementation has a 128KB buffer and a message must be emitted before that is used. #7475 therefore limits the message size to 128KB. But up to 256 bytes are written to the buffer in each call. This means we must emit a message before shorter than 128KB.
Therefore we change the limit to 128KB minus 256 bytes.
Queue output items and reuse them.
Remove the unneeded type system in sql and just use the Go type system.
In best case this is more than an order of magnitude speedup:
```
BenchmarkSelectAll_1M-12 1 1841049400 ns/op 274299728 B/op 4198522 allocs/op
BenchmarkSelectAll_1M-12 14 84833400 ns/op 169228346 B/op 3146541 allocs/op
```
The SQL parser as it stands right now ignores alias for aggregate
result, e.g. `SELECT COUNT(*) AS thing FROM s3object` doesn't actually
return record like `{"thing": 42}`, it returns a record like `{"_1": 42}`.
Column alias for aggregate result is supported in AWS's S3 Select, so
this commit fixes that by respecting the `expr.As` in the expression.
Also improve test for S3 select
On top of testing a simple `SELECT` query, we want to test a few more
"advanced" queries (e.g. aggregation).
Convert existing tests into table driven tests[1], and add the new test
cases with "advanced" queries into them.
[1] - https://github.com/golang/go/wiki/TableDrivenTests
- Also, switch to jstream to generate internal record representation
from CSV/JSON readers
- This fixes a bug in which JSON output objects have their keys
reversed from the order they are specified in the Select columns.
- Also includes a fix for tests.
This PR also adds some comments and simplifies
the code. Primary handling is done to ensure
that we make sure to honor cached buffer.
Added unit tests as well
Fixes#7141
- Staging buffer is flushed every 500ms. In cases where the result
records are slowly generated (e.g. when a where condition
matches very few records), this change causes the server to send
results even though the staging buffer is not full.
- Refactor messageWriter code to use simpler channel based
co-ordination instead of atomic variables.
This change adds support for casting strings to Timestamp via CAST:
`CAST('2010T' AS TIMESTAMP)`
It also implements the following date-time functions:
- UTCNOW()
- DATE_ADD()
- DATE_DIFF()
- EXTRACT()
For values passed to these functions, date-types are automatically
inferred.
Batching records into a single SQL Select message in the response
leads to significant speed up as the message header overhead is made
negligible.
This change leads to a speed up of 3-5x for queries that select many
small records.
- New parser written from scratch, allows easier and complete parsing
of the full S3 Select SQL syntax. Parser definition is directly
provided by the AST defined for the SQL grammar.
- Bring support to parse and interpret SQL involving JSON path
expressions; evaluation of JSON path expressions will be
subsequently added.
- Bring automatic type inference and conversion for untyped
values (e.g. CSV data).
This PR implements one of the pending items in issue #6286
in S3 API a user can request CSV output for a JSON document
and a JSON output for a CSV document. This PR refactors
the code a little bit to bring this feature.