This improves the performance of certain queries dramatically, such as 'count(*)' etc. Without this PR ``` ~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz 2173762 real 0m42.464s user 0m0.071s sys 0m0.010s ``` With this PR ``` ~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz 2173762 real 0m17.603s user 0m0.093s sys 0m0.008s ``` Almost a 250% improvement in performance. This PR avoids a lot of type conversions and instead relies on raw sequences of data and interprets them lazily. ``` benchcmp old new benchmark old ns/op new ns/op delta BenchmarkSQLAggregate_100K-4 551213 259782 -52.87% BenchmarkSQLAggregate_1M-4 6981901985 2432413729 -65.16% BenchmarkSQLAggregate_2M-4 13511978488 4536903552 -66.42% BenchmarkSQLAggregate_10M-4 68427084908 23266283336 -66.00% benchmark old allocs new allocs delta BenchmarkSQLAggregate_100K-4 2366 485 -79.50% BenchmarkSQLAggregate_1M-4 47455492 21462860 -54.77% BenchmarkSQLAggregate_2M-4 95163637 43110771 -54.70% BenchmarkSQLAggregate_10M-4 476959550 216906510 -54.52% benchmark old bytes new bytes delta BenchmarkSQLAggregate_100K-4 1233079 1086024 -11.93% BenchmarkSQLAggregate_1M-4 2607984120 557038536 -78.64% BenchmarkSQLAggregate_2M-4 5254103616 1128149168 -78.53% BenchmarkSQLAggregate_10M-4 26443524872 5722715992 -78.36% ```
11 KiB
get json values quickly
GJSON is a Go package that provides a fast and simple way to get values from a json document. It has features such as one line retrieval, dot notation paths, iteration, and parsing json lines.
Also check out SJSON for modifying json, and the JJ command line tool.
Getting Started
Installing
To start using GJSON, install Go and run go get
:
$ go get -u github.com/tidwall/gjson
This will retrieve the library.
Get a value
Get searches json for the specified path. A path is in dot syntax, such as "name.last" or "age". When the value is found it's returned immediately.
package main
import "github.com/tidwall/gjson"
const json = `{"name":{"first":"Janet","last":"Prichard"},"age":47}`
func main() {
value := gjson.Get(json, "name.last")
println(value.String())
}
This will print:
Prichard
There's also the GetMany function to get multiple values at once, and GetBytes for working with JSON byte slices.
Path Syntax
A path is a series of keys separated by a dot. A key may contain special wildcard characters '*' and '?'. To access an array value use the index as the key. To get the number of elements in an array or to access a child path, use the '#' character. The dot and wildcard characters can be escaped with '\'.
{
"name": {"first": "Tom", "last": "Anderson"},
"age":37,
"children": ["Sara","Alex","Jack"],
"fav.movie": "Deer Hunter",
"friends": [
{"first": "Dale", "last": "Murphy", "age": 44},
{"first": "Roger", "last": "Craig", "age": 68},
{"first": "Jane", "last": "Murphy", "age": 47}
]
}
"name.last" >> "Anderson"
"age" >> 37
"children" >> ["Sara","Alex","Jack"]
"children.#" >> 3
"children.1" >> "Alex"
"child*.2" >> "Jack"
"c?ildren.0" >> "Sara"
"fav\.movie" >> "Deer Hunter"
"friends.#.first" >> ["Dale","Roger","Jane"]
"friends.1.last" >> "Craig"
You can also query an array for the first match by using #[...]
, or find all matches with #[...]#
.
Queries support the ==
, !=
, <
, <=
, >
, >=
comparison operators and the simple pattern matching %
(like) and !%
(not like) operators.
friends.#[last=="Murphy"].first >> "Dale"
friends.#[last=="Murphy"]#.first >> ["Dale","Jane"]
friends.#[age>45]#.last >> ["Craig","Murphy"]
friends.#[first%"D*"].last >> "Murphy"
friends.#[first!%"D*"].last >> "Craig"
JSON Lines
There's support for JSON Lines using the ..
prefix, which treats a multilined document as an array.
For example:
{"name": "Gilbert", "age": 61}
{"name": "Alexa", "age": 34}
{"name": "May", "age": 57}
{"name": "Deloise", "age": 44}
..# >> 4
..1 >> {"name": "Alexa", "age": 34}
..3 >> {"name": "Deloise", "age": 44}
..#.name >> ["Gilbert","Alexa","May","Deloise"]
..#[name="May"].age >> 57
The ForEachLines
function will iterate through JSON lines.
gjson.ForEachLine(json, func(line gjson.Result) bool{
println(line.String())
return true
})
Result Type
GJSON supports the json types string
, number
, bool
, and null
.
Arrays and Objects are returned as their raw json types.
The Result
type holds one of these:
bool, for JSON booleans
float64, for JSON numbers
string, for JSON string literals
nil, for JSON null
To directly access the value:
result.Type // can be String, Number, True, False, Null, or JSON
result.Str // holds the string
result.Num // holds the float64 number
result.Raw // holds the raw json
result.Index // index of raw value in original json, zero means index unknown
There are a variety of handy functions that work on a result:
result.Exists() bool
result.Value() interface{}
result.Int() int64
result.Uint() uint64
result.Float() float64
result.String() string
result.Bool() bool
result.Time() time.Time
result.Array() []gjson.Result
result.Map() map[string]gjson.Result
result.Get(path string) Result
result.ForEach(iterator func(key, value Result) bool)
result.Less(token Result, caseSensitive bool) bool
The result.Value()
function returns an interface{}
which requires type assertion and is one of the following Go types:
The result.Array()
function returns back an array of values.
If the result represents a non-existent value, then an empty array will be returned.
If the result is not a JSON array, the return value will be an array containing one result.
boolean >> bool
number >> float64
string >> string
null >> nil
array >> []interface{}
object >> map[string]interface{}
64-bit integers
The result.Int()
and result.Uint()
calls are capable of reading all 64 bits, allowing for large JSON integers.
result.Int() int64 // -9223372036854775808 to 9223372036854775807
result.Uint() int64 // 0 to 18446744073709551615
Get nested array values
Suppose you want all the last names from the following json:
{
"programmers": [
{
"firstName": "Janet",
"lastName": "McLaughlin",
}, {
"firstName": "Elliotte",
"lastName": "Hunter",
}, {
"firstName": "Jason",
"lastName": "Harold",
}
]
}
You would use the path "programmers.#.lastName" like such:
result := gjson.Get(json, "programmers.#.lastName")
for _, name := range result.Array() {
println(name.String())
}
You can also query an object inside an array:
name := gjson.Get(json, `programmers.#[lastName="Hunter"].firstName`)
println(name.String()) // prints "Elliotte"
Iterate through an object or array
The ForEach
function allows for quickly iterating through an object or array.
The key and value are passed to the iterator function for objects.
Only the value is passed for arrays.
Returning false
from an iterator will stop iteration.
result := gjson.Get(json, "programmers")
result.ForEach(func(key, value gjson.Result) bool {
println(value.String())
return true // keep iterating
})
Simple Parse and Get
There's a Parse(json)
function that will do a simple parse, and result.Get(path)
that will search a result.
For example, all of these will return the same result:
gjson.Parse(json).Get("name").Get("last")
gjson.Get(json, "name").Get("last")
gjson.Get(json, "name.last")
Check for the existence of a value
Sometimes you just want to know if a value exists.
value := gjson.Get(json, "name.last")
if !value.Exists() {
println("no last name")
} else {
println(value.String())
}
// Or as one step
if gjson.Get(json, "name.last").Exists() {
println("has a last name")
}
Validate JSON
The Get*
and Parse*
functions expects that the json is well-formed. Bad json will not panic, but it may return back unexpected results.
If you are consuming JSON from an unpredictable source then you may want to validate prior to using GJSON.
if !gjson.Valid(json) {
return errors.New("invalid json")
}
value := gjson.Get(json, "name.last")
Unmarshal to a map
To unmarshal to a map[string]interface{}
:
m, ok := gjson.Parse(json).Value().(map[string]interface{})
if !ok {
// not a map
}
Working with Bytes
If your JSON is contained in a []byte
slice, there's the GetBytes function. This is preferred over Get(string(data), path)
.
var json []byte = ...
result := gjson.GetBytes(json, path)
If you are using the gjson.GetBytes(json, path)
function and you want to avoid converting result.Raw
to a []byte
, then you can use this pattern:
var json []byte = ...
result := gjson.GetBytes(json, path)
var raw []byte
if result.Index > 0 {
raw = json[result.Index:result.Index+len(result.Raw)]
} else {
raw = []byte(result.Raw)
}
This is a best-effort no allocation sub slice of the original json. This method utilizes the result.Index
field, which is the position of the raw data in the original json. It's possible that the value of result.Index
equals zero, in which case the result.Raw
is converted to a []byte
.
Get multiple values at once
The GetMany
function can be used to get multiple values at the same time.
results := gjson.GetMany(json, "name.first", "name.last", "age")
The return value is a []Result
, which will always contain exactly the same number of items as the input paths.
Performance
Benchmarks of GJSON alongside encoding/json, ffjson, EasyJSON, jsonparser, and json-iterator
BenchmarkGJSONGet-8 3000000 372 ns/op 0 B/op 0 allocs/op
BenchmarkGJSONUnmarshalMap-8 900000 4154 ns/op 1920 B/op 26 allocs/op
BenchmarkJSONUnmarshalMap-8 600000 9019 ns/op 3048 B/op 69 allocs/op
BenchmarkJSONDecoder-8 300000 14120 ns/op 4224 B/op 184 allocs/op
BenchmarkFFJSONLexer-8 1500000 3111 ns/op 896 B/op 8 allocs/op
BenchmarkEasyJSONLexer-8 3000000 887 ns/op 613 B/op 6 allocs/op
BenchmarkJSONParserGet-8 3000000 499 ns/op 21 B/op 0 allocs/op
BenchmarkJSONIterator-8 3000000 812 ns/op 544 B/op 9 allocs/op
JSON document used:
{
"widget": {
"debug": "on",
"window": {
"title": "Sample Konfabulator Widget",
"name": "main_window",
"width": 500,
"height": 500
},
"image": {
"src": "Images/Sun.png",
"hOffset": 250,
"vOffset": 250,
"alignment": "center"
},
"text": {
"data": "Click Here",
"size": 36,
"style": "bold",
"vOffset": 100,
"alignment": "center",
"onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;"
}
}
}
Each operation was rotated though one of the following search paths:
widget.window.name
widget.image.hOffset
widget.text.onMouseUp
These benchmarks were run on a MacBook Pro 15" 2.8 GHz Intel Core i7 using Go 1.8 and can be be found here.
Contact
Josh Baker @tidwall
License
GJSON source code is available under the MIT License.