We need to have local peer initialized properly
for listen bucket to work, current code did initialize
properly but the resulting code was initializing
peer on a wrong target v/s what listen bucket expected
it to be.
This regression came in de204a0a52Fixes#4158
Avoid using `time.Now()` instead rely on UTC time
for the final deadline, this is to be consistent with
all our internal functions.
Reduce the default read timeout to 15 seconds
in lieu with a newly discovered issue
- https://github.com/minio/minio/issues/4139
Additionally also change the Read() conn wrapper
to set deadline only upon successful Reads().
Current log prints in this form
```
ERRO[8150] Lock maintenance failed to remove entry for write
lock (should never happen)%!!(MISSING)(EXTRA ....
```
Fix this by using proper formatting directive.
Duration for which a lock was held can be computed from the `Since`
field of `OpsLockState`. It is the difference between current time and
time at which the namespace lock was held. This change avoids
superfluous instrumentation.
Previous value was set to avoid large cache value build
up but we can clearly see this can cause lots of GC
pauses which can lead to significant drop in performance.
Change this value to 50% and decrease the value to 25%
once the 75% cache size is used. To have a larger
window for GC pauses.
Another change is to only allow caching if a server has
more than 24GB of RAM instead of 8GB.
Such that in a situation where all errors were
ignored we need to reduce the errors using
readQuorum to get a consistent error value.
Without this change errors generated will
never be consistent with for an expected scenario.
For example in a 6 disk setup 1 disk is missing
and 5 do not have the volume (testbucket)
Without this change Stat() would result in different
errors depending on which disk died. Can cause
confusion to S3 client application.
This change addresses need to track type of
errors we ignored and bring readQuorum to
choose the maximally occuring as the value
of truth.
getBucketInfo() should keep track errors ignored,
such that in a situation where all errors were
ignored we need to reduce the errors using readQuorum
to get a consistent error value.
This is the problem we see with DiskNotFound test
disks are randomly removed.
Fixes#4095
- Due to usage of amazon SDK, spark expects md5sum of empty string to be
returned when it does PUT on a directory.
- The fix returns md5sum of a empty string for the above mentioned case.
- This fixes the issue of Apache Spark not being able to write into Minio.
Ignore any network errors when registering a webhook
notifier during Minio startup sequence. This way server
can be started even if the webhook endpoint is not available
and unreachable.
This is to comply with S3 behavior, we previously removed
reading `fs.json` for optimization reasons but we have a
reason to believe that providing ETag and using gjson
provides needed benefit of not having to deal with
unmarshalling overhead of golang stdlib.
Fixes#4028
Values of canonicalized query resources should be unescaped before calculating
the signature. This bug is not noticed before because partNumber and uploadID
values in Minio doesn't have characters that need to be escaped.
Separate out validating v/s parsing logic in
isValidLocationConstraint() into parseLocationConstraint()
and isValidLocation()
Additionally also set `X-Amz-Bucket-Region` as part of the
common headers for the clients to fallback on in-case of any
region related errors.
Healing of buckets, objects and incomplete uploads are implemented and
available via admin REST APIs. Additionally, it is available via mc admin
sub-command. The warning is no longer relevant.
Fixes#4030
`disksUnavailable` healStatus constant indicates that a given object
needs healing but one or more of disks requiring heal are offline. This
can be used by admin heal API consumers to distinguish between a
successful heal and a no-op since the outdated disks were offline.
This change adds `access` format support for notifications to a
Elasticsearch server, and it refactors `namespace` format support.
In the case of `access` format, for each event in Minio, a JSON
document is inserted into Elasticsearch with its timestamp set to the
event's timestamp, and with the ID generated automatically by
elasticsearch. No events are modified or deleted in this mode.
In the case of `namespace` format, for each event in Minio, a JSON
document is keyed together by the bucket and object name is updated in
Elasticsearch. In the case of an object being created or over-written
in Minio, a new document or an existing document is inserted into the
Elasticsearch index. If an object is deleted in Minio, the
corresponding document is deleted from the Elasticsearch index.
Additionally, this change upgrades Elasticsearch support to the 5.x
series. This is a breaking change, and users of previous elasticsearch
versions should upgrade.
Also updates documentation on Elasticsearch notification target usage
and has a link to an elasticsearch upgrade guide.
This is the last patch that finally resolves#3928.
Do not rely on a specific cipher suite instead let the
go choose the type of cipher needed, if the connection
is coming from clients which do not support forward
secrecy let the go tls handle this automatically based
on tls1.2 specifications.
Fixes#4017
url.Parse() wrongly parses an address of format "address:port"
which is fixed in go1.8. This inculcates a breaking change
on our end. We should fix this wrong usage everywhere so that
migrating to go1.8 eventually becomes smoother.
Previously serverConfigV17 used a global lock that made any instance of
serverConfigV17 depended on single global serverConfigMu.
This patch fixes by having individual lock per instances.
This is an enhancement change to to cater support all
the data fields present on the object. Currently
we only send a subset of data which object info
provides us.
It also helps us keep a full namespace mirror on
notification targets for efficient query.
CopyObjectHandler() was incorrectly performing comparison
between destination and source object paths, which sometimes
leads to a lock race. This PR simplifies comparaison and add
one test case.
This change adds `access` format support for notifications to a Redis
server, and it refactors `namespace` format support.
In the case of `access` format, a list is used to store Minio
operations in Redis. Each entry in the list is a JSON encoded list of
two items - the first is the Minio server timestamp of the event, and
the second is an object describing the operation that created/replaced
the object in the server.
In the case of `namespace` format, a hash is used. Entries in the hash
may be updated or removed if objects in Minio are updated or deleted
respectively. The field values in the Redis hash are JSON encoded.
Also updates documentation on Redis notification target usage.
Towards resolving #3928
The following form of arguments such as
```
minio.exe -C some_dir server dir
```
has stopped working because of lack of handling of
absolute paths for config directory. Always calculate
absolute path for any relative paths on any operating
system.
The following fix converts all config directory relative
paths into absolute paths.
Fixes#3991
We can't use Content-Encoding to verify if `aws-chunked` is set
or not. Just use 'streaming' signature header instead.
While this is considered mandatory, on the contrary aws-sdk-java
doesn't set this value
http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html
```
Set the value to aws-chunked.
```
We will relax it and behave appropriately. Also this PR supports
saving custom encoding after trimming off the `aws-chunked`
parameter.
Fixes#3983
* Add configuration parameter "format" for db targets and perform
configuration migration.
* Add PostgreSQL `access` format: This causes Minio to append all events
to the configured table. Prefix, suffix and event filters continue
to be supported for this mode too.
* Update documentation for PostgreSQL notification target.
* Add MySQL `access` format: It is very similar to the same format for
PostgreSQL.
* Update MySQL notification documentation.
Statically typed BrowserFlag prevents any arbitrary string value
usage. The wrapped bool marshals/unmarshals JSON according to the
typed value ie string value "on" represents boolean true and "off" as
boolean false.
This is to keep the portability and also avoid errors that
might occur using the functions written for URL resource name
Since query param values have different escaping requirements.
In the algorithm to check if an object requires healing, in addition to
checking if all disks have xl.json present we should check if all parts
of the object are present and have valid blake2b checksums.
Also fixed a minor compilation error in heal-objects-list.go.
This patch fixes below
* Previously fatalIf() never writes log other than first logging target.
* quiet flag is not honored to show progress messages other than startup messages.
* Removes console package usage for progress messages.
For listing of objects needing heal, we list all objects present on all
the disks and return the set union. We were incorrectly dropping objects
that weren't already seen in disks so far.
Sample directory layout of disks in a 4-disk setup:
`/tmp/1`, `/tmp/2`, `/tmp/3`, `/tmp/4` are directories used as disks here.
`test` is the bucket, `obj1` and obj2` are the objects.
```
/tmp/1/test
└── obj2
├── part.1
├── part.2
└── xl.json
/tmp/2/test
└── obj1
├── part.1
├── part.2
└── xl.json
/tmp/3/test
├── obj1
│ ├── part.1
│ ├── part.2
│ └── xl.json
└── obj2
├── part.1
├── part.2
└── xl.json
/tmp/4/test
[This is empty]
```
This change adds information like host, port and user-agent of the
client whose request triggered an event notification.
E.g, if someone uploads an object to a bucket using mc. If notifications
were configured on that bucket, the host, port and user-agent of mc
would be sent as part of event notification data.
Sample output:
```
"source": {
"host": "127.0.0.1",
"port": "55808",
"userAgent": "Minio (linux; amd64) minio-go/2.0.4 mc ..."
}
```
* Add a new function Save() which saves given configuration into given file.
* Simplify Load() function.
* Remove unused CheckVersion().
* CheckData() is a private function now.
* quick_test.go is part of quick package now.
* minio server uses top level quick.Load() and quick.Save() functions.
Previously, erasure backend's `listDirFactory` may return errors which
were explicitly ignored. With this change, it returns nil. Superfluous
checks at higher-layers for ignored errors are removed as well.
As a new configuration parameter is added, configuration version is
bumped up from 14 to 15.
The MySQL target's behaviour is identical to the PostgreSQL: rows are
deleted from the MySQL table on delete-object events, and are
created/updated on create/over-write events.
This API is meant for administrative tools like mc-admin to heal an
ongoing multipart upload on a Minio server. N B This set of admin
APIs apply only for Minio servers.
`github.com/minio/minio/pkg/madmin` provides a go SDK for this (and
other admin) operations. Specifically,
func HealUpload(bucket, object, uploadID string, dryRun bool) error
Sample admin API request:
POST
/?heal&bucket=mybucket&object=myobject&upload-id=myuploadID&dry-run
- Header(s): ["x-minio-operation"] = "upload"
Notes:
- bucket, object and upload-id are mandatory query parameters
- if dry-run is set, API returns success if all parameters passed are
valid.
checkURL() is a generic function to check if a passed address
is valid. This commit adds support for addresses like `m1`
and `172.16.3.1` which is needed in MySQL and NATS. This commit
also adds tests.
HEAD Object for FS and XL was returning invalid object name when
an object name has a trailing slash separator, this PR changes the
behavior and will always return 404 object not found, this guarantees
a better compatibility with S3 spec.
This change is cleanup of the postPolicyHandler code
primarily to address the flow and also converting
certain critical parts into self contained functions.
It was possible to upload a big file which overcomes the minimal
disk space limit in XL, PrepareFile was actually checking for disk
space but we weren't checking its returned error. This patch fixes
this behavior.
* fs: Rename tempObjPath variable in fsCreateFile()
* fs/posix: Factor checkDiskFree() function
* fs: Add disk free check in fsCreateFile()
* posix: Move free disk check to createFile()
* xl: Relax free disk check in POSIX initialization
* fs: checkDiskFree checks for space to store data
This improves the startup time significantly
for clusters which have lot of buckets.
Also fixes a bug where `.minio.sys` is created
on disks which do not have `format.json`
startOffset was re-assigned to '0' so it would end up
copying wrong content ignoring the requested startOffset.
This also fixes the corruption issue we observed while
using docker registry.
Fixes https://github.com/docker/distribution/issues/2205
Also fixes#3842 - incorrect routing.
The globalMaxObjectSize limit is instilled in S3 spec perhaps
due to certain limitations on S3 infrastructure. For minio we
don't have such limitations and we can stream a larger file
instead.
So we are going to bump this limit to 16GiB.
Fixes#3825
This function was returning BucketNotFound for all errors
which at least hides the fact that disks could be corrupted.
This commit fixes the behavior by returning all errors that,
are, by the way, Object API errors.
Add missing protection from deleting multiple objects
in parallel. Currently we are deleting objects without
proper locking through this API.
This can cause significant amount of races.
Ignore a disk which wasn't able to successfully perform an action to
avoid eventual perturbations when the disk comes back in the middle
of write change.
This removal comes to avoid some redundant requirements
which are adding more problems on a production setup.
Here are the list of checks for time as they happen
- Fresh connect (during server startup) - CORRECT
- A reconnect after network disconnect - CORRECT
- For each RPC call - INCORRECT.
Verifying time for each RPC aggravates a situation
where a RPC call is rejected in a sequence of events
due to enough load on a production setup. 3 second
might not be enough time window for the call to be
initiated and received by the server.
Currently we document as IP:PORT which doesn't provide
if someone can use HOSTNAME:PORT. This is a change
to clarify this by calling it as ADDRESS:PORT which
encompasses both a HOSTNAME and an IP.
Fixes#3799
This PR is for readability cleanup
- getOrderedDisks as shuffleDisks
- getOrderedPartsMetadata as shufflePartsMetadata
Distribution is now a second argument instead being the
primary input argument for brevity.
Also change the usage of type casted int64(0), instead
rely on direct type reference as `var variable int64` everywhere.
Existing objects before overwrites are renamed to
temp location in completeMultipart. We make sure
that we delete it even if subsequenty calls fail.
Additionally move verifying of parent dir is a
file earlier to fail the entire operation.
Ref #3784
Content-Encoding is set to "aws-chunked" which is an S3 specific
API value which is no meaning for an object. This is how S3
behaves as well for a streaming signature uploaded object.
Make sure to skip reserved bucket names in `ListBuckets()`
current code didn't skip this properly and also generalize
this behavior for both XL and FS.
This is an attempt cleanup code and keep the top level config
functions simpler and easy to understand where as move the
notifier related code and logger setter/getter methods as part
of their own struct.
Locks are now held properly not globally by configMutex, but
instead as private variables.
Final fix for #3700
Also changes the behavior of `secretKeyHash` which is
not necessary to be sent over the network, each node
has its own secretKeyHash to validate.
Fixes#3696
Partial(fix) #3700 (More changes needed with some code cleanup)
Currently the auth rpc client defaults to to a maximum
cap of 30seconds timeout. Make this to be configurable
by the caller of authRPCClient during initialization, if no
such config is provided then default to 30 seconds.
Ideally here if the interface is not found it would
fail the server, as it should be because without these
we can't even have a working server in the first place.
Just like how it fails in master invariably inside Go
net/http code path.
Fixes#3708
Network: total bytes of incoming and outgoing server's data
by taking advantage of our ConnMux Read/Write wrapping
HTTP: total number of different http verbs passed in http
requests and different status codes passed in http responses.
This is counted in a new http handler.
Resource strings and paths are case insensitive on windows
deployments but if user happens to use upper case instead of
lower case for certain configuration params like bucket
policies and bucket notification config. We might not honor
them which leads to a wrong behavior on windows.
This is windows only behavior, for all other platforms case
is still kept sensitive.
Avoid passing size = -1 to PutObject API by requiring content-length
header in POST request (as AWS S3 does) and in Upload web handler.
Post handler is modified to completely store multipart file to know
its size before sending it to PutObject().
Following is a sample list lock API request schematic,
/?lock&bucket=mybucket&prefix=myprefix&duration=holdDuration
x-minio-operation: list
The response would contain the list of locks held on mybucket matching
myprefix for a duration longer than holdDuration.
Current implementation didn't honor quorum properly and didn't
handle the errors generated properly. This patch addresses that
and also moves common code `cleanupMultipartUploads` into xl
specific private function.
Fixes#3665
On macOS, if a process already listens on 127.0.0.1:PORT, net.Listen() falls back
to IPv6 address ie minio will start listening on IPv6 address whereas another
(non-)minio process is listening on IPv4 of given port.
To avoid this error sutiation we check for port availability only for macOS.
Note: checkPortAvailability() tries to listen on given port and closes it.
It is possible to have a disconnected client in this tiny window of time.
Creds don't require secretKeyHash to be calculated
everytime, cache it instead and re-use.
This is an optimization for bcrypt.
Relevant results from the benchmark done locally, negative
value means improvement in this scenario.
```
benchmark old ns/op new ns/op delta
BenchmarkAuthenticateNode-4 160590992 80125647 -50.11%
BenchmarkAuthenticateWeb-4 160556692 80432144 -49.90%
benchmark old allocs new allocs delta
BenchmarkAuthenticateNode-4 87 75 -13.79%
BenchmarkAuthenticateWeb-4 87 75 -13.79%
benchmark old bytes new bytes delta
BenchmarkAuthenticateNode-4 15222 9785 -35.72%
BenchmarkAuthenticateWeb-4 15222 9785 -35.72%
```
An external test that runs cmd.Main() has a difficulty to set cmd arguments
and MINIO_{ACCESS,SECRET}_KEY values, this commit changes a little the current
behavior in a way that helps external tests.
Encode the path of the passed presigned url before calculating the signature. This fixes
presigning objects whose names contain characters that are found encoded in urls.
* Implement heal format REST API handler
* Implement admin peer rpc handler to re-initialize storage
* Implement HealFormat API in pkg/madmin
* Update pkg/madmin API.md to incl. HealFormat
* Added unit tests for ReInitDisks rpc handler and HealFormatHandler
For TLS peekProtocol do not assume the incoming request to be a TLS
connection perform a handshake() instead and validate.
Also add some security related defaults to `tls.Config`.
This restriction has lots of side affects, since
we do not have a mechanism to clear states like
this it is better not to keep them.
Network errors are common and can occur with
simple cable removal etc. Since we already have
a retry mechanism this error count and stateful
nature can bring problems on a long running
cluster.