For listing of objects needing heal, we list all objects present on all
the disks and return the set union. We were incorrectly dropping objects
that weren't already seen in disks so far.
Sample directory layout of disks in a 4-disk setup:
`/tmp/1`, `/tmp/2`, `/tmp/3`, `/tmp/4` are directories used as disks here.
`test` is the bucket, `obj1` and obj2` are the objects.
```
/tmp/1/test
└── obj2
├── part.1
├── part.2
└── xl.json
/tmp/2/test
└── obj1
├── part.1
├── part.2
└── xl.json
/tmp/3/test
├── obj1
│ ├── part.1
│ ├── part.2
│ └── xl.json
└── obj2
├── part.1
├── part.2
└── xl.json
/tmp/4/test
[This is empty]
```
This change adds information like host, port and user-agent of the
client whose request triggered an event notification.
E.g, if someone uploads an object to a bucket using mc. If notifications
were configured on that bucket, the host, port and user-agent of mc
would be sent as part of event notification data.
Sample output:
```
"source": {
"host": "127.0.0.1",
"port": "55808",
"userAgent": "Minio (linux; amd64) minio-go/2.0.4 mc ..."
}
```
* Add a new function Save() which saves given configuration into given file.
* Simplify Load() function.
* Remove unused CheckVersion().
* CheckData() is a private function now.
* quick_test.go is part of quick package now.
* minio server uses top level quick.Load() and quick.Save() functions.
Previously, erasure backend's `listDirFactory` may return errors which
were explicitly ignored. With this change, it returns nil. Superfluous
checks at higher-layers for ignored errors are removed as well.
As a new configuration parameter is added, configuration version is
bumped up from 14 to 15.
The MySQL target's behaviour is identical to the PostgreSQL: rows are
deleted from the MySQL table on delete-object events, and are
created/updated on create/over-write events.
This API is meant for administrative tools like mc-admin to heal an
ongoing multipart upload on a Minio server. N B This set of admin
APIs apply only for Minio servers.
`github.com/minio/minio/pkg/madmin` provides a go SDK for this (and
other admin) operations. Specifically,
func HealUpload(bucket, object, uploadID string, dryRun bool) error
Sample admin API request:
POST
/?heal&bucket=mybucket&object=myobject&upload-id=myuploadID&dry-run
- Header(s): ["x-minio-operation"] = "upload"
Notes:
- bucket, object and upload-id are mandatory query parameters
- if dry-run is set, API returns success if all parameters passed are
valid.
checkURL() is a generic function to check if a passed address
is valid. This commit adds support for addresses like `m1`
and `172.16.3.1` which is needed in MySQL and NATS. This commit
also adds tests.
HEAD Object for FS and XL was returning invalid object name when
an object name has a trailing slash separator, this PR changes the
behavior and will always return 404 object not found, this guarantees
a better compatibility with S3 spec.
This change is cleanup of the postPolicyHandler code
primarily to address the flow and also converting
certain critical parts into self contained functions.
It was possible to upload a big file which overcomes the minimal
disk space limit in XL, PrepareFile was actually checking for disk
space but we weren't checking its returned error. This patch fixes
this behavior.
* fs: Rename tempObjPath variable in fsCreateFile()
* fs/posix: Factor checkDiskFree() function
* fs: Add disk free check in fsCreateFile()
* posix: Move free disk check to createFile()
* xl: Relax free disk check in POSIX initialization
* fs: checkDiskFree checks for space to store data
This improves the startup time significantly
for clusters which have lot of buckets.
Also fixes a bug where `.minio.sys` is created
on disks which do not have `format.json`
startOffset was re-assigned to '0' so it would end up
copying wrong content ignoring the requested startOffset.
This also fixes the corruption issue we observed while
using docker registry.
Fixes https://github.com/docker/distribution/issues/2205
Also fixes#3842 - incorrect routing.
The globalMaxObjectSize limit is instilled in S3 spec perhaps
due to certain limitations on S3 infrastructure. For minio we
don't have such limitations and we can stream a larger file
instead.
So we are going to bump this limit to 16GiB.
Fixes#3825
This function was returning BucketNotFound for all errors
which at least hides the fact that disks could be corrupted.
This commit fixes the behavior by returning all errors that,
are, by the way, Object API errors.
Add missing protection from deleting multiple objects
in parallel. Currently we are deleting objects without
proper locking through this API.
This can cause significant amount of races.
Ignore a disk which wasn't able to successfully perform an action to
avoid eventual perturbations when the disk comes back in the middle
of write change.
This removal comes to avoid some redundant requirements
which are adding more problems on a production setup.
Here are the list of checks for time as they happen
- Fresh connect (during server startup) - CORRECT
- A reconnect after network disconnect - CORRECT
- For each RPC call - INCORRECT.
Verifying time for each RPC aggravates a situation
where a RPC call is rejected in a sequence of events
due to enough load on a production setup. 3 second
might not be enough time window for the call to be
initiated and received by the server.
Currently we document as IP:PORT which doesn't provide
if someone can use HOSTNAME:PORT. This is a change
to clarify this by calling it as ADDRESS:PORT which
encompasses both a HOSTNAME and an IP.
Fixes#3799
This PR is for readability cleanup
- getOrderedDisks as shuffleDisks
- getOrderedPartsMetadata as shufflePartsMetadata
Distribution is now a second argument instead being the
primary input argument for brevity.
Also change the usage of type casted int64(0), instead
rely on direct type reference as `var variable int64` everywhere.
Existing objects before overwrites are renamed to
temp location in completeMultipart. We make sure
that we delete it even if subsequenty calls fail.
Additionally move verifying of parent dir is a
file earlier to fail the entire operation.
Ref #3784
Content-Encoding is set to "aws-chunked" which is an S3 specific
API value which is no meaning for an object. This is how S3
behaves as well for a streaming signature uploaded object.
Make sure to skip reserved bucket names in `ListBuckets()`
current code didn't skip this properly and also generalize
this behavior for both XL and FS.
This is an attempt cleanup code and keep the top level config
functions simpler and easy to understand where as move the
notifier related code and logger setter/getter methods as part
of their own struct.
Locks are now held properly not globally by configMutex, but
instead as private variables.
Final fix for #3700
Also changes the behavior of `secretKeyHash` which is
not necessary to be sent over the network, each node
has its own secretKeyHash to validate.
Fixes#3696
Partial(fix) #3700 (More changes needed with some code cleanup)
Currently the auth rpc client defaults to to a maximum
cap of 30seconds timeout. Make this to be configurable
by the caller of authRPCClient during initialization, if no
such config is provided then default to 30 seconds.
Ideally here if the interface is not found it would
fail the server, as it should be because without these
we can't even have a working server in the first place.
Just like how it fails in master invariably inside Go
net/http code path.
Fixes#3708
Network: total bytes of incoming and outgoing server's data
by taking advantage of our ConnMux Read/Write wrapping
HTTP: total number of different http verbs passed in http
requests and different status codes passed in http responses.
This is counted in a new http handler.
Resource strings and paths are case insensitive on windows
deployments but if user happens to use upper case instead of
lower case for certain configuration params like bucket
policies and bucket notification config. We might not honor
them which leads to a wrong behavior on windows.
This is windows only behavior, for all other platforms case
is still kept sensitive.
Avoid passing size = -1 to PutObject API by requiring content-length
header in POST request (as AWS S3 does) and in Upload web handler.
Post handler is modified to completely store multipart file to know
its size before sending it to PutObject().
Following is a sample list lock API request schematic,
/?lock&bucket=mybucket&prefix=myprefix&duration=holdDuration
x-minio-operation: list
The response would contain the list of locks held on mybucket matching
myprefix for a duration longer than holdDuration.
Current implementation didn't honor quorum properly and didn't
handle the errors generated properly. This patch addresses that
and also moves common code `cleanupMultipartUploads` into xl
specific private function.
Fixes#3665
On macOS, if a process already listens on 127.0.0.1:PORT, net.Listen() falls back
to IPv6 address ie minio will start listening on IPv6 address whereas another
(non-)minio process is listening on IPv4 of given port.
To avoid this error sutiation we check for port availability only for macOS.
Note: checkPortAvailability() tries to listen on given port and closes it.
It is possible to have a disconnected client in this tiny window of time.
Creds don't require secretKeyHash to be calculated
everytime, cache it instead and re-use.
This is an optimization for bcrypt.
Relevant results from the benchmark done locally, negative
value means improvement in this scenario.
```
benchmark old ns/op new ns/op delta
BenchmarkAuthenticateNode-4 160590992 80125647 -50.11%
BenchmarkAuthenticateWeb-4 160556692 80432144 -49.90%
benchmark old allocs new allocs delta
BenchmarkAuthenticateNode-4 87 75 -13.79%
BenchmarkAuthenticateWeb-4 87 75 -13.79%
benchmark old bytes new bytes delta
BenchmarkAuthenticateNode-4 15222 9785 -35.72%
BenchmarkAuthenticateWeb-4 15222 9785 -35.72%
```
An external test that runs cmd.Main() has a difficulty to set cmd arguments
and MINIO_{ACCESS,SECRET}_KEY values, this commit changes a little the current
behavior in a way that helps external tests.
Encode the path of the passed presigned url before calculating the signature. This fixes
presigning objects whose names contain characters that are found encoded in urls.
* Implement heal format REST API handler
* Implement admin peer rpc handler to re-initialize storage
* Implement HealFormat API in pkg/madmin
* Update pkg/madmin API.md to incl. HealFormat
* Added unit tests for ReInitDisks rpc handler and HealFormatHandler
For TLS peekProtocol do not assume the incoming request to be a TLS
connection perform a handshake() instead and validate.
Also add some security related defaults to `tls.Config`.
This restriction has lots of side affects, since
we do not have a mechanism to clear states like
this it is better not to keep them.
Network errors are common and can occur with
simple cable removal etc. Since we already have
a retry mechanism this error count and stateful
nature can bring problems on a long running
cluster.
This is a consolidation effort, avoiding usage
of naked strings in codebase. Whenever possible
use constants which can be repurposed elsewhere.
This also fixes `goconst ./...` reported issues.
`principalId` i.e user identity is kept as AccessKey in
accordance with S3 spec.
Additionally responseElements{} are added starting with
`x-amz-request-id` is a hexadecimal of the event time itself in nanosecs.
`x-minio-origin-server` - points to the server generating the event.
Fixes#3556
URL paths can be empty and not have preceding separator,
we do not yet know the conditions this can happen inside
Go http server.
This patch is to ensure that we do not crash ourselves
under conditions where r.URL.Path may be empty.
Fixes#3553
A client sends escaped characters in values of some query parameters in a presign url.
This commit properly unescapes queires to fix signature calculation.
Golang HTTP client automatically detects content-type but
for S3 clients this content-type might be incorrect or
might misbehave.
For example:
```
Content-Type: text/xml; charset=utf-8
```
Should be
```
Content-Type: application/xml
```
Allow this to be set properly.
* Filter lock info based on bucket, prefix and time since lock was held
* Implement list and clear locks REST API
* madmin: Add list and clear locks API
* locks: Clear locks matching bucket, prefix, relTime.
* Gather lock information across nodes for both list and clear locks admin REST API.
* docs: Add lock API to management APIs
* Rename GenericArgs to AuthRPCArgs
* Rename GenericReply to AuthRPCReply
* Remove authConfig.loginMethod and add authConfig.ServiceName
* Rename loginServer to AuthRPCServer
* Rename RPCLoginArgs to LoginRPCArgs
* Rename RPCLoginReply to LoginRPCReply
* Version and RequestTime are added to LoginRPCArgs and verified by
server side, not client side.
* Fix data race in lockMaintainence loop.
This patch uses a technique where in a retryable storage
before object layer initialization has a higher delay
and waits for longer period upto 4 times with time unit
of seconds.
And uses another set of configuration after the disks
have been formatted, i.e use a lower retry backoff rate
and retrying only once per 5 millisecond.
Network IO error count is reduced to a lower value i.e 256
before we reject the disk completely. This is done so that
combination of retry logic and total error count roughly
come to around 2.5secs which is when we basically take the
disk offline completely.
NOTE: This patch doesn't fix the issue of what if the disk
is completely dead and comes back again after the initialization.
Such a mutating state requires a change in our startup sequence
which will be done subsequently. This is an interim fix to alleviate
users from these issues.
Implement a storage rpc specific rpc client,
which does not reconnect unnecessarily.
Instead reconnect is handled at a different
layer for storage alone.
Rest of the calls using AuthRPC automatically
reconnect, i.e upon an error equal to `rpc.ErrShutdown`
they dial again and call the requested method again.
Attempt a reconnect also if disk not found.
This is needed since any network operation error
is converted to disk not found but we also need
to make sure if disk is really not available.
Additionally we also need to retry more than
once because the server might be in startup
sequence which would render other servers to
wrongly think that the server is offline.
This is written so that to simplify our handler code
and provide a way to only update metadata instead of
the data when source and destination in CopyObject
request are same.
Fixes#3316
- Add a lockStat type to group counters
- Remove unnecessary helper functions
- Fix stats computation on force unlock
- Removed unnecessary checks and cleaned up comments
This is to utilize an optimized version of
sha256 checksum which @fwessels implemented.
blake2b lacks such optimizations on ARM platform,
this can provide us significant boost in performance.
blake2b on ARM64 as expected would be slower.
```
BenchmarkSize1K-4 30000 44015 ns/op 23.26 MB/s
BenchmarkSize8K-4 5000 335448 ns/op 24.42 MB/s
BenchmarkSize32K-4 1000 1333960 ns/op 24.56 MB/s
BenchmarkSize128K-4 300 5328286 ns/op 24.60 MB/s
```
sha256 on ARM64 is faster by orders of magnitude giving close to
AVX performance of blake2b.
```
BenchmarkHash8Bytes-4 1000000 1446 ns/op 5.53 MB/s
BenchmarkHash1K-4 500000 3229 ns/op 317.12 MB/s
BenchmarkHash8K-4 100000 14430 ns/op 567.69 MB/s
BenchmarkHash1M-4 1000 1640126 ns/op 639.33 MB/s
```
ObjectLayer GetObject() now returns the entire object
if starting offset is 0 and length is negative. This
also allows to simplify handler layer code where
we always had to use GetObjectInfo() before proceeding
to read bucket metadata files examples `policy.json`.
This also reduces one additional call overhead.
success_action_redirect in the sent Form means that the server needs to return 303 in addition to a well specific redirection url, this commit adds this feature
This is important in a distributed setup, where the server hosting the
first disk formats a fresh setup. Sorting ensures that all servers
arrive at the same 'first' server.
Note: This change doesn't protect against different disk arguments
with some disks being same across servers.
Previously, more than one goroutine calls RPCClient.dial(), each
goroutine gets a new rpc.Client but only one such client is stored
into RPCClient object. This leads to leaky connection at the server
side. This is fixed by taking lock at top of dial() and release on
return.