Network: total bytes of incoming and outgoing server's data
by taking advantage of our ConnMux Read/Write wrapping
HTTP: total number of different http verbs passed in http
requests and different status codes passed in http responses.
This is counted in a new http handler.
Avoid passing size = -1 to PutObject API by requiring content-length
header in POST request (as AWS S3 does) and in Upload web handler.
Post handler is modified to completely store multipart file to know
its size before sending it to PutObject().
Creds don't require secretKeyHash to be calculated
everytime, cache it instead and re-use.
This is an optimization for bcrypt.
Relevant results from the benchmark done locally, negative
value means improvement in this scenario.
```
benchmark old ns/op new ns/op delta
BenchmarkAuthenticateNode-4 160590992 80125647 -50.11%
BenchmarkAuthenticateWeb-4 160556692 80432144 -49.90%
benchmark old allocs new allocs delta
BenchmarkAuthenticateNode-4 87 75 -13.79%
BenchmarkAuthenticateWeb-4 87 75 -13.79%
benchmark old bytes new bytes delta
BenchmarkAuthenticateNode-4 15222 9785 -35.72%
BenchmarkAuthenticateWeb-4 15222 9785 -35.72%
```
An external test that runs cmd.Main() has a difficulty to set cmd arguments
and MINIO_{ACCESS,SECRET}_KEY values, this commit changes a little the current
behavior in a way that helps external tests.
* Implement heal format REST API handler
* Implement admin peer rpc handler to re-initialize storage
* Implement HealFormat API in pkg/madmin
* Update pkg/madmin API.md to incl. HealFormat
* Added unit tests for ReInitDisks rpc handler and HealFormatHandler
This is a consolidation effort, avoiding usage
of naked strings in codebase. Whenever possible
use constants which can be repurposed elsewhere.
This also fixes `goconst ./...` reported issues.
`principalId` i.e user identity is kept as AccessKey in
accordance with S3 spec.
Additionally responseElements{} are added starting with
`x-amz-request-id` is a hexadecimal of the event time itself in nanosecs.
`x-minio-origin-server` - points to the server generating the event.
Fixes#3556
This patch uses a technique where in a retryable storage
before object layer initialization has a higher delay
and waits for longer period upto 4 times with time unit
of seconds.
And uses another set of configuration after the disks
have been formatted, i.e use a lower retry backoff rate
and retrying only once per 5 millisecond.
Network IO error count is reduced to a lower value i.e 256
before we reject the disk completely. This is done so that
combination of retry logic and total error count roughly
come to around 2.5secs which is when we basically take the
disk offline completely.
NOTE: This patch doesn't fix the issue of what if the disk
is completely dead and comes back again after the initialization.
Such a mutating state requires a change in our startup sequence
which will be done subsequently. This is an interim fix to alleviate
users from these issues.
Implement a storage rpc specific rpc client,
which does not reconnect unnecessarily.
Instead reconnect is handled at a different
layer for storage alone.
Rest of the calls using AuthRPC automatically
reconnect, i.e upon an error equal to `rpc.ErrShutdown`
they dial again and call the requested method again.
Attempt a reconnect also if disk not found.
This is needed since any network operation error
is converted to disk not found but we also need
to make sure if disk is really not available.
Additionally we also need to retry more than
once because the server might be in startup
sequence which would render other servers to
wrongly think that the server is offline.
This change brings in changes at multiple places
- Reuse buffers at almost all locations ranging
from rpc, fs, xl, checksum etc.
- Change caching behavior to disable itself
under low memory conditions i.e < 8GB of RAM.
- Only objects cached are of size 1/10th the size
of the cache for example if 4GB is the cache size
the maximum object size which will be cached
is going to be 400MB. This change is an
optimization to cache more objects rather
than few larger objects.
- If object cache is enabled default GC
percent has been reduced to 20% in lieu
with newly found behavior of GC. If the cache
utilization reaches 75% of the maximum value
GC percent is reduced to 10% to make GC
more aggressive.
- Do not use *bytes.Buffer* due to its growth
requirements. For every allocation *bytes.Buffer*
allocates an additional buffer for its internal
purposes. This is undesirable for us, so
implemented a new cappedWriter which is capped to a
desired size, beyond this all writes rejected.
Possible fix for #3403.
setGlobalsFromContext() is added to set global variables after parsing
command line arguments. Thus, global flags will be honored wherever
they are placed in minio command.
Ref #3229
After review with @abperiasamy we decided to remove all the unnecessary options
- MINIO_BROWSER (Implemented as a security feature but now deemed obsolete
since even if blocking access to MINIO_BROWSER, s3 API port is open)
- MINIO_CACHE_EXPIRY (Defaults to 72h)
- MINIO_MAXCONN (No one used this option and we don't test this)
- MINIO_ENABLE_FSMETA (Enable FSMETA all the time)
Remove --ignore-disks option - this option was implemented when XL layer
would initialize the backend disks and heal them automatically to disallow
XL accidentally using the root partition itself this option was introduced.
This behavior has been changed XL no longer automatically initializes
`format.json` a HEAL is controlled activity, so ignore-disks is not
useful anymore. This change also addresses the problems of our documentation
going forward and keeps things simple. This patch brings in reduction of
options and defaulting them to a valid known inputs. This patch also
serves as a guideline of limiting many ways to do the same thing.
- Adds an interface to update in-memory bucket metadata state called
BucketMetaState - this interface has functions to:
- update bucket notification configuration,
- bucket listener configuration,
- bucket policy configuration, and
- send bucket event
- This interface is implemented by `localBMS` a type for manipulating
local node in-memory bucket metadata, and by `remoteBMS` a type for
manipulating remote node in-memory bucket metadata.
- The remote node interface, makes an RPC call, but the local node
interface does not - it updates in-memory bucket state directly.
- Rename mkPeersFromEndpoints to makeS3Peers and refactored it.
- Use arrayslice instead of map in s3Peers struct
- `s3Peers.SendUpdate` now receives an arrayslice of peer indexes to
send the request to, with a special nil value slice indicating that
all peers should be sent the update.
- `s3Peers.SendUpdate` now returns an arrayslice of errors, representing
errors from peers when sending an update. The array positions
correspond to peer array s3Peers.peers
Improve globalS3Peers:
- Make isDistXL a global `globalIsDistXL` and remove from s3Peers
- Make globalS3Peers an array of (address, bucket-meta-state) pairs.
- Fix code and tests.
For command line arguments we are currently following
- <node-1>:/path ... <node-n>:/path
This patch changes this to
- http://<node-1>/path ... http://<node-n>/path
* Implements a Peer RPC router that sends info to all Minio servers in the cluster.
* Bucket notifications are propagated to all nodes via this RPC router.
* Bucket listener configuration is persisted to separate object layer
file (`listener.json`) and peer RPCs are used to communicate changes
throughout the cluster.
* When events are generated, RPC calls to send them to other servers
where bucket listeners may be connected is implemented.
* Some bucket notification tests are now disabled as they cannot work in
the new design.
* Minor fix in `funcFromPC` to use `path.Join`
- Servers do not exit for invalid credentials instead they print and wait.
- Servers do not exit for version mismatch instead they print and wait.
- Servers do not exit for time differences between nodes they print and wait.
These messages based on our prep stage during XL
and prints more informative message regarding
drive information.
This change also does a much needed refactoring.