This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.
This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.
Some design details and restrictions:
- Objects are distributed using consistent ordering
to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
requirement, you can start with multiple
such sets statically.
- Static sets set of disks and cannot be
changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
slower since List happens on all servers,
and is merged at this sets layer.
Fixes#5465Fixes#5464Fixes#5461Fixes#5460Fixes#5459Fixes#5458Fixes#5460Fixes#5488Fixes#5489Fixes#5497Fixes#5496
This is a generic minimum value. The current reason is to support
Azure blob storage accounts name whose length is less than 5. 3 is the
minimum length for Azure.
- Changes related to moving admin APIs
- admin APIs now have an endpoint under /minio/admin
- admin APIs are now versioned - a new API to server the version is
added at "GET /minio/admin/version" and all API operations have the
path prefix /minio/admin/v1/<operation>
- new service stop API added
- credentials change API is moved to /minio/admin/v1/config/credential
- credentials change API and configuration get/set API now require TLS
so that credentials are protected
- all API requests now receive JSON
- heal APIs are disabled as they will be changed substantially
- Heal API changes
Heal API is now provided at a single endpoint with the ability for a
client to start a heal sequence on all the data in the server, a
single bucket, or under a prefix within a bucket.
When a heal sequence is started, the server returns a unique token
that needs to be used for subsequent 'status' requests to fetch heal
results.
On each status request from the client, the server returns heal result
records that it has accumulated since the previous status request. The
server accumulates upto 1000 records and pauses healing further
objects until the client requests for status. If the client does not
request any further records for a long time, the server aborts the
heal sequence automatically.
A heal result record is returned for each entity healed on the server,
such as system metadata, object metadata, buckets and objects, and has
information about the before and after states on each disk.
A client may request to force restart a heal sequence - this causes
the running heal sequence to be aborted at the next safe spot and
starts a new heal sequence.
In current implementation we used as many dsync clients
as per number of endpoints(along with path) which is not
the expected implementation. The implementation of Dsync
was expected to be just for the endpoint Host alone such
that if you have 4 servers and each with 4 disks we need
to only have 4 dsync clients and 4 dsync servers. But
we currently had 8 clients, servers which in-fact is
unexpected and should be avoided.
This PR brings the implementation back to its original
intention. This issue was found #5160
This fix removes logrus package dependency and refactors the console
logging as the only logging mechanism by removing file logging support.
It rearranges the log message format and adds stack trace information
whenever trace information is not available in the error structure.
It also adds `--json` flag support for server logging.
When minio server is started with `--json` flag, all log messages are
displayed in json format, with no start-up and informational log
messages.
Fixes#5265#5220#5197
This PR allows 'minio update' to not only shows update banner
but also allows for in-place upgrades.
Updates are done safely by validating the downloaded
sha256 of the binary.
Fixes#4781
This PR handles following situations
- secure endpoints provided, server should fail to start
if TLS is not configured
- insecure endpoints provided, server starts ignoring
if TLS is configured or not.
Fixes#5251
The default timeout of 30secs is not enough for high latency
environments, change these values to use 15 minutes instead.
With 30secs I/O timeouts seem to be quite common, this leads
to pretty much most SDKs and clients reconnect. This in-turn
causes significant performance problems. On a low latency
interconnect this can be quite challenging to transfer large
amounts of data. Setting this value to 15minutes covers
pretty much all known cases.
This PR was tested with `wondershaper <NIC> 20000 20000` by
limiting the network bandwidth to 20Mbit/sec. Default timeout
caused a significant amount of I/O timeouts, leading to
constant retires from the client. This seems to be more common
with tools like rclone, restic which have high concurrency set
by default. Once the value was fixed to 15minutes i/o timeouts
stopped and client could steadily upload data to the server
even while saturating the network.
Fixes#4670
* Refactor HTTP server to address bugs
* Remove unnecessary goroutine to start multiple TCP listeners.
* HTTP server waits for shutdown to maximum of Server.ShutdownTimeout
than per serverShutdownPoll.
* Handles new connection errors properly.
* Handles read and write timeout properly.
* Handles error on start of HTTP server properly by exiting minio
process.
Fixes#4494#4476 & fixed review comments
This implementation is similar to AMQP notifications:
* Notifications are published on a single topic as a JSON feed
* Topic is configurable, as is the QoS. Uses the paho.mqtt.golang
library for the mqtt connection, and supports connections over tcp
and websockets, with optional secure tls support.
* Additionally the minio server configuration has been bumped up
so mqtt configuration can be added.
* Configuration migration code is added with tests.
MQTT is an ISO standard M2M/IoT messaging protocol and was
originally designed for applications for limited bandwidth
networks. Today it's use is growing in the IoT space.
This makes lock RPCs similar to other RPCs where requests to the local
server bypass the network. Requests to the local lock-subsystem may
bypass the network layer and directly access the locking
data-structures.
This incidentally fixes#4451.
Previously serverConfigV17 used a global lock that made any instance of
serverConfigV17 depended on single global serverConfigMu.
This patch fixes by having individual lock per instances.
The following form of arguments such as
```
minio.exe -C some_dir server dir
```
has stopped working because of lack of handling of
absolute paths for config directory. Always calculate
absolute path for any relative paths on any operating
system.
The following fix converts all config directory relative
paths into absolute paths.
Fixes#3991
Statically typed BrowserFlag prevents any arbitrary string value
usage. The wrapped bool marshals/unmarshals JSON according to the
typed value ie string value "on" represents boolean true and "off" as
boolean false.
This patch fixes below
* Previously fatalIf() never writes log other than first logging target.
* quiet flag is not honored to show progress messages other than startup messages.
* Removes console package usage for progress messages.
Currently we document as IP:PORT which doesn't provide
if someone can use HOSTNAME:PORT. This is a change
to clarify this by calling it as ADDRESS:PORT which
encompasses both a HOSTNAME and an IP.
Fixes#3799
Also changes the behavior of `secretKeyHash` which is
not necessary to be sent over the network, each node
has its own secretKeyHash to validate.
Fixes#3696
Partial(fix) #3700 (More changes needed with some code cleanup)
On macOS, if a process already listens on 127.0.0.1:PORT, net.Listen() falls back
to IPv6 address ie minio will start listening on IPv6 address whereas another
(non-)minio process is listening on IPv4 of given port.
To avoid this error sutiation we check for port availability only for macOS.
Note: checkPortAvailability() tries to listen on given port and closes it.
It is possible to have a disconnected client in this tiny window of time.
* Implement heal format REST API handler
* Implement admin peer rpc handler to re-initialize storage
* Implement HealFormat API in pkg/madmin
* Update pkg/madmin API.md to incl. HealFormat
* Added unit tests for ReInitDisks rpc handler and HealFormatHandler
This is a consolidation effort, avoiding usage
of naked strings in codebase. Whenever possible
use constants which can be repurposed elsewhere.
This also fixes `goconst ./...` reported issues.
`principalId` i.e user identity is kept as AccessKey in
accordance with S3 spec.
Additionally responseElements{} are added starting with
`x-amz-request-id` is a hexadecimal of the event time itself in nanosecs.
`x-minio-origin-server` - points to the server generating the event.
Fixes#3556
This is important in a distributed setup, where the server hosting the
first disk formats a fresh setup. Sorting ensures that all servers
arrive at the same 'first' server.
Note: This change doesn't protect against different disk arguments
with some disks being same across servers.
setGlobalsFromContext() is added to set global variables after parsing
command line arguments. Thus, global flags will be honored wherever
they are placed in minio command.
This is needed to validate if the `format.json` indeed exists
when a fresh node is brought online.
This wrapped implementation also connects to the remote node
by attempting a re-login. Subsequently after a successful
connect `format.json` is validated as well.
Fixes#3207
Ref #3229
After review with @abperiasamy we decided to remove all the unnecessary options
- MINIO_BROWSER (Implemented as a security feature but now deemed obsolete
since even if blocking access to MINIO_BROWSER, s3 API port is open)
- MINIO_CACHE_EXPIRY (Defaults to 72h)
- MINIO_MAXCONN (No one used this option and we don't test this)
- MINIO_ENABLE_FSMETA (Enable FSMETA all the time)
Remove --ignore-disks option - this option was implemented when XL layer
would initialize the backend disks and heal them automatically to disallow
XL accidentally using the root partition itself this option was introduced.
This behavior has been changed XL no longer automatically initializes
`format.json` a HEAL is controlled activity, so ignore-disks is not
useful anymore. This change also addresses the problems of our documentation
going forward and keeps things simple. This patch brings in reduction of
options and defaulting them to a valid known inputs. This patch also
serves as a guideline of limiting many ways to do the same thing.
- Adds an interface to update in-memory bucket metadata state called
BucketMetaState - this interface has functions to:
- update bucket notification configuration,
- bucket listener configuration,
- bucket policy configuration, and
- send bucket event
- This interface is implemented by `localBMS` a type for manipulating
local node in-memory bucket metadata, and by `remoteBMS` a type for
manipulating remote node in-memory bucket metadata.
- The remote node interface, makes an RPC call, but the local node
interface does not - it updates in-memory bucket state directly.
- Rename mkPeersFromEndpoints to makeS3Peers and refactored it.
- Use arrayslice instead of map in s3Peers struct
- `s3Peers.SendUpdate` now receives an arrayslice of peer indexes to
send the request to, with a special nil value slice indicating that
all peers should be sent the update.
- `s3Peers.SendUpdate` now returns an arrayslice of errors, representing
errors from peers when sending an update. The array positions
correspond to peer array s3Peers.peers
Improve globalS3Peers:
- Make isDistXL a global `globalIsDistXL` and remove from s3Peers
- Make globalS3Peers an array of (address, bucket-meta-state) pairs.
- Fix code and tests.
Default golang net.Listen only listens on the first IP when
host resolves to multiple IPs.
This change addresses a problem for example your ``/etc/hosts``
has entries as following
```
127.0.1.1 minio1
192.168.1.10 minio1
```
Trying to start minio as
```
minio server --address "minio1:9001" ~/Photos
```
Causes the minio server to be bound only to "127.0.1.1" which
is an incorrect behavior since we are generally interested in
`192.168.1.10` as well.
This patch addresses this issue if the hostname is resolvable
and gives back list of addresses associated with that hostname
we just bind on all of them as it is the expected behavior.
For command line arguments we are currently following
- <node-1>:/path ... <node-n>:/path
This patch changes this to
- http://<node-1>/path ... http://<node-n>/path
In a distributed setup that the server should not perform any operation
on the storage layer after it is exported via RPC. e.g, cleaning up of
temporary directories under .minio.sys/tmp may interfere with ongoing
PUT objects being served by the distributed setup.
* Implements a Peer RPC router that sends info to all Minio servers in the cluster.
* Bucket notifications are propagated to all nodes via this RPC router.
* Bucket listener configuration is persisted to separate object layer
file (`listener.json`) and peer RPCs are used to communicate changes
throughout the cluster.
* When events are generated, RPC calls to send them to other servers
where bucket listeners may be connected is implemented.
* Some bucket notification tests are now disabled as they cannot work in
the new design.
* Minor fix in `funcFromPC` to use `path.Join`
- Servers do not exit for invalid credentials instead they print and wait.
- Servers do not exit for version mismatch instead they print and wait.
- Servers do not exit for time differences between nodes they print and wait.
These messages based on our prep stage during XL
and prints more informative message regarding
drive information.
This change also does a much needed refactoring.
* Test code for controller-handler operations:
* Heal operations
* List operation
* Switch to "testing" lib, moving away from gocheck
* Minor refactors
* Remove extra call to initGracefulShutdown
* Remove dead code in mainControl:
Dead code found by the TestControlMain() test function that always
passes.
* Add tests for control-*-main.go
Initialization when disk was down the network disk
reported an incorrect error rather than errDiskNotFound.
This resulted in incorrect error handling during
prepInitStorage() stage.
Fixes#2577
This change initializes rpc servers associated with disks that are
local. It makes object layer initialization on demand, namely on the
first request to the object layer.
Also adds lock RPC service vendorized minio/dsync