Commit Graph

391 Commits

Author SHA1 Message Date
Harshavardhana bd2131ba34
add DNS cache support to avoid DNS flooding (#10693)
Go stdlib resolver doesn't support caching DNS
resolutions, since we compile with CGO disabled
we are more probe to DNS flooding for all network
calls to resolve for DNS from the DNS server.

Under various containerized environments such as
VMWare this becomes a problem because there are
no DNS caches available and we may end up overloading
the kube-dns resolver under concurrent I/O.

To circumvent this issue implement a DNSCache resolver
which resolves DNS and caches them for around 10secs
with every 3sec invalidation attempted.
2020-10-16 14:49:05 -07:00
Harshavardhana ad726b49b4
rename zones to serverSets to avoid terminology conflict (#10679)
we are bringing in availability zones, we should avoid
zones as per server expansion concept.
2020-10-15 14:28:50 -07:00
Harshavardhana 2042d4873c
rename crawler config option to heal (#10678) 2020-10-14 13:51:51 -07:00
Harshavardhana 2760fc86af
Bump default idleConnsPerHost to control conns in time_wait (#10653)
This PR fixes a hang which occurs quite commonly at higher concurrency
by allowing following changes

- allowing lower connections in time_wait allows faster socket open's
- lower idle connection timeout to ensure that we let kernel
  reclaim the time_wait connections quickly
- increase somaxconn to 4096 instead of 2048 to allow larger tcp
  syn backlogs.

fixes #10413
2020-10-12 14:19:46 -07:00
Ritesh H Shukla c2f16ee846
Add basic bandwidth monitoring for replication. (#10501)
This change tracks bandwidth for a bucket and object

- [x] Add Admin API
- [x] Add Peer API
- [x] Add BW throttling
- [x] Admin APIs to set replication limit
- [x] Admin APIs for fetch bandwidth
2020-10-09 20:36:00 -07:00
Harshavardhana a0d0645128
remove safeMode behavior in startup (#10645)
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.

allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.

In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.
2020-10-09 09:59:52 -07:00
Harshavardhana 2b4eb87d77
pick disks which are common maximally used (#10600)
further optimization to ensure that good disks
are always used for listing, other than healing
we only use disks that are maximally used.
2020-09-29 22:54:02 -07:00
Harshavardhana 66174692a2
add '.healing.bin' for tracking currently healing disk (#10573)
add a hint on the disk to allow for tracking fresh disk
being healed, to allow for restartable heals, and also
use this as a way to track and remove disks.

There are more pending changes where we should move
all the disk formatting logic to backend drives, this
PR doesn't deal with this refactor instead makes it
easier to track healing in the future.
2020-09-28 19:39:32 -07:00
Harshavardhana bebcf4f004 unlock() only if locking was successful 2020-09-25 19:36:47 -07:00
Harshavardhana ca989eb0b3
avoid ListBuckets returning quorum errors when node is down (#10555)
Also, revamp the way ListBuckets work make few portions
of the healing logic parallel

- walk objects for healing disks in parallel
- collect the list of buckets in parallel across drives
- provide consistent view for listBuckets()
2020-09-24 09:53:38 -07:00
Harshavardhana 1cf322b7d4
change leader locker only for crawler (#10509) 2020-09-18 11:15:54 -07:00
Klaus Post c851e022b7
Tweaks to dynamic locks (#10508)
* Fix cases where minimum timeout > default timeout.
* Add defensive code for too small/negative timeouts.
* Never set timeout below the maximum value of a request.
* Protect against (unlikely) int64 wraps.
* Decrease timeout slower.
* Don't re-lock before copying.
2020-09-18 09:18:18 -07:00
Harshavardhana d616d8a857
serialize replication and feed it through task model (#10500)
this allows for eventually controlling the concurrency
of replication and overally control of throughput
2020-09-16 16:04:55 -07:00
Anis Elleuch 8ea55f9dba
obd: Add console log to OBD output (#10372) 2020-09-15 18:02:54 -07:00
Harshavardhana 0104af6bcc
delayed locks until we have started reading the body (#10474)
This is to ensure that Go contexts work properly, after some
interesting experiments I found that Go net/http doesn't
cancel the context when Body is non-zero and hasn't been
read till EOF.

The following gist explains this, this can lead to pile up
of go-routines on the server which will never be canceled
and will die at a really later point in time, which can
simply overwhelm the server.

https://gist.github.com/harshavardhana/c51dcfd055780eaeb71db54f9c589150

To avoid this refactor the locking such that we take locks after we
have started reading from the body and only take locks when needed.

Also, remove contextReader as it's not useful, doesn't work as expected
context is not canceled until the body reaches EOF so there is no point
in wrapping it with context and putting a `select {` on it which
can unnecessarily increase the CPU overhead.

We will still use the context to cancel the lockers etc.
Additional simplification in the locker code to avoid timers
as re-using them is a complicated ordeal avoid them in
the hot path, since locking is very common this may avoid
lots of allocations.
2020-09-14 15:57:13 -07:00
Harshavardhana eb2934f0c1
simplify webhook DNS further generalize for gateway (#10448)
continuation of the changes from eaaf05a7cc
this further simplifies, enables this for gateway deployments as well
2020-09-10 14:19:32 -07:00
Nitish Tiwari eaaf05a7cc
Add Kubernetes operator webook server as DNS target (#10404)
This PR adds a DNS target that ensures to update an entry
into Kubernetes operator when a bucket is created or deleted.

See minio/operator#264 for details.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-09-09 12:20:49 -07:00
Harshavardhana 96997d2b21
allow ctrl+c to be consistent at early startup (#10435)
fixes #10431
2020-09-08 09:10:55 -07:00
Andreas Auernhammer fbd1c5f51a
certs: refactor cert manager to support multiple certificates (#10207)
This commit refactors the certificate management implementation
in the `certs` package such that multiple certificates can be
specified at the same time. Therefore, the following layout of
the `certs/` directory is expected:
```
certs/
 │
 ├─ public.crt
 ├─ private.key
 ├─ CAs/          // CAs directory is ignored
 │   │
 │    ...
 │
 ├─ example.com/
 │   │
 │   ├─ public.crt
 │   └─ private.key
 └─ foobar.org/
     │
     ├─ public.crt
     └─ private.key
   ...
```

However, directory names like `example.com` are just for human
readability/organization and don't have any meaning w.r.t whether
a particular certificate is served or not. This decision is made based
on the SNI sent by the client and the SAN of the certificate.

***

The `Manager` will pick a certificate based on the client trying
to establish a TLS connection. In particular, it looks at the client
hello (i.e. SNI) to determine which host the client tries to access.
If the manager can find a certificate that matches the SNI it
returns this certificate to the client.

However, the client may choose to not send an SNI or tries to access
a server directly via IP (`https://<ip>:<port>`). In this case, we
cannot use the SNI to determine which certificate to serve. However,
we also should not pick "the first" certificate that would be accepted
by the client (based on crypto. parameters - like a signature algorithm)
because it may be an internal certificate that contains internal hostnames. 
We would disclose internal infrastructure details doing so.

Therefore, the `Manager` returns the "default" certificate when the
client does not specify an SNI. The default certificate the top-level
`public.crt` - i.e. `certs/public.crt`.

This approach has some consequences:
 - It's the operator's responsibility to ensure that the top-level
   `public.crt` does not disclose any information (i.e. hostnames)
   that are not publicly visible. However, this was the case in the
   past already.
 - Any other `public.crt` - except for the top-level one - must not
   contain any IP SAN. The reason for this restriction is that the
   Manager cannot match a SNI to an IP b/c the SNI is the server host
   name. The entire purpose of SNI is to indicate which host the client
   tries to connect to when multiple hosts run on the same IP. So, a
   client will not set the SNI to an IP.
   If we would allow IP SANs in a lower-level `public.crt` a user would
   expect that it is possible to connect to MinIO directly via IP address
   and that the MinIO server would pick "the right" certificate. However,
   the MinIO server cannot determine which certificate to serve, and
   therefore always picks the "default" one. This may lead to all sorts
   of confusing errors like:
   "It works if I use `https:instance.minio.local` but not when I use
   `https://10.0.2.1`.

These consequences/limitations should be pointed out / explained in our
docs in an appropriate way. However, the support for multiple
certificates should not have any impact on how deployment with a single
certificate function today.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-09-03 23:33:37 -07:00
Klaus Post c097ce9c32
continous healing based on crawler (#10103)
Design: https://gist.github.com/klauspost/792fe25c315caf1dd15c8e79df124914
2020-08-24 13:47:01 -07:00
Harshavardhana 59352d0ac2
load all blocking metadata in background (#10298)
most of this metadata already has fallbacks
and there is no good reason to load them
in blocking fashion
2020-08-20 10:38:53 -07:00
Harshavardhana e57c742674
use single dynamic timeout for most locked API/heal ops (#10275)
newDynamicTimeout should be allocated once, in-case
of temporary locks in config and IAM we should
have allocated timeout once before the `for loop`

This PR doesn't fix any issue as such, but provides
enough dynamism for the timeout as per expectation.
2020-08-17 11:29:58 -07:00
Harshavardhana 83a82d818e
allow lock tolerance to match storage-class drive tolerance (#10270) 2020-08-14 18:17:14 -07:00
Harshavardhana 038d91feaa
fix: add public certs automatically as part of global CAs (#10256) 2020-08-13 09:46:50 -07:00
Harshavardhana 0dd3a08169
move the certPool loader function into pkg/certs (#10239) 2020-08-11 08:29:50 -07:00
Harshavardhana 2a9819aff8
fix: refactor background heal for cluster health (#10225) 2020-08-07 19:43:06 -07:00
Harshavardhana 77509ce391
Support looking up environment remotely (#10215)
adds a feature where we can fetch the MinIO
command-line remotely, this
is primarily meant to add some stateless
nature to the MinIO deployment in k8s
environments, MinIO operator would run a
webhook service endpoint
which can be used to fetch any environment
value in a generalized approach.
2020-08-06 18:03:16 -07:00
Harshavardhana a20d4568a2
fix: make sure to use uniform drive count calculation (#10208)
It is possible in situations when server was deployed
in asymmetric configuration in the past such as

```
minio server ~/fs{1...4}/disk{1...5}
```

Results in setDriveCount of 10 in older releases
but with fairly recent releases we have moved to
having server affinity which means that a set drive
count ascertained from above config will be now '4'

While the object layer make sure that we honor
`format.json` the storageClass configuration however
was by mistake was using the global value obtained
by heuristics. Which leads to prematurely using
lower parity without being requested by the an
administrator.

This PR fixes this behavior.
2020-08-05 13:31:12 -07:00
poornas a8dd7b3eda
Refactor replication target management. (#10154)
Generalize replication target management so
that remote targets for a bucket can be
managed with ARNs. `mc admin bucket remote`
command will be used to manage targets.
2020-07-30 19:55:22 -07:00
Harshavardhana fe157166ca
fix: Pass context all the way down to the network call in lockers (#10161)
Context timeout might race on each other when timeouts are lower
i.e when two lock attempts happened very quickly on the same resource
and the servers were yet trying to establish quorum.

This situation can lead to locks held which wouldn't be unlocked
and subsequent lock attempts would fail.

This would require a complete server restart. A potential of this
issue happening is when server is booting up and we are trying
to hold a 'transaction.lock' in quick bursts of timeout.
2020-07-29 23:15:34 -07:00
poornas c43da3005a
Add support for server side bucket replication (#9882) 2020-07-21 17:49:56 -07:00
Harshavardhana 11d21d5d1b
fix: pass around the correct drives per set (#10097)
this is a precursor change before adding parity
based SLA across zones instead of same stripe size
2020-07-20 16:38:40 -07:00
Klaus Post 00d3cc4b69
Enforce quota checks after crawl (#10036)
Enforce bucket quotas when crawling has finished. 
This ensures that we will not do quota enforcement on old data.

Additionally, delete less if we are closer to quota than we thought.
2020-07-14 18:59:05 -07:00
Harshavardhana 37c14207d6
fix: cors handling again for not just OPTIONS request (#10025)
CORS is notorious requires specific headers to be
handled appropriately in request and response,
using cors package as part of handlerFunc() for
options method lacks the necessary control this
package needs to add headers.
2020-07-12 10:56:57 -07:00
Harshavardhana 5c15656c55
support bootstrap client to use healthcheck restClient (#10004)
- reduce locker timeout for early transaction lock
  for more eagerness to timeout
- reduce leader lock timeout to range from 30sec to 1minute
- add additional log message during bootstrap phase
2020-07-10 09:26:21 -07:00
Anis Elleuch 2be20588bf
Reroute requests based token heal/listing (#9939)
When manual healing is triggered, one node in a cluster will 
become the authority to heal. mc regularly sends new requests 
to fetch the status of the ongoing healing process, but a load 
balancer could land the healing request to a node that is not 
doing the healing request.

This PR will redirect a request to the node based on the node 
index found described as part of the client token. A similar
technique is also used to proxy ListObjectsV2 requests
by encoding this information in continuation-token
2020-07-03 11:53:03 -07:00
Krishna Srinivas 4c266df863
fix: proxy ListObjects request to one of the server based on hash(bucket) (#9881) 2020-07-02 10:56:22 -07:00
Harshavardhana a38ce29137
fix: simplify background heal and trigger heal items early (#9928)
Bonus fix during versioning merge one of the PR was missing
the offline/online disk count fix from #9801 port it correctly
over to the master branch from release.

Additionally, add versionID support for MRF

Fixes #9910
Fixes #9931
2020-06-29 13:07:26 -07:00
Praveen raj Mani b1705599e1
Fix config leaks and deprecate file-based config setters in NAS gateway (#9884)
This PR has the following changes

- Removing duplicate lookupConfigs() calls.
- Deprecate admin config APIs for NAS gateways. This will avoid repeated reloads of the config from the disk.
- WatchConfigNASDisk will be removed
- Migration guide for NAS gateways users to migrate to ENV settings.

NOTE: THIS PR HAS A BREAKING CHANGE

Fixes #9875

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-06-25 15:59:28 +05:30
Harshavardhana 4915433bd2
Support bucket versioning (#9377)
- Implement a new xl.json 2.0.0 format to support,
  this moves the entire marshaling logic to POSIX
  layer, top layer always consumes a common FileInfo
  construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
  for object placements.

Fixes #2111
2020-06-12 20:04:01 -07:00
Klaus Post 43d6e3ae06
merge object lifecycle checks into usage crawler (#9579) 2020-06-12 10:28:21 -07:00
Harshavardhana 4790868878
allow background IAM load to speed up startup (#9796)
Also fix healthcheck handler to run success
only if object layer has initialized fully
for S3 API access call.
2020-06-09 19:19:03 -07:00
Harshavardhana febe9cc26a
fix: avoid timer leaks in dsync/lsync (#9781)
At a customer setup with lots of concurrent calls
it can be observed that in newRetryTimer there
were lots of tiny alloations which are not
relinquished upon retries, in this codepath
we were only interested in re-using the timer
and use it wisely for each locker.

```
(pprof) top
Showing nodes accounting for 8.68TB, 97.02% of 8.95TB total
Dropped 1198 nodes (cum <= 0.04TB)
Showing top 10 nodes out of 79
      flat  flat%   sum%        cum   cum%
    5.95TB 66.50% 66.50%     5.95TB 66.50%  time.NewTimer
    1.16TB 13.02% 79.51%     1.16TB 13.02%  github.com/ncw/directio.AlignedBlock
    0.67TB  7.53% 87.04%     0.70TB  7.78%  github.com/minio/minio/cmd.xlObjects.putObject
    0.21TB  2.36% 89.40%     0.21TB  2.36%  github.com/minio/minio/cmd.(*posix).Walk
    0.19TB  2.08% 91.49%     0.27TB  2.99%  os.statNolog
    0.14TB  1.59% 93.08%     0.14TB  1.60%  os.(*File).readdirnames
    0.10TB  1.09% 94.17%     0.11TB  1.25%  github.com/minio/minio/cmd.readDirN
    0.10TB  1.07% 95.23%     0.10TB  1.07%  syscall.ByteSliceFromString
    0.09TB  1.03% 96.27%     0.09TB  1.03%  strings.(*Builder).grow
    0.07TB  0.75% 97.02%     0.07TB  0.75%  path.(*lazybuf).append
```
2020-06-08 11:28:40 -07:00
Harshavardhana 5686a7e273
fix NAS gateway support for policy/notification (#9765)
Fixes #9764
2020-06-03 13:18:54 -07:00
Harshavardhana eba423bb9d
Disable crawler in FS/NAS gateway mode (#9695)
No one really uses FS for large scale accounting
usage, neither we crawl in NAS gateway mode. It is
worthwhile to simply disable this feature as its
not useful for anyone.

Bonus disable bucket quota ops as well in, FS
and gateway mode
2020-05-25 00:17:52 -07:00
Harshavardhana 7dbfea1353
avoid net/http ErrorLog for consistent logging experience (#9672)
net/http exposes ErrorLog but it is log.Logger
instance not an interface which can be overridden,
because of this reason the logging is interleaved
sometimes with TLS with messages like this on the
server

```
http: TLS handshake error from 139.178.70.188:63760: EOF
```

This is bit problematic for us as we need to have
consistent logging view for allow --json or --quiet
flags.

With this PR we ensure that this format is adhered to.
2020-05-22 21:59:18 -07:00
Harshavardhana 6656fa3066
simplify further bucket configuration properly (#9650)
This PR is a continuation from #9586, now the
entire parsing logic is fully merged into
bucket metadata sub-system, simplify the
quota API further by reducing the remove
quota handler implementation.
2020-05-20 10:18:15 -07:00
Harshavardhana bd032d13ff
migrate all bucket metadata into a single file (#9586)
this is a major overhaul by migrating off all
bucket metadata related configs into a single
object '.metadata.bin' this allows us for faster
bootups across 1000's of buckets and as well
as keeps the code simple enough for future
work and additions.

Additionally also fixes #9396, #9394
2020-05-19 13:53:54 -07:00
kannappanr a62572fb86
Check for address flags in all positions (#9615)
Fixes #9599
2020-05-17 08:46:23 -07:00
Harshavardhana a1de9cec58
cleanup object-lock/bucket tagging for gateways (#9548)
This PR is to ensure that we call the relevant object
layer APIs for necessary S3 API level functionalities
allowing gateway implementations to return proper
errors as NotImplemented{}

This allows for all our tests in mint to behave
appropriately and can be handled appropriately as
well.
2020-05-08 13:44:44 -07:00
Harshavardhana 2dc46cb153
Report correct error when O_DIRECT is not supported (#9545)
fixes #9537
2020-05-07 16:12:16 -07:00
Harshavardhana 4c9de098b0
heal buckets during init and make sure to wait on quorum (#9526)
heal buckets properly during expansion, and make sure
to wait for the quorum properly such that healing can
be retried.
2020-05-06 14:25:05 -07:00
Harshavardhana b768645fde
fix: unexpected logging with bucket metadata conversions (#9519) 2020-05-04 20:04:06 -07:00
Harshavardhana 9b3b04ecec
allow retries for bucket encryption/policy quorum reloads (#9513)
We should allow quorum errors to be send upwards
such that caller can retry while reading bucket
encryption/policy configs when server is starting
up, this allows distributed setups to load the
configuration properly.

Current code didn't facilitate this and would have
never loaded the actual configs during rolling,
server restarts.
2020-05-04 09:42:58 -07:00
poornas 9a547dcbfb
Add API's for managing bucket quota (#9379)
This PR allows setting a "hard" or "fifo" quota
restriction at the bucket level. Buckets that
have reached the FIFO quota configured, will
automatically be cleaned up in FIFO manner until
bucket usage drops to configured quota.
If a bucket is configured with a "hard" quota
ceiling, all further writes are disallowed.
2020-04-30 15:55:54 -07:00
Klaus Post 073aac3d92
add data update tracking using bloom filter (#9208)
By monitoring PUT/DELETE and heal operations it is possible
to track changed paths and keep a bloom filter for this data. 

This can help prioritize paths to scan. The bloom filter can identify
paths that have not changed, and the few collisions will only result
in a marginal extra workload. This can be implemented on either a
bucket+(1 prefix level) with reasonable performance.

The bloom filter is set to have a false positive rate at 1% at 1M 
entries. A bloom table of this size is about ~2500 bytes when serialized.

To not force a full scan of all paths that have changed cycle bloom
filters would need to be kept, so we guarantee that dirty paths have
been scanned within cycle runs. Until cycle bloom filters have been
collected all paths are considered dirty.
2020-04-27 10:06:21 -07:00
Harshavardhana f14bf25cb9
optimize Listen bucket notification implementation (#9444)
this commit avoids lots of tiny allocations, repeated
channel creates which are performed when filtering
the incoming events, unescaping a key just for matching.

also remove deprecated code which is not needed
anymore, avoids unexpected data structure transformations
from the map to slice.
2020-04-27 06:25:05 -07:00
Nitish Tiwari ebf3dda449
Update server startup example to showcase local erasure code (#9407) 2020-04-21 23:59:13 -07:00
Klaus Post f19cbfad5c
fix: use per test context (#9343)
Instead of GlobalContext use a local context for tests.
Most notably this allows stuff created to be shut down 
when tests using it is done. After PR #9345 9331 CI is 
often running out of memory/time.
2020-04-14 17:52:38 -07:00
Harshavardhana f44cfb2863
use GlobalContext whenever possible (#9280)
This change is throughout the codebase to
ensure that all codepaths honor GlobalContext
2020-04-09 09:30:02 -07:00
Harshavardhana e7276b7b9b
fix: make single locks for both IAM and object-store (#9279)
Additionally add context support for IAM sub-system
2020-04-07 14:26:39 -07:00
Krishna Srinivas 541a778d7b
fix: do not exit on bootstrap Verify() to allow for rolling upgrades (#9235) 2020-04-01 21:40:03 -07:00
Harshavardhana 6f992134a2
fix: startup load time by reusing storageDisks (#9210) 2020-03-27 14:48:30 -07:00
Sidhartha Mani 0c80bf45d0
Implement oboard diagnostics admin API (#9024)
- Implement a graph algorithm to test network bandwidth from every 
  node to every other node
- Saturate any network bandwidth adaptively, accounting for slow 
  and fast network capacity
- Implement parallel drive OBD tests
- Implement a paging mechanism for OBD test to provide periodic updates to client
- Implement Sys, Process, Host, Mem OBD Infos
2020-03-26 21:07:39 -07:00
Harshavardhana 6f6a2214fc
Add rate limiter for S3 API layer (#9196)
- total number of S3 API calls per server
- maximum wait duration for any S3 API call

This implementation is primarily meant for situations
where HDDs are not capable enough to handle the incoming
workload and there is no way to throttle the client.

This feature allows MinIO server to throttle itself
such that we do not overwhelm the HDDs.
2020-03-24 12:43:40 -07:00
Harshavardhana cfc9cfd84a
fix: various optimizations, idiomatic changes (#9179)
- acquire since leader lock for all background operations
  - healing, crawling and applying lifecycle policies.

- simplify lifecyle to avoid network calls, which was a
  bug in implementation - we should hold a leader and
  do everything from there, we have access to entire
  name space.

- make listing, walking not interfere by slowing itself
  down like the crawler.

- effectively use global context everywhere to ensure
  proper shutdown, in cache, lifecycle, healing

- don't read `format.json` for prometheus metrics in
  StorageInfo() call.
2020-03-22 12:16:36 -07:00
Klaus Post 8d98662633
re-implement data usage crawler to be more efficient (#9075)
Implementation overview: 

https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b
2020-03-18 16:19:29 -07:00
Anis Elleuch 7fdeb44372
info: Initialize boot time early so uptime will always be correct (#9154) 2020-03-17 16:37:28 -07:00
Harshavardhana dcd63b4146
fix: avoid double ListBuckets() loading object lock (#9031) 2020-02-24 06:39:11 +05:30
Nitish Tiwari 15e2ea2c96
Fix an issue where MinIO was logging every error twice (#8953)
The logging subsystem was initialized under init() method in
both gateway-main.go and server-main.go which are part of
same package. This created two logging targets and hence
errors were logged twice. This PR moves the init() method
to common-main.go
2020-02-07 13:48:07 +05:30
Krishnan Parthasarathi 026265f8f7
Add support for bucket encryption feature (#8890)
- pkg/bucket/encryption provides support for handling bucket 
  encryption configuration
- changes under cmd/ provide support for AES256 algorithm only

Co-Authored-By: Poorna  <poornas@users.noreply.github.com>
Co-authored-by: Harshavardhana <harsha@minio.io>
2020-02-05 15:12:34 +05:30
Harshavardhana d76160c245
Initialize only one retry timer for all sub-systems (#8913)
Also make sure that we create buckets on all zones
successfully, do not run quick heal buckets if not
running with expansion.
2020-02-02 06:37:43 +05:30
Klaus Post c7178d2066 Profiling: Add base, fix memory profiling (#8850)
For 'snapshot' type profiles, record a 'before' profile that can be used 
as `go tool pprof -base=before ...` to compare before and after.

"Before" profiles are included in the zipped package.

[`runtime.MemProfileRate`](https://golang.org/pkg/runtime/#pkg-variables) 
should not be updated while the application is running, so we set it at startup.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-01-21 15:49:25 -08:00
Harshavardhana 0879a4f743 rest/storage: Remove racy LastError usage (#8817)
instead perform a liveness check call to
verify if server is online and print relevant
errors.

Also introduce a StorageErr string error type
instead of errors.New() deprecate usage of
VerifyFileError, DeleteFileError for gob,
change in datastructure also requires bump in
storage REST version to v13.

Fixes #8811
2020-01-14 18:45:17 -08:00
Klaus Post 37b32199e3 Validate XL sets on format (#8779)
When formatting a set validate if a host failure will likely lead to data loss.

While we don't know what config will be set in the future 
evaluate to our best knowledge, assuming default settings.
2020-01-13 13:09:10 -08:00
Harshavardhana 5aa5dcdc6d
lock: improve locker initialization at init (#8776)
Use reference format to initialize lockers
during startup, also handle `nil` for NetLocker
in dsync and remove *errorLocker* implementation

Add further tuning parameters such as

 - DialTimeout is now 15 seconds from 30 seconds
 - KeepAliveTimeout is not 20 seconds, 5 seconds
   more than default 15 seconds
 - ResponseHeaderTimeout to 10 seconds
 - ExpectContinueTimeout is reduced to 3 seconds
 - DualStack is enabled by default remove setting
   it to `true`
 - Reduce IdleConnTimeout to 30 seconds from
   1 minute to avoid idleConn build up

Fixes #8773
2020-01-10 02:35:06 -08:00
George Xie 7f31d933a8 fixes some typos, for CREDITS change (#8743) 2020-01-03 17:49:01 -08:00
Harshavardhana 725172e13b
fix: Do not need safe-mode for unreachable targets upon restart (#8686) 2019-12-21 22:35:50 -08:00
Harshavardhana 9bb0869b73
fix: populate buckets on etcd after config has loaded (#8658) 2019-12-17 13:50:07 -08:00
Harshavardhana 5f2318567e
Allow metadata updates on meta bucket even in WORM mode (#8657)
This ensures that we can update the

- .minio.sys is updated for accounting/data usage purposes
- .minio.sys is updated to indicate if backend is encrypted
  or not.
2019-12-17 10:13:12 -08:00
Harshavardhana c8d82588c2 Fix crash in console logger and also handle bucket DNS updates (#8654)
Also fix listenBucketNotification bugs seen by minio-js
listen bucket notification API.
2019-12-16 20:30:57 -08:00
Harshavardhana 1dc5f2d0af
Remove safe mode for invalid entries in config (#8650)
The approach is that now safe mode is only invoked when
we cannot read the config or under some catastrophic
situations, but not under situations when config entries
are invalid or unreachable. This allows for maximum
availability for MinIO and not fail on our users unlike
most of our historical releases.
2019-12-14 17:27:57 -08:00
Anis Elleuch 555969ee42 Add data usage collect with its new admin API (#8553)
Admin data usage info API returns the following

(Only FS & XL, for now)

- Number of buckets
- Number of objects
- The total size of objects
- Objects histogram
- Bucket sizes
2019-12-12 06:02:37 -08:00
Harshavardhana c9940d8c3f Final changes to config sub-system (#8600)
- Introduces changes such as certain types of
  errors that can be ignored or which need to 
  go into safe mode.
- Update help text as per the review
2019-12-04 15:32:37 -08:00
Harshavardhana 2ab8d5e47f Enable build verification with race (#8583) 2019-12-02 15:54:26 -08:00
Harshavardhana 5d3d57c12a
Start using error wrapping with fmt.Errorf (#8588)
Use fatih/errwrap to fix all the code to use
error wrapping with fmt.Errorf()
2019-12-02 09:28:01 -08:00
Harshavardhana 5d65428b29
Handle localhost distributed setups properly (#8577)
Fixes an issue reported by @klauspost and @vadmeste

This PR also allows users to expand their clusters
from single node XL deployment to distributed mode.
2019-11-26 11:42:10 -08:00
Harshavardhana c3771df641
Add bootstrap REST handler for verifying server config (#8550) 2019-11-22 12:45:13 -08:00
Harshavardhana 4e9de58675 Avoid pointer based copy, instead use Clone() (#8547)
This PR adds functional test to test expanded
cluster syntax.
2019-11-21 17:54:51 +05:30
Harshavardhana 8392d2f510 Preserve same deploymentID on all zones (#8542) 2019-11-20 15:39:30 +05:30
Harshavardhana 347b29d059 Implement bucket expansion (#8509) 2019-11-19 17:42:27 -08:00
Harshavardhana 26a866a202
Fix review comments and new changes in config (#8515)
- Migrate and save only settings which are enabled
- Rename logger_http to logger_webhook and
  logger_http_audit to audit_webhook
- No more pretty printing comments, comment
  is a key=value pair now.
- Avoid quotes on values which do not have space in them
- `state="on"` is implicit for all SetConfigKV unless
  specified explicitly as `state="off"`
- Disabled IAM users should be disabled always
2019-11-13 17:38:05 -08:00
Harshavardhana e9b2bf00ad Support MinIO to be deployed on more than 32 nodes (#8492)
This PR implements locking from a global entity into
a more localized set level entity, allowing for locks
to be held only on the resources which are writing
to a collection of disks rather than a global level.

In this process this PR also removes the top-level
limit of 32 nodes to an unlimited number of nodes. This
is a precursor change before bring in bucket expansion.
2019-11-13 12:17:45 -08:00
Harshavardhana d97d53bddc
Honor etcd legacy v/s new config settings properly (#8510)
This PR also fixes issues related to

- Add proper newline for `mc admin config get` output
  for more than one targets
- Fixes issue of temporary user credentials to have
  consistent output
- Fixes a crash when setting a key with empty values
- Fixes a parsing issue with `mc admin config history`
- Fixes gateway ENV handling for etcd server and gateway
2019-11-12 03:16:25 -08:00
Harshavardhana aa04f97f95 Config migration should handle plain-text (#8506)
This PR fixes issues found in config migration

 - StorageClass migration error when rrs is empty
 - Plain-text migration of older config
 - Do not run in safe mode with incorrect credentials
 - Update logger_http documentation for _STATE env

Refer more reported issues at #8434
2019-11-11 12:01:21 -08:00
Harshavardhana 822eb5ddc7 Bring in safe mode support (#8478)
This PR refactors object layer handling such
that upon failure in sub-system initialization
server reaches a stage of safe-mode operation
wherein only certain API operations are enabled
and available.

This allows for fixing many scenarios such as

 - incorrect configuration in vault, etcd,
   notification targets
 - missing files, incomplete config migrations
   unable to read encrypted content etc
 - any other issues related to notification,
   policies, lifecycle etc
2019-11-09 09:27:23 -08:00
Harshavardhana 4e63e0e372 Return appropriate errors API versions changes across REST APIs (#8480)
This PR adds code to appropriately handle versioning issues
that come up quite constantly across our API changes. Currently
we were also routing our requests wrong which sort of made it
harder to write a consistent error handling code to appropriately
reject or honor requests.

This PR potentially fixes issues

 - old mc is used against new minio release which is incompatible
   returns an appropriate for client action.
 - any older servers talking to each other, report appropriate error
 - incompatible peer servers should report error and reject the calls
   with appropriate error
2019-11-04 09:30:59 -08:00
Harshavardhana d28bcb4f84 Migrate all backend at .minio.sys/config to encrypted backend (#8474)
- Supports migrating only when the credential ENVs are set,
  so any FS mode deployments which do not have ENVs set will
  continue to remain as is.
- Credential ENVs can be rotated using MINIO_ACCESS_KEY_OLD
  and MINIO_SECRET_KEY_OLD envs, in such scenarios it allowed
  to rotate the encrypted content to a new admin key.
2019-11-01 15:53:16 -07:00
Harshavardhana 9e7a3e6adc Extend further validation of config values (#8469)
- This PR allows config KVS to be validated properly
  without being affected by ENV overrides, rejects
  invalid values during set operation

- Expands unit tests and refactors the error handling
  for notification targets, returns error instead of
  ignoring targets for invalid KVS

- Does all the prep-work for implementing safe-mode
  style operation for MinIO server, introduces a new
  global variable to toggle safe mode based operations
  NOTE: this PR itself doesn't provide safe mode operations
2019-10-30 23:39:09 -07:00
Harshavardhana 47b13cdb80 Add etcd part of config support, add noColor/json support (#8439)
- Add color/json mode support for get/help commands
- Support ENV help for all sub-systems
- Add support for etcd as part of config
2019-10-30 00:04:39 -07:00
Anis Elleuch a49d4a9cb2 xl: Rewrite auto-healing and implement auto new-disk healer (#8114)
The new auto healing model selects one node always responsible
for auto-healing the whole cluster, erasure set by erasure set.
If that node dies, another node will be elected as a leading
operator to perform healing.

This code also adds a goroutine which checks each 10 minutes
if there are any new unformatted disks and performs its healing
in that case, only the erasure set which has the new disk will
be healed.
2019-10-28 10:27:49 -07:00
Harshavardhana ee4a6a823d Migrate config to KV data format (#8392)
- adding oauth support to MinIO browser (#8400) by @kanagaraj
- supports multi-line get/set/del for all config fields
- add support for comments, allow toggle
- add extensive validation of config before saving
- support MinIO browser to support proper claims, using STS tokens
- env support for all config parameters, legacy envs are also
  supported with all documentation now pointing to latest ENVs
- preserve accessKey/secretKey from FS mode setups
- add history support implements three APIs
  - ClearHistory
  - RestoreHistory
  - ListHistory
- add help command support for each config parameters
- all the bug fixes after migration to KV, and other bug
  fixes encountered during testing.
2019-10-22 22:59:13 -07:00
Praveen raj Mani 8836d57e3c The prometheus metrics refractoring (#8003)
The measures are consolidated to the following metrics

- `disk_storage_used` : Disk space used by the disk.
- `disk_storage_available`: Available disk space left on the disk.
- `disk_storage_total`: Total disk space on the disk.
- `disks_offline`: Total number of offline disks in current MinIO instance.
- `disks_total`: Total number of disks in current MinIO instance.
- `s3_requests_total`: Total number of s3 requests in current MinIO instance.
- `s3_errors_total`: Total number of errors in s3 requests in current MinIO instance.
- `s3_requests_current`: Total number of active s3 requests in current MinIO instance.
- `internode_rx_bytes_total`: Total number of internode bytes received by current MinIO server instance.
- `internode_tx_bytes_total`: Total number of bytes sent to the other nodes by current MinIO server instance.
- `s3_rx_bytes_total`: Total number of s3 bytes received by current MinIO server instance.
- `s3_tx_bytes_total`: Total number of s3 bytes sent by current MinIO server instance.
- `minio_version_info`: Current MinIO version with commit-id.
- `s3_ttfb_seconds_bucket`: Histogram that holds the latency information of the requests.

And this PR also modifies the current StorageInfo queries

- Decouples StorageInfo from ServerInfo .
- StorageInfo is enhanced to give endpoint information.

NOTE: ADMIN API VERSION IS BUMPED UP IN THIS PR

Fixes #7873
2019-10-22 21:01:14 -07:00
poornas c4e2af8ca3 Remove cache env from server help message (#8405) 2019-10-16 23:22:57 +05:30
Harshavardhana d48fd6fde9
Remove unusued params and functions (#8399) 2019-10-15 18:35:41 -07:00
Harshavardhana 36e12a6038 Assume local endpoints appropriately in k8s deployments (#8375)
On Kubernetes/Docker setups DNS resolves inappropriately
sometimes where there are situations same endpoints with
multiple disks come online indicating either one of them
is local and some of them are not local. This situation
can never happen and its only a possibility in orchestrated
deployments with dynamic DNS. Following code ensures that we
treat if one of the endpoint says its local for a given host
it is true for all endpoints for the same host. Following code
ensures that this assumption is true and it works in all
scenarios and it is safe to assume for a given host.

This PR also adds validation such that we do not crash the
server if there are bugs in the endpoints list in dsync
initialization.

Thanks to Daniel Valdivia <hola@danielvaldivia.com> for
reproducing this, this fix is needed as part of the
https://github.com/minio/m3 project.
2019-10-10 10:14:17 +05:30
Harshavardhana 290ad0996f Move etcd, logger, crypto into their own packages (#8366)
- Deprecates _MINIO_PROFILER, `mc admin profile` does the job
- Move ENVs to common location in cmd/config/
2019-10-08 11:17:56 +05:30
Harshavardhana 589e32a4ed Refactor config and split them in packages (#8351)
This change is related to larger config migration PR
change, this is a first stage change to move our
configs to `cmd/config/` - divided into its subsystems
2019-10-04 23:05:33 +05:30
Harshavardhana 8b80eca184 List buckets only once per sub-system initialization (#8333)
Current master repeatedly calls ListBuckets() during
initialization of multiple sub-systems

Use single ListBuckets() call for each sub-system as
follows

- LifeCycle
- Policy
- Notification
2019-10-02 05:35:02 +05:30
Harshavardhana ff5bf51952 admin/heal: Fix deep healing to heal objects under more conditions (#8321)
- Heal if the part.1 is truncated from its original size
- Heal if the part.1 fails while being verified in between
- Heal if the part.1 fails while being at a certain offset

Other cleanups include make sure to flush the HTTP responses
properly from storage-rest-server, avoid using 'defer' to
improve call latency. 'defer' incurs latency avoid them
in our hot-paths such as storage-rest handlers.

Fixes #8319
2019-10-02 01:42:15 +05:30
Harshavardhana f45977d371
Fix error handling in DeleteFileBulk storage handler (#8327)
errors.errorString() cannot be marshalled by gob
encoder, so using a slice of []error would fail
to be encoded. This leads to no errors being
generated instead gob.Decoder on the storage-client
would see an io.EOF

To avoid such bugs introduce a typed error for
handling such translations and register this type
for gob encoding support.
2019-09-30 19:01:28 -07:00
poornas 04b92124c5 fs/xl: Log warning if cache config specified (#8251)
in non-gateway mode.
2019-09-16 19:55:52 -07:00
poornas 76df027264 Allow caching only in gateway mode. (#8232)
This PR changes cache on PUT behavior to background fill the cache
after PutObject completes. This will avoid concurrency issues as in #8219.

Added cleanup of partially filled cache to prevent cache corruption
- Fixes #8208
2019-09-17 02:54:04 +05:30
poornas 8a71b0ec5a Add admin API to send console log messages (#7784)
Utilized by mc admin console command.
2019-09-03 23:40:48 +05:30
Krishnan Parthasarathi bbb56739bd Add User-Agent header with MinIO release details in http logs (#7843)
This would allow http log target server to distinguish between log
messages across different versions of MinIO deployments.
2019-08-14 11:43:43 -07:00
Harshavardhana bf8ec8ad73
Cleanup ui-errors and print proper error messages (#8068)
* Cleanup ui-errors and print proper error messages

Change HELP to HINT instead, handle more error
cases when starting up MinIO. One such is related
to #8048

* Apply suggestions from code review
2019-08-12 21:25:34 -07:00
poornas 3385bf3da8 Rewrite cache implementation to cache only on GET (#7694)
Fixes #7458
Fixes #7573 
Fixes #7938 
Fixes #6934
Fixes #6265 
Fixes #6630 

This will allow the cache to consistently work for
server and gateways. Range GET requests will
be cached in the background after the request
is served from the backend.

- All cached content is automatically bitrot protected.

- Avoid ETag verification if a cache-control header
is set and the cached content is still valid.

- This PR changes the cache backend format, and all existing
content will be migrated to the new format. Until the data is
migrated completely, all content will be served from the backend.
2019-08-09 17:09:08 -07:00
Anis Elleuch 1ce8d2c476 Add bucket lifecycle expiry feature (#7834) 2019-08-09 10:02:41 -07:00
Harshavardhana 007a52b546
Add common validation for compression and encryption (#7978) 2019-07-26 02:41:16 -07:00
Krishnan Parthasarathi 559a59220e Add initial support for bucket lifecycle (#7563)
This PR is based off @sinhaashish's PR for object lifecycle
management, which includes support only for,
- Expiration of object
- Filter using object prefix (_not_ object tags)

N B the code for actual expiration of objects will be included in a
subsequent PR.
2019-07-19 21:20:33 +01:00
Krishna Srinivas 58d90ed73c Avoid network transfer for bitrot verification during healing (#7375) 2019-07-08 13:51:18 -07:00
kannappanr 70b350c383
Remove DeploymentID from response headers (#7815)
Response headers need not contain deployment ID.
2019-07-01 12:22:01 -07:00
Krishna Srinivas 338e9a9be9 Put object client disconnect (#7824)
Fail putObject  and postpolicy in case client prematurely disconnects
Use request's context to cancel lock requests on client disconnects
2019-06-28 22:09:17 -07:00
Krishna Srinivas 183ec094c4 Simplify HTTP trace related code (#7833) 2019-06-26 22:41:12 -07:00
Kanagaraj M 48cb271a46 include ip address while doing checkPortAvailability (#7818)
While checking for port availability, ip address should be included.
When a machine has multiple ip addresses, multiple minio instances
or some other applications can be run on same port but different
ip address.

Fixes #7685
2019-06-24 15:02:39 -07:00
Harshavardhana 91ceae23d0 Add support for customizable user (#7569) 2019-06-10 20:27:42 +05:30
Anis Elleuch 7abadfccc2 Add self-healing feature (#7604)
- Background Heal routine receives heal requests from a channel, either to
heal format, buckets or objects
- Daily sweeper lists all objects in all buckets, these objects
don't necessarly have read quorum so they can be removed if
these objects are unhealable
- Heal daily ops receives objects from the daily sweeper
and send them to the heal routine.
2019-06-08 22:14:07 -07:00
poornas 97090aa16c Add admin API to send trace notifications to registered (#7128)
Remove current functionality to log trace to file
using MINIO_HTTP_TRACE env, and replace it with
mc admin trace command on mc client.
2019-06-08 15:54:41 -07:00
ebozduman 67d508b214 Adjusts help content dynamically according to OS (#7646) 2019-05-15 14:02:44 +05:30
Harshavardhana 091b9b661f Complain if we detect sub-optimal ordering in distributed setup (#7576)
Fixes #6156
2019-04-29 10:10:50 +05:30
Praveen raj Mani d96584ef58 Allow server to start if one of local nodes in docker/kubernetes setup is resolved (#7452)
Allow server to start if one of the local nodes in docker/kubernetes setup is successfully resolved

- The rule is that we need atleast one local node to work. We dont need to resolve the
  rest at that point.

- In a non-orchestrational setup, we fail if we do not have atleast one local node up
  and running.

- In an orchestrational setup (docker-swarm and kubernetes), We retry with a sleep of 5
  seconds until any one local node shows up.

Fixes #6995
2019-04-19 10:26:44 -07:00
kannappanr 5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Harshavardhana 8757c963ba
Migrate all Peer communication to common Notification subsystem (#7031)
Deprecate the use of Admin Peers concept and migrate all peer
communication to Notification subsystem. This finally allows
for a common subsystem for all peer notification in case of
distributed server deployments.
2019-01-14 12:14:20 +05:30
Harshavardhana e82dcd195c Deprecate config-dir bring in certs-dir for TLS configuration (#7033)
This PR is to provide indication that config-dir will be removed
in future and all users should migrate to new --certs-dir option

Fixes #7016
Fixes #7032
2019-01-02 10:05:16 -08:00
Anis Elleuch 99b843a64e Add anonymous flag to prevent logging sensitive information (#6899) 2018-12-18 16:08:11 -08:00
Andreas Auernhammer d264d2c899 add auto-encryption feature (#6523)
This commit adds an auto-encryption feature which allows
the Minio operator to ensure that uploaded objects are
always encrypted.

This change adds the `autoEncryption` configuration option
as part of the KMS conifguration and the ENV. variable
`MINIO_SSE_AUTO_ENCRYPTION:{on,off}`.

It also updates the KMS documentation according to the
changes.

Fixes #6502
2018-12-14 13:35:48 -08:00
Harshavardhana bebaff269c Support IPv6 in minio command line (#6947)
Fixes #6946
2018-12-14 13:07:46 +05:30
Praveen raj Mani 4e6d3c093f Errors in notification config should not crash the server (#6881)
Fixes #6870
2018-12-04 18:27:12 -08:00
Harshavardhana a9de303d8b
Update command line docs (#6839) 2018-11-20 17:35:33 -08:00
Harshavardhana 3f19ea98bb Honor certs from config-dir (#6757)
This was a regression introduced in 9fe51e392b
2018-11-02 11:53:45 -07:00
Harshavardhana 9fe51e392b Support etcd TLS certficates (#6719)
This PR supports two models for etcd certs

- Client-to-server transport security with HTTPS
- Client-to-server authentication with HTTPS client certificates
2018-10-29 11:14:12 -07:00
Pontus Leitzler 81d21850ec Root CAs can be used for backend without TLS (#6711) 2018-10-28 06:21:00 +05:30
Harshavardhana 54ae364def Introduce STS client grants API and OPA policy integration (#6168)
This PR introduces two new features

- AWS STS compatible STS API named AssumeRoleWithClientGrants

```
POST /?Action=AssumeRoleWithClientGrants&Token=<jwt>
```

This API endpoint returns temporary access credentials, access
tokens signature types supported by this API

  - RSA keys
  - ECDSA keys

Fetches the required public key from the JWKS endpoints, provides
them as rsa or ecdsa public keys.

- External policy engine support, in this case OPA policy engine

- Credentials are stored on disks
2018-10-09 14:00:01 -07:00
Harshavardhana 12b4971b70 Rename config.json in config-dir with '.deprecated' extension (#6446)
Fixes #6444
2018-09-10 16:15:47 -07:00
Anis Elleuch 92bc7caf7a Reword missing credentials error msg (#6418)
Enhance a little bit the error message that is showing
when access & secret keys are not specified in the
environment when running Minio in gateway and server mode.

This commit also removes a redundant check of access/secret keys.
2018-09-09 22:51:48 +05:30
Harshavardhana 19202bae81 Allow backward compatible way to load creds from config.json (#6435)
Print warning message for users to migrate to newer style of distributed
deployment by always setting credentials as ENVs.

Fixes #6434
2018-09-07 11:18:49 -07:00
Harshavardhana d0d015361c Fix config subsystem to wait on quorum number of formatted disks (#6407) 2018-09-05 20:55:55 +05:30
Harshavardhana b01e69e08f
Initialize global object layer after all subsystems have initialized (#6333)
This is to ensure that object API operations are not performed
on a server on which subsystems are yet to be initialized.
2018-08-22 23:11:17 -07:00
Harshavardhana 7d7e21aebb Merge initConfig logic to ConfigSys (#6312) 2018-08-19 13:57:18 -07:00
poornas e71ef905f9 Add support for SSE-S3 server side encryption with vault (#6192)
Add support for sse-s3 encryption with vault as KMS.

Also refactoring code to make use of headers and functions defined in
crypto package and clean up duplicated code.
2018-08-17 12:52:14 -07:00