Commit Graph

209 Commits

Author SHA1 Message Date
Klaus Post
0d8c74358d
Add erasure and compression self-tests (#11918)
Ensure that we don't use potentially broken algorithms for critical functions, whether it be a runtime problem or implementation problem for a specific platform.
2021-03-31 09:11:37 -07:00
Anis Elleuch
d8b5adfd10
trace: Add storage & OS tracing (#11889) 2021-03-26 23:24:07 -07:00
Anis Elleuch
2c296652f7
Simplify access to local node name (#11907)
The local node name is heavily used in tracing, create a new global 
variable to store it. Multiple goroutines can access it since it won't be
changed later.
2021-03-26 11:37:58 -07:00
Harshavardhana
6386b45c08
[feat] use rename instead of recursive deletes (#11641)
most of the delete calls today spend time in
a blocking operation where multiple calls need
to be recursively sent to delete the objects,
instead we can use rename operation to atomically
move the objects from the namespace to `tmp/.trash`

we can schedule deletion of objects at this
location once in 15, 30mins and we can also add
wait times between each delete operation.

this allows us to make delete's faster as well
less chattier on the drives, each server runs locally
a groutine which would clean this up regularly.
2021-02-26 09:52:27 -08:00
Harshavardhana
1b453728a3
initialize forwarder after init() to avoid crashes (#11330)
DNSCache dialer is a global value initialized in
init(), whereas `go` keeps `var =` before `init()`
, also we don't need to keep proxy routers as
global entities - register the forwarder as
necessary to avoid crashes.
2021-01-22 15:37:41 -08:00
Harshavardhana
a4f6705874
expire stale locks when owner is down (#11247)
fixes #11246
2021-01-07 19:16:18 -08:00
Harshavardhana
5c451d1690
update x/net/http2 to address few bugs (#11144)
additionally also configure http2 healthcheck
values to quickly detect unstable connections
and let them timeout.

also use single transport for proxying requests
2020-12-21 21:42:38 -08:00
Klaus Post
a896125490
Add crawler delay config + dynamic config values (#11018) 2020-12-04 09:32:35 -08:00
Harshavardhana
80d31113e5
fix: etcd import paths again depend on v3.4.14 release (#11020)
Due to botched upstream renames of project repositories
and incomplete migration to go.mod support, our current
dependency version of `go.mod` had bugs i.e it was
using commits from master branch which didn't have
the required fixes present in release-3.4 branches

which leads to some rare bugs

https://github.com/etcd-io/etcd/pull/11477 provides
a workaround for now and we should migrate to this.

release-3.5 eventually claims to fix all of this
properly until then we cannot use /v3 import right now
2020-12-03 11:35:18 -08:00
Harshavardhana
4ec45753e6 rename server sets to server pools 2020-12-01 13:50:33 -08:00
Harshavardhana
4ea31da889
fix: move list quorum ENV to config (#10804) 2020-11-02 17:21:56 -08:00
Harshavardhana
4c773f7068
re-use remote transports in Peer,Storage,Locker clients (#10788)
use one transport for internode communication
2020-11-02 07:43:11 -08:00
Harshavardhana
bd2131ba34
add DNS cache support to avoid DNS flooding (#10693)
Go stdlib resolver doesn't support caching DNS
resolutions, since we compile with CGO disabled
we are more probe to DNS flooding for all network
calls to resolve for DNS from the DNS server.

Under various containerized environments such as
VMWare this becomes a problem because there are
no DNS caches available and we may end up overloading
the kube-dns resolver under concurrent I/O.

To circumvent this issue implement a DNSCache resolver
which resolves DNS and caches them for around 10secs
with every 3sec invalidation attempted.
2020-10-16 14:49:05 -07:00
Harshavardhana
ad726b49b4
rename zones to serverSets to avoid terminology conflict (#10679)
we are bringing in availability zones, we should avoid
zones as per server expansion concept.
2020-10-15 14:28:50 -07:00
Ritesh H Shukla
c2f16ee846
Add basic bandwidth monitoring for replication. (#10501)
This change tracks bandwidth for a bucket and object

- [x] Add Admin API
- [x] Add Peer API
- [x] Add BW throttling
- [x] Admin APIs to set replication limit
- [x] Admin APIs for fetch bandwidth
2020-10-09 20:36:00 -07:00
Harshavardhana
a0d0645128
remove safeMode behavior in startup (#10645)
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.

allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.

In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.
2020-10-09 09:59:52 -07:00
Harshavardhana
736e58dd68
fix: handle concurrent lockers with multiple optimizations (#10640)
- select lockers which are non-local and online to have
  affinity towards remote servers for lock contention

- optimize lock retry interval to avoid sending too many
  messages during lock contention, reduces average CPU
  usage as well

- if bucket is not set, when deleteObject fails make sure
  setPutObjHeaders() honors lifecycle only if bucket name
  is set.

- fix top locks to list out always the oldest lockers always,
  avoid getting bogged down into map's unordered nature.
2020-10-08 12:32:32 -07:00
Krishna Srinivas
230fc0d186
Support for "directory" objects (#10499) 2020-09-19 08:39:41 -07:00
Harshavardhana
eb2934f0c1
simplify webhook DNS further generalize for gateway (#10448)
continuation of the changes from eaaf05a7cc
this further simplifies, enables this for gateway deployments as well
2020-09-10 14:19:32 -07:00
Harshavardhana
6a0372be6c
cleanup tmpDir any older entries automatically just like multipart (#10439)
also consider multipart uploads, temporary files in `.minio.sys/tmp`
as stale beyond 24hrs and clean them up automatically
2020-09-08 15:55:40 -07:00
Andreas Auernhammer
fbd1c5f51a
certs: refactor cert manager to support multiple certificates (#10207)
This commit refactors the certificate management implementation
in the `certs` package such that multiple certificates can be
specified at the same time. Therefore, the following layout of
the `certs/` directory is expected:
```
certs/
 │
 ├─ public.crt
 ├─ private.key
 ├─ CAs/          // CAs directory is ignored
 │   │
 │    ...
 │
 ├─ example.com/
 │   │
 │   ├─ public.crt
 │   └─ private.key
 └─ foobar.org/
     │
     ├─ public.crt
     └─ private.key
   ...
```

However, directory names like `example.com` are just for human
readability/organization and don't have any meaning w.r.t whether
a particular certificate is served or not. This decision is made based
on the SNI sent by the client and the SAN of the certificate.

***

The `Manager` will pick a certificate based on the client trying
to establish a TLS connection. In particular, it looks at the client
hello (i.e. SNI) to determine which host the client tries to access.
If the manager can find a certificate that matches the SNI it
returns this certificate to the client.

However, the client may choose to not send an SNI or tries to access
a server directly via IP (`https://<ip>:<port>`). In this case, we
cannot use the SNI to determine which certificate to serve. However,
we also should not pick "the first" certificate that would be accepted
by the client (based on crypto. parameters - like a signature algorithm)
because it may be an internal certificate that contains internal hostnames. 
We would disclose internal infrastructure details doing so.

Therefore, the `Manager` returns the "default" certificate when the
client does not specify an SNI. The default certificate the top-level
`public.crt` - i.e. `certs/public.crt`.

This approach has some consequences:
 - It's the operator's responsibility to ensure that the top-level
   `public.crt` does not disclose any information (i.e. hostnames)
   that are not publicly visible. However, this was the case in the
   past already.
 - Any other `public.crt` - except for the top-level one - must not
   contain any IP SAN. The reason for this restriction is that the
   Manager cannot match a SNI to an IP b/c the SNI is the server host
   name. The entire purpose of SNI is to indicate which host the client
   tries to connect to when multiple hosts run on the same IP. So, a
   client will not set the SNI to an IP.
   If we would allow IP SANs in a lower-level `public.crt` a user would
   expect that it is possible to connect to MinIO directly via IP address
   and that the MinIO server would pick "the right" certificate. However,
   the MinIO server cannot determine which certificate to serve, and
   therefore always picks the "default" one. This may lead to all sorts
   of confusing errors like:
   "It works if I use `https:instance.minio.local` but not when I use
   `https://10.0.2.1`.

These consequences/limitations should be pointed out / explained in our
docs in an appropriate way. However, the support for multiple
certificates should not have any impact on how deployment with a single
certificate function today.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-09-03 23:33:37 -07:00
Harshavardhana
e57c742674
use single dynamic timeout for most locked API/heal ops (#10275)
newDynamicTimeout should be allocated once, in-case
of temporary locks in config and IAM we should
have allocated timeout once before the `for loop`

This PR doesn't fix any issue as such, but provides
enough dynamism for the timeout as per expectation.
2020-08-17 11:29:58 -07:00
Harshavardhana
a20d4568a2
fix: make sure to use uniform drive count calculation (#10208)
It is possible in situations when server was deployed
in asymmetric configuration in the past such as

```
minio server ~/fs{1...4}/disk{1...5}
```

Results in setDriveCount of 10 in older releases
but with fairly recent releases we have moved to
having server affinity which means that a set drive
count ascertained from above config will be now '4'

While the object layer make sure that we honor
`format.json` the storageClass configuration however
was by mistake was using the global value obtained
by heuristics. Which leads to prematurely using
lower parity without being requested by the an
administrator.

This PR fixes this behavior.
2020-08-05 13:31:12 -07:00
Harshavardhana
019fe69a57
fix: reduce an extra system call for writes instead fail later (#10187) 2020-08-04 12:09:41 -07:00
poornas
a8dd7b3eda
Refactor replication target management. (#10154)
Generalize replication target management so
that remote targets for a bucket can be
managed with ARNs. `mc admin bucket remote`
command will be used to manage targets.
2020-07-30 19:55:22 -07:00
poornas
c43da3005a
Add support for server side bucket replication (#9882) 2020-07-21 17:49:56 -07:00
Harshavardhana
9fd836e51f
add dnsStore interface for upcoming operator webhook (#10077) 2020-07-20 12:28:48 -07:00
Anis Elleuch
778e9c864f
Move dependency from minio-go v6 to v7 (#10042) 2020-07-14 09:38:05 -07:00
Harshavardhana
72e0745e2f
fix: migrate to go.etcd.io import path (#9987)
with the merge of https://github.com/etcd-io/etcd/pull/11823
etcd v3.5.0 will now have a properly imported versioned path

this fixes our pending migration to newer repo
2020-07-07 19:04:29 -07:00
Anis Elleuch
2be20588bf
Reroute requests based token heal/listing (#9939)
When manual healing is triggered, one node in a cluster will 
become the authority to heal. mc regularly sends new requests 
to fetch the status of the ongoing healing process, but a load 
balancer could land the healing request to a node that is not 
doing the healing request.

This PR will redirect a request to the node based on the node 
index found described as part of the client token. A similar
technique is also used to proxy ListObjectsV2 requests
by encoding this information in continuation-token
2020-07-03 11:53:03 -07:00
Krishna Srinivas
4c266df863
fix: proxy ListObjects request to one of the server based on hash(bucket) (#9881) 2020-07-02 10:56:22 -07:00
Klaus Post
972d876ca9
Do not select zones with <5% free after upload (#9877)
Looking into full disk errors on zoned setup. We don't take the
5% space requirement into account when selecting a zone.

The interesting part is that even considering this we don't
know the size of the object the user wants to upload when
they do multipart uploads.

It seems quite defensive to always upload multiparts to
the zone where there is the most space since all load will
be directed to a part of the cluster.

In these cases we make sure it can at least hold a 1GiB file
and we disadvantage fuller zones more by subtracting the
expected size before weighing.
2020-06-20 06:36:44 -07:00
Harshavardhana
4915433bd2
Support bucket versioning (#9377)
- Implement a new xl.json 2.0.0 format to support,
  this moves the entire marshaling logic to POSIX
  layer, top layer always consumes a common FileInfo
  construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
  for object placements.

Fixes #2111
2020-06-12 20:04:01 -07:00
Harshavardhana
5e529a1c96
simplify context timeout for readiness (#9772)
additionally also add CORS support to restrict
for specific origin, adds a new config and
updated the documentation as well
2020-06-04 14:58:34 -07:00
Harshavardhana
53aaa5d2a5
Export bucket usage counts as part of bucket metrics (#9710)
Bonus fixes in quota enforcement to use the
new datastructure and use timedValue to cache
a value/reload automatically avoids one less
global variable.
2020-05-27 06:45:43 -07:00
Krishna Srinivas
7d19ab9f62
readiness returns error quickly if any of the set is down (#9662)
This PR adds a new configuration parameter which allows readiness
check to respond within 10secs, this can be reduced to a lower value
if necessary using 

```
mc admin config set api ready_deadline=5s
```

 or

```
export MINIO_API_READY_DEADLINE=5s
```
2020-05-23 17:38:39 -07:00
Harshavardhana
bd032d13ff
migrate all bucket metadata into a single file (#9586)
this is a major overhaul by migrating off all
bucket metadata related configs into a single
object '.metadata.bin' this allows us for faster
bootups across 1000's of buckets and as well
as keeps the code simple enough for future
work and additions.

Additionally also fixes #9396, #9394
2020-05-19 13:53:54 -07:00
kannappanr
a62572fb86
Check for address flags in all positions (#9615)
Fixes #9599
2020-05-17 08:46:23 -07:00
Harshavardhana
6ac48a65cb
fix: use unused cacheMetrics code in prometheus (#9588)
remove all other unusued/deadcode
2020-05-13 08:15:26 -07:00
Krishna Srinivas
94f1a1dea3
add option for O_SYNC writes for standalone FS backend (#9581) 2020-05-12 19:24:59 -07:00
Harshavardhana
a1de9cec58
cleanup object-lock/bucket tagging for gateways (#9548)
This PR is to ensure that we call the relevant object
layer APIs for necessary S3 API level functionalities
allowing gateway implementations to return proper
errors as NotImplemented{}

This allows for all our tests in mint to behave
appropriately and can be handled appropriately as
well.
2020-05-08 13:44:44 -07:00
Harshavardhana
9b3b04ecec
allow retries for bucket encryption/policy quorum reloads (#9513)
We should allow quorum errors to be send upwards
such that caller can retry while reading bucket
encryption/policy configs when server is starting
up, this allows distributed setups to load the
configuration properly.

Current code didn't facilitate this and would have
never loaded the actual configs during rolling,
server restarts.
2020-05-04 09:42:58 -07:00
Egor Rudinsky
f7c91eff54
Share button for public objects (#9162) 2020-05-01 23:55:53 -07:00
poornas
9a547dcbfb
Add API's for managing bucket quota (#9379)
This PR allows setting a "hard" or "fifo" quota
restriction at the bucket level. Buckets that
have reached the FIFO quota configured, will
automatically be cleaned up in FIFO manner until
bucket usage drops to configured quota.
If a bucket is configured with a "hard" quota
ceiling, all further writes are disallowed.
2020-04-30 15:55:54 -07:00
Harshavardhana
f14bf25cb9
optimize Listen bucket notification implementation (#9444)
this commit avoids lots of tiny allocations, repeated
channel creates which are performed when filtering
the incoming events, unescaping a key just for matching.

also remove deprecated code which is not needed
anymore, avoids unexpected data structure transformations
from the map to slice.
2020-04-27 06:25:05 -07:00
Harshavardhana
60d415bb8a
deprecate/remove global WORM mode (#9436)
global WORM mode is a complex piece for which
the time has passed, with the advent of S3 compatible
object locking and retention implementation global
WORM is sort of deprecated, this has been mentioned
in our documentation for some time, now the time
has come for this to go.
2020-04-24 16:37:05 -07:00
Anis Elleuch
8a94aebdb8
config: Add api requests max & deadline configs (#9273)
Add two new configuration entries, api.requests-max and
api.requests-deadline which have the same role of
MINIO_API_REQUESTS_MAX and MINIO_API_REQUESTS_DEADLINE.
2020-04-14 12:46:37 -07:00
Anis Elleuch
7fdeb44372
info: Initialize boot time early so uptime will always be correct (#9154) 2020-03-17 16:37:28 -07:00
Anis Elleuch
496f4a7dc7
Add service account type in IAM (#9029) 2020-03-17 10:36:13 -07:00
poornas
10fd53d6bb
Fix: admin config set API for notifications (#9085)
Filter out targets set via env when
validating incoming config change against
configured notification targets

Fixes #9066
2020-03-14 00:01:15 -07:00