Commit Graph

117 Commits

Author SHA1 Message Date
Harshavardhana
a0d0645128
remove safeMode behavior in startup (#10645)
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.

allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.

In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.
2020-10-09 09:59:52 -07:00
Harshavardhana
66174692a2
add '.healing.bin' for tracking currently healing disk (#10573)
add a hint on the disk to allow for tracking fresh disk
being healed, to allow for restartable heals, and also
use this as a way to track and remove disks.

There are more pending changes where we should move
all the disk formatting logic to backend drives, this
PR doesn't deal with this refactor instead makes it
easier to track healing in the future.
2020-09-28 19:39:32 -07:00
Anis Elleuch
b302c8a5f4
heal: Fix periodic healing cleanup (#10569)
isEnded() was incorrectly calculating if the current healing sequence is
ended or not. h.currentStatus.Items could be empty if healing is very
slow and mc admin heal consumed all items.
2020-09-25 10:29:00 -07:00
Harshavardhana
96997d2b21
allow ctrl+c to be consistent at early startup (#10435)
fixes #10431
2020-09-08 09:10:55 -07:00
Harshavardhana
b0e1d4ce78
re-attach offline drive after new drive replacement (#10416)
inconsistent drive healing when one of the drive is offline
while a new drive was replaced, this change is to ensure
that we can add the offline drive back into the mix by
healing it again.
2020-09-04 17:09:02 -07:00
Harshavardhana
7778fef6bb
update continous heal metrics appropriately for scanned items (#10352)
bonus make sure to ignore objectNotFound, and versionNotFound
errors properly at all layers, since HealObjects() returns
objectNotFound error if the bucket or prefix is empty.
2020-08-26 08:53:33 -07:00
Klaus Post
c097ce9c32
continous healing based on crawler (#10103)
Design: https://gist.github.com/klauspost/792fe25c315caf1dd15c8e79df124914
2020-08-24 13:47:01 -07:00
Klaus Post
95ae6c4b49
Fix missing unlock in *healSequence.hasEnded() (#10305)
The background healing sequence would always hang when this function is called.
2020-08-20 08:48:09 -07:00
Klaus Post
bb5976d727
healbucket: Send object version ID (#10263)
Based on our previous conversations I assume we should send the version
 id when healing an object.

Maybe we should even list object versions and heal all?
2020-08-17 08:25:44 -07:00
Harshavardhana
2a9819aff8
fix: refactor background heal for cluster health (#10225) 2020-08-07 19:43:06 -07:00
Harshavardhana
6c6137b2e7
add cluster maintenance healthcheck drive heal affinity (#10218) 2020-08-07 13:22:53 -07:00
Harshavardhana
17747db93f
fix: support healing older content (#10076)
This PR adds support for healing older
content i.e from 2yrs, 1yr. Also handles
other situations where our config was
not encrypted yet.

This PR also ensures that our Listing
is consistent and quorum friendly,
such that we don't list partial objects
2020-07-17 17:41:29 -07:00
Harshavardhana
187c3f62df
fix: heal replaced drives properly (#10069)
healing was not working properly when drives were
replaced, due to the error check in root disk
calculation this PR fixes this behavior

This PR also adds additional fix for missing
metadata entries from .minio.sys as part of
disk healing as well.

Added code to ignore and print more context
sensitive errors for better debugging.

This PR is continuation of fix in 7b14e9b660
2020-07-17 10:08:04 -07:00
Harshavardhana
cdb0e6ffed
support proper values for listMultipartUploads/listParts (#9970)
object KMS is configured with auto-encryption,
there were issues when using docker registry -
this has been left unnoticed for a while.

This PR fixes an issue with compatibility.

Additionally also fix the continuation-token implementation
infinite loop issue which was missed as part of #9939

Also fix the heal token to be generated as a client
facing value instead of what is remembered by the
server, this allows for the server to be stateless 
regarding the token's behavior.
2020-07-03 19:27:13 -07:00
Anis Elleuch
2be20588bf
Reroute requests based token heal/listing (#9939)
When manual healing is triggered, one node in a cluster will 
become the authority to heal. mc regularly sends new requests 
to fetch the status of the ongoing healing process, but a load 
balancer could land the healing request to a node that is not 
doing the healing request.

This PR will redirect a request to the node based on the node 
index found described as part of the client token. A similar
technique is also used to proxy ListObjectsV2 requests
by encoding this information in continuation-token
2020-07-03 11:53:03 -07:00
Harshavardhana
810a4f0723
fix: return proper errors Get/HeadObject for deleteMarkers (#9957) 2020-07-02 16:17:27 -07:00
Harshavardhana
a38ce29137
fix: simplify background heal and trigger heal items early (#9928)
Bonus fix during versioning merge one of the PR was missing
the offline/online disk count fix from #9801 port it correctly
over to the master branch from release.

Additionally, add versionID support for MRF

Fixes #9910
Fixes #9931
2020-06-29 13:07:26 -07:00
Harshavardhana
4915433bd2
Support bucket versioning (#9377)
- Implement a new xl.json 2.0.0 format to support,
  this moves the entire marshaling logic to POSIX
  layer, top layer always consumes a common FileInfo
  construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
  for object placements.

Fixes #2111
2020-06-12 20:04:01 -07:00
Anis Elleuch
3aad09be28
heal: Fix passing healing opts (#9756)
Manual healing (as background healing) creates a heal task with a
possiblity to override healing options, such as deep or normal mode.

Use a pointer type in heal opts so nil would mean use the default
healing options.
2020-06-02 09:07:16 -07:00
Harshavardhana
7ea026ff1d
fix: reply back user-metadata in lower case form (#9697)
some clients such as veeam expect the x-amz-meta to
be sent in lower cased form, while this does indeed
defeats the HTTP protocol contract it is harder to
change these applications, while these applications
get fixed appropriately in future.

x-amz-meta is usually sent in lowercased form
by AWS S3 and some applications like veeam
incorrectly end up relying on the case sensitivity
of the HTTP headers.

Bonus fixes

 - Fix the iso8601 time format to keep it same as
   AWS S3 response
 - Increase maxObjectList to 50,000 and use
   maxDeleteList as 10,000 whenever multi-object
   deletes are needed.
2020-05-25 16:51:32 -07:00
Harshavardhana
b768645fde
fix: unexpected logging with bucket metadata conversions (#9519) 2020-05-04 20:04:06 -07:00
Harshavardhana
27d716c663
simplify usage of mutexes and atomic constants (#9501) 2020-05-03 22:35:40 -07:00
Harshavardhana
71ce63f79c
fix: background heal to call HealFormat only if needed (#9491)
In large setups this avoids unnecessary data transfer
across nodes and potential locks.

This PR also optimizes heal result channel, which should
be avoided for each queueHealTask as its expensive
to create/close channels for large number of objects.
2020-04-30 20:23:00 -07:00
Harshavardhana
f44cfb2863
use GlobalContext whenever possible (#9280)
This change is throughout the codebase to
ensure that all codepaths honor GlobalContext
2020-04-09 09:30:02 -07:00
Bala FA
95e89f1712
proactive deep heal object when a bitrot is detected (#9192) 2020-04-01 12:14:00 -07:00
Nitish Tiwari
6b984410d5
Add support for self-healing related metrics in Prometheus (#9079)
Fixes #8988

Co-authored-by: Anis Elleuch <vadmeste@users.noreply.github.com>
Co-authored-by: Harshavardhana <harsha@minio.io>
2020-03-24 22:40:45 -07:00
Anis Elleuch
db2155551a
heal: Pass scan mode to HealObjects to deep scan full quorum objects (#9159)
As an optimization of the healing, HealObjects() avoid sending an
object to the background healing subsystem when the object is
present in all disks.

However, HealObjects() should have checked the scan type, if this
deep, always pass the object to the healing subsystem.
2020-03-18 17:50:00 -07:00
Klaus Post
8d98662633
re-implement data usage crawler to be more efficient (#9075)
Implementation overview: 

https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b
2020-03-18 16:19:29 -07:00
Anis Elleuch
fdf65aa9b9
heal: Add info about the next background healing round (#9122)
- avoid setting last heal activity when starting self-healing

This can be confusing to users thinking that the self healing
cycle was already performed.

- add info about the next background healing round
2020-03-11 23:00:31 -07:00
Harshavardhana
f98616dce7
heal: Optimize heal listing by avoiding batches (#8901)
Also limit the heal per object if there is incoming
requests by suspending heal for longer periods of time.
2020-01-29 12:05:44 +05:30
Harshavardhana
442e1698cb
heal: Avoid spinning up object healing during startup (#8819)
auto-heal disks, metadata and buckets in background but
not objects, let the auto heal kick in for objects after
the cluster has been up for a while.
2020-01-15 01:08:39 -08:00
Harshavardhana
5aa5dcdc6d
lock: improve locker initialization at init (#8776)
Use reference format to initialize lockers
during startup, also handle `nil` for NetLocker
in dsync and remove *errorLocker* implementation

Add further tuning parameters such as

 - DialTimeout is now 15 seconds from 30 seconds
 - KeepAliveTimeout is not 20 seconds, 5 seconds
   more than default 15 seconds
 - ResponseHeaderTimeout to 10 seconds
 - ExpectContinueTimeout is reduced to 3 seconds
 - DualStack is enabled by default remove setting
   it to `true`
 - Reduce IdleConnTimeout to 30 seconds from
   1 minute to avoid idleConn build up

Fixes #8773
2020-01-10 02:35:06 -08:00
Anis Elleuch
555969ee42 Add data usage collect with its new admin API (#8553)
Admin data usage info API returns the following

(Only FS & XL, for now)

- Number of buckets
- Number of objects
- The total size of objects
- Objects histogram
- Bucket sizes
2019-12-12 06:02:37 -08:00
Harshavardhana
822eb5ddc7 Bring in safe mode support (#8478)
This PR refactors object layer handling such
that upon failure in sub-system initialization
server reaches a stage of safe-mode operation
wherein only certain API operations are enabled
and available.

This allows for fixing many scenarios such as

 - incorrect configuration in vault, etcd,
   notification targets
 - missing files, incomplete config migrations
   unable to read encrypted content etc
 - any other issues related to notification,
   policies, lifecycle etc
2019-11-09 09:27:23 -08:00
Harshavardhana
9e7a3e6adc Extend further validation of config values (#8469)
- This PR allows config KVS to be validated properly
  without being affected by ENV overrides, rejects
  invalid values during set operation

- Expands unit tests and refactors the error handling
  for notification targets, returns error instead of
  ignoring targets for invalid KVS

- Does all the prep-work for implementing safe-mode
  style operation for MinIO server, introduces a new
  global variable to toggle safe mode based operations
  NOTE: this PR itself doesn't provide safe mode operations
2019-10-30 23:39:09 -07:00
Anis Elleuch
a49d4a9cb2 xl: Rewrite auto-healing and implement auto new-disk healer (#8114)
The new auto healing model selects one node always responsible
for auto-healing the whole cluster, erasure set by erasure set.
If that node dies, another node will be elected as a leading
operator to perform healing.

This code also adds a goroutine which checks each 10 minutes
if there are any new unformatted disks and performs its healing
in that case, only the erasure set which has the new disk will
be healed.
2019-10-28 10:27:49 -07:00
Harshavardhana
d48fd6fde9
Remove unusued params and functions (#8399) 2019-10-15 18:35:41 -07:00
Harshavardhana
e6d8e272ce
Use const slashSeparator instead of "/" everywhere (#8028) 2019-08-06 12:08:58 -07:00
Anis Elleuch
48f2c98052 admin: Add Background heal status info API (#7774)
This API returns the information related to the self healing routine.

For the moment, it returns:
- The total number of objects that are scanned
- The last time when an item was scanned
2019-06-25 16:42:24 -07:00
Anis Elleuch
7abadfccc2 Add self-healing feature (#7604)
- Background Heal routine receives heal requests from a channel, either to
heal format, buckets or objects
- Daily sweeper lists all objects in all buckets, these objects
don't necessarly have read quorum so they can be removed if
these objects are unhealable
- Heal daily ops receives objects from the daily sweeper
and send them to the heal routine.
2019-06-08 22:14:07 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Anis Elleuch
facbd653ba Add normal/deep type of heal scanning (#7251)
Healing scan used to read all objects parts to check for bitrot
checksum. This commit will add a quicker way of healing scan
by only checking if parts are actually present in disks or not.
2019-03-14 13:08:51 -07:00
Harshavardhana
7079abc931 Implement HealObjects API to simplify healing (#7351) 2019-03-13 17:35:09 -07:00
Harshavardhana
fef5416b3c Support unknown gateway errors and convert at handler layer (#7219)
Different gateway implementations due to different backend
API errors, might return different unsupported errors at
our handler layer. Current code posed a problem for us because
this information was lost and we would convert it to InternalError
in this situation all S3 clients end up retrying the request.

To avoid this unexpected situation implement a way to support
this cleanly such that the underlying information is not lost
which is returned by gateway.
2019-02-12 14:55:52 +05:30
Harshavardhana
082f777281 Revamp bucket metadata healing (#7208)
Bucket metadata healing in the current code was executed multiple
times each time for a given set. Bucket metadata just like
objects are hashed in accordance with its name on any given set,
to allow hashing to play a role we should let the top level
code decide where to navigate.

Current code also had 3 bucket metadata files hardcoded, whereas
we should make it generic by listing and navigating the .minio.sys
to heal such objects.

We also had another bug where due to isObjectDangling changes
without pre-existing bucket metadata files, we were erroneously
reporting it as grey/corrupted objects.

This PR fixes all of the above items.
2019-02-11 09:23:13 +05:30
Harshavardhana
30135eed86 Redo how to handle stale dangling files (#7171)
foo.CORRUPTED should never be created because when
multiple sets are involved we would hash the file
to wrong a location, this PR removes the code.

But allows DeleteBucket() to work properly to delete
dangling buckets/objects. Also adds another option
to Healing where a user needs to specify `--remove`
such that all dangling objects will be deleted with
user confirmation.
2019-02-05 17:58:48 -08:00
Aditya Manthramurthy
2053b3414f Reduce heal parallelism (#7155)
To avoid a large number of concurrent connections between minio
servers and to reduce CPU pressure, it is better to limit the number
of objects healed in parallel to number_of_CPUs.
2019-01-25 13:11:17 -08:00
Harshavardhana
8757c963ba
Migrate all Peer communication to common Notification subsystem (#7031)
Deprecate the use of Admin Peers concept and migrate all peer
communication to Notification subsystem. This finally allows
for a common subsystem for all peer notification in case of
distributed server deployments.
2019-01-14 12:14:20 +05:30
poornas
5a80cbec2a Add double encryption at S3 gateway. (#6423)
This PR adds pass-through, single encryption at gateway and double
encryption support (gateway encryption with pass through of SSE
headers to backend).

If KMS is set up (either with Vault as KMS or using
MINIO_SSE_MASTER_KEY),gateway will automatically perform
single encryption. If MINIO_GATEWAY_SSE is set up in addition to
Vault KMS, double encryption is performed.When neither KMS nor
MINIO_GATEWAY_SSE is set, do a pass through to backend.

When double encryption is specified, MINIO_GATEWAY_SSE can be set to
"C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3
encryption at gateway/backend or both to support more than one option.

Fixes #6323, #6696
2019-01-05 14:16:42 -08:00
Harshavardhana
4f31a9a33b Reload users upon AddUser on peers (#6975)
Also migrate ReloadFormat to notification subsystem,
remove GetConfig() we do not use this API anymore
2018-12-18 14:39:21 -08:00
Anis Elleuch
7b579caf68 heal: Fix heal sequences cleanup process (#6780)
The current code triggers a timeout to cleanup a heal seq from
healSeqMap, but we don't know if the user did or not launch a new
healing sequence with the same path.

Add endTime to healSequence struct and add a periodic heal-sequence
cleaner to remove heal sequences only if this latter is older than
10 minutes.
2018-11-16 14:59:51 -08:00
Harshavardhana
a55a298e00 Make sure to log unhandled errors always (#6784)
In many situations, while testing we encounter
ErrInternalError, to reduce logging we have
removed logging from quite a few places which
is acceptable but when ErrInternalError occurs
we should have a facility to log the corresponding
error, this helps to debug Minio server.
2018-11-12 11:07:43 -08:00
Harshavardhana
a9cda850ca Add forceStop flag to provide facility to stop healing (#6718)
This PR also makes sure that we deal with HTTP request
count by ignoring the on-going heal operation, i.e
do not wait on itself.
2018-11-04 19:24:16 -08:00
Harshavardhana
b4772849f9 Heal recursively all entries in config/ prefix (#6545)
This to ensure that we heal all entries in config/
prefix, we will have IAM and STS related files which
are being introduced in #6168 PR

This is a change to ensure that we heal all of them
properly, not just `config.json`
2018-10-01 22:24:26 +05:30
Harshavardhana
aebfceeafb Heal backend configuration file (#6532)
Fixes #6461
2018-09-29 13:47:01 +05:30
Harshavardhana
6fe9a613c0 Prioritize HTTP requests over Heal (#6468)
Additionally also heal 256 objects at any given
time in parallel.

Fixes #6196
Fixes #6241
2018-09-17 18:28:34 -07:00
poornas
5c0b98abf0 Add ObjectOptions to ObjectLayer calls (#6382) 2018-09-10 09:42:43 -07:00
Harshavardhana
e5e522fc61
docs: fix all Chinese doc links for the new docs site (#6097)
Additionally fix typos, default to US locale words
2018-06-28 16:02:02 -07:00
Harshavardhana
1d31ad499f Make sure to re-load reference format after HealFormat (#5772)
This PR introduces ReloadFormat API call at objectlayer
to facilitate this. Previously we repurposed HealFormat
but we never ended up updating our reference format on
peers.

Fixes #5700
2018-04-09 22:55:41 +05:30
kannappanr
f8a3fd0c2a
Create logger package and rename errorIf to LogIf (#5678)
Removing message from error logging
Replace errors.Trace with LogIf
2018-04-05 15:04:40 -07:00
Harshavardhana
de44be86d0 Use readQuorum instead of writeQuorum to check bucket exists (#5715)
Fixes #5708
Fixes #5700
2018-03-26 16:36:57 -07:00
Harshavardhana
f23944aed7 Fix heal bucket deadlock after replacing disks (#5661)
Fixes #5659
2018-03-16 15:09:31 -07:00
Krishna Srinivas
9ede179a21 Use context.Background() instead of nil
Rename Context[Get|Set] -> [Get|Set]Context
2018-03-15 16:28:25 -07:00
Krishna Srinivas
e452377b24 Add context to the object-interface methods.
Make necessary changes to xl fs azure sia
2018-03-15 16:28:25 -07:00
Harshavardhana
fb96779a8a Add large bucket support for erasure coded backend (#5160)
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
2018-02-15 17:45:57 -08:00
Aditya Manthramurthy
254b05e314 Fix locking in some admin APIs: (#5438)
- read lock for get config
- write lock for update creds
- write lock for format file
2018-01-22 18:09:12 -08:00
Aditya Manthramurthy
a337ea4d11 Move admin APIs to new path and add redesigned heal APIs (#5351)
- Changes related to moving admin APIs
   - admin APIs now have an endpoint under /minio/admin
   - admin APIs are now versioned - a new API to server the version is
     added at "GET /minio/admin/version" and all API operations have the
     path prefix /minio/admin/v1/<operation>
   - new service stop API added
   - credentials change API is moved to /minio/admin/v1/config/credential
   - credentials change API and configuration get/set API now require TLS
     so that credentials are protected
   - all API requests now receive JSON
   - heal APIs are disabled as they will be changed substantially

- Heal API changes
   Heal API is now provided at a single endpoint with the ability for a
   client to start a heal sequence on all the data in the server, a
   single bucket, or under a prefix within a bucket.

   When a heal sequence is started, the server returns a unique token
   that needs to be used for subsequent 'status' requests to fetch heal
   results.

   On each status request from the client, the server returns heal result
   records that it has accumulated since the previous status request. The
   server accumulates upto 1000 records and pauses healing further
   objects until the client requests for status. If the client does not
   request any further records for a long time, the server aborts the
   heal sequence automatically.

   A heal result record is returned for each entity healed on the server,
   such as system metadata, object metadata, buckets and objects, and has
   information about the before and after states on each disk.

   A client may request to force restart a heal sequence - this causes
   the running heal sequence to be aborted at the next safe spot and
   starts a new heal sequence.
2018-01-22 14:54:55 -08:00