Commit Graph

450 Commits

Author SHA1 Message Date
Harshavardhana
641e564b65
fips build tag uses relevant binary link for updates (#12014)
This code is necessary for `mc admin update` command
to work with fips compiled binaries, with fips tags
the releaseInfo will automatically point to fips
specific binaries.
2021-04-08 09:51:11 -07:00
Klaus Post
48c5e7e5b6
Add runtime mem stats to server info (#11995)
Adds information about runtime+gc memory use.
2021-04-07 10:40:51 -07:00
Harshavardhana
abb55bd49e
fix: properly close leaking bandwidth monitor channel (#11967)
This PR fixes

- close leaking bandwidth report channel leakage
- remove the closer requirement for bandwidth monitor
  instead if Read() fails remember the error and return
  error for all subsequent reads.
- use locking for usage-cache.bin updates, with inline
  data we cannot afford to have concurrent writes to
  usage-cache.bin corrupting xl.meta
2021-04-05 16:07:53 -07:00
Ritesh H Shukla
3ddd8b04d1
fix: handle unsupported APIs more granularly (#11674) 2021-03-30 23:19:36 -07:00
Anis Elleuch
d8b5adfd10
trace: Add storage & OS tracing (#11889) 2021-03-26 23:24:07 -07:00
Anis Elleuch
2c296652f7
Simplify access to local node name (#11907)
The local node name is heavily used in tracing, create a new global 
variable to store it. Multiple goroutines can access it since it won't be
changed later.
2021-03-26 11:37:58 -07:00
Klaus Post
749e9c5771
metrics: Add canceled requests (#11881)
Add metric for canceled requests
2021-03-24 10:25:27 -07:00
Anis Elleuch
0eb146e1b2
add additional metrics per disk API latency, API call counts #11250)
```
mc admin info --json
```

provides these details, for now, we shall eventually 
expose this at Prometheus level eventually. 

Co-authored-by: Harshavardhana <harsha@minio.io>
2021-03-16 20:06:57 -07:00
Klaus Post
fa9cf1251b
Imporve healing and reporting (#11312)
* Provide information on *actively* healing, buckets healed/queued, objects healed/failed.
* Add concurrent healing of multiple sets (typically on startup).
* Add bucket level resume, so restarts will only heal non-healed buckets.
* Print summary after healing a disk is done.
2021-03-04 14:36:23 -08:00
Anis Elleuch
7be7109471
locking: Add Refresh for better locking cleanup (#11535)
Co-authored-by: Anis Elleuch <anis@min.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-03-03 18:36:43 -08:00
Harshavardhana
c6a120df0e
fix: Prometheus metrics to re-use storage disks (#11647)
also re-use storage disks for all `mc admin server info`
calls as well, implement a new LocalStorageInfo() API
call at ObjectLayer to lookup local disks storageInfo

also fixes bugs where there were double calls to StorageInfo()
2021-03-02 17:28:04 -08:00
Shireesh Anjal
289b22d911
fix: pool number not added for one server (#11670)
The previous code was iterating over replies from peers and assigning
pool numbers to them, thus missing to add it for the local server.

Fixed by iterating over the server properties of all the servers
including the local one.
2021-03-01 08:09:43 -08:00
Anis Elleuch
98d3f94996
metrics: Add the number of requests in the waiting queue (#11580)
We can use this metric to check if there are too many S3 clients in the
queue and could explain why some of those S3 clients are timing out.

```
minio_s3_requests_waiting_total{server="127.0.0.1:9000"} 9981
```

If max_requests is 10000 then there is a strong possibility that clients
are timing out because of the queue deadline.
2021-02-20 00:21:55 -08:00
Shireesh Anjal
3afa499885
fix: empty buckets/objects nodes in new setup (#11493) 2021-02-09 09:52:38 -08:00
Andreas Auernhammer
33554651e9
crypto: deprecate native Hashicorp Vault support (#11352)
This commit deprecates the native Hashicorp Vault
support and removes the legacy Vault documentation.

The native Hashicorp Vault documentation is marked as
outdated and deprecated for over a year now. We give
another 6 months before we start removing Hashicorp Vault
support and show a deprecation warning when a MinIO server
starts with a native Vault configuration.
2021-01-29 17:55:37 -08:00
Anis Elleuch
00cff1aac5
audit: per object send pool number, set number and servers per operation (#11233) 2021-01-26 13:21:51 -08:00
Harshavardhana
9cdd981ce7
fix: expire locks only on participating lockers (#11335)
additionally also add a new ForceUnlock API, to
allow forcibly unlocking locks if possible.
2021-01-25 10:01:27 -08:00
Harshavardhana
a6c146bd00
validate storage class across pools when setting config (#11320)
```
mc admin config set alias/ storage_class standard=EC:3
```

should only succeed if parity ratio is valid for all
server pools, if not we should fail proactively.

This PR also needs to bring other changes now that
we need to cater for variadic drive counts per pool.

Bonus fixes also various bugs reproduced with

- GetObjectWithPartNumber()
- CopyObjectPartWithOffsets()
- CopyObjectWithMetadata()
- PutObjectPart,PutObject with truncated streams
2021-01-22 12:09:24 -08:00
Anis Elleuch
6f781c5e7a
heal: Reduce whitespace ticker to 5 seconds (#11234)
30 seconds white spaces is long for some setups which time out when no
read activity in short time, reduce the subnet health white space ticker
to 5 seconds, since it has no cost at all.
2021-01-06 13:29:50 -08:00
Harshavardhana
e7ae49f9c9
fix: calculate prometheus disks_offline/disks_total correctly (#11215)
fixes #11196
2021-01-04 09:42:09 -08:00
Anis Elleuch
2ecaab55a6
admin: ServerInfo returns info without object layer initialized (#11142) 2020-12-21 09:35:19 -08:00
Anis Elleuch
e63a10e505
Profiling does not required object layer to be initialized (#11133) 2020-12-18 11:51:15 -08:00
Harshavardhana
4550ac6fff
fix: refactor locks to apply them uniquely per node (#11052)
This refactor is done for few reasons below

- to avoid deadlocks in scenarios when number
  of nodes are smaller < actual erasure stripe
  count where in N participating local lockers
  can lead to deadlocks across systems.

- avoids expiry routines to run 1000 of separate
  network operations and routes per disk where
  as each of them are still accessing one single
  local entity.

- it is ideal to have since globalLockServer
  per instance.

- In a 32node deployment however, each server
  group is still concentrated towards the
  same set of lockers that partipicate during
  the write/read phase, unlike previous minio/dsync
  implementation - this potentially avoids send
  32 requests instead we will still send at max
  requests of unique nodes participating in a
  write/read phase.

- reduces overall chattiness on smaller setups.
2020-12-10 07:28:37 -08:00
Klaus Post
a896125490
Add crawler delay config + dynamic config values (#11018) 2020-12-04 09:32:35 -08:00
Ritesh H Shukla
7e2b79984e
Stream bucket bandwidth measurements (#11014) 2020-12-03 11:34:42 -08:00
Harshavardhana
86409fa93d
add audit/admin trace support for browser requests (#10947)
To support this functionality we had to fork
the gorilla/rpc package with relevant changes
2020-11-20 22:52:17 -08:00
Shireesh Anjal
7bc47a14cc
Rename OBD to Health (#10842)
Also, Remove thread stats and openfds from the health report 
as we already have process stats and numfds
2020-11-20 12:52:53 -08:00
Harshavardhana
0bcb1b679d
fix: disallow update if dates are same (#10890)
fixes #10889
2020-11-12 14:18:59 -08:00
Harshavardhana
cbdab62c1e
fix: heal user/metadata right away upon server startup (#10863)
this is needed such that we make sure to heal the
users, policies and bucket metadata right away as
we do listing based on list cache which only lists
'3' sufficiently good drives, to avoid possibly
losing access to these users upon upgrade make
sure to heal them.
2020-11-10 09:02:06 -08:00
Klaus Post
2294e53a0b
Don't retain context in locker (#10515)
Use the context for internal timeouts, but disconnect it from outgoing 
calls so we always receive the results and cancel it remotely.
2020-11-04 08:25:42 -08:00
Klaus Post
a982baff27
ListObjects Metadata Caching (#10648)
Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c

Gist of improvements:

* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no 
  longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.
2020-10-28 09:18:35 -07:00
Shireesh Anjal
858e2a43df
Remove logging info from OBDInfoHandler (#10727)
A lot of logging data is counterproductive. A better implementation with
precise useful log data can be introduced later.
2020-10-27 17:41:48 -07:00
Harshavardhana
646d6917ed
turn-off checking for updates completely if MINIO_UPDATE=off (#10752) 2020-10-24 22:39:44 -07:00
Harshavardhana
d9db7f3308
expire lockers if lockers are offline (#10749)
lockers currently might leave stale lockers,
in unknown ways waiting for downed lockers.

locker check interval is high enough to safely
cleanup stale locks.
2020-10-24 13:23:16 -07:00
Ritesh H Shukla
8ceb2a93fd
fix: peer replication bandwidth monitoring in distributed setup (#10652) 2020-10-12 09:04:55 -07:00
Ritesh H Shukla
c2f16ee846
Add basic bandwidth monitoring for replication. (#10501)
This change tracks bandwidth for a bucket and object

- [x] Add Admin API
- [x] Add Peer API
- [x] Add BW throttling
- [x] Admin APIs to set replication limit
- [x] Admin APIs for fetch bandwidth
2020-10-09 20:36:00 -07:00
Harshavardhana
a0d0645128
remove safeMode behavior in startup (#10645)
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.

allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.

In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.
2020-10-09 09:59:52 -07:00
Harshavardhana
736e58dd68
fix: handle concurrent lockers with multiple optimizations (#10640)
- select lockers which are non-local and online to have
  affinity towards remote servers for lock contention

- optimize lock retry interval to avoid sending too many
  messages during lock contention, reduces average CPU
  usage as well

- if bucket is not set, when deleteObject fails make sure
  setPutObjHeaders() honors lifecycle only if bucket name
  is set.

- fix top locks to list out always the oldest lockers always,
  avoid getting bogged down into map's unordered nature.
2020-10-08 12:32:32 -07:00
Harshavardhana
c6a9a94f94
fix: optimize ServerInfo() handler to avoid reading config (#10626)
fixes #10620
2020-10-02 16:19:44 -07:00
Harshavardhana
66174692a2
add '.healing.bin' for tracking currently healing disk (#10573)
add a hint on the disk to allow for tracking fresh disk
being healed, to allow for restartable heals, and also
use this as a way to track and remove disks.

There are more pending changes where we should move
all the disk formatting logic to backend drives, this
PR doesn't deal with this refactor instead makes it
easier to track healing in the future.
2020-09-28 19:39:32 -07:00
Harshavardhana
eafa775952
fix: add lock ownership to expire locks (#10571)
- Add owner information for expiry, locking, unlocking a resource
- TopLocks returns now locks in quorum by default, provides
  a way to capture stale locks as well with `?stale=true`
- Simplify the quorum handling for locks to avoid from storage
  class, because there were challenges to make it consistent
  across all situations.
- And other tiny simplifications to reset locks.
2020-09-25 19:21:52 -07:00
飞雪无情
d778d034e7
Remove redundant mgmtQueryKey type. (#10557)
Remove redundant type conversion.
2020-09-24 08:40:21 -07:00
Anis Elleuch
8ea55f9dba
obd: Add console log to OBD output (#10372) 2020-09-15 18:02:54 -07:00
Harshavardhana
c13afd56e8
Remove MaxConnsPerHost settings to avoid potential hangs (#10438)
MaxConnsPerHost can potentially hang a call without any
way to timeout, we do not need this setting for our proxy
and gateway implementations instead IdleConn settings are
good enough.

Also ensure to use NewRequestWithContext and make sure to
take the disks offline only for network errors.

Fixes #10304
2020-09-08 14:22:04 -07:00
Andreas Auernhammer
fbd1c5f51a
certs: refactor cert manager to support multiple certificates (#10207)
This commit refactors the certificate management implementation
in the `certs` package such that multiple certificates can be
specified at the same time. Therefore, the following layout of
the `certs/` directory is expected:
```
certs/
 │
 ├─ public.crt
 ├─ private.key
 ├─ CAs/          // CAs directory is ignored
 │   │
 │    ...
 │
 ├─ example.com/
 │   │
 │   ├─ public.crt
 │   └─ private.key
 └─ foobar.org/
     │
     ├─ public.crt
     └─ private.key
   ...
```

However, directory names like `example.com` are just for human
readability/organization and don't have any meaning w.r.t whether
a particular certificate is served or not. This decision is made based
on the SNI sent by the client and the SAN of the certificate.

***

The `Manager` will pick a certificate based on the client trying
to establish a TLS connection. In particular, it looks at the client
hello (i.e. SNI) to determine which host the client tries to access.
If the manager can find a certificate that matches the SNI it
returns this certificate to the client.

However, the client may choose to not send an SNI or tries to access
a server directly via IP (`https://<ip>:<port>`). In this case, we
cannot use the SNI to determine which certificate to serve. However,
we also should not pick "the first" certificate that would be accepted
by the client (based on crypto. parameters - like a signature algorithm)
because it may be an internal certificate that contains internal hostnames. 
We would disclose internal infrastructure details doing so.

Therefore, the `Manager` returns the "default" certificate when the
client does not specify an SNI. The default certificate the top-level
`public.crt` - i.e. `certs/public.crt`.

This approach has some consequences:
 - It's the operator's responsibility to ensure that the top-level
   `public.crt` does not disclose any information (i.e. hostnames)
   that are not publicly visible. However, this was the case in the
   past already.
 - Any other `public.crt` - except for the top-level one - must not
   contain any IP SAN. The reason for this restriction is that the
   Manager cannot match a SNI to an IP b/c the SNI is the server host
   name. The entire purpose of SNI is to indicate which host the client
   tries to connect to when multiple hosts run on the same IP. So, a
   client will not set the SNI to an IP.
   If we would allow IP SANs in a lower-level `public.crt` a user would
   expect that it is possible to connect to MinIO directly via IP address
   and that the MinIO server would pick "the right" certificate. However,
   the MinIO server cannot determine which certificate to serve, and
   therefore always picks the "default" one. This may lead to all sorts
   of confusing errors like:
   "It works if I use `https:instance.minio.local` but not when I use
   `https://10.0.2.1`.

These consequences/limitations should be pointed out / explained in our
docs in an appropriate way. However, the support for multiple
certificates should not have any impact on how deployment with a single
certificate function today.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-09-03 23:33:37 -07:00
Harshavardhana
8a291e1dc0
Cluster healthcheck improvements (#10408)
- do not fail the healthcheck if heal status
  was not obtained from one of the nodes,
  if many nodes fail then report this as a
  catastrophic error.
- add "x-minio-write-quorum" value to match
  the write tolerance supported by server.
- admin info now states if a drive is healing
  where madmin.Disk.Healing is set to true
  and madmin.Disk.State is "ok"
2020-09-02 22:54:56 -07:00
Harshavardhana
2acb530ccd
update rulesguard with new rules (#10392)
Co-authored-by: Nitish Tiwari <nitish@minio.io>
Co-authored-by: Praveen raj Mani <praveen@minio.io>
2020-09-01 16:58:13 -07:00
Andreas Auernhammer
18725679c4
crypto: allow multiple KES endpoints (#10383)
This commit addresses a maintenance / automation problem when MinIO-KES
is deployed on bare-metal. In orchestrated env. the orchestrator (K8S)
will make sure that `n` KES servers (IPs) are available via the same DNS
name. There it is sufficient to provide just one endpoint.
2020-08-31 18:10:52 -07:00
Klaus Post
1b119557c2
getDisksInfo: Attribute failed disks to correct endpoint (#10360)
If DiskInfo calls failed the information returned was used anyway 
resulting in no endpoint being set.

This would make the drive be attributed to the local system since 
`disk.Endpoint == disk.DrivePath` in that case.

Instead, if the call fails record the endpoint and the error only.
2020-08-26 10:11:26 -07:00
Harshavardhana
e57c742674
use single dynamic timeout for most locked API/heal ops (#10275)
newDynamicTimeout should be allocated once, in-case
of temporary locks in config and IAM we should
have allocated timeout once before the `for loop`

This PR doesn't fix any issue as such, but provides
enough dynamism for the timeout as per expectation.
2020-08-17 11:29:58 -07:00
Harshavardhana
2a9819aff8
fix: refactor background heal for cluster health (#10225) 2020-08-07 19:43:06 -07:00
Harshavardhana
6c6137b2e7
add cluster maintenance healthcheck drive heal affinity (#10218) 2020-08-07 13:22:53 -07:00
Harshavardhana
0b8255529a
fix: proxies set keep-alive timeouts to be system dependent (#10199)
Split the DialContext's one for internode and another
for all other external communications especially
proxy forwarders, gateway transport etc.
2020-08-04 14:55:53 -07:00
Harshavardhana
25a55bae6f
fix: avoid buffering of server sent events by proxies (#10164) 2020-07-30 19:45:12 -07:00
Harshavardhana
3a73f1ead5
refactor server update behavior (#10107) 2020-07-23 08:03:31 -07:00
Harshavardhana
e7d7d5232c
fix: admin info output and improve overall performance (#10015)
- admin info node offline check is now quicker
- admin info now doesn't duplicate the code
  across doing the same checks for disks
- rely on StorageInfo to return appropriate errors
  instead of calling locally.
- diskID checks now return proper errors when
  disk not found v/s format.json missing.
- add more disk states for more clarity on the
  underlying disk errors.
2020-07-13 09:51:07 -07:00
Andreas Auernhammer
a317a2531c
admin: new API for creating KMS master keys (#9982)
This commit adds a new admin API for creating master keys.
An admin client can send a POST request to:
```
/minio/admin/v3/kms/key/create?key-id=<keyID>
```

The name / ID of the new key is specified as request
query parameter `key-id=<ID>`.

Creating new master keys requires KES - it does not work with
the native Vault KMS (deprecated) nor with a static master key
(deprecated).

Further, this commit removes the `UpdateKey` method from the `KMS`
interface. This method is not needed and not used anymore.
2020-07-08 18:50:43 -07:00
Harshavardhana
2743d4ca87
fix: Add support for preserving mtime for replication (#9995)
This PR is needed for bucket replication support
2020-07-08 17:36:56 -07:00
Harshavardhana
cdb0e6ffed
support proper values for listMultipartUploads/listParts (#9970)
object KMS is configured with auto-encryption,
there were issues when using docker registry -
this has been left unnoticed for a while.

This PR fixes an issue with compatibility.

Additionally also fix the continuation-token implementation
infinite loop issue which was missed as part of #9939

Also fix the heal token to be generated as a client
facing value instead of what is remembered by the
server, this allows for the server to be stateless 
regarding the token's behavior.
2020-07-03 19:27:13 -07:00
Anis Elleuch
2be20588bf
Reroute requests based token heal/listing (#9939)
When manual healing is triggered, one node in a cluster will 
become the authority to heal. mc regularly sends new requests 
to fetch the status of the ongoing healing process, but a load 
balancer could land the healing request to a node that is not 
doing the healing request.

This PR will redirect a request to the node based on the node 
index found described as part of the client token. A similar
technique is also used to proxy ListObjectsV2 requests
by encoding this information in continuation-token
2020-07-03 11:53:03 -07:00
Harshavardhana
a38ce29137
fix: simplify background heal and trigger heal items early (#9928)
Bonus fix during versioning merge one of the PR was missing
the offline/online disk count fix from #9801 port it correctly
over to the master branch from release.

Additionally, add versionID support for MRF

Fixes #9910
Fixes #9931
2020-06-29 13:07:26 -07:00
Harshavardhana
b8cb21c954
allow more than N number of locks in TopLocks (#9883) 2020-06-20 06:33:01 -07:00
Harshavardhana
4915433bd2
Support bucket versioning (#9377)
- Implement a new xl.json 2.0.0 format to support,
  this moves the entire marshaling logic to POSIX
  layer, top layer always consumes a common FileInfo
  construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
  for object placements.

Fixes #2111
2020-06-12 20:04:01 -07:00
Harshavardhana
96ed0991b5
fix: optimize IAM users load, add fallback (#9809)
Bonus fix, load service accounts properly
when service accounts were generated with
LDAP
2020-06-11 14:11:30 -07:00
Harshavardhana
b2db8123ec
Preserve errors returned by diskInfo to detect disk errors (#9727)
This PR basically reverts #9720 and re-implements it differently
2020-05-28 13:03:04 -07:00
Harshavardhana
53aaa5d2a5
Export bucket usage counts as part of bucket metrics (#9710)
Bonus fixes in quota enforcement to use the
new datastructure and use timedValue to cache
a value/reload automatically avoids one less
global variable.
2020-05-27 06:45:43 -07:00
Sidhartha Mani
c121d27f31
progressively report obd results (#9639) 2020-05-22 17:56:45 -07:00
Harshavardhana
bd032d13ff
migrate all bucket metadata into a single file (#9586)
this is a major overhaul by migrating off all
bucket metadata related configs into a single
object '.metadata.bin' this allows us for faster
bootups across 1000's of buckets and as well
as keeps the code simple enough for future
work and additions.

Additionally also fixes #9396, #9394
2020-05-19 13:53:54 -07:00
Harshavardhana
1bc32215b9
enable full linter across the codebase (#9620)
enable linter using golangci-lint across
codebase to run a bunch of linters together,
we shall enable new linters as we fix more
things the codebase.

This PR fixes the first stage of this
cleanup.
2020-05-18 09:59:45 -07:00
Harshavardhana
814ddc0923
add missing admin actions, enhance AccountUsageInfo (#9607) 2020-05-15 18:16:45 -07:00
Harshavardhana
6ac48a65cb
fix: use unused cacheMetrics code in prometheus (#9588)
remove all other unusued/deadcode
2020-05-13 08:15:26 -07:00
Harshavardhana
337c2a7cb4
add audit logging for all admin calls (#9568)
- add ServiceRestart/ServiceStop actions
- audit log appropriately in all admin handlers

fixes #9522
2020-05-11 10:34:08 -07:00
Harshavardhana
27d716c663
simplify usage of mutexes and atomic constants (#9501) 2020-05-03 22:35:40 -07:00
Harshavardhana
7a5271ad96
fix: re-use connections in webhook/elasticsearch (#9461)
- elasticsearch client should rely on the SDK helpers
  instead of pure HTTP calls.
- webhook shouldn't need to check for IsActive() for
  all notifications, failure should be delayed.
- Remove DialHTTP as its never used properly

Fixes #9460
2020-04-28 13:57:56 -07:00
Praveen raj Mani
322385f1b6
fix: only show active/available ARNs in server startup banner (#9392) 2020-04-21 09:38:32 -07:00
Klaus Post
c4464e36c8
fix: limit HTTP transport tuables to affordable values (#9383)
Close connections pro-actively in transient calls
2020-04-17 11:20:56 -07:00
Harshavardhana
69fb68ef0b
fix simplify code to start using context (#9350) 2020-04-16 10:56:18 -07:00
Sidhartha Mani
ec11e99667
implement configurable timeout for OBD tests (#9324) 2020-04-14 11:48:32 -07:00
Harshavardhana
37d066b563
fix: deprecate requirement of session token for service accounts (#9320)
This PR fixes couple of behaviors with service accounts

- not need to have session token for service accounts
- service accounts can be generated by any user for themselves
  implicitly, with a valid signature.
- policy input for AddNewServiceAccount API is not fully typed
  allowing for validation before it is sent to the server.
- also bring in additional context for admin API errors if any
  when replying back to client.
- deprecate GetServiceAccount API as we do not need to reply
  back session tokens
2020-04-14 11:28:56 -07:00
Praveen raj Mani
bfec5fe200
fix: fetchLambdaInfo should return consistent results (#9332)
- Introduced a function `FetchRegisteredTargets` which will return
  a complete set of registered targets irrespective to their states,
  if the `returnOnTargetError` flag is set to `False`
- Refactor NewTarget functions to return non-nil targets
- Refactor GetARNList() to return a complete list of configured targets
2020-04-14 11:19:25 -07:00
Harshavardhana
4314ee1670
fix: remove unusued PerfInfoHandler code (#9328)
- Removes PerfInfo admin API as its not OBDInfo
- Keep the drive path without the metaBucket in OBD
  global latency map.
- Remove all the unused code related to PerfInfo API
- Do not redefined global mib,gib constants use
  humanize.MiByte and humanize.GiByte instead always
2020-04-12 19:37:09 -07:00
Harshavardhana
f44cfb2863
use GlobalContext whenever possible (#9280)
This change is throughout the codebase to
ensure that all codepaths honor GlobalContext
2020-04-09 09:30:02 -07:00
Harshavardhana
2642e12d14
fix: change policies API to return and take struct (#9181)
This allows for order guarantees in returned values
can be consumed safely by the caller to avoid any
additional parsing and validation.

Fixes #9171
2020-04-07 19:30:59 -07:00
Sidhartha Mani
c8243706b4
Add Parallel NetOBD tests to saturate all nodes at once (#9241) 2020-03-31 17:08:28 -07:00
Sidhartha Mani
7b732b566f
[Bugfix] Fix Net tests being omitted (#9234) 2020-03-31 01:15:21 -07:00
Sidhartha Mani
0c80bf45d0
Implement oboard diagnostics admin API (#9024)
- Implement a graph algorithm to test network bandwidth from every 
  node to every other node
- Saturate any network bandwidth adaptively, accounting for slow 
  and fast network capacity
- Implement parallel drive OBD tests
- Implement a paging mechanism for OBD test to provide periodic updates to client
- Implement Sys, Process, Host, Mem OBD Infos
2020-03-26 21:07:39 -07:00
Harshavardhana
3d3beb6a9d
Add response header timeouts (#9170)
- Add conservative timeouts upto 3 minutes
  for internode communication
- Add aggressive timeouts of 30 seconds
  for gateway communication

Fixes #9105
Fixes #8732
Fixes #8881
Fixes #8376
Fixes #9028
2020-03-21 22:10:13 -07:00
Harshavardhana
b4bfdc92cc fix: admin console logger changes to log.Info 2020-03-20 15:14:14 -07:00
Harshavardhana
ae654831aa
Add madmin package context support (#9172)
This is to improve responsiveness for all
admin API operations and allowing callers
to cancel any on-going admin operations,
if they happen to be waiting too long.
2020-03-20 15:00:44 -07:00
Harshavardhana
cfd12914e1
fix: crash in serverInfo handler when ldap is configured (#9123) 2020-03-11 23:13:32 -07:00
Anis Elleuch
fdf65aa9b9
heal: Add info about the next background healing round (#9122)
- avoid setting last heal activity when starting self-healing

This can be confusing to users thinking that the self healing
cycle was already performed.

- add info about the next background healing round
2020-03-11 23:00:31 -07:00
Nitish Tiwari
7c32f3f554
Fix the URL for MinIO update when using custom download server (#9111)
Co-authored-by: Nitish Tiwari <nitish@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2020-03-11 20:09:20 +05:30
Anis Elleuch
d4dcf1d722
metrics: Use StorageInfo() instead to have consistent info (#9006)
Metrics used to have its own code to calculate offline disks.
StorageInfo() was avoided because it is an expensive operation
by sending calls to all nodes.

To make metrics & server info share the same code, a new
argument `local` is added to StorageInfo() so it will only
query local disks when needed.

Metrics now calls StorageInfo() as server info handler does
but with the local flag set to false.

Co-authored-by: Praveen raj Mani <praveen@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2020-02-20 09:21:33 +05:30
Andreas Auernhammer
086fbb745e
fix and improve KMS server info (#8944)
This commit fixes typos in the displayed server info
w.r.t. the KMS and removes the update status.

For more information about why the update status
is removed see: PR #8943
2020-02-06 06:18:34 +05:30
Andreas Auernhammer
4f37c8ccf2
refine the KMS admin API (#8943)
This commit removes the `Update` functionality
from the admin API. While this is technically
a breaking change I think this will not cause
any harm because:
 - The KMS admin API is not complete, yet.
   At the moment only the status can be fetched.
 - The `mc` integration hasn't been merged yet.
   So no `mc` client could have used this API
   in the past.

The `Update`/`Rewrap` status is not useful anymore.
It provided a way to migrate from one master key version
to another. However, KES does not support the concept of
key versions. Instead, key migration should be implemented
as migration from one master key to another.

Basically, the `Update` functionality has been implemented just
for Vault.
2020-02-05 22:47:35 +05:30
Anis Elleuch
52bdbcd046
Add new admin API to return Accounting Usage (#8689) 2020-02-04 18:20:39 -08:00
Anis Elleuch
7432b5c9b2
Use user CAs in checkEndpoint() call (#8911)
The server info handler makes a http connection to other
nodes to check if they are up but does not load the custom
CAs in ~/.minio/certs/CAs.

This commit fix it.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-02-02 07:15:29 +05:30
poornas
5d838edcef
Fix panic in ServerInfoHandler when (#8915)
Co-authored-by: Harshavardhana <harsha@minio.io>
2020-02-01 17:50:04 +05:30
poornas
2232e095d5 Make admin permissions more granular for admin handlers. (#8888) 2020-01-26 20:47:52 -06:00
Harshavardhana
f14f60a487 fix: Avoid double usage calculation on every restart (#8856)
On every restart of the server, usage was being
calculated which is not useful instead wait for
sufficient time to start the crawling routine.

This PR also avoids lots of double allocations
through strings, optimizes usage of string builders
and also avoids crawling through symbolic links.

Fixes #8844
2020-01-21 14:07:49 -08:00
Klaus Post
2bf6cf0e15 Enable multiple concurrent profile types (#8792) 2020-01-10 17:19:58 -08:00
Praveen raj Mani
4cd1bbb50a This PR fixes two things (#8772)
- Stop spawning store replay routines when testing the notification targets
- Properly honor the target.Close() to clean the resources used

Fixes #8707

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-01-09 19:45:44 +05:30
Harshavardhana
6695fd6a61
Add more context aware error for policy parsing errors (#8726)
In existing functionality we simply return a generic
error such as "MalformedPolicy" which indicates just
a generic string "invalid resource" which is not very
meaningful when there might be multiple types of errors
during policy parsing. This PR ensures that we send
these errors back to client to indicate the actual
error, brings in two concrete types such as

 - iampolicy.Error
 - policy.Error

Refer #8202
2020-01-03 11:28:52 -08:00
Ashish Kumar Sinha
abc266caa1 Add bucket and object count along with total object size (#8639) 2019-12-12 09:58:59 -08:00
Anis Elleuch
555969ee42 Add data usage collect with its new admin API (#8553)
Admin data usage info API returns the following

(Only FS & XL, for now)

- Number of buckets
- Number of objects
- The total size of objects
- Objects histogram
- Bucket sizes
2019-12-12 06:02:37 -08:00
Ashish Kumar Sinha
e2c5d29017 Bucket,Object count & Usage removed if set to default (#8638) 2019-12-11 21:56:47 -08:00
kannappanr
d266b3a066
Admin Info: Modify Uptime to return seconds (#8635) 2019-12-11 17:56:02 -08:00
Ashish Kumar Sinha
24fb1bf258 New Admin Info (#8497) 2019-12-11 14:27:03 -08:00
Nitish Tiwari
3df7285c3c Add Support for Cache and S3 related metrics in Prometheus endpoint (#8591)
This PR adds support below metrics

- Cache Hit Count
- Cache Miss Count
- Data served from Cache (in Bytes)
- Bytes received from AWS S3
- Bytes sent to AWS S3
- Number of requests sent to AWS S3

Fixes #8549
2019-12-05 23:16:06 -08:00
poornas
929951fd49 Add support for multiple admins (#8487)
Also define IAM policies for administering
MinIO server
2019-11-19 02:03:18 -08:00
Harshavardhana
e9b2bf00ad Support MinIO to be deployed on more than 32 nodes (#8492)
This PR implements locking from a global entity into
a more localized set level entity, allowing for locks
to be held only on the resources which are writing
to a collection of disks rather than a global level.

In this process this PR also removes the top-level
limit of 32 nodes to an unlimited number of nodes. This
is a precursor change before bring in bucket expansion.
2019-11-13 12:17:45 -08:00
Harshavardhana
822eb5ddc7 Bring in safe mode support (#8478)
This PR refactors object layer handling such
that upon failure in sub-system initialization
server reaches a stage of safe-mode operation
wherein only certain API operations are enabled
and available.

This allows for fixing many scenarios such as

 - incorrect configuration in vault, etcd,
   notification targets
 - missing files, incomplete config migrations
   unable to read encrypted content etc
 - any other issues related to notification,
   policies, lifecycle etc
2019-11-09 09:27:23 -08:00
Harshavardhana
4e63e0e372 Return appropriate errors API versions changes across REST APIs (#8480)
This PR adds code to appropriately handle versioning issues
that come up quite constantly across our API changes. Currently
we were also routing our requests wrong which sort of made it
harder to write a consistent error handling code to appropriately
reject or honor requests.

This PR potentially fixes issues

 - old mc is used against new minio release which is incompatible
   returns an appropriate for client action.
 - any older servers talking to each other, report appropriate error
 - incompatible peer servers should report error and reject the calls
   with appropriate error
2019-11-04 09:30:59 -08:00
Andreas Auernhammer
eac518b178 admin API: change returned HTTP error in hardware info (#8471)
This commit replaces the returned error message by
the hardware info handler from `Method-Not-Allowed`
to `Bad-Request` since the current HTTP error is not
correct according to the HTTP spec.

In particular:
```
The origin server MUST generate an Allow header field
in a 405 response containing a list of the target
resource's currently supported methods.
```
From: https://tools.ietf.org/html/rfc7231#section-6.5.5
2019-10-30 23:41:18 -07:00
Harshavardhana
9e7a3e6adc Extend further validation of config values (#8469)
- This PR allows config KVS to be validated properly
  without being affected by ENV overrides, rejects
  invalid values during set operation

- Expands unit tests and refactors the error handling
  for notification targets, returns error instead of
  ignoring targets for invalid KVS

- Does all the prep-work for implementing safe-mode
  style operation for MinIO server, introduces a new
  global variable to toggle safe mode based operations
  NOTE: this PR itself doesn't provide safe mode operations
2019-10-30 23:39:09 -07:00
Harshavardhana
47b13cdb80 Add etcd part of config support, add noColor/json support (#8439)
- Add color/json mode support for get/help commands
- Support ENV help for all sub-systems
- Add support for etcd as part of config
2019-10-30 00:04:39 -07:00
Harshavardhana
ee4a6a823d Migrate config to KV data format (#8392)
- adding oauth support to MinIO browser (#8400) by @kanagaraj
- supports multi-line get/set/del for all config fields
- add support for comments, allow toggle
- add extensive validation of config before saving
- support MinIO browser to support proper claims, using STS tokens
- env support for all config parameters, legacy envs are also
  supported with all documentation now pointing to latest ENVs
- preserve accessKey/secretKey from FS mode setups
- add history support implements three APIs
  - ClearHistory
  - RestoreHistory
  - ListHistory
- add help command support for each config parameters
- all the bug fixes after migration to KV, and other bug
  fixes encountered during testing.
2019-10-22 22:59:13 -07:00
Praveen raj Mani
8836d57e3c The prometheus metrics refractoring (#8003)
The measures are consolidated to the following metrics

- `disk_storage_used` : Disk space used by the disk.
- `disk_storage_available`: Available disk space left on the disk.
- `disk_storage_total`: Total disk space on the disk.
- `disks_offline`: Total number of offline disks in current MinIO instance.
- `disks_total`: Total number of disks in current MinIO instance.
- `s3_requests_total`: Total number of s3 requests in current MinIO instance.
- `s3_errors_total`: Total number of errors in s3 requests in current MinIO instance.
- `s3_requests_current`: Total number of active s3 requests in current MinIO instance.
- `internode_rx_bytes_total`: Total number of internode bytes received by current MinIO server instance.
- `internode_tx_bytes_total`: Total number of bytes sent to the other nodes by current MinIO server instance.
- `s3_rx_bytes_total`: Total number of s3 bytes received by current MinIO server instance.
- `s3_tx_bytes_total`: Total number of s3 bytes sent by current MinIO server instance.
- `minio_version_info`: Current MinIO version with commit-id.
- `s3_ttfb_seconds_bucket`: Histogram that holds the latency information of the requests.

And this PR also modifies the current StorageInfo queries

- Decouples StorageInfo from ServerInfo .
- StorageInfo is enhanced to give endpoint information.

NOTE: ADMIN API VERSION IS BUMPED UP IN THIS PR

Fixes #7873
2019-10-22 21:01:14 -07:00
Ashish Kumar Sinha
18cb15559d Add network hardware info (#8358)
peerRESTVersion changed to v6
2019-10-17 04:09:49 -07:00
Harshavardhana
d48fd6fde9
Remove unusued params and functions (#8399) 2019-10-15 18:35:41 -07:00
poornas
d7060c4c32 Allow logging targets to be configured to receive minio (#8347)
specific errors, `application` errors or `all` by default.

console logging on server by default lists all logs -
enhance admin console API to accept `type` as query parameter to
subscribe to application/minio logs.
2019-10-11 18:50:54 -07:00
Harshavardhana
290ad0996f Move etcd, logger, crypto into their own packages (#8366)
- Deprecates _MINIO_PROFILER, `mc admin profile` does the job
- Move ENVs to common location in cmd/config/
2019-10-08 11:17:56 +05:30
Ashish Kumar Sinha
74008446fe CPU hardware info (#8187) 2019-10-03 20:18:38 +05:30
Bala FA
2a2ff96ee1 change ReadPerf into ReadThroughput in NetPerfInfo. (#8316)
Previously `ReadPerf` was in time.Duration is changed to `ReadThroughput` in uint64.
2019-09-27 00:01:18 +05:30
Harshavardhana
fd53057654 Add InfoCannedPolicy API to fetch only necessary policy (#8307)
This PR adds
- InfoCannedPolicy() API for efficiency in fetching policies
- Send group memberships for LDAPUser if available
2019-09-26 23:53:13 +05:30
Harshavardhana
9ac12cf898
Remove unusued Set/GetConfigKeys API (#8235) 2019-09-13 16:34:34 -07:00
Krishnan Parthasarathi
6ba323b009 Add ability to test drive speeds on a MinIO setup (#7664)
- Extends existing Admin API to measure disk performance
2019-09-13 03:22:30 +05:30
Harshavardhana
73e4e99942 Hosts should be skipped, when calculating local info (#8191)
endpoint.IsLocal will not have .Host entries so
using them to skip double entries will never work.

change the code such that we look for endpoint.Host
outside of endpoint.IsLocal logic to skip double
hosts appropriately.

Move these functions to their appropriate file.
2019-09-12 23:36:12 +05:30
Aditya Manthramurthy
a0456ce940 LDAP STS API (#8091)
Add LDAP based users-groups system

This change adds support to integrate an LDAP server for user
authentication. This works via a custom STS API for LDAP. Each user
accessing the MinIO who can be authenticated via LDAP receives
temporary credentials to access the MinIO server.

LDAP is enabled only over TLS.

User groups are also supported via LDAP. The administrator may
configure an LDAP search query to find the group attribute of a user -
this may correspond to any attribute in the LDAP tree (that the user
has access to view). One or more groups may be returned by such a
query.

A group is mapped to an IAM policy in the usual way, and the server
enforces a policy corresponding to all the groups and the user's own
mapped policy.

When LDAP is configured, the internal MinIO users system is disabled.
2019-09-10 04:42:29 +05:30
Harshavardhana
b52a3e523c Avoid using fastjson parser pool, move back to jsoniter (#8190)
It looks like from implementation point of view fastjson
parser pool doesn't behave the same way as expected
when dealing many `xl.json` from multiple disks.

The fastjson parser pool usage ends up returning incorrect
xl.json entries for checksums, with references pointing
to older entries. This led to the subtle bug where checksum
info is duplicated from a previous xl.json read of a different
file from different disk.
2019-09-06 04:21:27 +05:30
Andreas Auernhammer
810a44e951 KMS Admin-API: add route and handler for KMS key info (#7955)
This commit adds an admin API route and handler for
requesting status information about a KMS key.

Therefore, the client specifies the KMS key ID (when
empty / not set the server takes the currently configured
default key-ID) and the server tries to perform a dummy encryption,
re-wrap and decryption operation. If all three succeed we know that
the server can access the KMS and has permissions to generate, re-wrap
and decrypt data keys (policy is set correctly).
2019-09-05 01:49:44 +05:30
poornas
8a71b0ec5a Add admin API to send console log messages (#7784)
Utilized by mc admin console command.
2019-09-03 23:40:48 +05:30
Bala FA
fa3546bb03 Add NetPerfInfo() API in madmin (#8112) 2019-08-31 08:27:53 +05:30
Aditya Manthramurthy
847a3ea0a2 Add unit tests and refactor to improve coverage (#7617) 2019-08-29 13:53:27 -07:00
Harshavardhana
83d4c5763c
Decouple ServiceUpdate to ServerUpdate to be more native (#8138)
The change now is to ensure that we take custom URL as
well for updating the deployment, this is required for
hotfix deliveries for certain deployments - other than
the community release.

This commit changes the previous work d65a2c6725
with newer set of requirements.

Also deprecates PeerUptime()
2019-08-28 15:04:43 -07:00
Harshavardhana
d65a2c6725
Implement cluster-wide in-place updates (#8070)
This PR is a breaking change and also deprecates
`minio update` command, from this release onwards
all users are advised to just use `mc admin update`
2019-08-27 11:37:47 -07:00
Bala FA
60f52f461f add network read performance collection support. (#8038)
ReST API on /minio/admin/v1/performance?perfType=net[?size=N] 
returns

```
{
  "PEER-1": [
             {
	       "addr": ADDR,
	       "readPerf": DURATION,
	       "error": ERROR,
	     },
	     ...
	   ],
  ...
  ...
  "PEER-N": [
             {
	       "addr": ADDR,
	       "readPerf": DURATION,
	       "error": ERROR,
	     },
	     ...
	   ]
}
```
2019-08-19 08:26:32 +05:30
Aditya Manthramurthy
bf9b619d86 Set the policy mapping for a user or group (#8036)
Add API to set policy mapping for a user or group

Contains a breaking Admin APIs change.

- Also enforce all applicable policies
- Removes the previous /set-user-policy API

 Bump up peerRESTVersion

Add get user info API to show groups of a user
2019-08-13 13:41:06 -07:00
Harshavardhana
e6d8e272ce
Use const slashSeparator instead of "/" everywhere (#8028) 2019-08-06 12:08:58 -07:00
Aditya Manthramurthy
414a7eca83 Add IAM groups support (#7981)
This change adds admin APIs and IAM subsystem APIs to:

- add or remove members to a group (group addition and deletion is
  implicit on add and remove)

- enable/disable a group

- list and fetch group info
2019-08-02 14:25:00 -07:00
Harshavardhana
123cccaed1 Honor connection pooling while tracing (#7979)
This PR fixes relying on r.Context().Done()
by setting

```
Connection: "close"
```

HTTP Header, this has detrimental issues for
client side connection pooling. Since this
header explicitly tells clients to turn-off
connection pooling. This causing pro-active
connections to be closed leaving many conn's
in TIME_WAIT state. This can be observed with
`mc admin trace -a` when running distributed
setup.

This PR also fixes tracing filtering issue
when bucket names have `minio` as prefixes,
trace was erroneously ignoring them.
2019-07-31 11:08:39 -07:00
Aditya Manthramurthy
7bdaf9bc50 Update on-disk storage format for users system (#7949)
- Policy mapping is now at `config/iam/policydb/users/myuser1.json`
  and includes version.

- User identity file is now versioned.

- Migrate old data to the new format.
2019-07-24 17:34:23 -07:00
poornas
0373a1699b Add error filter to admin trace API (#7923)
This allows MinIO to have the ability to send back only error trace
2019-07-20 01:38:26 +01:00
Krishnan Parthasarathi
fbfc9a61ec Add node address information to logs (#7941) 2019-07-18 09:58:37 -07:00
Harshavardhana
5c0acbc6fc
Add text/event-stream for long running http connections (#7909)
When MinIO is behind a proxy, proxies end up killing
clients when no data is seen on the connection, adding
the right content-type ensures that proxies do not come
in the way.
2019-07-11 13:19:25 -07:00
poornas
0505ef83b5 Fix host address returned in admin API calls (#7846) 2019-07-05 20:41:35 -07:00
Harshavardhana
c43f745449
Ensure that we use constants everywhere (#7845)
This allows for canonicalization of the strings
throughout our code and provides a common space
for all these constants to reside.

This list is rather non-exhaustive but captures
all the headers used in AWS S3 API operations
2019-07-02 22:34:32 -07:00
kannappanr
70b350c383
Remove DeploymentID from response headers (#7815)
Response headers need not contain deployment ID.
2019-07-01 12:22:01 -07:00
Krishna Srinivas
183ec094c4 Simplify HTTP trace related code (#7833) 2019-06-26 22:41:12 -07:00
Anis Elleuch
48f2c98052 admin: Add Background heal status info API (#7774)
This API returns the information related to the self healing routine.

For the moment, it returns:
- The total number of objects that are scanned
- The last time when an item was scanned
2019-06-25 16:42:24 -07:00
Harshavardhana
90ca73af13 Allow trace even if server is not initialized (#7822) 2019-06-21 16:47:51 -07:00
Harshavardhana
cd7d5b59e5
Add DeleteUser() to generate events in etcd (#7804)
Fixes a regression introduced in 6d89435356

Fixes #7797
2019-06-18 15:44:23 -07:00
poornas
97090aa16c Add admin API to send trace notifications to registered (#7128)
Remove current functionality to log trace to file
using MINIO_HTTP_TRACE env, and replace it with
mc admin trace command on mc client.
2019-06-08 15:54:41 -07:00
Harshavardhana
6d89435356 Reload a specific user or policy on peers (#7705)
Fixes #7587
2019-06-06 17:46:22 -07:00
Harshavardhana
ae002aa724 Deprecate updating admin credentials using API calls (#7570)
Root credentials are not allowed to change in all of our
distributed setup deployments, this PR simply removes
that behavior.
2019-04-24 12:54:44 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Harshavardhana
c90999df98 Valid if bucket names are internal (#7476)
This commit fixes a privilege escalation issue against
the S3 and web handlers. An authenticated IAM user
can:

- Read from or write to the internal '.minio.sys'
bucket by simply sending a properly signed
S3 GET or PUT request. Further, the user can
- Read from or write to the internal '.minio.sys'
bucket using the 'Upload'/'Download'/'DownloadZIP'
API by sending a "browser" request authenticated
with its JWT token.
2019-04-03 23:10:37 -07:00
kannappanr
87cf51d5ab
unused code: Remove LoadCredentials function (#7369)
It is required to set the environment variable in the case of distributed
minio. LoadCredentials is used to notify peers of the change and will not work if
environment variable is set. so, this function will never be called.
2019-03-20 18:09:57 -07:00
Krishnan Parthasarathi
8a77a298f2 Add deploymentID to ServerInfo (#7372) 2019-03-19 16:12:24 +05:30
Kirill Motkov
3d29ab4059 Rewrite if-else chains to switch statements (#7382) 2019-03-18 07:46:20 -07:00
Harshavardhana
a51781e5cf Use context to fill in more details about error XML (#7232) 2019-02-13 16:07:21 -08:00
Harshavardhana
df35d7db9d Introduce staticcheck for stricter builds (#7035) 2019-02-13 18:29:36 +05:30
Harshavardhana
fef5416b3c Support unknown gateway errors and convert at handler layer (#7219)
Different gateway implementations due to different backend
API errors, might return different unsupported errors at
our handler layer. Current code posed a problem for us because
this information was lost and we would convert it to InternalError
in this situation all S3 clients end up retrying the request.

To avoid this unexpected situation implement a way to support
this cleanly such that the underlying information is not lost
which is returned by gateway.
2019-02-12 14:55:52 +05:30
Praveen raj Mani
8af1f0cc7b Improved error message for user and access key conflict (#7190) 2019-02-07 17:25:58 -08:00
Sidhartha Mani
34e7259f95 Add Historic CPU and memory stats (#7136)
Collect historic cpu and mem stats.  Also, use actual values 
instead of formatted strings while returning to the client. The string 
formatting prevents values from being processed by the server or 
by the client without parsing it. 

This change will allow the values to be processed (eg. 
compute rolling-average over the lifetime of the minio server)
and offloads the formatting to the client.
2019-01-30 12:47:32 +05:30
kannappanr
ce870466ff
Top Locks command implementation (#7052)
API to list locks used in distributed XL mode
2019-01-24 07:22:14 -08:00
kannappanr
e0d22359e7
Fix lint warnings (#7099) 2019-01-16 12:49:20 -08:00
Harshavardhana
8757c963ba
Migrate all Peer communication to common Notification subsystem (#7031)
Deprecate the use of Admin Peers concept and migrate all peer
communication to Notification subsystem. This finally allows
for a common subsystem for all peer notification in case of
distributed server deployments.
2019-01-14 12:14:20 +05:30
Sidhartha Mani
f3f47d8cd3 Add ServerCPULoadInfo() and ServerMemUsageInfo() admin API (#7038) 2019-01-09 19:04:19 -08:00
Nitish Tiwari
fcb56d864c Add ServerDrivesPerfInfo() admin API (#6969)
This is part of implementation for mc admin health command. The
ServerDrivesPerfInfo() admin API returns read and write speed
information for all the drives (local and remote) in a given Minio
server deployment.

Part of minio/mc#2606
2018-12-31 09:46:44 -08:00
Harshavardhana
4f31a9a33b Reload users upon AddUser on peers (#6975)
Also migrate ReloadFormat to notification subsystem,
remove GetConfig() we do not use this API anymore
2018-12-18 14:39:21 -08:00
Harshavardhana
dba61867e8 Redirect browser requests returning AccessDenied (#6848)
Anonymous requests from S3 resources returning
AccessDenied should be auto redirected to browser
for login.
2018-11-26 12:15:12 -08:00
Harshavardhana
a55a298e00 Make sure to log unhandled errors always (#6784)
In many situations, while testing we encounter
ErrInternalError, to reduce logging we have
removed logging from quite a few places which
is acceptable but when ErrInternalError occurs
we should have a facility to log the corresponding
error, this helps to debug Minio server.
2018-11-12 11:07:43 -08:00
Harshavardhana
a9cda850ca Add forceStop flag to provide facility to stop healing (#6718)
This PR also makes sure that we deal with HTTP request
count by ignoring the on-going heal operation, i.e
do not wait on itself.
2018-11-04 19:24:16 -08:00
Praveen raj Mani
acf46cc3b5 SetConfigHandler should avoid setting an invalid notification config (#6679)
Fixes #6642 
Fixes #6641
2018-10-22 11:51:26 -07:00
Harshavardhana
b251454dd6 Fix toggling users status (#6640) 2018-10-16 14:55:23 -07:00
Harshavardhana
1e7e5e297c
Add canned policy support (#6637)
This PR adds an additional API where we can create
a new set of canned policies which can be used with one
or many users.
2018-10-16 12:48:19 -07:00
Harshavardhana
3ef3fefd54 Add ListUsers API to list all configured users in IAM (#6619) 2018-10-13 12:48:43 +05:30
Harshavardhana
54ae364def Introduce STS client grants API and OPA policy integration (#6168)
This PR introduces two new features

- AWS STS compatible STS API named AssumeRoleWithClientGrants

```
POST /?Action=AssumeRoleWithClientGrants&Token=<jwt>
```

This API endpoint returns temporary access credentials, access
tokens signature types supported by this API

  - RSA keys
  - ECDSA keys

Fetches the required public key from the JWKS endpoints, provides
them as rsa or ecdsa public keys.

- External policy engine support, in this case OPA policy engine

- Credentials are stored on disks
2018-10-09 14:00:01 -07:00
Pontus Leitzler
274b35154c Fix go1.11 vet shadow errors (#6555) 2018-10-02 15:21:34 -07:00
Anis Elleuch
83d7ec09c1 Disable restarting server after setting a new config (#6521)
Also disable listening to service restart event in tests since
we don't do this anymore.
2018-09-28 12:10:51 -07:00
Anis Elleuch
6c7c6bec91 profiler: Download API returns error when all nodes fail (#6525)
When download profiling data API fails to gather profiling data
from all nodes for any reason (including profiler not enabled),
return 400 http code with the appropriate json message.
2018-09-27 10:34:37 -07:00
Harshavardhana
1111419d4a Add debugging for mutex, tracing (#6522) 2018-09-27 09:32:05 +05:30
Anis Elleuch
9531cddb06 Add Profiler Admin API (#6463)
Two handlers are added to admin API to enable profiling and disable
profiling of a server in a standalone mode, or all nodes in the
distributed mode.

/minio/admin/profiling/start/{cpu,block,mem}:
  - Start profiling and return starting JSON results, e.g. one
    node is offline.

/minio/admin/profiling/download:
  - Stop the on-going profiling task
  - Stream a zip file which contains all profiling files that can
    be later inspected by go tool pprof
2018-09-18 16:46:35 -07:00
Anis Elleuch
3099af70a3 Add admin get/set config keys API (#6113)
This PR adds two new admin APIs in Minio server and madmin package:
- GetConfigKeys(keys []string) ([]byte, error)
- SetConfigKeys(params map[string]string) (err error)

A key is a path in Minio configuration file, (e.g. notify.webhook.1)

The user will always send a string value when setting it in the config file,
the API will know how to convert the value to the appropriate type. The user
is also able to set a raw json.

Before setting a new config, Minio will validate all fields and try to connect
to notification targets if available.
2018-09-06 20:33:18 +05:30
Harshavardhana
9f14433cbd Ensure that setConfig uses latest functionality (#6302) 2018-08-17 18:51:34 -07:00
Harshavardhana
0e02328c98 Migrate config.json from config-dir to backend (#6195)
This PR is the first set of changes to move the config
to the backend, the changes use the existing `config.json`
allows it to be migrated such that we can save it in on
backend disks.

In future releases, we will slowly migrate out of the
current architecture.

Fixes #6182
2018-08-15 10:11:47 +05:30
Oleg Kovalov
37de2dbd3b simplifying if-else chains to switches (#6208) 2018-08-06 10:26:40 -07:00
Harshavardhana
556a51120c Deprecate ListLocks and ClearLocks (#6233)
No locks are ever left in memory, we also
have a periodic interval of clearing stale locks
anyways. The lock instrumentation was not complete
and was seldom used.

Deprecate this for now and bring it back later if
it is really needed. This also in-turn seems to improve
performance slightly.
2018-08-02 23:09:42 +05:30
kannappanr
76ddf4d32f Log x-amz-request-id as log and XML error response (#6173)
Currently, requestid field in logEntry is not populated, as the
requestid field gets set at the very end.
It is now set before regular handler functions. This is also
useful in setting it as part of the XML error response.

Travis build for ppc64le has been quite inconsistent and stays queued
for most of the time. Removing this build as part of Travis.yml for
the time being.
2018-07-20 18:46:32 -07:00
Anis Elleuch
e8a008f5b5 Better validation of all config file fields (#6090)
Add Validate() to serverConfig to call it at server
startup and in Admin SetConfig handler to minimize
errors scenario after server restart.
2018-07-18 11:22:29 -07:00
Harshavardhana
371349787f Remove region requirement for Healing (#6031) 2018-06-08 17:54:57 -07:00
Nitish Tiwari
3dc13323e5 Use random host from among multiple hosts to create requests
Also use hosts passed to Minio startup command to populate IP
addresses if MINIO_PUBLIC_IPS is not set.
2018-06-08 10:22:01 -07:00
Harshavardhana
853ea371ce Bring etcd support for bucket DNS federation
- Supports centralized `config.json`
- Supports centralized `bucket` service records
  for client lookups
- implement a new proxy forwarder
2018-06-08 10:22:01 -07:00
Praveen raj Mani
c0cfe21c00 Ignore region in the case of admin API (#5919)
Admin API is not an S3 API and hence it is not required
to honor server region while validating admin API calls.

Fixes #2411
2018-06-07 10:37:31 -07:00
Harshavardhana
6138cae8e7 Persist MINIO_WORM as part of config.json (#6022) 2018-06-06 18:10:51 -07:00
Bala FA
6a53dd1701 Implement HTTP POST based RPC (#5840)
Added support for new RPC support using HTTP POST.  RPC's 
arguments and reply are Gob encoded and sent as HTTP 
request/response body.

This patch also removes Go RPC based implementation.
2018-06-06 14:21:56 +05:30
Anis Elleuch
9439dfef64 Use defer style to stop tickers to avoid current/possible misuse (#5883)
This commit ensures that all tickers are stopped using defer ticker.Stop()
style. This will also fix one bug seen when a client starts to listen to
event notifications and that case will result a leak in tickers.
2018-05-04 10:43:20 -07:00
Harshavardhana
954142a98f Cleanup and make a safer code (#5794) 2018-04-21 20:51:53 -07:00
Nitish Tiwari
638f01f9e4 Generalize loadConfig method to avoid reading from disk (#5819)
As we move to multiple config backends like local disk and etcd,
config file should not be read from the disk, instead the quick
package should load and verify for duplicate entries.
2018-04-13 15:14:19 -07:00
kannappanr
f8a3fd0c2a
Create logger package and rename errorIf to LogIf (#5678)
Removing message from error logging
Replace errors.Trace with LogIf
2018-04-05 15:04:40 -07:00
Krishna Srinivas
e452377b24 Add context to the object-interface methods.
Make necessary changes to xl fs azure sia
2018-03-15 16:28:25 -07:00
Harshavardhana
e4f6877c8b Handle incoming proxy requests ip, scheme (#5591)
This PR implements functions to get the right ip, scheme
from the incoming proxied requests.
2018-03-02 15:23:04 -08:00
Harshavardhana
fb96779a8a Add large bucket support for erasure coded backend (#5160)
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
2018-02-15 17:45:57 -08:00
Aditya Manthramurthy
018813b98f Fix configuration handling bugs: (#5473)
* Update the GetConfig admin API to use the latest version of
  configuration, along with fixes to the corresponding RPCs.
* Remove mutex inside the configuration struct, and inside
  notification struct.
* Use global config mutex where needed.
* Add `serverConfig.ConfigDiff()` that provides a more granular diff
  of what is different between two configurations.
2018-01-31 08:15:54 -08:00
Aditya Manthramurthy
5cdcc73bd5 Admin API auth and heal related fixes (#5445)
- Fetch region for auth from global state
- Fix SHA256 handling for empty body in heal API
2018-01-25 19:24:00 +05:30
Aditya Manthramurthy
254b05e314 Fix locking in some admin APIs: (#5438)
- read lock for get config
- write lock for update creds
- write lock for format file
2018-01-22 18:09:12 -08:00
Aditya Manthramurthy
a337ea4d11 Move admin APIs to new path and add redesigned heal APIs (#5351)
- Changes related to moving admin APIs
   - admin APIs now have an endpoint under /minio/admin
   - admin APIs are now versioned - a new API to server the version is
     added at "GET /minio/admin/version" and all API operations have the
     path prefix /minio/admin/v1/<operation>
   - new service stop API added
   - credentials change API is moved to /minio/admin/v1/config/credential
   - credentials change API and configuration get/set API now require TLS
     so that credentials are protected
   - all API requests now receive JSON
   - heal APIs are disabled as they will be changed substantially

- Heal API changes
   Heal API is now provided at a single endpoint with the ability for a
   client to start a heal sequence on all the data in the server, a
   single bucket, or under a prefix within a bucket.

   When a heal sequence is started, the server returns a unique token
   that needs to be used for subsequent 'status' requests to fetch heal
   results.

   On each status request from the client, the server returns heal result
   records that it has accumulated since the previous status request. The
   server accumulates upto 1000 records and pauses healing further
   objects until the client requests for status. If the client does not
   request any further records for a long time, the server aborts the
   heal sequence automatically.

   A heal result record is returned for each entity healed on the server,
   such as system metadata, object metadata, buckets and objects, and has
   information about the before and after states on each disk.

   A client may request to force restart a heal sequence - this causes
   the running heal sequence to be aborted at the next safe spot and
   starts a new heal sequence.
2018-01-22 14:54:55 -08:00
Andreas Auernhammer
3f09c17bfe fix authentication bypass against Admin-API (#5412)
This change fixes an authentication bypass attack against the
minio Admin-API. Therefore the Admin-API rejects now all types of
requests except valid signature V2 and signature V4 requests - this
includes signature V2/V4 pre-signed requests.

Fixes #5411
2018-01-17 10:36:25 -08:00
poornas
0bb6247056 Move nslocking from s3 layer to object layer (#5382)
Fixes #5350
2018-01-13 10:04:52 +05:30
Aditya Manthramurthy
cd22feecf8 Remove healing of incomplete multipart uploads (#5390)
Since the server performs automatic clean-up of multipart uploads that
have not been resumed for more than a couple of weeks, it was decided
to remove functionality to heal multipart uploads.
2018-01-11 15:07:43 -08:00
Aditya Manthramurthy
8e4eb591c1 Update error response when heal is not implemented (#5383) 2018-01-11 10:21:41 -08:00
Aditya Manthramurthy
f413224b24 Fix config set handler (#5384)
- Return error when the config JSON has duplicate keys (fixes #5286)

- Limit size of configuration file provided to 256KiB - this prevents
  another form of DoS
2018-01-11 12:36:36 +05:30
Harshavardhana
819d1e80c6 Add more delays on distributed startup for slow network (#5240)
Refer #5237
2017-12-16 08:25:29 -08:00
Krishna Srinivas
14e6c5ec08 Simplify the steps to make changes to config.json (#5186)
This change introduces following simplified steps to follow 
during config migration.

```
 // Steps to move from version N to version N+1
 // 1. Add new struct serverConfigVN+1 in config-versions.go
 // 2. Set configCurrentVersion to "N+1"
 // 3. Set serverConfigCurrent to serverConfigVN+1
 // 4. Add new migration function (ex. func migrateVNToVN+1()) in config-migrate.go
 // 5. Call migrateVNToVN+1() from migrateConfig() in config-migrate.go
 // 6. Make changes in config-current_test.go for any test change
```
2017-11-29 13:12:47 -08:00
Krishna Srinivas
e7a724de0d Virtual host style S3 requests (#5095) 2017-11-14 16:56:24 -08:00
Bala FA
32c6b62932 move credentials as separate package (#5115) 2017-10-31 11:54:32 -07:00
Aditya Manthramurthy
d23ded0d83 Use retryableStorage after healing format.json (#5105)
- Previously networkStorage was being used and this lead to errors
  when listing with a down server/disk

Fixes #5089
2017-10-26 09:52:23 -07:00
poornas
0d154871d5 Admin: Raise error if config and env credentials mismatch (#4870) 2017-09-07 11:16:13 -07:00
Frank Wessels
61e0b1454a Add support for timeouts for locks (#4377) 2017-08-31 14:43:59 -07:00
Harshavardhana
f346ca44f0 config: Avoid stale credentials in memory. (#4466) 2017-08-08 12:14:32 -07:00
Anis Elleuch
465274cd21 server-info: Change Error type to string (#4346)
Golang std error type doesn't marshal/unmarshal with json. So errors
are not actually being sent when a client calls ServerInfo() API.
2017-05-15 07:28:47 -07:00
Anis Elleuch
83abad0b37 admin: ServerInfo() returns info for each node (#4150)
ServerInfo() will gather information from all nodes before returning
it back to the client.
2017-04-21 07:15:53 -07:00
Krishnan Parthasarathi
ca64b86112 Return possible states a heal operation (#4045) 2017-04-14 10:28:35 -07:00
Harshavardhana
27749c2124 admin/info: Add HTTPStats value as part of serverInfo() struct. (#4049)
Remove our counter implementation instead use atomic external
package which supports more types and methods.
2017-04-06 23:08:33 -07:00
Krishnan Parthasarathi
2bd694dbc8 Add disksUnavailable healStatus const (#3990)
`disksUnavailable` healStatus constant indicates that a given object
needs healing but one or more of disks requiring heal are offline. This
can be used by admin heal API consumers to distinguish between a
successful heal and a no-op since the outdated disks were offline.
2017-03-31 17:55:15 -07:00
Krishnan Parthasarathi
c192e5c9b2 Implement heal-upload admin API (#3914)
This API is meant for administrative tools like mc-admin to heal an
ongoing multipart upload on a Minio server.  N B This set of admin
APIs apply only for Minio servers.

`github.com/minio/minio/pkg/madmin` provides a go SDK for this (and
other admin) operations.  Specifically,

  func HealUpload(bucket, object, uploadID string, dryRun bool) error

Sample admin API request:
POST
/?heal&bucket=mybucket&object=myobject&upload-id=myuploadID&dry-run
- Header(s): ["x-minio-operation"] = "upload"

Notes:
- bucket, object and upload-id are mandatory query parameters
- if dry-run is set, API returns success if all parameters passed are
  valid.
2017-03-17 09:25:49 -07:00
Bala FA
21d73a3eef Simplify credential usage. (#3893) 2017-03-16 00:16:06 -07:00
Krishnan Parthasarathi
051f9bb5c6 Implement list uploads heal admin API (#3885) 2017-03-16 00:15:06 -07:00
Bala FA
8a9852220d Make unit testable cert parsing functions. (#3863) 2017-03-08 19:20:01 -08:00
Anis Elleuch
cddc684559 admin: Set Config returns errSet and errMsg (#3822)
There is no way to see if a node encountered an error
when trying to set a new config set, this commit adds
a bool errSet field.
2017-03-03 02:53:48 -08:00
Krishnan Parthasarathi
c9619673fb Implement SetConfig admin API handler. (#3792) 2017-02-27 11:40:27 -08:00
Krishnan Parthasarathi
2745bf2f1f Implement ServerConfig admin REST API (#3741)
Returns a valid config.json of the setup. In case of distributed
setup, it checks if quorum or more number of nodes have the same
config.json.
2017-02-20 12:58:50 -08:00
Anis Elleuch
7f86a21317 admin: Add ServerInfo API() (#3743) 2017-02-15 10:45:45 -08:00
Krishnan Parthasarathi
ce9aa2f2b2 Add uptime to ServiceStatus (#3690) 2017-02-08 00:13:02 -08:00
Harshavardhana
31dff87903 Honor envs properly for access and secret key. (#3703)
Also changes the behavior of `secretKeyHash` which is
not necessary to be sent over the network, each node
has its own secretKeyHash to validate.

Fixes #3696
Partial(fix) #3700 (More changes needed with some code cleanup)
2017-02-07 12:51:43 -08:00
Krishnan Parthasarathi
0472e5c1e1 Change query param name to duration in list/clear locks API (#3664)
Following is a sample list lock API request schematic,

  /?lock&bucket=mybucket&prefix=myprefix&duration=holdDuration
  x-minio-operation: list

The response would contain the list of locks held on mybucket matching
myprefix for a duration longer than holdDuration.
2017-02-01 11:17:30 -08:00
Harshavardhana
85f2b74cfd jwt: Cache the bcrypt password hash. (#3526)
Creds don't require secretKeyHash to be calculated
everytime, cache it instead and re-use.

This is an optimization for bcrypt.

Relevant results from the benchmark done locally, negative
value means improvement in this scenario.

```
benchmark                       old ns/op     new ns/op     delta
BenchmarkAuthenticateNode-4     160590992     80125647      -50.11%
BenchmarkAuthenticateWeb-4      160556692     80432144      -49.90%

benchmark                       old allocs     new allocs     delta
BenchmarkAuthenticateNode-4     87             75             -13.79%
BenchmarkAuthenticateWeb-4      87             75             -13.79%

benchmark                       old bytes     new bytes     delta
BenchmarkAuthenticateNode-4     15222         9785          -35.72%
BenchmarkAuthenticateWeb-4      15222         9785          -35.72%
```
2017-01-26 16:51:51 -08:00
Krishnan Parthasarathi
0e693e0284 Add dry-run query param for HealFormat API (#3618) 2017-01-24 08:11:05 -08:00
Anis Elleuch
d1d89116f1 admin: Add version to service Status API response (#3605)
Add server's version field to service status API:

"version":{
	"version":"DEVELOPMENT.GOGET",
	"commitID":"DEVELOPMENT.GOGET"
}
2017-01-23 08:56:06 -08:00
Krishnan Parthasarathi
586058f079 Implement mgmt REST APIs to heal storage format. (#3604)
* Implement heal format REST API handler
* Implement admin peer rpc handler to re-initialize storage
* Implement HealFormat API in pkg/madmin
* Update pkg/madmin API.md to incl. HealFormat
* Added unit tests for ReInitDisks rpc handler and HealFormatHandler
2017-01-23 00:32:55 -08:00
Anis Elleuch
0715032598 heal: Add ListBucketsHeal object API (#3563)
ListBucketsHeal will list which buckets that need to be healed:
  * ListBucketsHeal() (buckets []BucketInfo, err error)
2017-01-19 09:34:18 -08:00
Anis Elleuch
f803bb4b3d admin: Add service Set Credentials API (#3580) 2017-01-17 14:25:59 -08:00
Krishnan Parthasarathi
c194b9f5f1 Implement mgmt REST APIs for heal subcommands (#3533)
The heal APIs supported in this change are,
- listing of objects to be healed.
- healing a bucket.
- healing an object.
2017-01-17 10:02:58 -08:00
Harshavardhana
caecd75a2a Deprecate and remove service stop API. (#3578)
Fixes #3570
2017-01-14 14:48:52 -08:00
Harshavardhana
926c75d0b5 api: Set appropriate content-type for success/error responses. (#3537)
Golang HTTP client automatically detects content-type but
for S3 clients this content-type might be incorrect or
might misbehave.

For example:

```
Content-Type: text/xml; charset=utf-8
```

Should be
```
Content-Type: application/xml
```

Allow this to be set properly.
2017-01-06 00:37:00 -08:00
Krishnan Parthasarathi
c8f57133a4 Implement list, clear locks REST API w/ pkg/madmin support (#3491)
* Filter lock info based on bucket, prefix and time since lock was held
* Implement list and clear locks REST API
* madmin: Add list and clear locks API
* locks: Clear locks matching bucket, prefix, relTime.
* Gather lock information across nodes for both list and clear locks admin REST API.
* docs: Add lock API to management APIs
2017-01-03 23:39:22 -08:00
Harshavardhana
f57f773189 admin: Add missing madmin examples and API docs. (#3483) 2016-12-20 18:49:48 -08:00
Harshavardhana
e7b4e4e105 admin: ServiceStatus() shouldn't have to write double http headers. (#3484)
Fixes #3482
2016-12-20 18:05:25 -08:00
Krishnan Parthasarathi
b2f920a868 Add service API handler stubs for status, stop and restart (#3417) 2016-12-15 22:26:15 -08:00