Git Repo - linux.git/log

]> Git Repo - linux.git/log

Ilya Dryomov [Thu, 28 Apr 2016 14:07:26 +0000 (16:07 +0200)]

libceph: request_init() and request_release_checks()

These are going to be used by request_reinit() code.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:26 +0000 (16:07 +0200)]

libceph: a major OSD client update

This is a major sync up, up to ~Jewel. The highlights are:

- per-session request trees (vs a global per-client tree)
- per-session locking (vs a global per-client rwlock)
- homeless OSD session
- no ad-hoc global per-client lists
- support for pool quotas
- foundation for watch/notify v2 support
- foundation for map check (pool deletion detection) support

The switchover is incomplete: lingering requests can be setup and
teared down but aren't ever reestablished. This functionality is
restored with the introduction of the new lingering infrastructure
(ceph_osd_linger_request, linger_work, etc) in a later commit.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:26 +0000 (16:07 +0200)]

libceph: protect osdc->osd_lru list with a spinlock

OSD client is getting moved from the big per-client lock to a set of
per-session locks. The big rwlock would only be held for read most of
the time, so a global osdc->osd_lru needs additional protection.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:25 +0000 (16:07 +0200)]

libceph: allocate ceph_osd with GFP_NOFAIL

create_osd() is called way too deep in the stack to be able to error
out in a sane way; a failing create_osd() just messes everything up.
The current req_notarget list solution is broken - the list is never
traversed as it's not entirely clear when to do it, I guess.

If we were to start traversing it at regular intervals and retrying
each request, we wouldn't be far off from what __GFP_NOFAIL is doing,
so allocate OSD sessions with __GFP_NOFAIL, at least until we come up
with a better fix.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:25 +0000 (16:07 +0200)]

libceph: osd_init() and osd_cleanup()

These are going to be used by homeless OSD sessions code.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:25 +0000 (16:07 +0200)]

libceph: handle_one_map()

Separate osdmap handling from decoding and iterating over a bag of maps
in a fresh MOSDMap message. This sets up the scene for the updated OSD
client.

Of particular importance here is the addition of pi->was_full, which
can be used to answer "did this pool go full -> not-full in this map?".
This is the key bit for supporting pool quotas.

We won't be able to downgrade map_sem for much longer, so drop
downgrade_write().

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Wed, 25 May 2016 22:59:09 +0000 (15:59 -0700)]

Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs iov_iter regression fix from Al Viro:
"Fix for braino in 'fold checks into iterate_and_advance()'"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do "fold checks into iterate_and_advance()" right

commit | commitdiff | tree

Linus Torvalds [Wed, 25 May 2016 22:54:35 +0000 (15:54 -0700)]

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs xattr regression fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make xattr_resolve_handlers() safe to use with NULL ->s_xattr
xattr: Fail with -EINVAL for NULL attribute names

commit | commitdiff | tree

Linus Torvalds [Wed, 25 May 2016 22:38:56 +0000 (15:38 -0700)]

Merge tag 'acpi-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"Additional ACPI update for v4.7-rc1

  Just one fix for incorrect async_synchronize_cookie() usage in the
  ACPI battery driver (Chris Wilson)"

* tag 'acpi-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / battery: Correctly serialise with the pending async probe

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:25 +0000 (16:07 +0200)]

libceph: allocate dummy osdmap in ceph_osdc_init()

This leads to a simpler osdmap handling code, particularly when dealing
with pi->was_full, which is introduced in a later commit.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:24 +0000 (16:07 +0200)]

libceph: schedule tick from ceph_osdc_init()

Both homeless OSD sessions and watch/notify v2, introduced in later
commits, require periodic ticks which don't depend on ->num_requests.
Schedule the initial tick from ceph_osdc_init() and reschedule from
handle_timeout() unconditionally.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:24 +0000 (16:07 +0200)]

libceph: move schedule_delayed_work() in ceph_osdc_init()

ceph_osdc_stop() isn't called if ceph_osdc_init() fails, so we end up
with handle_osds_timeout() running on invalid memory if any one of the
allocations fails. Call schedule_delayed_work() after everything is
setup, just before returning.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:24 +0000 (16:07 +0200)]

libceph: redo callbacks and factor out MOSDOpReply decoding

If you specify ACK | ONDISK and set ->r_unsafe_callback, both
->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
very confusing.  Redo this so that only one of them is called:

    ->r_unsafe_callback(true), on ack
    ->r_unsafe_callback(false), on commit

or

    ->r_callback, on ack|commit

Decode everything in decode_MOSDOpReply() to reduce clutter.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Thu, 28 Apr 2016 14:07:24 +0000 (16:07 +0200)]

libceph: drop msg argument from ceph_osdc_callback_t

finish_read(), its only user, uses it to get to hdr.data_len, which is
what ->r_result is set to on success. This gains us the ability to
safely call callbacks from contexts other than reply, e.g. map check.

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree

Ilya Dryomov [Wed, 25 May 2016 22:29:52 +0000 (00:29 +0200)]

libceph: switch to calc_target(), part 2

The crux of this is getting rid of ceph_osdc_build_request(), so that
MOSDOp can be encoded not before but after calc_target() calculates the
actual target.  Encoding now happens within ceph_osdc_start_request().

Also nuked is the accompanying bunch of pointers into the encoded
buffer that was used to update fields on each send - instead, the
entire front is re-encoded.  If we want to support target->name_len !=
base->name_len in the future, there is no other way, because oid is
surrounded by other fields in the encoded buffer.

Encoding OSD ops and adding data items to the request message were
mixed together in osd_req_encode_op().  While we want to re-encode OSD
ops, we don't want to add duplicate data items to the message when
resending, so all call to ceph_osdc_msg_data_add() are factored out
into a new setup_request_data().

Signed-off-by: Ilya Dryomov <[email protected]>

commit | commitdiff | tree